[Doc] fix typos in documentation (#9458)

This commit is contained in:
James Lamb 2023-08-10 06:26:36 -05:00 committed by GitHub
parent 4359356d46
commit 9dbb71490c
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
18 changed files with 32 additions and 31 deletions

1
.gitignore vendored
View File

@ -48,6 +48,7 @@ Debug
*.Rproj *.Rproj
./xgboost.mpi ./xgboost.mpi
./xgboost.mock ./xgboost.mock
*.bak
#.Rbuildignore #.Rbuildignore
R-package.Rproj R-package.Rproj
*.cache* *.cache*

View File

@ -119,7 +119,7 @@ An up-to-date version of the CUDA toolkit is required.
.. note:: Checking your compiler version .. note:: Checking your compiler version
CUDA is really picky about supported compilers, a table for the compatible compilers for the latests CUDA version on Linux can be seen `here <https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html>`_. CUDA is really picky about supported compilers, a table for the compatible compilers for the latest CUDA version on Linux can be seen `here <https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html>`_.
Some distros package a compatible ``gcc`` version with CUDA. If you run into compiler errors with ``nvcc``, try specifying the correct compiler with ``-DCMAKE_CXX_COMPILER=/path/to/correct/g++ -DCMAKE_C_COMPILER=/path/to/correct/gcc``. On Arch Linux, for example, both binaries can be found under ``/opt/cuda/bin/``. Some distros package a compatible ``gcc`` version with CUDA. If you run into compiler errors with ``nvcc``, try specifying the correct compiler with ``-DCMAKE_CXX_COMPILER=/path/to/correct/g++ -DCMAKE_C_COMPILER=/path/to/correct/gcc``. On Arch Linux, for example, both binaries can be found under ``/opt/cuda/bin/``.

View File

@ -32,7 +32,7 @@ GitHub Actions is also used to build Python wheels targeting MacOS Intel and App
``python_wheels`` pipeline sets up environment variables prefixed ``CIBW_*`` to indicate the target ``python_wheels`` pipeline sets up environment variables prefixed ``CIBW_*`` to indicate the target
OS and processor. The pipeline then invokes the script ``build_python_wheels.sh``, which in turns OS and processor. The pipeline then invokes the script ``build_python_wheels.sh``, which in turns
calls ``cibuildwheel`` to build the wheel. The ``cibuildwheel`` is a library that sets up a calls ``cibuildwheel`` to build the wheel. The ``cibuildwheel`` is a library that sets up a
suitable Python environment for each OS and processor target. Since we don't have Apple Silion suitable Python environment for each OS and processor target. Since we don't have Apple Silicon
machine in GitHub Actions, cross-compilation is needed; ``cibuildwheel`` takes care of the complex machine in GitHub Actions, cross-compilation is needed; ``cibuildwheel`` takes care of the complex
task of cross-compiling a Python wheel. (Note that ``cibuildwheel`` will call task of cross-compiling a Python wheel. (Note that ``cibuildwheel`` will call
``pip wheel``. Since XGBoost has a native library component, we created a customized build ``pip wheel``. Since XGBoost has a native library component, we created a customized build
@ -131,7 +131,7 @@ set up a credential pair in order to provision resources on AWS. See
Worker Image Pipeline Worker Image Pipeline
===================== =====================
Building images for worker machines used to be a chore: you'd provision an EC2 machine, SSH into it, and Building images for worker machines used to be a chore: you'd provision an EC2 machine, SSH into it, and
manually install the necessary packages. This process is not only laborous but also error-prone. You may manually install the necessary packages. This process is not only laborious but also error-prone. You may
forget to install a package or change a system configuration. forget to install a package or change a system configuration.
No more. Now we have an automated pipeline for building images for worker machines. No more. Now we have an automated pipeline for building images for worker machines.

View File

@ -100,7 +100,7 @@ two automatic checks to enforce coding style conventions. To expedite the code r
Linter Linter
====== ======
We use `pylint <https://github.com/PyCQA/pylint>`_ and `cpplint <https://github.com/cpplint/cpplint>`_ to enforce style convention and find potential errors. Linting is especially useful for Python, as we can catch many errors that would have otherwise occured at run-time. We use `pylint <https://github.com/PyCQA/pylint>`_ and `cpplint <https://github.com/cpplint/cpplint>`_ to enforce style convention and find potential errors. Linting is especially useful for Python, as we can catch many errors that would have otherwise occurred at run-time.
To run this check locally, run the following command from the top level source tree: To run this check locally, run the following command from the top level source tree:

View File

@ -29,7 +29,7 @@ The Project Management Committee (PMC) of the XGBoost project appointed `Open So
All expenses incurred for hosting CI will be submitted to the fiscal host with receipts. Only the expenses in the following categories will be approved for reimbursement: All expenses incurred for hosting CI will be submitted to the fiscal host with receipts. Only the expenses in the following categories will be approved for reimbursement:
* Cloud exprenses for the cloud test farm (https://buildkite.com/xgboost) * Cloud expenses for the cloud test farm (https://buildkite.com/xgboost)
* Cost of domain https://xgboost-ci.net * Cost of domain https://xgboost-ci.net
* Monthly cost of using BuildKite * Monthly cost of using BuildKite
* Hosting cost of the User Forum (https://discuss.xgboost.ai) * Hosting cost of the User Forum (https://discuss.xgboost.ai)

View File

@ -169,7 +169,7 @@ supply a specified SANITIZER_PATH.
How to use sanitizers with CUDA support How to use sanitizers with CUDA support
======================================= =======================================
Runing XGBoost on CUDA with address sanitizer (asan) will raise memory error. Running XGBoost on CUDA with address sanitizer (asan) will raise memory error.
To use asan with CUDA correctly, you need to configure asan via ASAN_OPTIONS To use asan with CUDA correctly, you need to configure asan via ASAN_OPTIONS
environment variable: environment variable:

View File

@ -63,7 +63,7 @@ XGBoost supports missing values by default.
In tree algorithms, branch directions for missing values are learned during training. In tree algorithms, branch directions for missing values are learned during training.
Note that the gblinear booster treats missing values as zeros. Note that the gblinear booster treats missing values as zeros.
When the ``missing`` parameter is specifed, values in the input predictor that is equal to When the ``missing`` parameter is specified, values in the input predictor that is equal to
``missing`` will be treated as missing and removed. By default it's set to ``NaN``. ``missing`` will be treated as missing and removed. By default it's set to ``NaN``.
************************************** **************************************

View File

@ -129,7 +129,7 @@ With parameters and data, you are able to train a booster model.
booster.saveModel("model.bin"); booster.saveModel("model.bin");
* Generaing model dump with feature map * Generating model dump with feature map
.. code-block:: java .. code-block:: java

View File

@ -54,7 +54,7 @@ After 1.4 release, we added a new parameter called ``strict_shape``, one can set
Output is a 4-dim array with ``(n_samples, n_iterations, n_classes, n_trees_in_forest)`` Output is a 4-dim array with ``(n_samples, n_iterations, n_classes, n_trees_in_forest)``
as shape. ``n_trees_in_forest`` is specified by the ``numb_parallel_tree`` during as shape. ``n_trees_in_forest`` is specified by the ``numb_parallel_tree`` during
training. When strict shape is set to False, output is a 2-dim array with last 3 dims training. When strict shape is set to False, output is a 2-dim array with last 3 dims
concatenated into 1. Also the last dimension is dropped if it eqauls to 1. When using concatenated into 1. Also the last dimension is dropped if it equals to 1. When using
``apply`` method in scikit learn interface, this is set to False by default. ``apply`` method in scikit learn interface, this is set to False by default.
@ -68,7 +68,7 @@ n_classes, n_trees_in_forest)``, while R with ``strict_shape=TRUE`` outputs
Other than these prediction types, there's also a parameter called ``iteration_range``, Other than these prediction types, there's also a parameter called ``iteration_range``,
which is similar to model slicing. But instead of actually splitting up the model into which is similar to model slicing. But instead of actually splitting up the model into
multiple stacks, it simply returns the prediction formed by the trees within range. multiple stacks, it simply returns the prediction formed by the trees within range.
Number of trees created in each iteration eqauls to :math:`trees_i = num\_class \times Number of trees created in each iteration equals to :math:`trees_i = num\_class \times
num\_parallel\_tree`. So if you are training a boosted random forest with size of 4, on num\_parallel\_tree`. So if you are training a boosted random forest with size of 4, on
the 3-class classification dataset, and want to use the first 2 iterations of trees for the 3-class classification dataset, and want to use the first 2 iterations of trees for
prediction, you need to provide ``iteration_range=(0, 2)``. Then the first :math:`2 prediction, you need to provide ``iteration_range=(0, 2)``. Then the first :math:`2

View File

@ -20,7 +20,7 @@ sklearn estimator interface is still working in progress.
You can find some some quick start examples at You can find some some quick start examples at
:ref:`sphx_glr_python_examples_sklearn_examples.py`. The main advantage of using sklearn :ref:`sphx_glr_python_examples_sklearn_examples.py`. The main advantage of using sklearn
interface is that it works with most of the utilites provided by sklearn like interface is that it works with most of the utilities provided by sklearn like
:py:func:`sklearn.model_selection.cross_validate`. Also, many other libraries recognize :py:func:`sklearn.model_selection.cross_validate`. Also, many other libraries recognize
the sklearn estimator interface thanks to its popularity. the sklearn estimator interface thanks to its popularity.

View File

@ -68,7 +68,7 @@ Other Updaters
1. ``Prune``: It prunes the existing trees. ``prune`` is usually used as part of other 1. ``Prune``: It prunes the existing trees. ``prune`` is usually used as part of other
tree methods. To use pruner independently, one needs to set the process type to update tree methods. To use pruner independently, one needs to set the process type to update
by: ``{"process_type": "update", "updater": "prune"}``. With this set of parameters, by: ``{"process_type": "update", "updater": "prune"}``. With this set of parameters,
during trianing, XGBOost will prune the existing trees according to 2 parameters during training, XGBoost will prune the existing trees according to 2 parameters
``min_split_loss (gamma)`` and ``max_depth``. ``min_split_loss (gamma)`` and ``max_depth``.
2. ``Refresh``: Refresh the statistic of built trees on a new training dataset. Like the 2. ``Refresh``: Refresh the statistic of built trees on a new training dataset. Like the

View File

@ -55,7 +55,7 @@ To ensure that CMake can locate the XGBoost library, supply ``-DCMAKE_PREFIX_PAT
.. code-block:: bash .. code-block:: bash
# Nagivate to the build directory for your application # Navigate to the build directory for your application
cd build cd build
# Activate the Conda environment where we previously installed XGBoost # Activate the Conda environment where we previously installed XGBoost
conda activate [env_name] conda activate [env_name]
@ -65,7 +65,7 @@ To ensure that CMake can locate the XGBoost library, supply ``-DCMAKE_PREFIX_PAT
make make
************************ ************************
Usefull Tips To Remember Useful Tips To Remember
************************ ************************
Below are some useful tips while using C API: Below are some useful tips while using C API:
@ -151,7 +151,7 @@ c. Assertion technique: It works both in C/ C++. If expression evaluates to 0 (f
Example if we our training data is in ``dense matrix`` format then your prediction dataset should also be a ``dense matrix`` or if training in ``libsvm`` format then dataset for prediction should also be in ``libsvm`` format. Example if we our training data is in ``dense matrix`` format then your prediction dataset should also be a ``dense matrix`` or if training in ``libsvm`` format then dataset for prediction should also be in ``libsvm`` format.
4. Always use strings for setting values to the parameters in booster handle object. The paramter value can be of any data type (e.g. int, char, float, double, etc), but they should always be encoded as strings. 4. Always use strings for setting values to the parameters in booster handle object. The parameter value can be of any data type (e.g. int, char, float, double, etc), but they should always be encoded as strings.
.. code-block:: c .. code-block:: c
@ -168,7 +168,7 @@ Sample examples along with Code snippet to use C API functions
.. code-block:: c .. code-block:: c
DMatrixHandle data; // handle to DMatrix DMatrixHandle data; // handle to DMatrix
// Load the dat from file & store it in data variable of DMatrixHandle datatype // Load the data from file & store it in data variable of DMatrixHandle datatype
safe_xgboost(XGDMatrixCreateFromFile("/path/to/file/filename", silent, &data)); safe_xgboost(XGDMatrixCreateFromFile("/path/to/file/filename", silent, &data));
@ -278,7 +278,7 @@ Sample examples along with Code snippet to use C API functions
uint64_t const* out_shape; uint64_t const* out_shape;
/* Dimension of output prediction */ /* Dimension of output prediction */
uint64_t out_dim; uint64_t out_dim;
/* Pointer to a thread local contigious array, assigned in prediction function. */ /* Pointer to a thread local contiguous array, assigned in prediction function. */
float const* out_result = NULL; float const* out_result = NULL;
safe_xgboost( safe_xgboost(
XGBoosterPredictFromDMatrix(booster, dmatrix, config, &out_shape, &out_dim, &out_result)); XGBoosterPredictFromDMatrix(booster, dmatrix, config, &out_shape, &out_dim, &out_result));

View File

@ -38,7 +38,7 @@ Although XGBoost has native support for said functions, using it for demonstrati
provides us the opportunity of comparing the result from our own implementation and the provides us the opportunity of comparing the result from our own implementation and the
one from XGBoost internal for learning purposes. After finishing this tutorial, we should one from XGBoost internal for learning purposes. After finishing this tutorial, we should
be able to provide our own functions for rapid experiments. And at the end, we will be able to provide our own functions for rapid experiments. And at the end, we will
provide some notes on non-identy link function along with examples of using custom metric provide some notes on non-identity link function along with examples of using custom metric
and objective with the `scikit-learn` interface. and objective with the `scikit-learn` interface.
If we compute the gradient of said objective function: If we compute the gradient of said objective function:
@ -165,7 +165,7 @@ Reverse Link Function
When using builtin objective, the raw prediction is transformed according to the objective When using builtin objective, the raw prediction is transformed according to the objective
function. When a custom objective is provided XGBoost doesn't know its link function so the function. When a custom objective is provided XGBoost doesn't know its link function so the
user is responsible for making the transformation for both objective and custom evaluation user is responsible for making the transformation for both objective and custom evaluation
metric. For objective with identiy link like ``squared error`` this is trivial, but for metric. For objective with identity link like ``squared error`` this is trivial, but for
other link functions like log link or inverse link the difference is significant. other link functions like log link or inverse link the difference is significant.
For the Python package, the behaviour of prediction can be controlled by the For the Python package, the behaviour of prediction can be controlled by the
@ -173,7 +173,7 @@ For the Python package, the behaviour of prediction can be controlled by the
parameter without a custom objective, the metric function will receive transformed parameter without a custom objective, the metric function will receive transformed
prediction since the objective is defined by XGBoost. However, when the custom objective is prediction since the objective is defined by XGBoost. However, when the custom objective is
also provided along with that metric, then both the objective and custom metric will also provided along with that metric, then both the objective and custom metric will
recieve raw prediction. The following example provides a comparison between two different receive raw prediction. The following example provides a comparison between two different
behavior with a multi-class classification model. Firstly we define 2 different Python behavior with a multi-class classification model. Firstly we define 2 different Python
metric functions implementing the same underlying metric for comparison, metric functions implementing the same underlying metric for comparison,
`merror_with_transform` is used when custom objective is also used, otherwise the simpler `merror_with_transform` is used when custom objective is also used, otherwise the simpler

View File

@ -256,7 +256,7 @@ In the example below, a ``KubeCluster`` is used for `deploying Dask on Kubernete
m = 1000 m = 1000
n = 10 n = 10
kWorkers = 2 # assuming you have 2 GPU nodes on that cluster. kWorkers = 2 # assuming you have 2 GPU nodes on that cluster.
# You need to work out the worker-spec youself. See document in dask_kubernetes for # You need to work out the worker-spec yourself. See document in dask_kubernetes for
# its usage. Here we just want to show that XGBoost works on various clusters. # its usage. Here we just want to show that XGBoost works on various clusters.
cluster = KubeCluster.from_yaml('worker-spec.yaml', deploy_mode='remote') cluster = KubeCluster.from_yaml('worker-spec.yaml', deploy_mode='remote')
cluster.scale(kWorkers) # scale to use all GPUs cluster.scale(kWorkers) # scale to use all GPUs
@ -648,7 +648,7 @@ environment than training the model using a single node due to aforementioned cr
Memory Usage Memory Usage
************ ************
Here are some pratices on reducing memory usage with dask and xgboost. Here are some practices on reducing memory usage with dask and xgboost.
- In a distributed work flow, data is best loaded by dask collections directly instead of - In a distributed work flow, data is best loaded by dask collections directly instead of
loaded by client process. When loading with client process is unavoidable, use loaded by client process. When loading with client process is unavoidable, use

View File

@ -7,7 +7,7 @@ dataset needs to be loaded into memory. This can be costly and sometimes
infeasible. Staring from 1.5, users can define a custom iterator to load data in chunks infeasible. Staring from 1.5, users can define a custom iterator to load data in chunks
for running XGBoost algorithms. External memory can be used for both training and for running XGBoost algorithms. External memory can be used for both training and
prediction, but training is the primary use case and it will be our focus in this prediction, but training is the primary use case and it will be our focus in this
tutorial. For prediction and evaluation, users can iterate through the data themseleves tutorial. For prediction and evaluation, users can iterate through the data themselves
while training requires the full dataset to be loaded into the memory. while training requires the full dataset to be loaded into the memory.
During training, there are two different modes for external memory support available in During training, there are two different modes for external memory support available in
@ -142,7 +142,7 @@ see `this paper <https://arxiv.org/abs/2005.09148>`_.
.. warning:: .. warning::
When GPU is running out of memory during iteration on external memory, user might When GPU is running out of memory during iteration on external memory, user might
recieve a segfault instead of an OOM exception. receive a segfault instead of an OOM exception.
.. _ext_remarks: .. _ext_remarks:
@ -150,7 +150,7 @@ see `this paper <https://arxiv.org/abs/2005.09148>`_.
Remarks Remarks
******* *******
When using external memory with XBGoost, data is divided into smaller chunks so that only When using external memory with XGBoost, data is divided into smaller chunks so that only
a fraction of it needs to be stored in memory at any given time. It's important to note a fraction of it needs to be stored in memory at any given time. It's important to note
that this method only applies to the predictor data (``X``), while other data, like labels that this method only applies to the predictor data (``X``), while other data, like labels
and internal runtime structures are concatenated. This means that memory reduction is most and internal runtime structures are concatenated. This means that memory reduction is most
@ -211,7 +211,7 @@ construction of `QuantileDmatrix` with data chunks. On the other hand, if it's p
doesn't fetch data during training. On the other hand, the external memory `DMatrix` doesn't fetch data during training. On the other hand, the external memory `DMatrix`
fetches data batches from external memory on-demand. Use the `QuantileDMatrix` (with fetches data batches from external memory on-demand. Use the `QuantileDMatrix` (with
iterator if necessary) when you can fit most of your data in memory. The training would be iterator if necessary) when you can fit most of your data in memory. The training would be
an order of magnitute faster than using external memory. an order of magnitude faster than using external memory.
**************** ****************
Text File Inputs Text File Inputs

View File

@ -233,7 +233,7 @@ This has lead to some interesting implications of feature interaction constraint
``[[0, 1], [0, 1, 2], [1, 2]]`` as another example. Assuming we have only 3 available ``[[0, 1], [0, 1, 2], [1, 2]]`` as another example. Assuming we have only 3 available
features in our training datasets for presentation purpose, careful readers might have features in our training datasets for presentation purpose, careful readers might have
found out that the above constraint is the same as simply ``[[0, 1, 2]]``. Since no matter which found out that the above constraint is the same as simply ``[[0, 1, 2]]``. Since no matter which
feature is chosen for split in the root node, all its descendants are allowd to include every feature is chosen for split in the root node, all its descendants are allowed to include every
feature as legitimate split candidates without violating interaction constraints. feature as legitimate split candidates without violating interaction constraints.
For one last example, we use ``[[0, 1], [1, 3, 4]]`` and choose feature ``0`` as split for For one last example, we use ``[[0, 1], [1, 3, 4]]`` and choose feature ``0`` as split for

View File

@ -11,12 +11,12 @@ Learning to Rank
******** ********
Overview Overview
******** ********
Often in the context of information retrieval, learning-to-rank aims to train a model that arranges a set of query results into an ordered list `[1] <#references>`__. For surprivised learning-to-rank, the predictors are sample documents encoded as feature matrix, and the labels are relevance degree for each sample. Relevance degree can be multi-level (graded) or binary (relevant or not). The training samples are often grouped by their query index with each query group containing multiple query results. Often in the context of information retrieval, learning-to-rank aims to train a model that arranges a set of query results into an ordered list `[1] <#references>`__. For supervised learning-to-rank, the predictors are sample documents encoded as feature matrix, and the labels are relevance degree for each sample. Relevance degree can be multi-level (graded) or binary (relevant or not). The training samples are often grouped by their query index with each query group containing multiple query results.
XGBoost implements learning to rank through a set of objective functions and performance metrics. The default objective is ``rank:ndcg`` based on the ``LambdaMART`` `[2] <#references>`__ algorithm, which in turn is an adaptation of the ``LambdaRank`` `[3] <#references>`__ framework to gradient boosting trees. For a history and a summary of the algorithm, see `[5] <#references>`__. The implementation in XGBoost features deterministic GPU computation, distributed training, position debiasing and two different pair construction strategies. XGBoost implements learning to rank through a set of objective functions and performance metrics. The default objective is ``rank:ndcg`` based on the ``LambdaMART`` `[2] <#references>`__ algorithm, which in turn is an adaptation of the ``LambdaRank`` `[3] <#references>`__ framework to gradient boosting trees. For a history and a summary of the algorithm, see `[5] <#references>`__. The implementation in XGBoost features deterministic GPU computation, distributed training, position debiasing and two different pair construction strategies.
************************************ ************************************
Training with the Pariwise Objective Training with the Pairwise Objective
************************************ ************************************
``LambdaMART`` is a pairwise ranking model, meaning that it compares the relevance degree for every pair of samples in a query group and calculate a proxy gradient for each pair. The default objective ``rank:ndcg`` is using the surrogate gradient derived from the ``ndcg`` metric. To train a XGBoost model, we need an additional sorted array called ``qid`` for specifying the query group of input samples. An example input would look like this: ``LambdaMART`` is a pairwise ranking model, meaning that it compares the relevance degree for every pair of samples in a query group and calculate a proxy gradient for each pair. The default objective ``rank:ndcg`` is using the surrogate gradient derived from the ``ndcg`` metric. To train a XGBoost model, we need an additional sorted array called ``qid`` for specifying the query group of input samples. An example input would look like this:
@ -59,7 +59,7 @@ Notice that the samples are sorted based on their query index in a non-decreasin
X = X[sorted_idx, :] X = X[sorted_idx, :]
y = y[sorted_idx] y = y[sorted_idx]
The simpliest way to train a ranking model is by using the scikit-learn estimator interface. Continuing the previous snippet, we can train a simple ranking model without tuning: The simplest way to train a ranking model is by using the scikit-learn estimator interface. Continuing the previous snippet, we can train a simple ranking model without tuning:
.. code-block:: python .. code-block:: python

View File

@ -138,7 +138,7 @@ This will train on four GPUs in parallel.
Note that it usually does not make sense to allocate more than one GPU per actor, Note that it usually does not make sense to allocate more than one GPU per actor,
as XGBoost relies on distributed libraries such as Dask or Ray to utilize multi as XGBoost relies on distributed libraries such as Dask or Ray to utilize multi
GPU taining. GPU training.
Setting the number of CPUs per actor Setting the number of CPUs per actor
==================================== ====================================