[Doc] fix typos in documentation (#9458)

This commit is contained in:
James Lamb
2023-08-10 06:26:36 -05:00
committed by GitHub
parent 4359356d46
commit 9dbb71490c
18 changed files with 32 additions and 31 deletions

View File

@@ -55,7 +55,7 @@ To ensure that CMake can locate the XGBoost library, supply ``-DCMAKE_PREFIX_PAT
.. code-block:: bash
# Nagivate to the build directory for your application
# Navigate to the build directory for your application
cd build
# Activate the Conda environment where we previously installed XGBoost
conda activate [env_name]
@@ -65,7 +65,7 @@ To ensure that CMake can locate the XGBoost library, supply ``-DCMAKE_PREFIX_PAT
make
************************
Usefull Tips To Remember
Useful Tips To Remember
************************
Below are some useful tips while using C API:
@@ -151,7 +151,7 @@ c. Assertion technique: It works both in C/ C++. If expression evaluates to 0 (f
Example if we our training data is in ``dense matrix`` format then your prediction dataset should also be a ``dense matrix`` or if training in ``libsvm`` format then dataset for prediction should also be in ``libsvm`` format.
4. Always use strings for setting values to the parameters in booster handle object. The paramter value can be of any data type (e.g. int, char, float, double, etc), but they should always be encoded as strings.
4. Always use strings for setting values to the parameters in booster handle object. The parameter value can be of any data type (e.g. int, char, float, double, etc), but they should always be encoded as strings.
.. code-block:: c
@@ -168,7 +168,7 @@ Sample examples along with Code snippet to use C API functions
.. code-block:: c
DMatrixHandle data; // handle to DMatrix
// Load the dat from file & store it in data variable of DMatrixHandle datatype
// Load the data from file & store it in data variable of DMatrixHandle datatype
safe_xgboost(XGDMatrixCreateFromFile("/path/to/file/filename", silent, &data));
@@ -278,7 +278,7 @@ Sample examples along with Code snippet to use C API functions
uint64_t const* out_shape;
/* Dimension of output prediction */
uint64_t out_dim;
/* Pointer to a thread local contigious array, assigned in prediction function. */
/* Pointer to a thread local contiguous array, assigned in prediction function. */
float const* out_result = NULL;
safe_xgboost(
XGBoosterPredictFromDMatrix(booster, dmatrix, config, &out_shape, &out_dim, &out_result));

View File

@@ -38,7 +38,7 @@ Although XGBoost has native support for said functions, using it for demonstrati
provides us the opportunity of comparing the result from our own implementation and the
one from XGBoost internal for learning purposes. After finishing this tutorial, we should
be able to provide our own functions for rapid experiments. And at the end, we will
provide some notes on non-identy link function along with examples of using custom metric
provide some notes on non-identity link function along with examples of using custom metric
and objective with the `scikit-learn` interface.
If we compute the gradient of said objective function:
@@ -165,7 +165,7 @@ Reverse Link Function
When using builtin objective, the raw prediction is transformed according to the objective
function. When a custom objective is provided XGBoost doesn't know its link function so the
user is responsible for making the transformation for both objective and custom evaluation
metric. For objective with identiy link like ``squared error`` this is trivial, but for
metric. For objective with identity link like ``squared error`` this is trivial, but for
other link functions like log link or inverse link the difference is significant.
For the Python package, the behaviour of prediction can be controlled by the
@@ -173,7 +173,7 @@ For the Python package, the behaviour of prediction can be controlled by the
parameter without a custom objective, the metric function will receive transformed
prediction since the objective is defined by XGBoost. However, when the custom objective is
also provided along with that metric, then both the objective and custom metric will
recieve raw prediction. The following example provides a comparison between two different
receive raw prediction. The following example provides a comparison between two different
behavior with a multi-class classification model. Firstly we define 2 different Python
metric functions implementing the same underlying metric for comparison,
`merror_with_transform` is used when custom objective is also used, otherwise the simpler

View File

@@ -256,7 +256,7 @@ In the example below, a ``KubeCluster`` is used for `deploying Dask on Kubernete
m = 1000
n = 10
kWorkers = 2 # assuming you have 2 GPU nodes on that cluster.
# You need to work out the worker-spec youself. See document in dask_kubernetes for
# You need to work out the worker-spec yourself. See document in dask_kubernetes for
# its usage. Here we just want to show that XGBoost works on various clusters.
cluster = KubeCluster.from_yaml('worker-spec.yaml', deploy_mode='remote')
cluster.scale(kWorkers) # scale to use all GPUs
@@ -648,7 +648,7 @@ environment than training the model using a single node due to aforementioned cr
Memory Usage
************
Here are some pratices on reducing memory usage with dask and xgboost.
Here are some practices on reducing memory usage with dask and xgboost.
- In a distributed work flow, data is best loaded by dask collections directly instead of
loaded by client process. When loading with client process is unavoidable, use

View File

@@ -7,7 +7,7 @@ dataset needs to be loaded into memory. This can be costly and sometimes
infeasible. Staring from 1.5, users can define a custom iterator to load data in chunks
for running XGBoost algorithms. External memory can be used for both training and
prediction, but training is the primary use case and it will be our focus in this
tutorial. For prediction and evaluation, users can iterate through the data themseleves
tutorial. For prediction and evaluation, users can iterate through the data themselves
while training requires the full dataset to be loaded into the memory.
During training, there are two different modes for external memory support available in
@@ -142,7 +142,7 @@ see `this paper <https://arxiv.org/abs/2005.09148>`_.
.. warning::
When GPU is running out of memory during iteration on external memory, user might
recieve a segfault instead of an OOM exception.
receive a segfault instead of an OOM exception.
.. _ext_remarks:
@@ -150,7 +150,7 @@ see `this paper <https://arxiv.org/abs/2005.09148>`_.
Remarks
*******
When using external memory with XBGoost, data is divided into smaller chunks so that only
When using external memory with XGBoost, data is divided into smaller chunks so that only
a fraction of it needs to be stored in memory at any given time. It's important to note
that this method only applies to the predictor data (``X``), while other data, like labels
and internal runtime structures are concatenated. This means that memory reduction is most
@@ -211,7 +211,7 @@ construction of `QuantileDmatrix` with data chunks. On the other hand, if it's p
doesn't fetch data during training. On the other hand, the external memory `DMatrix`
fetches data batches from external memory on-demand. Use the `QuantileDMatrix` (with
iterator if necessary) when you can fit most of your data in memory. The training would be
an order of magnitute faster than using external memory.
an order of magnitude faster than using external memory.
****************
Text File Inputs

View File

@@ -233,7 +233,7 @@ This has lead to some interesting implications of feature interaction constraint
``[[0, 1], [0, 1, 2], [1, 2]]`` as another example. Assuming we have only 3 available
features in our training datasets for presentation purpose, careful readers might have
found out that the above constraint is the same as simply ``[[0, 1, 2]]``. Since no matter which
feature is chosen for split in the root node, all its descendants are allowd to include every
feature is chosen for split in the root node, all its descendants are allowed to include every
feature as legitimate split candidates without violating interaction constraints.
For one last example, we use ``[[0, 1], [1, 3, 4]]`` and choose feature ``0`` as split for

View File

@@ -11,12 +11,12 @@ Learning to Rank
********
Overview
********
Often in the context of information retrieval, learning-to-rank aims to train a model that arranges a set of query results into an ordered list `[1] <#references>`__. For surprivised learning-to-rank, the predictors are sample documents encoded as feature matrix, and the labels are relevance degree for each sample. Relevance degree can be multi-level (graded) or binary (relevant or not). The training samples are often grouped by their query index with each query group containing multiple query results.
Often in the context of information retrieval, learning-to-rank aims to train a model that arranges a set of query results into an ordered list `[1] <#references>`__. For supervised learning-to-rank, the predictors are sample documents encoded as feature matrix, and the labels are relevance degree for each sample. Relevance degree can be multi-level (graded) or binary (relevant or not). The training samples are often grouped by their query index with each query group containing multiple query results.
XGBoost implements learning to rank through a set of objective functions and performance metrics. The default objective is ``rank:ndcg`` based on the ``LambdaMART`` `[2] <#references>`__ algorithm, which in turn is an adaptation of the ``LambdaRank`` `[3] <#references>`__ framework to gradient boosting trees. For a history and a summary of the algorithm, see `[5] <#references>`__. The implementation in XGBoost features deterministic GPU computation, distributed training, position debiasing and two different pair construction strategies.
************************************
Training with the Pariwise Objective
Training with the Pairwise Objective
************************************
``LambdaMART`` is a pairwise ranking model, meaning that it compares the relevance degree for every pair of samples in a query group and calculate a proxy gradient for each pair. The default objective ``rank:ndcg`` is using the surrogate gradient derived from the ``ndcg`` metric. To train a XGBoost model, we need an additional sorted array called ``qid`` for specifying the query group of input samples. An example input would look like this:
@@ -59,7 +59,7 @@ Notice that the samples are sorted based on their query index in a non-decreasin
X = X[sorted_idx, :]
y = y[sorted_idx]
The simpliest way to train a ranking model is by using the scikit-learn estimator interface. Continuing the previous snippet, we can train a simple ranking model without tuning:
The simplest way to train a ranking model is by using the scikit-learn estimator interface. Continuing the previous snippet, we can train a simple ranking model without tuning:
.. code-block:: python

View File

@@ -138,7 +138,7 @@ This will train on four GPUs in parallel.
Note that it usually does not make sense to allocate more than one GPU per actor,
as XGBoost relies on distributed libraries such as Dask or Ray to utilize multi
GPU taining.
GPU training.
Setting the number of CPUs per actor
====================================