* Fix most of the link checks from sphinx. * Remove duplicate explicit target name.
This commit is contained in:
parent
d1052b5cfe
commit
4d2ea0d4ef
@ -95,13 +95,13 @@ XGBoost makes use of `GPUTreeShap <https://github.com/rapidsai/gputreeshap>`_ as
|
|||||||
shap_interaction_values = model.predict(dtrain, pred_interactions=True)
|
shap_interaction_values = model.predict(dtrain, pred_interactions=True)
|
||||||
|
|
||||||
See examples `here
|
See examples `here
|
||||||
<https://github.com/dmlc/xgboost/tree/master/demo/gpu_acceleration>`_.
|
<https://github.com/dmlc/xgboost/tree/master/demo/gpu_acceleration>`__.
|
||||||
|
|
||||||
Multi-node Multi-GPU Training
|
Multi-node Multi-GPU Training
|
||||||
=============================
|
=============================
|
||||||
XGBoost supports fully distributed GPU training using `Dask <https://dask.org/>`_. For
|
XGBoost supports fully distributed GPU training using `Dask <https://dask.org/>`_. For
|
||||||
getting started see our tutorial :doc:`/tutorials/dask` and worked examples `here
|
getting started see our tutorial :doc:`/tutorials/dask` and worked examples `here
|
||||||
<https://github.com/dmlc/xgboost/tree/master/demo/dask>`_, also Python documentation
|
<https://github.com/dmlc/xgboost/tree/master/demo/dask>`__, also Python documentation
|
||||||
:ref:`dask_api` for complete reference.
|
:ref:`dask_api` for complete reference.
|
||||||
|
|
||||||
|
|
||||||
@ -238,7 +238,7 @@ Working memory is allocated inside the algorithm proportional to the number of r
|
|||||||
|
|
||||||
The quantile finding algorithm also uses some amount of working device memory. It is able to operate in batches, but is not currently well optimised for sparse data.
|
The quantile finding algorithm also uses some amount of working device memory. It is able to operate in batches, but is not currently well optimised for sparse data.
|
||||||
|
|
||||||
If you are getting out-of-memory errors on a big dataset, try the `external memory version <../tutorials/external_memory.html>`_.
|
If you are getting out-of-memory errors on a big dataset, try the :doc:`external memory version </tutorials/external_memory>`.
|
||||||
|
|
||||||
Developer notes
|
Developer notes
|
||||||
===============
|
===============
|
||||||
|
|||||||
@ -79,7 +79,7 @@ The first thing in data transformation is to load the dataset as Spark's structu
|
|||||||
StructField("class", StringType, true)))
|
StructField("class", StringType, true)))
|
||||||
val rawInput = spark.read.schema(schema).csv("input_path")
|
val rawInput = spark.read.schema(schema).csv("input_path")
|
||||||
|
|
||||||
At the first line, we create a instance of `SparkSession <http://spark.apache.org/docs/latest/sql-programming-guide.html#starting-point-sparksession>`_ which is the entry of any Spark program working with DataFrame. The ``schema`` variable defines the schema of DataFrame wrapping Iris data. With this explicitly set schema, we can define the columns' name as well as their types; otherwise the column name would be the default ones derived by Spark, such as ``_col0``, etc. Finally, we can use Spark's built-in csv reader to load Iris csv file as a DataFrame named ``rawInput``.
|
At the first line, we create a instance of `SparkSession <https://spark.apache.org/docs/latest/sql-getting-started.html#starting-point-sparksession>`_ which is the entry of any Spark program working with DataFrame. The ``schema`` variable defines the schema of DataFrame wrapping Iris data. With this explicitly set schema, we can define the columns' name as well as their types; otherwise the column name would be the default ones derived by Spark, such as ``_col0``, etc. Finally, we can use Spark's built-in csv reader to load Iris csv file as a DataFrame named ``rawInput``.
|
||||||
|
|
||||||
Spark also contains many built-in readers for other format. The latest version of Spark supports CSV, JSON, Parquet, and LIBSVM.
|
Spark also contains many built-in readers for other format. The latest version of Spark supports CSV, JSON, Parquet, and LIBSVM.
|
||||||
|
|
||||||
@ -130,7 +130,7 @@ labels. A DataFrame like this (containing vector-represented features and numeri
|
|||||||
Dealing with missing values
|
Dealing with missing values
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
XGBoost supports missing values by default (`as desribed here <https://xgboost.readthedocs.io/en/latest/faq.html#how-to-deal-with-missing-value>`_).
|
XGBoost supports missing values by default (`as desribed here <https://xgboost.readthedocs.io/en/latest/faq.html#how-to-deal-with-missing-values>`_).
|
||||||
If given a SparseVector, XGBoost will treat any values absent from the SparseVector as missing. You are also able to
|
If given a SparseVector, XGBoost will treat any values absent from the SparseVector as missing. You are also able to
|
||||||
specify to XGBoost to treat a specific value in your Dataset as if it was a missing value. By default XGBoost will treat NaN as the value representing missing.
|
specify to XGBoost to treat a specific value in your Dataset as if it was a missing value. By default XGBoost will treat NaN as the value representing missing.
|
||||||
|
|
||||||
@ -369,7 +369,7 @@ Then we can load this model with single node Python XGBoost:
|
|||||||
|
|
||||||
When interacting with other language bindings, XGBoost also supports saving-models-to and loading-models-from file systems other than the local one. You can use HDFS and S3 by prefixing the path with ``hdfs://`` and ``s3://`` respectively. However, for this capability, you must do **one** of the following:
|
When interacting with other language bindings, XGBoost also supports saving-models-to and loading-models-from file systems other than the local one. You can use HDFS and S3 by prefixing the path with ``hdfs://`` and ``s3://`` respectively. However, for this capability, you must do **one** of the following:
|
||||||
|
|
||||||
1. Build XGBoost4J-Spark with the steps described in `here <https://xgboost.readthedocs.io/en/latest/jvm/index.html#installation-from-source>`_, but turning `USE_HDFS <https://github.com/dmlc/xgboost/blob/e939192978a0c152ad7b49b744630e99d54cffa8/jvm-packages/create_jni.py#L18>`_ (or USE_S3, etc. in the same place) switch on. With this approach, you can reuse the above code example by replacing "nativeModelPath" with a HDFS path.
|
1. Build XGBoost4J-Spark with the steps described in :ref:`here <install_jvm_packages>`, but turning `USE_HDFS <https://github.com/dmlc/xgboost/blob/e939192978a0c152ad7b49b744630e99d54cffa8/jvm-packages/create_jni.py#L18>`_ (or USE_S3, etc. in the same place) switch on. With this approach, you can reuse the above code example by replacing "nativeModelPath" with a HDFS path.
|
||||||
|
|
||||||
- However, if you build with USE_HDFS, etc. you have to ensure that the involved shared object file, e.g. libhdfs.so, is put in the LIBRARY_PATH of your cluster. To avoid the complicated cluster environment configuration, choose the other option.
|
- However, if you build with USE_HDFS, etc. you have to ensure that the involved shared object file, e.g. libhdfs.so, is put in the LIBRARY_PATH of your cluster. To avoid the complicated cluster environment configuration, choose the other option.
|
||||||
|
|
||||||
|
|||||||
@ -366,8 +366,8 @@ Specify the learning task and the corresponding learning objective. The objectiv
|
|||||||
- ``rank:pairwise``: Use LambdaMART to perform pairwise ranking where the pairwise loss is minimized
|
- ``rank:pairwise``: Use LambdaMART to perform pairwise ranking where the pairwise loss is minimized
|
||||||
- ``rank:ndcg``: Use LambdaMART to perform list-wise ranking where `Normalized Discounted Cumulative Gain (NDCG) <http://en.wikipedia.org/wiki/NDCG>`_ is maximized
|
- ``rank:ndcg``: Use LambdaMART to perform list-wise ranking where `Normalized Discounted Cumulative Gain (NDCG) <http://en.wikipedia.org/wiki/NDCG>`_ is maximized
|
||||||
- ``rank:map``: Use LambdaMART to perform list-wise ranking where `Mean Average Precision (MAP) <http://en.wikipedia.org/wiki/Mean_average_precision#Mean_average_precision>`_ is maximized
|
- ``rank:map``: Use LambdaMART to perform list-wise ranking where `Mean Average Precision (MAP) <http://en.wikipedia.org/wiki/Mean_average_precision#Mean_average_precision>`_ is maximized
|
||||||
- ``reg:gamma``: gamma regression with log-link. Output is a mean of gamma distribution. It might be useful, e.g., for modeling insurance claims severity, or for any outcome that might be `gamma-distributed <https://en.wikipedia.org/wiki/Gamma_distribution#Applications>`_.
|
- ``reg:gamma``: gamma regression with log-link. Output is a mean of gamma distribution. It might be useful, e.g., for modeling insurance claims severity, or for any outcome that might be `gamma-distributed <https://en.wikipedia.org/wiki/Gamma_distribution#Occurrence_and_applications>`_.
|
||||||
- ``reg:tweedie``: Tweedie regression with log-link. It might be useful, e.g., for modeling total loss in insurance, or for any outcome that might be `Tweedie-distributed <https://en.wikipedia.org/wiki/Tweedie_distribution#Applications>`_.
|
- ``reg:tweedie``: Tweedie regression with log-link. It might be useful, e.g., for modeling total loss in insurance, or for any outcome that might be `Tweedie-distributed <https://en.wikipedia.org/wiki/Tweedie_distribution#Occurrence_and_applications>`_.
|
||||||
|
|
||||||
* ``base_score`` [default=0.5]
|
* ``base_score`` [default=0.5]
|
||||||
|
|
||||||
@ -390,7 +390,7 @@ Specify the learning task and the corresponding learning objective. The objectiv
|
|||||||
- ``error@t``: a different than 0.5 binary classification threshold value could be specified by providing a numerical value through 't'.
|
- ``error@t``: a different than 0.5 binary classification threshold value could be specified by providing a numerical value through 't'.
|
||||||
- ``merror``: Multiclass classification error rate. It is calculated as ``#(wrong cases)/#(all cases)``.
|
- ``merror``: Multiclass classification error rate. It is calculated as ``#(wrong cases)/#(all cases)``.
|
||||||
- ``mlogloss``: `Multiclass logloss <http://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html>`_.
|
- ``mlogloss``: `Multiclass logloss <http://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html>`_.
|
||||||
- ``auc``: `Receiver Operating Characteristic Area under the Curve <http://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_curve>`_.
|
- ``auc``: `Receiver Operating Characteristic Area under the Curve <https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve>`_.
|
||||||
Available for classification and learning-to-rank tasks.
|
Available for classification and learning-to-rank tasks.
|
||||||
|
|
||||||
- When used with binary classification, the objective should be ``binary:logistic`` or similar functions that work on probability.
|
- When used with binary classification, the objective should be ``binary:logistic`` or similar functions that work on probability.
|
||||||
|
|||||||
@ -11,7 +11,7 @@ In order to run a XGBoost job in a Kubernetes cluster, perform the following ste
|
|||||||
|
|
||||||
1. Install XGBoost Operator on the Kubernetes cluster.
|
1. Install XGBoost Operator on the Kubernetes cluster.
|
||||||
|
|
||||||
a. XGBoost Operator is designed to manage the scheduling and monitoring of XGBoost jobs. Follow `this installation guide <https://github.com/kubeflow/xgboost-operator#installing-xgboost-operator>`_ to install XGBoost Operator.
|
a. XGBoost Operator is designed to manage the scheduling and monitoring of XGBoost jobs. Follow `this installation guide <https://github.com/kubeflow/xgboost-operator#install-xgboost-operator>`_ to install XGBoost Operator.
|
||||||
|
|
||||||
2. Write application code that will be executed by the XGBoost Operator.
|
2. Write application code that will be executed by the XGBoost Operator.
|
||||||
|
|
||||||
|
|||||||
@ -227,15 +227,15 @@ XGBoost has a function called ``dump_model`` in Booster object, which lets you t
|
|||||||
the model in a readable format like ``text``, ``json`` or ``dot`` (graphviz). The primary
|
the model in a readable format like ``text``, ``json`` or ``dot`` (graphviz). The primary
|
||||||
use case for it is for model interpretation or visualization, and is not supposed to be
|
use case for it is for model interpretation or visualization, and is not supposed to be
|
||||||
loaded back to XGBoost. The JSON version has a `schema
|
loaded back to XGBoost. The JSON version has a `schema
|
||||||
<https://github.com/dmlc/xgboost/blob/master/doc/dump.schema>`_. See next section for
|
<https://github.com/dmlc/xgboost/blob/master/doc/dump.schema>`__. See next section for
|
||||||
more info.
|
more info.
|
||||||
|
|
||||||
***********
|
***********
|
||||||
JSON Schema
|
JSON Schema
|
||||||
***********
|
***********
|
||||||
|
|
||||||
Another important feature of JSON format is a documented `Schema
|
Another important feature of JSON format is a documented `schema
|
||||||
<https://json-schema.org/>`_, based on which one can easily reuse the output model from
|
<https://json-schema.org/>`__, based on which one can easily reuse the output model from
|
||||||
XGBoost. Here is the initial draft of JSON schema for the output model (not
|
XGBoost. Here is the initial draft of JSON schema for the output model (not
|
||||||
serialization, which will not be stable as noted above). It's subject to change due to
|
serialization, which will not be stable as noted above). It's subject to change due to
|
||||||
the beta status. For an example of parsing XGBoost tree model, see ``/demo/json-model``.
|
the beta status. For an example of parsing XGBoost tree model, see ``/demo/json-model``.
|
||||||
|
|||||||
@ -1805,7 +1805,7 @@ class Booster(object):
|
|||||||
.. note::
|
.. note::
|
||||||
|
|
||||||
See `Prediction
|
See `Prediction
|
||||||
<https://xgboost.readthedocs.io/en/latest/tutorials/prediction.html>`_
|
<https://xgboost.readthedocs.io/en/latest/prediction.html>`_
|
||||||
for issues like thread safety and a summary of outputs from this function.
|
for issues like thread safety and a summary of outputs from this function.
|
||||||
|
|
||||||
Parameters
|
Parameters
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user