Address some sphinx warnings and errors, add doc for building doc. (#4589)
This commit is contained in:
parent
6125521caf
commit
9494950ee7
@ -57,6 +57,7 @@ to ask questions at `the user forum <https://discuss.xgboost.ai>`_.
|
||||
* `Python Package Installation`_
|
||||
* `R Package Installation`_
|
||||
* `Trouble Shooting`_
|
||||
* `Building the documentation`_
|
||||
|
||||
***************************
|
||||
Building the Shared Library
|
||||
@ -448,3 +449,23 @@ Trouble Shooting
|
||||
.. code-block:: bash
|
||||
|
||||
git clone https://github.com/dmlc/xgboost --recursive
|
||||
|
||||
|
||||
Building the Documentation
|
||||
==========================
|
||||
XGBoost uses `Sphinx <https://www.sphinx-doc.org/en/stable/>`_ for documentation. To build it locally, you need a installed XGBoost with all its dependencies along with:
|
||||
|
||||
* System dependencies
|
||||
|
||||
- git
|
||||
- graphviz
|
||||
|
||||
* Python dependencies
|
||||
|
||||
- sphinx
|
||||
- breathe
|
||||
- guzzle_sphinx_theme
|
||||
- recommonmark
|
||||
- mock
|
||||
|
||||
Under ``xgboost/doc`` directory, run ``make <format>`` with ``<format>`` replaced by the format you want. For a list of supported formats, run ``make help`` under the same directory.
|
||||
|
||||
@ -176,7 +176,8 @@ clang-tidy
|
||||
To run clang-tidy on both C++ and CUDA source code, run the following command
|
||||
from the top level source tree:
|
||||
|
||||
.. code-black:: bash
|
||||
.. code-block:: bash
|
||||
|
||||
cd /path/to/xgboost/
|
||||
python3 tests/ci_build/tidy.py --gtest-path=/path/to/google-test
|
||||
|
||||
@ -186,13 +187,15 @@ Also, the script accepts two optional integer arguments, namely ``--cpp`` and ``
|
||||
By default they are both set to 1. If you want to exclude CUDA source from
|
||||
linting, use:
|
||||
|
||||
.. code-black:: bash
|
||||
.. code-block:: bash
|
||||
|
||||
cd /path/to/xgboost/
|
||||
python3 tests/ci_build/tidy.py --cuda=0
|
||||
|
||||
Similarly, if you want to exclude C++ source from linting:
|
||||
|
||||
.. code-black:: bash
|
||||
.. code-block:: bash
|
||||
|
||||
cd /path/to/xgboost/
|
||||
python3 tests/ci_build/tidy.py --cpp=0
|
||||
|
||||
@ -260,7 +263,7 @@ The following steps are followed to add a new Rmarkdown vignettes:
|
||||
|
||||
- If you already cloned the repo to doc, this means ``git add``
|
||||
|
||||
- Create PR for both the markdown and ``dmlc/web-data``.
|
||||
- Create PR for both the markdown and ``dmlc/web-data``.
|
||||
- You can also build the document locally by typing the following command at the ``doc`` directory:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
@ -211,7 +211,7 @@ Training time time on 1,000,000 rows x 50 columns with 500 boosting iterations a
|
||||
See `GPU Accelerated XGBoost <https://xgboost.ai/2016/12/14/GPU-accelerated-xgboost.html>`_ and `Updates to the XGBoost GPU algorithms <https://xgboost.ai/2018/07/04/gpu-xgboost-update.html>`_ for additional performance benchmarks of the ``gpu_exact`` and ``gpu_hist`` tree methods.
|
||||
|
||||
Developer notes
|
||||
==========
|
||||
===============
|
||||
The application may be profiled with annotations by specifying USE_NTVX to cmake and providing the path to the stand-alone nvtx header via NVTX_HEADER_DIR. Regions covered by the 'Monitor' class in cuda code will automatically appear in the nsight profiler.
|
||||
|
||||
**********
|
||||
@ -222,7 +222,7 @@ References
|
||||
`Nvidia Parallel Forall: Gradient Boosting, Decision Trees and XGBoost with CUDA <https://devblogs.nvidia.com/parallelforall/gradient-boosting-decision-trees-xgboost-cuda/>`_
|
||||
|
||||
Contributors
|
||||
=======
|
||||
============
|
||||
Many thanks to the following contributors (alphabetical order):
|
||||
|
||||
* Andrey Adinets
|
||||
|
||||
@ -154,7 +154,7 @@ Now, we have a DataFrame containing only two columns, "features" which contains
|
||||
labels. A DataFrame like this (containing vector-represented features and numeric labels) can be fed to XGBoost4J-Spark's training engine directly.
|
||||
|
||||
Dealing with missing values
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Strategies to handle missing values (and therefore overcome issues as above):
|
||||
|
||||
@ -244,7 +244,7 @@ When it comes to custom eval metrics, in additional to ``num_early_stopping_roun
|
||||
For example, we need to maximize the evaluation metrics (set ``maximize_evaluation_metrics`` with true), and set ``num_early_stopping_rounds`` with 5. The evaluation metric of 10th iteration is the maximum one until now. In the following iterations, if there is no evaluation metric greater than the 10th iteration's (best one), the traning would be early stopped at 15th iteration.
|
||||
|
||||
Training with Evaluation Sets
|
||||
----------------
|
||||
-----------------------------
|
||||
|
||||
You can also monitor the performance of the model during training with multiple evaluation datasets. By specifying ``eval_sets`` or call ``setEvalSets`` over a XGBoostClassifier or XGBoostRegressor, you can pass in multiple evaluation datasets typed as a Map from String to DataFrame.
|
||||
|
||||
|
||||
@ -83,17 +83,12 @@ Parameters for Tree Booster
|
||||
- range: (0,1]
|
||||
|
||||
* ``colsample_bytree``, ``colsample_bylevel``, ``colsample_bynode`` [default=1]
|
||||
|
||||
- This is a family of parameters for subsampling of columns.
|
||||
- All ``colsample_by*`` parameters have a range of (0, 1], the default value of 1, and
|
||||
specify the fraction of columns to be subsampled.
|
||||
- ``colsample_bytree`` is the subsample ratio of columns when constructing each
|
||||
tree. Subsampling occurs once for every tree constructed.
|
||||
- ``colsample_bylevel`` is the subsample ratio of columns for each level. Subsampling
|
||||
occurs once for every new depth level reached in a tree. Columns are subsampled from
|
||||
the set of columns chosen for the current tree.
|
||||
- ``colsample_bynode`` is the subsample ratio of columns for each node
|
||||
(split). Subsampling occurs once every time a new split is evaluated. Columns are
|
||||
subsampled from the set of columns chosen for the current level.
|
||||
- All ``colsample_by*`` parameters have a range of (0, 1], the default value of 1, and specify the fraction of columns to be subsampled.
|
||||
- ``colsample_bytree`` is the subsample ratio of columns when constructing each tree. Subsampling occurs once for every tree constructed.
|
||||
- ``colsample_bylevel`` is the subsample ratio of columns for each level. Subsampling occurs once for every new depth level reached in a tree. Columns are subsampled from the set of columns chosen for the current tree.
|
||||
- ``colsample_bynode`` is the subsample ratio of columns for each node (split). Subsampling occurs once every time a new split is evaluated. Columns are subsampled from the set of columns chosen for the current level.
|
||||
- ``colsample_by*`` parameters work cumulatively. For instance,
|
||||
the combination ``{'colsample_bytree':0.5, 'colsample_bylevel':0.5,
|
||||
'colsample_bynode':0.5}`` with 64 features will leave 8 features to choose from at
|
||||
@ -294,7 +289,7 @@ Specify the learning task and the corresponding learning objective. The objectiv
|
||||
|
||||
* ``objective`` [default=reg:squarederror]
|
||||
|
||||
- ``reg:squarederror``: regression with squared loss
|
||||
- ``reg:squarederror``: regression with squared loss.
|
||||
- ``reg:squaredlogerror``: regression with squared log loss :math:`\frac{1}{2}[log(pred + 1) - log(label + 1)]^2`. All input labels are required to be greater than -1. Also, see metric ``rmsle`` for possible issue with this objective.
|
||||
- ``reg:logistic``: logistic regression
|
||||
- ``binary:logistic``: logistic regression for binary classification, output probability
|
||||
|
||||
@ -13,9 +13,9 @@ Scikit-Learn wrapper after 0.82 (not included in 0.82). Please note that the ne
|
||||
Scikit-Learn wrapper is still **experimental**, which means we might change the interface
|
||||
whenever needed.
|
||||
|
||||
****************
|
||||
*****************************************
|
||||
Standalone Random Forest With XGBoost API
|
||||
****************
|
||||
*****************************************
|
||||
|
||||
The following parameters must be set to enable random forest training.
|
||||
|
||||
@ -64,9 +64,9 @@ A random forest model can then be trained as follows::
|
||||
bst = train(params, dmatrix, num_boost_round=1)
|
||||
|
||||
|
||||
**************************
|
||||
***************************************************
|
||||
Standalone Random Forest With Scikit-Learn-Like API
|
||||
**************************
|
||||
***************************************************
|
||||
|
||||
``XGBRFClassifier`` and ``XGBRFRegressor`` are SKL-like classes that provide random forest
|
||||
functionality. They are basically versions of ``XGBClassifier`` and ``XGBRegressor`` that
|
||||
|
||||
@ -189,12 +189,14 @@ def to_graphviz(booster, fmap='', num_trees=0, rankdir='UT',
|
||||
'style':'filled,rounded',
|
||||
'fillcolor':'#78bceb'
|
||||
}
|
||||
|
||||
leaf_node_params : dict (optional)
|
||||
leaf node configuration
|
||||
{'shape':'box',
|
||||
'style':'filled',
|
||||
'fillcolor':'#e48038'
|
||||
}
|
||||
|
||||
kwargs :
|
||||
Other keywords passed to graphviz graph_attr
|
||||
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user