Address some sphinx warnings and errors, add doc for building doc. (#4589)

This commit is contained in:
Jiaming Yuan 2019-06-21 06:07:36 +08:00 committed by Philip Hyunsu Cho
parent 6125521caf
commit 9494950ee7
7 changed files with 44 additions and 23 deletions

View File

@ -57,6 +57,7 @@ to ask questions at `the user forum <https://discuss.xgboost.ai>`_.
* `Python Package Installation`_
* `R Package Installation`_
* `Trouble Shooting`_
* `Building the documentation`_
***************************
Building the Shared Library
@ -448,3 +449,23 @@ Trouble Shooting
.. code-block:: bash
git clone https://github.com/dmlc/xgboost --recursive
Building the Documentation
==========================
XGBoost uses `Sphinx <https://www.sphinx-doc.org/en/stable/>`_ for documentation. To build it locally, you need a installed XGBoost with all its dependencies along with:
* System dependencies
- git
- graphviz
* Python dependencies
- sphinx
- breathe
- guzzle_sphinx_theme
- recommonmark
- mock
Under ``xgboost/doc`` directory, run ``make <format>`` with ``<format>`` replaced by the format you want. For a list of supported formats, run ``make help`` under the same directory.

View File

@ -176,7 +176,8 @@ clang-tidy
To run clang-tidy on both C++ and CUDA source code, run the following command
from the top level source tree:
.. code-black:: bash
.. code-block:: bash
cd /path/to/xgboost/
python3 tests/ci_build/tidy.py --gtest-path=/path/to/google-test
@ -186,13 +187,15 @@ Also, the script accepts two optional integer arguments, namely ``--cpp`` and ``
By default they are both set to 1. If you want to exclude CUDA source from
linting, use:
.. code-black:: bash
.. code-block:: bash
cd /path/to/xgboost/
python3 tests/ci_build/tidy.py --cuda=0
Similarly, if you want to exclude C++ source from linting:
.. code-black:: bash
.. code-block:: bash
cd /path/to/xgboost/
python3 tests/ci_build/tidy.py --cpp=0
@ -260,7 +263,7 @@ The following steps are followed to add a new Rmarkdown vignettes:
- If you already cloned the repo to doc, this means ``git add``
- Create PR for both the markdown and ``dmlc/web-data``.
- Create PR for both the markdown and ``dmlc/web-data``.
- You can also build the document locally by typing the following command at the ``doc`` directory:
.. code-block:: bash

View File

@ -211,7 +211,7 @@ Training time time on 1,000,000 rows x 50 columns with 500 boosting iterations a
See `GPU Accelerated XGBoost <https://xgboost.ai/2016/12/14/GPU-accelerated-xgboost.html>`_ and `Updates to the XGBoost GPU algorithms <https://xgboost.ai/2018/07/04/gpu-xgboost-update.html>`_ for additional performance benchmarks of the ``gpu_exact`` and ``gpu_hist`` tree methods.
Developer notes
==========
===============
The application may be profiled with annotations by specifying USE_NTVX to cmake and providing the path to the stand-alone nvtx header via NVTX_HEADER_DIR. Regions covered by the 'Monitor' class in cuda code will automatically appear in the nsight profiler.
**********
@ -222,7 +222,7 @@ References
`Nvidia Parallel Forall: Gradient Boosting, Decision Trees and XGBoost with CUDA <https://devblogs.nvidia.com/parallelforall/gradient-boosting-decision-trees-xgboost-cuda/>`_
Contributors
=======
============
Many thanks to the following contributors (alphabetical order):
* Andrey Adinets

View File

@ -154,7 +154,7 @@ Now, we have a DataFrame containing only two columns, "features" which contains
labels. A DataFrame like this (containing vector-represented features and numeric labels) can be fed to XGBoost4J-Spark's training engine directly.
Dealing with missing values
~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Strategies to handle missing values (and therefore overcome issues as above):
@ -244,7 +244,7 @@ When it comes to custom eval metrics, in additional to ``num_early_stopping_roun
For example, we need to maximize the evaluation metrics (set ``maximize_evaluation_metrics`` with true), and set ``num_early_stopping_rounds`` with 5. The evaluation metric of 10th iteration is the maximum one until now. In the following iterations, if there is no evaluation metric greater than the 10th iteration's (best one), the traning would be early stopped at 15th iteration.
Training with Evaluation Sets
----------------
-----------------------------
You can also monitor the performance of the model during training with multiple evaluation datasets. By specifying ``eval_sets`` or call ``setEvalSets`` over a XGBoostClassifier or XGBoostRegressor, you can pass in multiple evaluation datasets typed as a Map from String to DataFrame.

View File

@ -83,17 +83,12 @@ Parameters for Tree Booster
- range: (0,1]
* ``colsample_bytree``, ``colsample_bylevel``, ``colsample_bynode`` [default=1]
- This is a family of parameters for subsampling of columns.
- All ``colsample_by*`` parameters have a range of (0, 1], the default value of 1, and
specify the fraction of columns to be subsampled.
- ``colsample_bytree`` is the subsample ratio of columns when constructing each
tree. Subsampling occurs once for every tree constructed.
- ``colsample_bylevel`` is the subsample ratio of columns for each level. Subsampling
occurs once for every new depth level reached in a tree. Columns are subsampled from
the set of columns chosen for the current tree.
- ``colsample_bynode`` is the subsample ratio of columns for each node
(split). Subsampling occurs once every time a new split is evaluated. Columns are
subsampled from the set of columns chosen for the current level.
- All ``colsample_by*`` parameters have a range of (0, 1], the default value of 1, and specify the fraction of columns to be subsampled.
- ``colsample_bytree`` is the subsample ratio of columns when constructing each tree. Subsampling occurs once for every tree constructed.
- ``colsample_bylevel`` is the subsample ratio of columns for each level. Subsampling occurs once for every new depth level reached in a tree. Columns are subsampled from the set of columns chosen for the current tree.
- ``colsample_bynode`` is the subsample ratio of columns for each node (split). Subsampling occurs once every time a new split is evaluated. Columns are subsampled from the set of columns chosen for the current level.
- ``colsample_by*`` parameters work cumulatively. For instance,
the combination ``{'colsample_bytree':0.5, 'colsample_bylevel':0.5,
'colsample_bynode':0.5}`` with 64 features will leave 8 features to choose from at
@ -294,7 +289,7 @@ Specify the learning task and the corresponding learning objective. The objectiv
* ``objective`` [default=reg:squarederror]
- ``reg:squarederror``: regression with squared loss
- ``reg:squarederror``: regression with squared loss.
- ``reg:squaredlogerror``: regression with squared log loss :math:`\frac{1}{2}[log(pred + 1) - log(label + 1)]^2`. All input labels are required to be greater than -1. Also, see metric ``rmsle`` for possible issue with this objective.
- ``reg:logistic``: logistic regression
- ``binary:logistic``: logistic regression for binary classification, output probability

View File

@ -13,9 +13,9 @@ Scikit-Learn wrapper after 0.82 (not included in 0.82). Please note that the ne
Scikit-Learn wrapper is still **experimental**, which means we might change the interface
whenever needed.
****************
*****************************************
Standalone Random Forest With XGBoost API
****************
*****************************************
The following parameters must be set to enable random forest training.
@ -64,9 +64,9 @@ A random forest model can then be trained as follows::
bst = train(params, dmatrix, num_boost_round=1)
**************************
***************************************************
Standalone Random Forest With Scikit-Learn-Like API
**************************
***************************************************
``XGBRFClassifier`` and ``XGBRFRegressor`` are SKL-like classes that provide random forest
functionality. They are basically versions of ``XGBClassifier`` and ``XGBRegressor`` that

View File

@ -189,12 +189,14 @@ def to_graphviz(booster, fmap='', num_trees=0, rankdir='UT',
'style':'filled,rounded',
'fillcolor':'#78bceb'
}
leaf_node_params : dict (optional)
leaf node configuration
{'shape':'box',
'style':'filled',
'fillcolor':'#e48038'
}
kwargs :
Other keywords passed to graphviz graph_attr