Address some sphinx warnings and errors, add doc for building doc. (#4589)

2019-06-21 06:07:36 +08:00 · 2019-06-21 06:07:36 +08:00 · 9494950ee7
commit 9494950ee7
parent 6125521caf
7 changed files with 44 additions and 23 deletions
--- a/doc/build.rst
+++ b/doc/build.rst
@ -57,6 +57,7 @@ to ask questions at `the user forum <https://discuss.xgboost.ai>`_.
 * `Python Package Installation`_
 * `R Package Installation`_
 * `Trouble Shooting`_
+* `Building the documentation`_

 ***************************
 Building the Shared Library
@ -448,3 +449,23 @@ Trouble Shooting
   .. code-block:: bash

     git clone https://github.com/dmlc/xgboost --recursive
+
+
+Building the Documentation
+==========================
+XGBoost uses `Sphinx <https://www.sphinx-doc.org/en/stable/>`_ for documentation.  To build it locally, you need a installed XGBoost with all its dependencies along with:
+
+* System dependencies
+
+  - git
+  - graphviz
+
+* Python dependencies
+
+  - sphinx
+  - breathe
+  - guzzle_sphinx_theme
+  - recommonmark
+  - mock
+
+Under ``xgboost/doc`` directory, run ``make <format>`` with ``<format>`` replaced by the format you want.  For a list of supported formats, run ``make help`` under the same directory.
--- a/doc/contribute.rst
+++ b/doc/contribute.rst
@ -176,7 +176,8 @@ clang-tidy
 To run clang-tidy on both C++ and CUDA source code,  run the following command
 from the top level source tree:

-  .. code-black:: bash
+  .. code-block:: bash
+
    cd /path/to/xgboost/
    python3 tests/ci_build/tidy.py --gtest-path=/path/to/google-test

@ -186,13 +187,15 @@ Also, the script accepts two optional integer arguments, namely ``--cpp`` and ``
 By default they are both set to 1.  If you want to exclude CUDA source from
 linting, use:

-  .. code-black:: bash
+  .. code-block:: bash
+
    cd /path/to/xgboost/
    python3 tests/ci_build/tidy.py --cuda=0

 Similarly, if you want to exclude C++ source from linting:

-  .. code-black:: bash
+  .. code-block:: bash
+
    cd /path/to/xgboost/
    python3 tests/ci_build/tidy.py --cpp=0

@ -260,7 +263,7 @@ The following steps are followed to add a new Rmarkdown vignettes:

  - If you already cloned the repo to doc, this means ``git add``

- Create PR for both the markdown  and ``dmlc/web-data``.
+- Create PR for both the markdown and ``dmlc/web-data``.
 - You can also build the document locally by typing the following command at the ``doc`` directory:

  .. code-block:: bash
--- a/doc/gpu/index.rst
+++ b/doc/gpu/index.rst
@ -211,7 +211,7 @@ Training time time on 1,000,000 rows x 50 columns with 500 boosting iterations a
 See `GPU Accelerated XGBoost <https://xgboost.ai/2016/12/14/GPU-accelerated-xgboost.html>`_ and `Updates to the XGBoost GPU algorithms <https://xgboost.ai/2018/07/04/gpu-xgboost-update.html>`_ for additional performance benchmarks of the ``gpu_exact`` and ``gpu_hist`` tree methods.

 Developer notes
-==========
+===============
 The application may be profiled with annotations by specifying USE_NTVX to cmake and providing the path to the stand-alone nvtx header via NVTX_HEADER_DIR. Regions covered by the 'Monitor' class in cuda code will automatically appear in the nsight profiler.

 **********
@ -222,7 +222,7 @@ References
 `Nvidia Parallel Forall: Gradient Boosting, Decision Trees and XGBoost with CUDA <https://devblogs.nvidia.com/parallelforall/gradient-boosting-decision-trees-xgboost-cuda/>`_

 Contributors
-=======
+============
 Many thanks to the following contributors (alphabetical order):

 * Andrey Adinets
--- a/doc/jvm/xgboost4j_spark_tutorial.rst
+++ b/doc/jvm/xgboost4j_spark_tutorial.rst
@ -154,7 +154,7 @@ Now, we have a DataFrame containing only two columns, "features" which contains
 labels. A DataFrame like this (containing vector-represented features and numeric labels) can be fed to XGBoost4J-Spark's training engine directly.

 Dealing with missing values
-~~~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~~~~~

 Strategies to handle missing values (and therefore overcome issues as above):

@ -244,7 +244,7 @@ When it comes to custom eval metrics, in additional to ``num_early_stopping_roun
 For example, we need to maximize the evaluation metrics (set ``maximize_evaluation_metrics`` with true), and set ``num_early_stopping_rounds`` with 5. The evaluation metric of 10th iteration is the maximum one until now. In the following iterations, if there is no evaluation metric greater than the 10th iteration's (best one), the traning would be early stopped at 15th iteration.  

 Training with Evaluation Sets
----------------
+-----------------------------

 You can also monitor the performance of the model during training with multiple evaluation datasets. By specifying ``eval_sets`` or call ``setEvalSets`` over a XGBoostClassifier or XGBoostRegressor, you can pass in multiple evaluation datasets typed as a Map from String to DataFrame.

--- a/doc/parameter.rst
+++ b/doc/parameter.rst
@ -83,17 +83,12 @@ Parameters for Tree Booster
  - range: (0,1]

 * ``colsample_bytree``, ``colsample_bylevel``, ``colsample_bynode`` [default=1]
+
  - This is a family of parameters for subsampling of columns.
-  - All ``colsample_by*`` parameters have a range of (0, 1], the default value of 1, and
-    specify the fraction of columns to be subsampled.
-  - ``colsample_bytree`` is the subsample ratio of columns when constructing each
-    tree. Subsampling occurs once for every tree constructed.
-  - ``colsample_bylevel`` is the subsample ratio of columns for each level. Subsampling
-    occurs once for every new depth level reached in a tree. Columns are subsampled from
-    the set of columns chosen for the current tree.
-  - ``colsample_bynode`` is the subsample ratio of columns for each node
-    (split). Subsampling occurs once every time a new split is evaluated. Columns are
-    subsampled from the set of columns chosen for the current level.
+  - All ``colsample_by*`` parameters have a range of (0, 1], the default value of 1, and specify the fraction of columns to be subsampled.
+  - ``colsample_bytree`` is the subsample ratio of columns when constructing each tree. Subsampling occurs once for every tree constructed.
+  - ``colsample_bylevel`` is the subsample ratio of columns for each level. Subsampling occurs once for every new depth level reached in a tree. Columns are subsampled from the set of columns chosen for the current tree.
+  - ``colsample_bynode`` is the subsample ratio of columns for each node (split). Subsampling occurs once every time a new split is evaluated. Columns are subsampled from the set of columns chosen for the current level.
  - ``colsample_by*`` parameters work cumulatively. For instance,
    the combination ``{'colsample_bytree':0.5, 'colsample_bylevel':0.5,
    'colsample_bynode':0.5}`` with 64 features will leave 8 features to choose from at
@ -294,7 +289,7 @@ Specify the learning task and the corresponding learning objective. The objectiv

 * ``objective`` [default=reg:squarederror]

-  - ``reg:squarederror``: regression with squared loss
+  - ``reg:squarederror``: regression with squared loss.
  - ``reg:squaredlogerror``: regression with squared log loss :math:`\frac{1}{2}[log(pred + 1) - log(label + 1)]^2`.  All input labels are required to be greater than -1.  Also, see metric ``rmsle`` for possible issue  with this objective.
  - ``reg:logistic``: logistic regression
  - ``binary:logistic``: logistic regression for binary classification, output probability
--- a/doc/tutorials/rf.rst
+++ b/doc/tutorials/rf.rst
@ -13,9 +13,9 @@ Scikit-Learn wrapper after 0.82 (not included in 0.82).  Please note that the ne
 Scikit-Learn wrapper is still **experimental**, which means we might change the interface
 whenever needed.

-****************
+*****************************************
 Standalone Random Forest With XGBoost API
-****************
+*****************************************

 The following parameters must be set to enable random forest training.

@ -64,9 +64,9 @@ A random forest model can then be trained as follows::
  bst = train(params, dmatrix, num_boost_round=1)


-**************************
+***************************************************
 Standalone Random Forest With Scikit-Learn-Like API
-**************************
+***************************************************

 ``XGBRFClassifier`` and ``XGBRFRegressor`` are SKL-like classes that provide random forest
 functionality. They are basically versions of ``XGBClassifier`` and ``XGBRegressor`` that
--- a/python-package/xgboost/plotting.py
+++ b/python-package/xgboost/plotting.py
@ -189,12 +189,14 @@ def to_graphviz(booster, fmap='', num_trees=0, rankdir='UT',
               'style':'filled,rounded',
               'fillcolor':'#78bceb'
        }
+
    leaf_node_params : dict (optional)
        leaf node configuration
        {'shape':'box',
               'style':'filled',
               'fillcolor':'#e48038'
        }
+
    kwargs :
        Other keywords passed to graphviz graph_attr