Doc modernization (#3474)

* Change doc build to reST exclusively * Rewrite Intro doc in reST; create toctree * Update parameter and contribute * Convert tutorials to reST * Convert Python tutorials to reST * Convert CLI and Julia docs to reST * Enable markdown for R vignettes * Done migrating to reST * Add guzzle_sphinx_theme to requirements * Add breathe to requirements * Fix search bar * Add link to user forum
2018-07-19 14:22:16 -07:00
parent c004cea788
commit 05b089405d
57 changed files with 2833 additions and 3957 deletions
--- a/doc/tutorials/dart.rst
+++ b/doc/tutorials/dart.rst
@@ -0,0 +1,113 @@
+############
+DART booster
+############
+XGBoost mostly combines a huge number of regression trees with a small learning rate.
+In this situation, trees added early are significant and trees added late are unimportant.
+
+Vinayak and Gilad-Bachrach proposed a new method to add dropout techniques from the deep neural net community to boosted trees, and reported better results in some situations.
+
+This is a instruction of new tree booster ``dart``.
+
+**************
+Original paper
+**************
+Rashmi Korlakai Vinayak, Ran Gilad-Bachrach. "DART: Dropouts meet Multiple Additive Regression Trees." `JMLR <http://www.jmlr.org/proceedings/papers/v38/korlakaivinayak15.pdf>`_.
+
+********
+Features
+********
+- Drop trees in order to solve the over-fitting.
+
+  - Trivial trees (to correct trivial errors) may be prevented.
+
+Because of the randomness introduced in the training, expect the following few differences:
+
+- Training can be slower than ``gbtree`` because the random dropout prevents usage of the prediction buffer.
+- The early stop might not be stable, due to the randomness.
+
+************
+How it works
+************
+- In :math:`m`-th training round, suppose :math:`k` trees are selected to be dropped.
+- Let :math:`D = \sum_{i \in \mathbf{K}} F_i` be the leaf scores of dropped trees and :math:`F_m = \eta \tilde{F}_m` be the leaf scores of a new tree.
+- The objective function is as follows:
+
+.. math::
+
+  \mathrm{Obj}
+  = \sum_{j=1}^n L \left( y_j, \hat{y}_j^{m-1} - D_j + \tilde{F}_m \right)
+  + \Omega \left( \tilde{F}_m \right).
+
+- :math:`D` and :math:`F_m` are overshooting, so using scale factor
+
+.. math::
+
+  \hat{y}_j^m = \sum_{i \not\in \mathbf{K}} F_i + a \left( \sum_{i \in \mathbf{K}} F_i + b F_m \right) .
+
+**********
+Parameters
+**********
+
+The booster ``dart`` inherits ``gbtree`` booster, so it supports all parameters that ``gbtree`` does, such as ``eta``, ``gamma``, ``max_depth`` etc.
+
+Additional parameters are noted below:
+
+* ``sample_type``: type of sampling algorithm.
+
+  - ``uniform``: (default) dropped trees are selected uniformly.
+  - ``weighted``: dropped trees are selected in proportion to weight.
+
+* ``normalize_type``: type of normalization algorithm.
+
+  - ``tree``: (default) New trees have the same weight of each of dropped trees.
+
+  .. math::
+
+    a \left( \sum_{i \in \mathbf{K}} F_i + \frac{1}{k} F_m \right)
+    &= a \left( \sum_{i \in \mathbf{K}} F_i + \frac{\eta}{k} \tilde{F}_m \right) \\
+    &\sim a \left( 1 + \frac{\eta}{k} \right) D \\
+    &= a \frac{k + \eta}{k} D = D , \\
+    &\quad a = \frac{k}{k + \eta}
+
+  - ``forest``: New trees have the same weight of sum of dropped trees (forest).
+
+  .. math::
+
+    a \left( \sum_{i \in \mathbf{K}} F_i + F_m \right)
+    &= a \left( \sum_{i \in \mathbf{K}} F_i + \eta \tilde{F}_m \right) \\
+    &\sim a \left( 1 + \eta \right) D \\
+    &= a (1 + \eta) D = D , \\
+    &\quad a = \frac{1}{1 + \eta} .
+
+* ``rate_drop``: dropout rate.
+
+  - range: [0.0, 1.0]
+
+* ``skip_drop``: probability of skipping dropout.
+
+  - If a dropout is skipped, new trees are added in the same manner as gbtree.
+  - range: [0.0, 1.0]
+
+*************
+Sample Script
+*************
+
+.. code-block:: python
+
+  import xgboost as xgb
+  # read in data
+  dtrain = xgb.DMatrix('demo/data/agaricus.txt.train')
+  dtest = xgb.DMatrix('demo/data/agaricus.txt.test')
+  # specify parameters via map
+  param = {'booster': 'dart',
+           'max_depth': 5, 'learning_rate': 0.1,
+           'objective': 'binary:logistic', 'silent': True,
+           'sample_type': 'uniform',
+           'normalize_type': 'tree',
+           'rate_drop': 0.1,
+           'skip_drop': 0.5}
+  num_round = 50
+  bst = xgb.train(param, dtrain, num_round)
+  # make prediction
+  # ntree_limit must not be 0
+  preds = bst.predict(dtest, ntree_limit=num_round)