diff --git a/dmlc-core b/dmlc-core index 9fd3b4846..c39001019 160000 --- a/dmlc-core +++ b/dmlc-core @@ -1 +1 @@ -Subproject commit 9fd3b48462a7a651e12a197679f71e043dcb25a2 +Subproject commit c39001019e443c7a061789bd1180f58ce85fc3e6 diff --git a/doc/parameter.md b/doc/parameter.md index f3bccd001..0c1c989d3 100644 --- a/doc/parameter.md +++ b/doc/parameter.md @@ -13,8 +13,7 @@ In R-package, you can use .(dot) to replace under score in the parameters, for e General Parameters ------------------ * booster [default=gbtree] - - which booster to use, can be gbtree, gblinear or dart. -  gbtree and dart use tree based model while gblinear uses linear function. + - which booster to use, can be gbtree, gblinear or dart. gbtree and dart use tree based model while gblinear uses linear function. * silent [default=0] - 0 means printing running messages, 1 means silent mode. * nthread [default to maximum number of threads available if not set] @@ -81,20 +80,20 @@ Additional parameters for Dart Booster - type of sampling algorithm. - "uniform": dropped trees are selected uniformly. - "weighted": dropped trees are selected in proportion to weight. -* normalize_type [default="tree] +* normalize_type [default="tree"] - type of normalization algorithm. - - "tree": New trees have the same weight of each of dropped trees. - weight of new trees are 1 / (k + learnig_rate) - dropped trees are scaled by a factor of k / (k + learning_rate) - - "forest": New trees have the same weight of sum of dropped trees (forest). - weight of new trees are 1 / (1 + learning_rate) - dropped trees are scaled by a factor of 1 / (1 + learning_rate) + - "tree": new trees have the same weight of each of dropped trees. + - weight of new trees are 1 / (k + learnig_rate) + - dropped trees are scaled by a factor of k / (k + learning_rate) + - "forest": new trees have the same weight of sum of dropped trees (forest). + - weight of new trees are 1 / (1 + learning_rate) + - dropped trees are scaled by a factor of 1 / (1 + learning_rate) * rate_drop [default=0.0] - dropout rate. - range: [0.0, 1.0] * skip_drop [default=0.0] - probability of skip dropout. - If a dropout is skipped, new trees are added in the same manner as gbtree. + - If a dropout is skipped, new trees are added in the same manner as gbtree. - range: [0.0, 1.0] Parameters for Linear Booster diff --git a/doc/tutorials/dart.md b/doc/tutorials/dart.md new file mode 100644 index 000000000..44edcb42e --- /dev/null +++ b/doc/tutorials/dart.md @@ -0,0 +1,101 @@ +DART booster +============ +[XGBoost](https://github.com/dmlc/xgboost)) mostly combines a huge number of regression trees with small learning rate. +In this situation, trees added early are significance and trees added late are unimportant. + +Rasmi et.al proposed a new method to add dropout techniques from deep neural nets community to boosted trees, and reported better results in some situations. + +This is a instruction of new tree booster `dart`. + +Original paper +-------------- +Rashmi Korlakai Vinayak, Ran Gilad-Bachrach. "DART: Dropouts meet Multiple Additive Regression Trees." [JMLR](http://www.jmlr.org/proceedings/papers/v38/korlakaivinayak15.pdf) + +Features +-------- +- Drop trees in order to solve the over-fitting. + - Trivial trees (to correct trivial errors) may be prevented. + +Because the randomness introduced in the training, expect the following few difference. +- Training can be slower than `gbtree` because the random dropout prevents usage of prediction buffer. +- The early stop might not be stable, due to the randomness. + +How it works +------------ +- In ``$ m $``th training round, suppose ``$ k $`` trees are selected drop. +- Let ``$ D = \sum_{i \in \mathbf{K}} F_i $`` be leaf scores of dropped trees and ``$ F_m = \eta \tilde{F}_m $`` be leaf scores of a new tree. +- The objective function is following: +```math +\mathrm{Obj} += \sum_{j=1}^n L \left( y_j, \hat{y}_j^{m-1} - D_j + \tilde{F}_m \right) ++ \Omega \left( \tilde{F}_m \right). +``` +- ``$ D $`` and ``$ F_m $`` are overshooting, so using scale factor +```math +\hat{y}_j^m = \sum_{i \not\in \mathbf{K}} F_i + a \left( \sum_{i \in \mathbf{K}} F_i + b F_m \right) . +``` + +Parameters +---------- +### booster +* `dart` + +This booster inherits `gbtree`, so `dart` has also `eta`, `gamma`, `max_depth` and so on. + +Additional parameters are noted below. + +### sample_type +type of sampling algorithm. +* `uniform`: (default) dropped trees are selected uniformly. +* `weighted`: dropped trees are selected in proportion to weight. + +### normalize_type +type of normalization algorithm. +* `tree`: (default) New trees have the same weight of each of dropped trees. +```math +a \left( \sum_{i \in \mathbf{K}} F_i + \frac{1}{k} F_m \right) +&= a \left( \sum_{i \in \mathbf{K}} F_i + \frac{\eta}{k} \tilde{F}_m \right) \\ +&\sim a \left( 1 + \frac{\eta}{k} \right) D \\ +&= a \frac{k + \eta}{k} D = D , \\ +&\quad a = \frac{k}{k + \eta} . +``` + +* `forest`: New trees have the same weight of sum of dropped trees (forest). +```math +a \left( \sum_{i \in \mathbf{K}} F_i + F_m \right) +&= a \left( \sum_{i \in \mathbf{K}} F_i + \eta \tilde{F}_m \right) \\ +&\sim a \left( 1 + \eta \right) D \\ +&= a (1 + \eta) D = D , \\ +&\quad a = \frac{1}{1 + \eta} . +``` + +### rate_drop +dropout rate. +- range: [0.0, 1.0] + +### skip_drop +probability of skipping dropout. +- If a dropout is skipped, new trees are added in the same manner as gbtree. +- range: [0.0, 1.0] + +Sample Script +------------- +```python +import xgboost as xgb +# read in data +dtrain = xgb.DMatrix('demo/data/agaricus.txt.train') +dtest = xgb.DMatrix('demo/data/agaricus.txt.test') +# specify parameters via map +param = {'booster': 'dart', + 'max_depth': 5, 'learning_rate': 0.1, + 'objective': 'binary:logistic', 'silent': True, + 'sample_type': 'uniform', + 'normalize_type': 'tree', + 'rate_drop': 0.1, + 'skip_drop': 0.5} +num_round = 50 +bst = xgb.train(param, dtrain, num_round) +# make prediction +# ntree_limit must not be 0 +preds = bst.predict(dtest, ntree_limit=num_round) +``` diff --git a/doc/tutorials/index.md b/doc/tutorials/index.md index a4edf51c1..fdfba3b1a 100644 --- a/doc/tutorials/index.md +++ b/doc/tutorials/index.md @@ -6,3 +6,4 @@ See [Awesome XGBoost](https://github.com/dmlc/xgboost/tree/master/demo) for link ## Contents - [Introduction to Boosted Trees](../model.md) - [Distributed XGBoost YARN on AWS](aws_yarn.md) +- [DART booster](dart.md)