add Dart tutorial

This commit is contained in:
marugari 2016-06-12 19:06:28 +09:00
parent f14c160f4f
commit c332eb5a2b
4 changed files with 112 additions and 11 deletions

@ -1 +1 @@
Subproject commit 9fd3b48462a7a651e12a197679f71e043dcb25a2 Subproject commit c39001019e443c7a061789bd1180f58ce85fc3e6

View File

@ -13,8 +13,7 @@ In R-package, you can use .(dot) to replace under score in the parameters, for e
General Parameters General Parameters
------------------ ------------------
* booster [default=gbtree] * booster [default=gbtree]
- which booster to use, can be gbtree, gblinear or dart. - which booster to use, can be gbtree, gblinear or dart. gbtree and dart use tree based model while gblinear uses linear function.
 gbtree and dart use tree based model while gblinear uses linear function.
* silent [default=0] * silent [default=0]
- 0 means printing running messages, 1 means silent mode. - 0 means printing running messages, 1 means silent mode.
* nthread [default to maximum number of threads available if not set] * nthread [default to maximum number of threads available if not set]
@ -81,20 +80,20 @@ Additional parameters for Dart Booster
- type of sampling algorithm. - type of sampling algorithm.
- "uniform": dropped trees are selected uniformly. - "uniform": dropped trees are selected uniformly.
- "weighted": dropped trees are selected in proportion to weight. - "weighted": dropped trees are selected in proportion to weight.
* normalize_type [default="tree] * normalize_type [default="tree"]
- type of normalization algorithm. - type of normalization algorithm.
- "tree": New trees have the same weight of each of dropped trees. - "tree": new trees have the same weight of each of dropped trees.
weight of new trees are 1 / (k + learnig_rate) - weight of new trees are 1 / (k + learnig_rate)
dropped trees are scaled by a factor of k / (k + learning_rate) - dropped trees are scaled by a factor of k / (k + learning_rate)
- "forest": New trees have the same weight of sum of dropped trees (forest). - "forest": new trees have the same weight of sum of dropped trees (forest).
weight of new trees are 1 / (1 + learning_rate) - weight of new trees are 1 / (1 + learning_rate)
dropped trees are scaled by a factor of 1 / (1 + learning_rate) - dropped trees are scaled by a factor of 1 / (1 + learning_rate)
* rate_drop [default=0.0] * rate_drop [default=0.0]
- dropout rate. - dropout rate.
- range: [0.0, 1.0] - range: [0.0, 1.0]
* skip_drop [default=0.0] * skip_drop [default=0.0]
- probability of skip dropout. - probability of skip dropout.
If a dropout is skipped, new trees are added in the same manner as gbtree. - If a dropout is skipped, new trees are added in the same manner as gbtree.
- range: [0.0, 1.0] - range: [0.0, 1.0]
Parameters for Linear Booster Parameters for Linear Booster

101
doc/tutorials/dart.md Normal file
View File

@ -0,0 +1,101 @@
DART booster
============
[XGBoost](https://github.com/dmlc/xgboost)) mostly combines a huge number of regression trees with small learning rate.
In this situation, trees added early are significance and trees added late are unimportant.
Rasmi et.al proposed a new method to add dropout techniques from deep neural nets community to boosted trees, and reported better results in some situations.
This is a instruction of new tree booster `dart`.
Original paper
--------------
Rashmi Korlakai Vinayak, Ran Gilad-Bachrach. "DART: Dropouts meet Multiple Additive Regression Trees." [JMLR](http://www.jmlr.org/proceedings/papers/v38/korlakaivinayak15.pdf)
Features
--------
- Drop trees in order to solve the over-fitting.
- Trivial trees (to correct trivial errors) may be prevented.
Because the randomness introduced in the training, expect the following few difference.
- Training can be slower than `gbtree` because the random dropout prevents usage of prediction buffer.
- The early stop might not be stable, due to the randomness.
How it works
------------
- In ``$ m $``th training round, suppose ``$ k $`` trees are selected drop.
- Let ``$ D = \sum_{i \in \mathbf{K}} F_i $`` be leaf scores of dropped trees and ``$ F_m = \eta \tilde{F}_m $`` be leaf scores of a new tree.
- The objective function is following:
```math
\mathrm{Obj}
= \sum_{j=1}^n L \left( y_j, \hat{y}_j^{m-1} - D_j + \tilde{F}_m \right)
+ \Omega \left( \tilde{F}_m \right).
```
- ``$ D $`` and ``$ F_m $`` are overshooting, so using scale factor
```math
\hat{y}_j^m = \sum_{i \not\in \mathbf{K}} F_i + a \left( \sum_{i \in \mathbf{K}} F_i + b F_m \right) .
```
Parameters
----------
### booster
* `dart`
This booster inherits `gbtree`, so `dart` has also `eta`, `gamma`, `max_depth` and so on.
Additional parameters are noted below.
### sample_type
type of sampling algorithm.
* `uniform`: (default) dropped trees are selected uniformly.
* `weighted`: dropped trees are selected in proportion to weight.
### normalize_type
type of normalization algorithm.
* `tree`: (default) New trees have the same weight of each of dropped trees.
```math
a \left( \sum_{i \in \mathbf{K}} F_i + \frac{1}{k} F_m \right)
&= a \left( \sum_{i \in \mathbf{K}} F_i + \frac{\eta}{k} \tilde{F}_m \right) \\
&\sim a \left( 1 + \frac{\eta}{k} \right) D \\
&= a \frac{k + \eta}{k} D = D , \\
&\quad a = \frac{k}{k + \eta} .
```
* `forest`: New trees have the same weight of sum of dropped trees (forest).
```math
a \left( \sum_{i \in \mathbf{K}} F_i + F_m \right)
&= a \left( \sum_{i \in \mathbf{K}} F_i + \eta \tilde{F}_m \right) \\
&\sim a \left( 1 + \eta \right) D \\
&= a (1 + \eta) D = D , \\
&\quad a = \frac{1}{1 + \eta} .
```
### rate_drop
dropout rate.
- range: [0.0, 1.0]
### skip_drop
probability of skipping dropout.
- If a dropout is skipped, new trees are added in the same manner as gbtree.
- range: [0.0, 1.0]
Sample Script
-------------
```python
import xgboost as xgb
# read in data
dtrain = xgb.DMatrix('demo/data/agaricus.txt.train')
dtest = xgb.DMatrix('demo/data/agaricus.txt.test')
# specify parameters via map
param = {'booster': 'dart',
'max_depth': 5, 'learning_rate': 0.1,
'objective': 'binary:logistic', 'silent': True,
'sample_type': 'uniform',
'normalize_type': 'tree',
'rate_drop': 0.1,
'skip_drop': 0.5}
num_round = 50
bst = xgb.train(param, dtrain, num_round)
# make prediction
# ntree_limit must not be 0
preds = bst.predict(dtest, ntree_limit=num_round)
```

View File

@ -6,3 +6,4 @@ See [Awesome XGBoost](https://github.com/dmlc/xgboost/tree/master/demo) for link
## Contents ## Contents
- [Introduction to Boosted Trees](../model.md) - [Introduction to Boosted Trees](../model.md)
- [Distributed XGBoost YARN on AWS](aws_yarn.md) - [Distributed XGBoost YARN on AWS](aws_yarn.md)
- [DART booster](dart.md)