add Dart tutorial
This commit is contained in:
parent
f14c160f4f
commit
c332eb5a2b
@ -1 +1 @@
|
|||||||
Subproject commit 9fd3b48462a7a651e12a197679f71e043dcb25a2
|
Subproject commit c39001019e443c7a061789bd1180f58ce85fc3e6
|
||||||
@ -13,8 +13,7 @@ In R-package, you can use .(dot) to replace under score in the parameters, for e
|
|||||||
General Parameters
|
General Parameters
|
||||||
------------------
|
------------------
|
||||||
* booster [default=gbtree]
|
* booster [default=gbtree]
|
||||||
- which booster to use, can be gbtree, gblinear or dart.
|
- which booster to use, can be gbtree, gblinear or dart. gbtree and dart use tree based model while gblinear uses linear function.
|
||||||
gbtree and dart use tree based model while gblinear uses linear function.
|
|
||||||
* silent [default=0]
|
* silent [default=0]
|
||||||
- 0 means printing running messages, 1 means silent mode.
|
- 0 means printing running messages, 1 means silent mode.
|
||||||
* nthread [default to maximum number of threads available if not set]
|
* nthread [default to maximum number of threads available if not set]
|
||||||
@ -81,20 +80,20 @@ Additional parameters for Dart Booster
|
|||||||
- type of sampling algorithm.
|
- type of sampling algorithm.
|
||||||
- "uniform": dropped trees are selected uniformly.
|
- "uniform": dropped trees are selected uniformly.
|
||||||
- "weighted": dropped trees are selected in proportion to weight.
|
- "weighted": dropped trees are selected in proportion to weight.
|
||||||
* normalize_type [default="tree]
|
* normalize_type [default="tree"]
|
||||||
- type of normalization algorithm.
|
- type of normalization algorithm.
|
||||||
- "tree": New trees have the same weight of each of dropped trees.
|
- "tree": new trees have the same weight of each of dropped trees.
|
||||||
weight of new trees are 1 / (k + learnig_rate)
|
- weight of new trees are 1 / (k + learnig_rate)
|
||||||
dropped trees are scaled by a factor of k / (k + learning_rate)
|
- dropped trees are scaled by a factor of k / (k + learning_rate)
|
||||||
- "forest": New trees have the same weight of sum of dropped trees (forest).
|
- "forest": new trees have the same weight of sum of dropped trees (forest).
|
||||||
weight of new trees are 1 / (1 + learning_rate)
|
- weight of new trees are 1 / (1 + learning_rate)
|
||||||
dropped trees are scaled by a factor of 1 / (1 + learning_rate)
|
- dropped trees are scaled by a factor of 1 / (1 + learning_rate)
|
||||||
* rate_drop [default=0.0]
|
* rate_drop [default=0.0]
|
||||||
- dropout rate.
|
- dropout rate.
|
||||||
- range: [0.0, 1.0]
|
- range: [0.0, 1.0]
|
||||||
* skip_drop [default=0.0]
|
* skip_drop [default=0.0]
|
||||||
- probability of skip dropout.
|
- probability of skip dropout.
|
||||||
If a dropout is skipped, new trees are added in the same manner as gbtree.
|
- If a dropout is skipped, new trees are added in the same manner as gbtree.
|
||||||
- range: [0.0, 1.0]
|
- range: [0.0, 1.0]
|
||||||
|
|
||||||
Parameters for Linear Booster
|
Parameters for Linear Booster
|
||||||
|
|||||||
101
doc/tutorials/dart.md
Normal file
101
doc/tutorials/dart.md
Normal file
@ -0,0 +1,101 @@
|
|||||||
|
DART booster
|
||||||
|
============
|
||||||
|
[XGBoost](https://github.com/dmlc/xgboost)) mostly combines a huge number of regression trees with small learning rate.
|
||||||
|
In this situation, trees added early are significance and trees added late are unimportant.
|
||||||
|
|
||||||
|
Rasmi et.al proposed a new method to add dropout techniques from deep neural nets community to boosted trees, and reported better results in some situations.
|
||||||
|
|
||||||
|
This is a instruction of new tree booster `dart`.
|
||||||
|
|
||||||
|
Original paper
|
||||||
|
--------------
|
||||||
|
Rashmi Korlakai Vinayak, Ran Gilad-Bachrach. "DART: Dropouts meet Multiple Additive Regression Trees." [JMLR](http://www.jmlr.org/proceedings/papers/v38/korlakaivinayak15.pdf)
|
||||||
|
|
||||||
|
Features
|
||||||
|
--------
|
||||||
|
- Drop trees in order to solve the over-fitting.
|
||||||
|
- Trivial trees (to correct trivial errors) may be prevented.
|
||||||
|
|
||||||
|
Because the randomness introduced in the training, expect the following few difference.
|
||||||
|
- Training can be slower than `gbtree` because the random dropout prevents usage of prediction buffer.
|
||||||
|
- The early stop might not be stable, due to the randomness.
|
||||||
|
|
||||||
|
How it works
|
||||||
|
------------
|
||||||
|
- In ``$ m $``th training round, suppose ``$ k $`` trees are selected drop.
|
||||||
|
- Let ``$ D = \sum_{i \in \mathbf{K}} F_i $`` be leaf scores of dropped trees and ``$ F_m = \eta \tilde{F}_m $`` be leaf scores of a new tree.
|
||||||
|
- The objective function is following:
|
||||||
|
```math
|
||||||
|
\mathrm{Obj}
|
||||||
|
= \sum_{j=1}^n L \left( y_j, \hat{y}_j^{m-1} - D_j + \tilde{F}_m \right)
|
||||||
|
+ \Omega \left( \tilde{F}_m \right).
|
||||||
|
```
|
||||||
|
- ``$ D $`` and ``$ F_m $`` are overshooting, so using scale factor
|
||||||
|
```math
|
||||||
|
\hat{y}_j^m = \sum_{i \not\in \mathbf{K}} F_i + a \left( \sum_{i \in \mathbf{K}} F_i + b F_m \right) .
|
||||||
|
```
|
||||||
|
|
||||||
|
Parameters
|
||||||
|
----------
|
||||||
|
### booster
|
||||||
|
* `dart`
|
||||||
|
|
||||||
|
This booster inherits `gbtree`, so `dart` has also `eta`, `gamma`, `max_depth` and so on.
|
||||||
|
|
||||||
|
Additional parameters are noted below.
|
||||||
|
|
||||||
|
### sample_type
|
||||||
|
type of sampling algorithm.
|
||||||
|
* `uniform`: (default) dropped trees are selected uniformly.
|
||||||
|
* `weighted`: dropped trees are selected in proportion to weight.
|
||||||
|
|
||||||
|
### normalize_type
|
||||||
|
type of normalization algorithm.
|
||||||
|
* `tree`: (default) New trees have the same weight of each of dropped trees.
|
||||||
|
```math
|
||||||
|
a \left( \sum_{i \in \mathbf{K}} F_i + \frac{1}{k} F_m \right)
|
||||||
|
&= a \left( \sum_{i \in \mathbf{K}} F_i + \frac{\eta}{k} \tilde{F}_m \right) \\
|
||||||
|
&\sim a \left( 1 + \frac{\eta}{k} \right) D \\
|
||||||
|
&= a \frac{k + \eta}{k} D = D , \\
|
||||||
|
&\quad a = \frac{k}{k + \eta} .
|
||||||
|
```
|
||||||
|
|
||||||
|
* `forest`: New trees have the same weight of sum of dropped trees (forest).
|
||||||
|
```math
|
||||||
|
a \left( \sum_{i \in \mathbf{K}} F_i + F_m \right)
|
||||||
|
&= a \left( \sum_{i \in \mathbf{K}} F_i + \eta \tilde{F}_m \right) \\
|
||||||
|
&\sim a \left( 1 + \eta \right) D \\
|
||||||
|
&= a (1 + \eta) D = D , \\
|
||||||
|
&\quad a = \frac{1}{1 + \eta} .
|
||||||
|
```
|
||||||
|
|
||||||
|
### rate_drop
|
||||||
|
dropout rate.
|
||||||
|
- range: [0.0, 1.0]
|
||||||
|
|
||||||
|
### skip_drop
|
||||||
|
probability of skipping dropout.
|
||||||
|
- If a dropout is skipped, new trees are added in the same manner as gbtree.
|
||||||
|
- range: [0.0, 1.0]
|
||||||
|
|
||||||
|
Sample Script
|
||||||
|
-------------
|
||||||
|
```python
|
||||||
|
import xgboost as xgb
|
||||||
|
# read in data
|
||||||
|
dtrain = xgb.DMatrix('demo/data/agaricus.txt.train')
|
||||||
|
dtest = xgb.DMatrix('demo/data/agaricus.txt.test')
|
||||||
|
# specify parameters via map
|
||||||
|
param = {'booster': 'dart',
|
||||||
|
'max_depth': 5, 'learning_rate': 0.1,
|
||||||
|
'objective': 'binary:logistic', 'silent': True,
|
||||||
|
'sample_type': 'uniform',
|
||||||
|
'normalize_type': 'tree',
|
||||||
|
'rate_drop': 0.1,
|
||||||
|
'skip_drop': 0.5}
|
||||||
|
num_round = 50
|
||||||
|
bst = xgb.train(param, dtrain, num_round)
|
||||||
|
# make prediction
|
||||||
|
# ntree_limit must not be 0
|
||||||
|
preds = bst.predict(dtest, ntree_limit=num_round)
|
||||||
|
```
|
||||||
@ -6,3 +6,4 @@ See [Awesome XGBoost](https://github.com/dmlc/xgboost/tree/master/demo) for link
|
|||||||
## Contents
|
## Contents
|
||||||
- [Introduction to Boosted Trees](../model.md)
|
- [Introduction to Boosted Trees](../model.md)
|
||||||
- [Distributed XGBoost YARN on AWS](aws_yarn.md)
|
- [Distributed XGBoost YARN on AWS](aws_yarn.md)
|
||||||
|
- [DART booster](dart.md)
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user