Update model.md

This commit is contained in:
Tianqi Chen 2015-08-23 22:46:50 -07:00
parent f305cdbf75
commit c4fa2f6110

View File

@ -82,10 +82,10 @@ If you look at the example, an important fact is that the two trees tries to *co
Mathematically, we can write our model into the form Mathematically, we can write our model into the form
```math ```math
\hat{y}_i = \sum_{k=1}^K f_k(x_i), f_k \in F \hat{y}_i = \sum_{k=1}^K f_k(x_i), f_k \in \mathcal{F}
``` ```
where ``$ K $`` is the number of trees, ``$ f $`` is a function in the functional space ``$ F $``, and ``$ F $`` is the set of all possible CARTs. Therefore our objective to optimize can be written as where ``$ K $`` is the number of trees, ``$ f $`` is a function in the functional space ``$ \mathcal{F} $``, and ``$ \mathcal{F} $`` is the set of all possible CARTs. Therefore our objective to optimize can be written as
```math ```math
obj(\Theta) = \sum_i^n l(y_i, \hat{y}_i) + \sum_{k=1}^K \Omega(f_k) obj(\Theta) = \sum_i^n l(y_i, \hat{y}_i) + \sum_{k=1}^K \Omega(f_k)
@ -110,7 +110,7 @@ First thing we want to ask is what are ***parameters*** of trees. You can find w
of the tree, and the leaf score. This is much harder than traditional optimization problem where you can take the gradient and go. of the tree, and the leaf score. This is much harder than traditional optimization problem where you can take the gradient and go.
It is not easy to train all the trees at once. It is not easy to train all the trees at once.
Instead, we use an additive strategy: fix what we have learned, add a new tree at a time. Instead, we use an additive strategy: fix what we have learned, add a new tree at a time.
We note the prediction value at step `t` by ``$ \hat{y}_i^{(t)}$``, so we have We note the prediction value at step ``$t$`` by ``$ \hat{y}_i^{(t)}$``, so we have
```math ```math
\hat{y}_i^{(0)} &= 0\\ \hat{y}_i^{(0)} &= 0\\
@ -216,7 +216,7 @@ Specifically we try to split a leaf into two leaves, and the score it gains is
Gain = \frac{1}{2} \left[\frac{G_L^2}{H_L+\lambda}+\frac{G_R^2}{H_R+\lambda}-\frac{(G_L+G_R)^2}{H_L+H_R+\lambda}\right] - \gamma Gain = \frac{1}{2} \left[\frac{G_L^2}{H_L+\lambda}+\frac{G_R^2}{H_R+\lambda}-\frac{(G_L+G_R)^2}{H_L+H_R+\lambda}\right] - \gamma
``` ```
This formula can be decomposited as 1) the score on the new left leaf 2) the score on the new right leaf 3) The score on the original leaf 4) regularization on the additional leaf. This formula can be decomposited as 1) the score on the new left leaf 2) the score on the new right leaf 3) The score on the original leaf 4) regularization on the additional leaf.
We can find an important fact here: if the gain is smaller than ``$gamma$``, we would better not to add that branch. This is exactly the ***prunning*** techniques in tree based We can find an important fact here: if the gain is smaller than ``$\gamma$``, we would better not to add that branch. This is exactly the ***prunning*** techniques in tree based
models! By using the principles of supervised learning, we can naturally comes up with the reason these techniques :) models! By using the principles of supervised learning, we can naturally comes up with the reason these techniques :)
For real valued data, we usually want to search for an optimal split. To efficiently doing so, we place all the instances in a sorted way, like the following picture. For real valued data, we usually want to search for an optimal split. To efficiently doing so, we place all the instances in a sorted way, like the following picture.