Update model.md
This commit is contained in:
parent
f305cdbf75
commit
c4fa2f6110
@ -82,10 +82,10 @@ If you look at the example, an important fact is that the two trees tries to *co
|
||||
Mathematically, we can write our model into the form
|
||||
|
||||
```math
|
||||
\hat{y}_i = \sum_{k=1}^K f_k(x_i), f_k \in F
|
||||
\hat{y}_i = \sum_{k=1}^K f_k(x_i), f_k \in \mathcal{F}
|
||||
```
|
||||
|
||||
where ``$ K $`` is the number of trees, ``$ f $`` is a function in the functional space ``$ F $``, and ``$ F $`` is the set of all possible CARTs. Therefore our objective to optimize can be written as
|
||||
where ``$ K $`` is the number of trees, ``$ f $`` is a function in the functional space ``$ \mathcal{F} $``, and ``$ \mathcal{F} $`` is the set of all possible CARTs. Therefore our objective to optimize can be written as
|
||||
|
||||
```math
|
||||
obj(\Theta) = \sum_i^n l(y_i, \hat{y}_i) + \sum_{k=1}^K \Omega(f_k)
|
||||
@ -110,7 +110,7 @@ First thing we want to ask is what are ***parameters*** of trees. You can find w
|
||||
of the tree, and the leaf score. This is much harder than traditional optimization problem where you can take the gradient and go.
|
||||
It is not easy to train all the trees at once.
|
||||
Instead, we use an additive strategy: fix what we have learned, add a new tree at a time.
|
||||
We note the prediction value at step `t` by ``$ \hat{y}_i^{(t)}$``, so we have
|
||||
We note the prediction value at step ``$t$`` by ``$ \hat{y}_i^{(t)}$``, so we have
|
||||
|
||||
```math
|
||||
\hat{y}_i^{(0)} &= 0\\
|
||||
@ -216,7 +216,7 @@ Specifically we try to split a leaf into two leaves, and the score it gains is
|
||||
Gain = \frac{1}{2} \left[\frac{G_L^2}{H_L+\lambda}+\frac{G_R^2}{H_R+\lambda}-\frac{(G_L+G_R)^2}{H_L+H_R+\lambda}\right] - \gamma
|
||||
```
|
||||
This formula can be decomposited as 1) the score on the new left leaf 2) the score on the new right leaf 3) The score on the original leaf 4) regularization on the additional leaf.
|
||||
We can find an important fact here: if the gain is smaller than ``$gamma$``, we would better not to add that branch. This is exactly the ***prunning*** techniques in tree based
|
||||
We can find an important fact here: if the gain is smaller than ``$\gamma$``, we would better not to add that branch. This is exactly the ***prunning*** techniques in tree based
|
||||
models! By using the principles of supervised learning, we can naturally comes up with the reason these techniques :)
|
||||
|
||||
For real valued data, we usually want to search for an optimal split. To efficiently doing so, we place all the instances in a sorted way, like the following picture.
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user