new rmarkdown

This commit is contained in:
tqchen 2015-05-03 14:02:15 -07:00
parent 32b1d9d6b0
commit a310db86a1

View File

@ -7,10 +7,9 @@ output: html_document
Introduction Introduction
============ ============
XGBoost seems to be one of the most used tool to make prediction regarding the classification of the products from OTTO dataset.
**XGBoost** is an implementation of the famous gradient boosting algorithm. This model is often described as a *blackbox*, meaning it works well but it is not trivial to understand how. Indeed, the model is made of hundreds (thousands?) of decision trees. You may wonder how possible a human would be able to have a general view of the model? **XGBoost** is an implementation of the famous gradient boosting algorithm. This model is often described as a *blackbox*, meaning it works well but it is not trivial to understand how. Indeed, the model is made of hundreds (thousands?) of decision trees. You may wonder how possible a human would be able to have a general view of the model?
While xgboost is known for its fast speed and accuracy predictive power. It also comes with various functions to help you understand the model.
The purpose of this RMarkdown document is to demonstrate how we can leverage the functions already implemented in **XGBoost R** package for that purpose. Of course, everything showed below can be applied to the dataset you may have to manipulate at work or wherever! The purpose of this RMarkdown document is to demonstrate how we can leverage the functions already implemented in **XGBoost R** package for that purpose. Of course, everything showed below can be applied to the dataset you may have to manipulate at work or wherever!
First we will train a model on the **OTTO** dataset, then we will generate two vizualisations to get a clue of what is important to the model, finally, we will see how we can leverage these information. First we will train a model on the **OTTO** dataset, then we will generate two vizualisations to get a clue of what is important to the model, finally, we will see how we can leverage these information.
@ -211,4 +210,4 @@ xgb.plot.tree(feature_names = names, model = bst, n_first_tree = 1)
We are just displaying the first tree here. We are just displaying the first tree here.
On simple models first trees may be enough. Here, it may not be the case. On simple models first trees may be enough. Here, it may not be the case.