From be6bd3859d4083eb89a12f266fc6ee1118aa8a8d Mon Sep 17 00:00:00 2001 From: El Potaeto Date: Sun, 29 Mar 2015 01:52:26 +0100 Subject: [PATCH 01/12] Add Random Forest parameter (num_parallel_tree) in function doc + example in Vignette. --- R-package/R/xgb.train.R | 1 + R-package/man/xgb.train.Rd | 1 + R-package/vignettes/discoverYourData.Rmd | 15 +++++++++++++++ 3 files changed, 17 insertions(+) diff --git a/R-package/R/xgb.train.R b/R-package/R/xgb.train.R index d5cf5cbde..1444964e5 100644 --- a/R-package/R/xgb.train.R +++ b/R-package/R/xgb.train.R @@ -22,6 +22,7 @@ #' \item \code{min_child_weight} minimum sum of instance weight(hessian) needed in a child. If the tree partition step results in a leaf node with the sum of instance weight less than min_child_weight, then the building process will give up further partitioning. In linear regression mode, this simply corresponds to minimum number of instances needed to be in each node. The larger, the more conservative the algorithm will be. Default: 1 #' \item \code{subsample} subsample ratio of the training instance. Setting it to 0.5 means that xgboost randomly collected half of the data instances to grow trees and this will prevent overfitting. Default: 1 #' \item \code{colsample_bytree} subsample ratio of columns when constructing each tree. Default: 1 +#' \item \code{num_parallel_tree} number of trees to grow per round. Useful to test Random Forest through Xgboost (set \code{colsample_bytree < 1}, \code{subsample < 1} and \code{round = 1}) accordingly. Default: 1 #' } #' #' 2.2. Parameter for Linear Booster diff --git a/R-package/man/xgb.train.Rd b/R-package/man/xgb.train.Rd index d56f0b84e..91e21b50c 100644 --- a/R-package/man/xgb.train.Rd +++ b/R-package/man/xgb.train.Rd @@ -28,6 +28,7 @@ xgb.train(params = list(), data, nrounds, watchlist = list(), obj = NULL, \item \code{min_child_weight} minimum sum of instance weight(hessian) needed in a child. If the tree partition step results in a leaf node with the sum of instance weight less than min_child_weight, then the building process will give up further partitioning. In linear regression mode, this simply corresponds to minimum number of instances needed to be in each node. The larger, the more conservative the algorithm will be. Default: 1 \item \code{subsample} subsample ratio of the training instance. Setting it to 0.5 means that xgboost randomly collected half of the data instances to grow trees and this will prevent overfitting. Default: 1 \item \code{colsample_bytree} subsample ratio of columns when constructing each tree. Default: 1 + \item \code{num_parallel_tree} number of trees to grow per round. Useful to test Random Forest through Xgboost (set \code{colsample_bytree < 1}, \code{subsample < 1} and \code{round = 1}) accordingly. Default: 1 } 2.2. Parameter for Linear Booster diff --git a/R-package/vignettes/discoverYourData.Rmd b/R-package/vignettes/discoverYourData.Rmd index 49d5bf0cd..9419a13ae 100644 --- a/R-package/vignettes/discoverYourData.Rmd +++ b/R-package/vignettes/discoverYourData.Rmd @@ -313,4 +313,19 @@ However, in Random Forests™ this random choice will be done for each tree, bec In boosting, when a specific link between feature and outcome have been learned by the algorithm, it will try to not refocus on it (in theory it is what happens, reality is not always that simple). Therefore, all the importance will be on feature `A` or on feature `B` (but not both). You will know that one feature have an important role in the link between the observations and the label. It is still up to you to search for the correlated features to the one detected as important if you need to know all of them. +If you want to try Random Forests™ algorithm, you can tweak Xgboost parameters! For instance, to compute a model with 1000 trees, with a 0.5 factor on sampling rows and columns: + +```{r, warning=FALSE, message=FALSE} +data(agaricus.train, package='xgboost') +data(agaricus.test, package='xgboost') +train <- agaricus.train +test <- agaricus.test + +#Random Forest™ - 1000 trees +bst <- xgboost(data = train$data, label = train$label, max.depth = 4, num_parallel_tree = 1000, subsample = 0.5, colsample_bytree =0.5, nround = 1, objective = "binary:logistic") + +#Boosting - 3 rounds +bst <- xgboost(data = train$data, label = train$label, max.depth = 4, nround = 3, objective = "binary:logistic") +``` + > [**Random Forests™**](https://www.stat.berkeley.edu/~breiman/RandomForests/cc_papers.htm) is a trademark of Leo Breiman and Adele Cutler and is licensed exclusively to Salford Systems for the commercial release of the software. \ No newline at end of file From aa0f612ac91b8091aac17acf4ffaaf0b77462bee Mon Sep 17 00:00:00 2001 From: pommedeterresautee Date: Tue, 14 Apr 2015 00:26:11 +0200 Subject: [PATCH 02/12] git ignore RProject files --- .gitignore | 1 + 1 file changed, 1 insertion(+) diff --git a/.gitignore b/.gitignore index 8b2c65f62..9fd1e0f72 100644 --- a/.gitignore +++ b/.gitignore @@ -47,6 +47,7 @@ Debug .Rproj.user *.cpage.col *.cpage +*.Rproj xgboost xgboost.mpi xgboost.mock From 4e1002a52c63f418c2178d3df4456c2e0bbbf30e Mon Sep 17 00:00:00 2001 From: pommedeterresautee Date: Tue, 14 Apr 2015 00:30:55 +0200 Subject: [PATCH 03/12] Experimental parameter --- R-package/R/xgb.train.R | 2 +- R-package/man/xgb.train.Rd | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/R-package/R/xgb.train.R b/R-package/R/xgb.train.R index 1444964e5..20908863f 100644 --- a/R-package/R/xgb.train.R +++ b/R-package/R/xgb.train.R @@ -22,7 +22,7 @@ #' \item \code{min_child_weight} minimum sum of instance weight(hessian) needed in a child. If the tree partition step results in a leaf node with the sum of instance weight less than min_child_weight, then the building process will give up further partitioning. In linear regression mode, this simply corresponds to minimum number of instances needed to be in each node. The larger, the more conservative the algorithm will be. Default: 1 #' \item \code{subsample} subsample ratio of the training instance. Setting it to 0.5 means that xgboost randomly collected half of the data instances to grow trees and this will prevent overfitting. Default: 1 #' \item \code{colsample_bytree} subsample ratio of columns when constructing each tree. Default: 1 -#' \item \code{num_parallel_tree} number of trees to grow per round. Useful to test Random Forest through Xgboost (set \code{colsample_bytree < 1}, \code{subsample < 1} and \code{round = 1}) accordingly. Default: 1 +#' \item \code{num_parallel_tree} Experimental parameter. number of trees to grow per round. Useful to test Random Forest through Xgboost (set \code{colsample_bytree < 1}, \code{subsample < 1} and \code{round = 1}) accordingly. Default: 1 #' } #' #' 2.2. Parameter for Linear Booster diff --git a/R-package/man/xgb.train.Rd b/R-package/man/xgb.train.Rd index 91e21b50c..3f93b3989 100644 --- a/R-package/man/xgb.train.Rd +++ b/R-package/man/xgb.train.Rd @@ -28,7 +28,7 @@ xgb.train(params = list(), data, nrounds, watchlist = list(), obj = NULL, \item \code{min_child_weight} minimum sum of instance weight(hessian) needed in a child. If the tree partition step results in a leaf node with the sum of instance weight less than min_child_weight, then the building process will give up further partitioning. In linear regression mode, this simply corresponds to minimum number of instances needed to be in each node. The larger, the more conservative the algorithm will be. Default: 1 \item \code{subsample} subsample ratio of the training instance. Setting it to 0.5 means that xgboost randomly collected half of the data instances to grow trees and this will prevent overfitting. Default: 1 \item \code{colsample_bytree} subsample ratio of columns when constructing each tree. Default: 1 - \item \code{num_parallel_tree} number of trees to grow per round. Useful to test Random Forest through Xgboost (set \code{colsample_bytree < 1}, \code{subsample < 1} and \code{round = 1}) accordingly. Default: 1 + \item \code{num_parallel_tree} Experimental parameter. number of trees to grow per round. Useful to test Random Forest through Xgboost (set \code{colsample_bytree < 1}, \code{subsample < 1} and \code{round = 1}) accordingly. Default: 1 } 2.2. Parameter for Linear Booster From 12047056ae922020990bc7faad2ad5ad09b085e0 Mon Sep 17 00:00:00 2001 From: pommedeterresautee Date: Tue, 14 Apr 2015 00:39:51 +0200 Subject: [PATCH 04/12] Update vignette --- R-package/vignettes/discoverYourData.Rmd | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/R-package/vignettes/discoverYourData.Rmd b/R-package/vignettes/discoverYourData.Rmd index 9419a13ae..fa780ee94 100644 --- a/R-package/vignettes/discoverYourData.Rmd +++ b/R-package/vignettes/discoverYourData.Rmd @@ -313,7 +313,11 @@ However, in Random Forests™ this random choice will be done for each tree, bec In boosting, when a specific link between feature and outcome have been learned by the algorithm, it will try to not refocus on it (in theory it is what happens, reality is not always that simple). Therefore, all the importance will be on feature `A` or on feature `B` (but not both). You will know that one feature have an important role in the link between the observations and the label. It is still up to you to search for the correlated features to the one detected as important if you need to know all of them. -If you want to try Random Forests™ algorithm, you can tweak Xgboost parameters! For instance, to compute a model with 1000 trees, with a 0.5 factor on sampling rows and columns: +If you want to try Random Forests™ algorithm, you can tweak Xgboost parameters! + +**Warning**: this is still an experimental parameter. + +For instance, to compute a model with 1000 trees, with a 0.5 factor on sampling rows and columns: ```{r, warning=FALSE, message=FALSE} data(agaricus.train, package='xgboost') @@ -328,4 +332,6 @@ bst <- xgboost(data = train$data, label = train$label, max.depth = 4, num_parall bst <- xgboost(data = train$data, label = train$label, max.depth = 4, nround = 3, objective = "binary:logistic") ``` +> Note that the parameter `round` is set to `1`. + > [**Random Forests™**](https://www.stat.berkeley.edu/~breiman/RandomForests/cc_papers.htm) is a trademark of Leo Breiman and Adele Cutler and is licensed exclusively to Salford Systems for the commercial release of the software. \ No newline at end of file From 20dfcd7ceced716c8cb63aae117a7dc2cb84b45a Mon Sep 17 00:00:00 2001 From: pommedeterresautee Date: Tue, 14 Apr 2015 00:48:11 +0200 Subject: [PATCH 05/12] Add slides to readme + group documentation together --- README.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 6a05ce0c4..17edf4658 100644 --- a/README.md +++ b/README.md @@ -5,25 +5,27 @@ It implements machine learning algorithm under gradient boosting framework, incl Contributors: https://github.com/dmlc/xgboost/graphs/contributors -Turorial and Documentation: https://github.com/dmlc/xgboost/wiki - Issues Tracker: [https://github.com/dmlc/xgboost/issues](https://github.com/dmlc/xgboost/issues?q=is%3Aissue+label%3Aquestion) Please join [XGBoost User Group](https://groups.google.com/forum/#!forum/xgboost-user/) to ask questions and share your experience on xgboost. Examples Code: [Learning to use xgboost by examples](demo) -Video tutorial: [Better Optimization with Repeated Cross Validation and the XGBoost model - Machine Learning with R](https://www.youtube.com/watch?v=Og7CGAfSr_Y) - Distributed Version: [Distributed XGBoost](multi-node) Notes on the Code: [Code Guide](src) +Turorial and Documentation: https://github.com/dmlc/xgboost/wiki + +Video tutorial: [Better Optimization with Repeated Cross Validation and the XGBoost model - Machine Learning with R](https://www.youtube.com/watch?v=Og7CGAfSr_Y) + Learning about the model: [Introduction to Boosted Trees](http://homes.cs.washington.edu/~tqchen/pdf/BoostedTree.pdf) * This slide is made by Tianqi Chen to introduce gradient boosting in a statistical view. * It present boosted tree learning as formal functional space optimization of defined objective. * The model presented is used by xgboost for boosted trees +Presention of a real use case of XGBoost to prepare tax audit in France: [Feature Importance Analysis with XGBoost in Tax audit](http://fr.slideshare.net/MichaelBENESTY/feature-importance-analysis-with-xgboost-in-tax-audit) + What's New ========== * [Distributed XGBoost now runs on YARN](multi-node/hadoop)! From 2034b91b7d47e08ae0d732e368fc68bd9e4d905f Mon Sep 17 00:00:00 2001 From: El Potaeto Date: Wed, 15 Apr 2015 18:30:46 +0200 Subject: [PATCH 06/12] commit emtpy --- demo/kaggle-otto/readme.md | 1 + 1 file changed, 1 insertion(+) diff --git a/demo/kaggle-otto/readme.md b/demo/kaggle-otto/readme.md index af95dd47a..0c7bd45a4 100644 --- a/demo/kaggle-otto/readme.md +++ b/demo/kaggle-otto/readme.md @@ -7,6 +7,7 @@ This is a folder containing the benchmark for the [Otto Group Competition on Kag 1. Put `train.csv` and `test.csv` under the `data` folder 2. Run the script +3. Submit the `submission.csv` The parameter `nthread` controls the number of cores to run on, please set it to suit your machine. From 925fa30316ad70685bcbc565cb96d27b2442059f Mon Sep 17 00:00:00 2001 From: El Potaeto Date: Wed, 15 Apr 2015 18:32:04 +0200 Subject: [PATCH 07/12] Cancel readme modif --- README.md | 43 +++++++++++++++++++++++++++++++------------ 1 file changed, 31 insertions(+), 12 deletions(-) diff --git a/README.md b/README.md index 17edf4658..1155550b0 100644 --- a/README.md +++ b/README.md @@ -5,30 +5,29 @@ It implements machine learning algorithm under gradient boosting framework, incl Contributors: https://github.com/dmlc/xgboost/graphs/contributors -Issues Tracker: [https://github.com/dmlc/xgboost/issues](https://github.com/dmlc/xgboost/issues?q=is%3Aissue+label%3Aquestion) +Turorial and Documentation: https://github.com/dmlc/xgboost/wiki -Please join [XGBoost User Group](https://groups.google.com/forum/#!forum/xgboost-user/) to ask questions and share your experience on xgboost. +Issues Tracker: [https://github.com/dmlc/xgboost/issues](https://github.com/dmlc/xgboost/issues?q=is%3Aissue+label%3Aquestion) for bugreport and other issues + +Please join [XGBoost User Group](https://groups.google.com/forum/#!forum/xgboost-user/) to ask usage questions and share your experience on xgboost. Examples Code: [Learning to use xgboost by examples](demo) +Video tutorial: [Better Optimization with Repeated Cross Validation and the XGBoost model - Machine Learning with R](https://www.youtube.com/watch?v=Og7CGAfSr_Y) + Distributed Version: [Distributed XGBoost](multi-node) Notes on the Code: [Code Guide](src) -Turorial and Documentation: https://github.com/dmlc/xgboost/wiki - -Video tutorial: [Better Optimization with Repeated Cross Validation and the XGBoost model - Machine Learning with R](https://www.youtube.com/watch?v=Og7CGAfSr_Y) - Learning about the model: [Introduction to Boosted Trees](http://homes.cs.washington.edu/~tqchen/pdf/BoostedTree.pdf) * This slide is made by Tianqi Chen to introduce gradient boosting in a statistical view. * It present boosted tree learning as formal functional space optimization of defined objective. * The model presented is used by xgboost for boosted trees -Presention of a real use case of XGBoost to prepare tax audit in France: [Feature Importance Analysis with XGBoost in Tax audit](http://fr.slideshare.net/MichaelBENESTY/feature-importance-analysis-with-xgboost-in-tax-audit) - What's New ========== -* [Distributed XGBoost now runs on YARN](multi-node/hadoop)! +* XGBoost now support HDFS and S3 +* [Distributed XGBoost now runs on YARN](https://github.com/dmlc/wormhole/tree/master/learn/xgboost)! * [xgboost user group](https://groups.google.com/forum/#!forum/xgboost-user/) for tracking changes, sharing your experience on xgboost * [Distributed XGBoost](multi-node) is now available!! * New features in the lastest changes :) @@ -37,8 +36,6 @@ What's New - Predict leaf index, see [demo/guide-python/predict_leaf_indices.py](demo/guide-python/predict_leaf_indices.py) * XGBoost wins [Tradeshift Text Classification](https://kaggle2.blob.core.windows.net/forum-message-attachments/60041/1813/TradeshiftTextClassification.pdf?sv=2012-02-12&se=2015-01-02T13%3A55%3A16Z&sr=b&sp=r&sig=5MHvyjCLESLexYcvbSRFumGQXCS7MVmfdBIY3y01tMk%3D) * XGBoost wins [HEP meets ML Award in Higgs Boson Challenge](http://atlas.ch/news/2014/machine-learning-wins-the-higgs-challenge.html) -* Thanks to Bing Xu, [XGBoost.jl](https://github.com/antinucleon/XGBoost.jl) allows you to use xgboost from Julia -* Thanks to Tong He, the new [R package](R-package) is available Features ======== @@ -76,6 +73,28 @@ Build export CXX = g++-4.9 ``` Then run ```bash build.sh``` normally. + + - For users who want to use [High Performance Computing for Mac OS X](http://hpc.sourceforge.net/), download the GCC 4.9 binary tar ball and follow the installation guidance to install them under `/usr/local`. Then edit [Makefile](Makefile/) by replacing: + ``` + export CC = gcc + export CXX = g++ + ``` + with + ``` + export CC = /usr/local/bin/gcc + export CXX = /usr/local/bin/g++ + ``` + Then run ```bash build.sh``` normally. This solution is given by [Phil Culliton](https://www.kaggle.com/c/otto-group-product-classification-challenge/forums/t/12947/achieve-0-50776-on-the-leaderboard-in-a-minute-with-xgboost/68308#post68308). + +Build with HDFS and S3 Support +===== +* To build xgboost use with HDFS/S3 support and distributed learnig. It is recommended to build with dmlc, with the following steps + - ```git clone https://github.com/dmlc/dmlc-core``` + - Follow instruction in dmlc-core/make/config.mk to compile libdmlc.a + - In root folder of xgboost, type ```make dmlc=dmlc-core``` +* This will allow xgboost to directly load data and save model from/to hdfs and s3 + - Simply replace the filename with prefix s3:// or hdfs:// +* This xgboost that can be used for distributed learning Version ======= @@ -88,4 +107,4 @@ Version XGBoost in Graphlab Create ========================== * XGBoost is adopted as part of boosted tree toolkit in Graphlab Create (GLC). Graphlab Create is a powerful python toolkit that allows you to data manipulation, graph processing, hyper-parameter search, and visualization of TeraBytes scale data in one framework. Try the Graphlab Create in http://graphlab.com/products/create/quick-start-guide.html -* Nice blogpost by Jay Gu using GLC boosted tree to solve kaggle bike sharing challenge: http://blog.graphlab.com/using-gradient-boosted-trees-to-predict-bike-sharing-demand +* Nice blogpost by Jay Gu using GLC boosted tree to solve kaggle bike sharing challenge: http://blog.graphlab.com/using-gradient-boosted-trees-to-predict-bike-sharing-demand \ No newline at end of file From 0ae6d470c7248d9d30a6ccf4e3038ced5487f068 Mon Sep 17 00:00:00 2001 From: El Potaeto Date: Wed, 15 Apr 2015 18:36:53 +0200 Subject: [PATCH 08/12] test --- demo/kaggle-otto/readme.md | 1 - 1 file changed, 1 deletion(-) diff --git a/demo/kaggle-otto/readme.md b/demo/kaggle-otto/readme.md index 0c7bd45a4..94e422a13 100644 --- a/demo/kaggle-otto/readme.md +++ b/demo/kaggle-otto/readme.md @@ -22,4 +22,3 @@ devtools::install_github('tqchen/xgboost',subdir='R-package') Windows users may need to install [RTools](http://cran.r-project.org/bin/windows/Rtools/) first. - From ab8cf14fb98c632b8aa4bd643f0d1c130ba35f99 Mon Sep 17 00:00:00 2001 From: El Potaeto Date: Wed, 15 Apr 2015 18:44:06 +0200 Subject: [PATCH 09/12] cleaning --- demo/kaggle-otto/readme.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/demo/kaggle-otto/readme.md b/demo/kaggle-otto/readme.md index 94e422a13..53b0bd8d5 100644 --- a/demo/kaggle-otto/readme.md +++ b/demo/kaggle-otto/readme.md @@ -19,6 +19,4 @@ To install the R-package of xgboost, please run devtools::install_github('tqchen/xgboost',subdir='R-package') ``` -Windows users may need to install [RTools](http://cran.r-project.org/bin/windows/Rtools/) first. - - +Windows users may need to install [RTools](http://cran.r-project.org/bin/windows/Rtools/) first. \ No newline at end of file From 511d74c63111d34788021e0a8973ab4e94508f25 Mon Sep 17 00:00:00 2001 From: El Potaeto Date: Wed, 15 Apr 2015 18:46:28 +0200 Subject: [PATCH 10/12] clean --- demo/kaggle-otto/README.MD | 25 ------------------------- demo/kaggle-otto/readme.md | 22 ---------------------- 2 files changed, 47 deletions(-) delete mode 100644 demo/kaggle-otto/README.MD delete mode 100644 demo/kaggle-otto/readme.md diff --git a/demo/kaggle-otto/README.MD b/demo/kaggle-otto/README.MD deleted file mode 100644 index 0c7bd45a4..000000000 --- a/demo/kaggle-otto/README.MD +++ /dev/null @@ -1,25 +0,0 @@ -Benckmark for Otto Group Competition -========= - -This is a folder containing the benchmark for the [Otto Group Competition on Kaggle](http://www.kaggle.com/c/otto-group-product-classification-challenge). - -## Getting started - -1. Put `train.csv` and `test.csv` under the `data` folder -2. Run the script -3. Submit the `submission.csv` - -The parameter `nthread` controls the number of cores to run on, please set it to suit your machine. - -## R-package - -To install the R-package of xgboost, please run - -```r -devtools::install_github('tqchen/xgboost',subdir='R-package') -``` - -Windows users may need to install [RTools](http://cran.r-project.org/bin/windows/Rtools/) first. - - - diff --git a/demo/kaggle-otto/readme.md b/demo/kaggle-otto/readme.md deleted file mode 100644 index 53b0bd8d5..000000000 --- a/demo/kaggle-otto/readme.md +++ /dev/null @@ -1,22 +0,0 @@ -Benckmark for Otto Group Competition -========= - -This is a folder containing the benchmark for the [Otto Group Competition on Kaggle](http://www.kaggle.com/c/otto-group-product-classification-challenge). - -## Getting started - -1. Put `train.csv` and `test.csv` under the `data` folder -2. Run the script -3. Submit the `submission.csv` - -The parameter `nthread` controls the number of cores to run on, please set it to suit your machine. - -## R-package - -To install the R-package of xgboost, please run - -```r -devtools::install_github('tqchen/xgboost',subdir='R-package') -``` - -Windows users may need to install [RTools](http://cran.r-project.org/bin/windows/Rtools/) first. \ No newline at end of file From e4c8d9d2e12cb4f02b1cdc115a48b402c27e93ac Mon Sep 17 00:00:00 2001 From: El Potaeto Date: Wed, 15 Apr 2015 18:47:31 +0200 Subject: [PATCH 11/12] clean --- demo/kaggle-otto/README.MD | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) create mode 100644 demo/kaggle-otto/README.MD diff --git a/demo/kaggle-otto/README.MD b/demo/kaggle-otto/README.MD new file mode 100644 index 000000000..94e422a13 --- /dev/null +++ b/demo/kaggle-otto/README.MD @@ -0,0 +1,24 @@ +Benckmark for Otto Group Competition +========= + +This is a folder containing the benchmark for the [Otto Group Competition on Kaggle](http://www.kaggle.com/c/otto-group-product-classification-challenge). + +## Getting started + +1. Put `train.csv` and `test.csv` under the `data` folder +2. Run the script +3. Submit the `submission.csv` + +The parameter `nthread` controls the number of cores to run on, please set it to suit your machine. + +## R-package + +To install the R-package of xgboost, please run + +```r +devtools::install_github('tqchen/xgboost',subdir='R-package') +``` + +Windows users may need to install [RTools](http://cran.r-project.org/bin/windows/Rtools/) first. + + From a49150a6d2ce40f40994ef51d5b1fd3fc5405b60 Mon Sep 17 00:00:00 2001 From: El Potaeto Date: Wed, 15 Apr 2015 18:49:52 +0200 Subject: [PATCH 12/12] Redo readme modification --- README.md | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index 4acdc03dc..ec03ca336 100644 --- a/README.md +++ b/README.md @@ -5,25 +5,27 @@ It implements machine learning algorithm under gradient boosting framework, incl Contributors: https://github.com/dmlc/xgboost/graphs/contributors -Turorial and Documentation: https://github.com/dmlc/xgboost/wiki +Issues Tracker: [https://github.com/dmlc/xgboost/issues](https://github.com/dmlc/xgboost/issues?q=is%3Aissue+label%3Aquestion) -Issues Tracker: [https://github.com/dmlc/xgboost/issues](https://github.com/dmlc/xgboost/issues?q=is%3Aissue+label%3Aquestion) for bugreport and other issues - -Please join [XGBoost User Group](https://groups.google.com/forum/#!forum/xgboost-user/) to ask usage questions and share your experience on xgboost. +Please join [XGBoost User Group](https://groups.google.com/forum/#!forum/xgboost-user/) to ask questions and share your experience on xgboost. Examples Code: [Learning to use xgboost by examples](demo) -Video tutorial: [Better Optimization with Repeated Cross Validation and the XGBoost model - Machine Learning with R](https://www.youtube.com/watch?v=Og7CGAfSr_Y) - Distributed Version: [Distributed XGBoost](multi-node) Notes on the Code: [Code Guide](src) +Turorial and Documentation: https://github.com/dmlc/xgboost/wiki + +Video tutorial: [Better Optimization with Repeated Cross Validation and the XGBoost model - Machine Learning with R](https://www.youtube.com/watch?v=Og7CGAfSr_Y) + Learning about the model: [Introduction to Boosted Trees](http://homes.cs.washington.edu/~tqchen/pdf/BoostedTree.pdf) * This slide is made by Tianqi Chen to introduce gradient boosting in a statistical view. * It present boosted tree learning as formal functional space optimization of defined objective. * The model presented is used by xgboost for boosted trees +Presention of a real use case of XGBoost to prepare tax audit in France: [Feature Importance Analysis with XGBoost in Tax audit](http://fr.slideshare.net/MichaelBENESTY/feature-importance-analysis-with-xgboost-in-tax-audit) + What's New ========== * XGBoost now support HDFS and S3