Merge pull request #6 from dmlc/master

update
2015-12-18 14:24:08 +08:00
parent 3453b6e715 4a15939c13
commit f378fac6a1
172 changed files with 4068 additions and 1709 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -66,3 +66,8 @@ java/xgboost4j-demo/tmp/
 java/xgboost4j-demo/model/
 nb-configuration*
 dmlc-core
+# Eclipse
+.project
+.cproject
+.pydevproject
+.settings/
--- a/CHANGES.md
+++ b/CHANGES.md
@@ -37,11 +37,22 @@ xgboost-0.4

 on going at master
 ==================
-* Fix List
-  - Fixed possible problem of poisson regression for R.
-* Python module now throw exception instead of crash terminal when a parameter error happens.
-* Python module now has importance plot and tree plot functions.
+* Changes in R library
+  - fixed possible problem of poisson regression.
+  - switched from 0 to NA for missing values.
+  - exposed access to additional model parameters.
+* Changes in Python library
+  - throws exception instead of crash terminal when a parameter error happens.
+  - has importance plot and tree plot functions.
+  - accepts different learning rates for each boosting round.
+  - allows model training continuation from previously saved model.
+  - allows early stopping in CV.
+  - allows feval to return a list of tuples.
+  - allows eval_metric to handle additional format.
+  - improved compatibility in sklearn module.
+  - additional parameters added for sklearn wrapper.
+  - added pip installation functionality.
+  - supports more Pandas DataFrame dtypes. 
+  - added best_ntree_limit attribute, in addition to best_score and best_iteration.
 * Java api is ready for use
-* Added more test cases and continuous integration to make each build more robust
-* Improvements in sklearn compatible module
-* Added pip installation functionality for python module
+* Added more test cases and continuous integration to make each build more robust.
--- a/CONTRIBUTORS.md
+++ b/CONTRIBUTORS.md
@@ -13,6 +13,8 @@ Committers are people who have made substantial contribution to the project and
  - Bing is the original creator of xgboost python package and currently the maintainer of [XGBoost.jl](https://github.com/antinucleon/XGBoost.jl).
 * [Michael Benesty](https://github.com/pommedeterresautee)
  - Micheal is a lawyer, data scientist in France, he is the creator of xgboost interactive analysis module in R.
+* [Yuan Tang](https://github.com/terrytangyuan)
+  - Yuan is a data scientist in Chicago, US. He contributed mostly in R and Python packages. 

 Become a Comitter
 -----------------
@@ -34,7 +36,6 @@ List of Contributors
 * [Zygmunt Zając](https://github.com/zygmuntz)
  - Zygmunt is the master behind the early stopping feature frequently used by kagglers.
 * [Ajinkya Kale](https://github.com/ajkl)
-* [Yuan Tang](https://github.com/terrytangyuan)
 * [Boliang Chen](https://github.com/cblsjtu)
 * [Vadim Khotilovich](https://github.com/khotilov)
 * [Yangqing Men](https://github.com/yanqingmen)
@@ -49,4 +50,10 @@ List of Contributors
  - Masaaki is the initial creator of xgboost python plotting module.
 * [Hongliang Liu](https://github.com/phunterlau)
  - Hongliang is the maintainer of xgboost python PyPI package for pip installation.
+* [daiyl0320](https://github.com/daiyl0320)
+  - daiyl0320 contributed patch to xgboost distributed version more robust, and scales stably on TB scale datasets.
 * [Huayi Zhang](https://github.com/irachex)
+* [Johan Manders](https://github.com/johanmanders)
+* [yoori](https://github.com/yoori)
+* [Mathias Müller](https://github.com/far0n)
+* [Sam Thomson](https://github.com/sammthomson)
--- a/6
+++ b/6
@@ -177,11 +177,11 @@ Rcheck:
 	R CMD check --as-cran xgboost*.tar.gz

 pythonpack:
-	#make clean
+	#for pip maintainer only
 	cd subtree/rabit;make clean;cd ..
 	rm -rf xgboost-deploy xgboost*.tar.gz
 	cp -r python-package xgboost-deploy
-	cp *.md xgboost-deploy/
+	#cp *.md xgboost-deploy/
 	cp LICENSE xgboost-deploy/
 	cp Makefile xgboost-deploy/xgboost
 	cp -r wrapper xgboost-deploy/xgboost
@@ -189,7 +189,7 @@ pythonpack:
 	cp -r multi-node xgboost-deploy/xgboost
 	cp -r windows xgboost-deploy/xgboost
 	cp -r src xgboost-deploy/xgboost
-
+	cp python-package/setup_pip.py xgboost-deploy/setup.py
 	#make python

 pythonbuild:
--- a/R-package/DESCRIPTION
+++ b/R-package/DESCRIPTION
@@ -3,16 +3,16 @@ Type: Package
 Title: Extreme Gradient Boosting
 Version: 0.4-2
 Date: 2015-08-01
-Author: Tianqi Chen <tianqi.tchen@gmail.com>, Tong He <hetong007@gmail.com>, Michael Benesty <michael@benesty.fr>
+Author: Tianqi Chen <tianqi.tchen@gmail.com>, Tong He <hetong007@gmail.com>,
+    Michael Benesty <michael@benesty.fr>
 Maintainer: Tong He <hetong007@gmail.com>
-Description: Extreme Gradient Boosting, which is an 
-    efficient implementation of gradient boosting framework. 
-    This package is its R interface. The package includes efficient 
-    linear model solver and tree learning algorithms. The package can automatically 
-    do parallel computation on a single machine which could be more than 10 times faster
-    than existing gradient boosting packages. It supports various
-    objective functions, including regression, classification and ranking. The
-    package is made to be extensible, so that users are also allowed to define
+Description: Extreme Gradient Boosting, which is an efficient implementation
+    of gradient boosting framework. This package is its R interface. The package
+    includes efficient linear model solver and tree learning algorithms. The package
+    can automatically do parallel computation on a single machine which could be
+    more than 10 times faster than existing gradient boosting packages. It supports
+    various objective functions, including regression, classification and ranking.
+    The package is made to be extensible, so that users are also allowed to define
    their own objectives easily.
 License: Apache License (== 2.0) | file LICENSE
 URL: https://github.com/dmlc/xgboost
@@ -20,16 +20,18 @@ BugReports: https://github.com/dmlc/xgboost/issues
 VignetteBuilder: knitr
 Suggests:
    knitr,
-    ggplot2 (>= 1.0.0),
-    DiagrammeR (>= 0.6),
+    ggplot2 (>= 1.0.1),
+    DiagrammeR (>= 0.8.1),
    Ckmeans.1d.dp (>= 3.3.1),
    vcd (>= 1.3),
-    testthat
+    testthat,
+    igraph (>= 1.0.1)
 Depends:
    R (>= 2.10)
 Imports:
    Matrix (>= 1.1-0),
    methods,
-    data.table (>= 1.9.4),
+    data.table (>= 1.9.6),
    magrittr (>= 1.5),
    stringr (>= 0.6.2)
+RoxygenNote: 5.0.1
--- a/R-package/NAMESPACE
+++ b/R-package/NAMESPACE
@@ -1,16 +1,19 @@
-# Generated by roxygen2 (4.1.1): do not edit by hand
+# Generated by roxygen2: do not edit by hand

 export(getinfo)
 export(setinfo)
 export(slice)
 export(xgb.DMatrix)
 export(xgb.DMatrix.save)
+export(xgb.create.features)
 export(xgb.cv)
 export(xgb.dump)
 export(xgb.importance)
 export(xgb.load)
 export(xgb.model.dt.tree)
+export(xgb.plot.deepness)
 export(xgb.plot.importance)
+export(xgb.plot.multi.trees)
 export(xgb.plot.tree)
 export(xgb.save)
 export(xgb.save.raw)
@@ -23,6 +26,7 @@ importClassesFrom(Matrix,dgCMatrix)
 importClassesFrom(Matrix,dgeMatrix)
 importFrom(Matrix,cBind)
 importFrom(Matrix,colSums)
+importFrom(Matrix,sparse.model.matrix)
 importFrom(Matrix,sparseVector)
 importFrom(data.table,":=")
 importFrom(data.table,as.data.table)
@@ -35,6 +39,7 @@ importFrom(data.table,setnames)
 importFrom(magrittr,"%>%")
 importFrom(magrittr,add)
 importFrom(magrittr,not)
+importFrom(stringr,str_detect)
 importFrom(stringr,str_extract)
 importFrom(stringr,str_extract_all)
 importFrom(stringr,str_match)
--- a/R-package/R/getinfo.xgb.DMatrix.R
+++ b/R-package/R/getinfo.xgb.DMatrix.R
@@ -23,7 +23,6 @@ setClass('xgb.DMatrix')
 #' stopifnot(all(labels2 == 1-labels))
 #' @rdname getinfo
 #' @export
-#' 
 getinfo <- function(object, ...){
    UseMethod("getinfo")
 }
@@ -35,7 +34,7 @@ getinfo <- function(object, ...){
 #' @param ... other parameters
 #' @rdname getinfo
 #' @method getinfo xgb.DMatrix
-setMethod("getinfo", signature = "xgb.DMatrix", 
+setMethod("getinfo", signature = "xgb.DMatrix",
          definition = function(object, name) {
              if (typeof(name) != "character") {
                  stop("xgb.getinfo: name must be character")
@@ -43,7 +42,7 @@ setMethod("getinfo", signature = "xgb.DMatrix",
              if (class(object) != "xgb.DMatrix") {
                  stop("xgb.setinfo: first argument dtrain must be xgb.DMatrix")
              }
-              if (name != "label" && name != "weight" && 
+              if (name != "label" && name != "weight" &&
                      name != "base_margin" && name != "nrow") {
                  stop(paste("xgb.getinfo: unknown info name", name))
              }
@@ -54,4 +53,3 @@ setMethod("getinfo", signature = "xgb.DMatrix",
              }
              return(ret)
          })
-
--- a/R-package/R/predict.xgb.Booster.R
+++ b/R-package/R/predict.xgb.Booster.R
@@ -20,6 +20,17 @@ setClass("xgb.Booster",
 #'  only valid for gbtree, but not for gblinear. set it to be value bigger 
 #'  than 0. It will use all trees by default.
 #' @param predleaf whether predict leaf index instead. If set to TRUE, the output will be a matrix object.
+#' 
+#' @details  
+#' The option \code{ntreelimit} purpose is to let the user train a model with lots 
+#' of trees but use only the first trees for prediction to avoid overfitting 
+#' (without having to train a new model with less trees).
+#' 
+#' The option \code{predleaf} purpose is inspired from §3.1 of the paper 
+#' \code{Practical Lessons from Predicting Clicks on Ads at Facebook}.
+#' The idea is to use the model as a generator of new features which capture non linear link 
+#' from original features.
+#' 
 #' @examples
 #' data(agaricus.train, package='xgboost')
 #' data(agaricus.test, package='xgboost')
@@ -29,9 +40,8 @@ setClass("xgb.Booster",
 #'                eta = 1, nthread = 2, nround = 2,objective = "binary:logistic")
 #' pred <- predict(bst, test$data)
 #' @export
-#' 
-setMethod("predict", signature = "xgb.Booster", 
-          definition = function(object, newdata, missing = NULL, 
+setMethod("predict", signature = "xgb.Booster",
+          definition = function(object, newdata, missing = NA,
                                outputmargin = FALSE, ntreelimit = NULL, predleaf = FALSE) {
  if (class(object) != "xgb.Booster"){
    stop("predict: model in prediction must be of class xgb.Booster")
@@ -39,11 +49,7 @@ setMethod("predict", signature = "xgb.Booster",
    object <- xgb.Booster.check(object, saveraw = FALSE)
  }
  if (class(newdata) != "xgb.DMatrix") {
-    if (is.null(missing)) {
-      newdata <- xgb.DMatrix(newdata)
-    } else {
-      newdata <- xgb.DMatrix(newdata, missing = missing)
-    }
+    newdata <- xgb.DMatrix(newdata, missing = missing)
  }
  if (is.null(ntreelimit)) {
    ntreelimit <- 0
@@ -52,14 +58,14 @@ setMethod("predict", signature = "xgb.Booster",
      stop("predict: ntreelimit must be equal to or greater than 1")
    }
  }
-  option = 0
+  option <- 0
  if (outputmargin) {
    option <- option + 1
  }
  if (predleaf) {
    option <- option + 2
  }
-  ret <- .Call("XGBoosterPredict_R", object$handle, newdata, as.integer(option), 
+  ret <- .Call("XGBoosterPredict_R", object$handle, newdata, as.integer(option),
               as.integer(ntreelimit), PACKAGE = "xgboost")
  if (predleaf){
      len <- getinfo(newdata, "nrow")
@@ -72,4 +78,3 @@ setMethod("predict", signature = "xgb.Booster",
  }
  return(ret)
 })
-
--- a/R-package/R/predict.xgb.Booster.handle.R
+++ b/R-package/R/predict.xgb.Booster.handle.R
@@ -5,15 +5,14 @@
 #' @param object Object of class "xgb.Boost.handle"
 #' @param ... Parameters pass to \code{predict.xgb.Booster}
 #' 
-setMethod("predict", signature = "xgb.Booster.handle", 
+setMethod("predict", signature = "xgb.Booster.handle",
          definition = function(object, ...) {
  if (class(object) != "xgb.Booster.handle"){
    stop("predict: model in prediction must be of class xgb.Booster.handle")
  }
-  
+
  bst <- xgb.handleToBooster(object)
-  
-  ret = predict(bst, ...)
+
+  ret <- predict(bst, ...)
  return(ret)
 })
-
--- a/R-package/R/setinfo.xgb.DMatrix.R
+++ b/R-package/R/setinfo.xgb.DMatrix.R
@@ -21,7 +21,6 @@
 #' stopifnot(all(labels2 == 1-labels))
 #' @rdname setinfo
 #' @export
-#' 
 setinfo <- function(object, ...){
  UseMethod("setinfo")
 }
@@ -32,7 +31,7 @@ setinfo <- function(object, ...){
 #' @param ... other parameters
 #' @rdname setinfo
 #' @method setinfo xgb.DMatrix
-setMethod("setinfo", signature = "xgb.DMatrix", 
+setMethod("setinfo", signature = "xgb.DMatrix",
          definition = function(object, name, info) {
            xgb.setinfo(object, name, info)
          })
--- a/R-package/R/slice.xgb.DMatrix.R
+++ b/R-package/R/slice.xgb.DMatrix.R
@@ -13,7 +13,6 @@ setClass('xgb.DMatrix')
 #' dsub <- slice(dtrain, 1:3)
 #' @rdname slice
 #' @export
-#' 
 slice <- function(object, ...){
    UseMethod("slice")
 }
@@ -23,19 +22,19 @@ slice <- function(object, ...){
 #' @param ... other parameters
 #' @rdname slice
 #' @method slice xgb.DMatrix
-setMethod("slice", signature = "xgb.DMatrix", 
+setMethod("slice", signature = "xgb.DMatrix",
          definition = function(object, idxset, ...) {
              if (class(object) != "xgb.DMatrix") {
                  stop("slice: first argument dtrain must be xgb.DMatrix")
              }
-              ret <- .Call("XGDMatrixSliceDMatrix_R", object, idxset, 
+              ret <- .Call("XGDMatrixSliceDMatrix_R", object, idxset,
                           PACKAGE = "xgboost")
-              
+
              attr_list <- attributes(object)
              nr <- xgb.numrow(object)
              len <- sapply(attr_list,length)
-              ind <- which(len==nr)
-              if (length(ind)>0) {
+              ind <- which(len == nr)
+              if (length(ind) > 0) {
                  nms <- names(attr_list)[ind]
                  for (i in 1:length(ind)) {
                    attr(ret,nms[i]) <- attr(object,nms[i])[idxset]
--- a/R-package/R/utils.R
+++ b/R-package/R/utils.R
@@ -1,4 +1,4 @@
-#' @importClassesFrom Matrix dgCMatrix dgeMatrix
+  #' @importClassesFrom Matrix dgCMatrix dgeMatrix
 #' @import methods

 # depends on matrix
@@ -15,30 +15,30 @@ xgb.setinfo <- function(dmat, name, info) {
    stop("xgb.setinfo: first argument dtrain must be xgb.DMatrix")
  }
  if (name == "label") {
-    if (length(info)!=xgb.numrow(dmat))
+    if (length(info) != xgb.numrow(dmat))
      stop("The length of labels must equal to the number of rows in the input data")
-    .Call("XGDMatrixSetInfo_R", dmat, name, as.numeric(info), 
+    .Call("XGDMatrixSetInfo_R", dmat, name, as.numeric(info),
          PACKAGE = "xgboost")
    return(TRUE)
  }
  if (name == "weight") {
-    if (length(info)!=xgb.numrow(dmat))
+    if (length(info) != xgb.numrow(dmat))
      stop("The length of weights must equal to the number of rows in the input data")
-    .Call("XGDMatrixSetInfo_R", dmat, name, as.numeric(info), 
+    .Call("XGDMatrixSetInfo_R", dmat, name, as.numeric(info),
          PACKAGE = "xgboost")
    return(TRUE)
  }
  if (name == "base_margin") {
    # if (length(info)!=xgb.numrow(dmat))
    #   stop("The length of base margin must equal to the number of rows in the input data")
-    .Call("XGDMatrixSetInfo_R", dmat, name, as.numeric(info), 
+    .Call("XGDMatrixSetInfo_R", dmat, name, as.numeric(info),
          PACKAGE = "xgboost")
    return(TRUE)
  }
  if (name == "group") {
-    if (sum(info)!=xgb.numrow(dmat))
+    if (sum(info) != xgb.numrow(dmat))
      stop("The sum of groups must equal to the number of rows in the input data")
-    .Call("XGDMatrixSetInfo_R", dmat, name, as.integer(info), 
+    .Call("XGDMatrixSetInfo_R", dmat, name, as.integer(info),
          PACKAGE = "xgboost")
    return(TRUE)
  }
@@ -68,7 +68,7 @@ xgb.Booster <- function(params = list(), cachelist = list(), modelfile = NULL) {
    if (typeof(modelfile) == "character") {
      .Call("XGBoosterLoadModel_R", handle, modelfile, PACKAGE = "xgboost")
    } else if (typeof(modelfile) == "raw") {
-      .Call("XGBoosterLoadModelFromRaw_R", handle, modelfile, PACKAGE = "xgboost")      
+      .Call("XGBoosterLoadModelFromRaw_R", handle, modelfile, PACKAGE = "xgboost")
    } else {
      stop("xgb.Booster: modelfile must be character or raw vector")
    }
@@ -103,18 +103,13 @@ xgb.Booster.check <- function(bst, saveraw = TRUE)
 ## ----the following are low level iteratively function, not needed if
 ## you do not want to use them ---------------------------------------
 # get dmatrix from data, label
-xgb.get.DMatrix <- function(data, label = NULL, missing = NULL, weight = NULL) {
+xgb.get.DMatrix <- function(data, label = NULL, missing = NA, weight = NULL) {
  inClass <- class(data)
  if (inClass == "dgCMatrix" || inClass == "matrix") {
    if (is.null(label)) {
      stop("xgboost: need label when data is a matrix")
    }
-    dtrain <- xgb.DMatrix(data, label = label)
-    if (is.null(missing)){
-      dtrain <- xgb.DMatrix(data, label = label)
-    } else {
-      dtrain <- xgb.DMatrix(data, label = label, missing = missing)
-    }
+    dtrain <- xgb.DMatrix(data, label = label, missing = missing)
    if (!is.null(weight)){
      xgb.setinfo(dtrain, "weight", weight)
    }
@@ -127,8 +122,8 @@ xgb.get.DMatrix <- function(data, label = NULL, missing = NULL, weight = NULL) {
    } else if (inClass == "xgb.DMatrix") {
      dtrain <- data
    } else if (inClass == "data.frame") {
-      stop("xgboost only support numerical matrix input, 
-           use 'data.frame' to transform the data.")
+      stop("xgboost only support numerical matrix input,
+           use 'data.matrix' to transform the data.")
    } else {
      stop("xgboost: Invalid input of data")
    }
@@ -147,8 +142,7 @@ xgb.iter.boost <- function(booster, dtrain, gpair) {
  if (class(dtrain) != "xgb.DMatrix") {
    stop("xgb.iter.update: second argument must be type xgb.DMatrix")
  }
-  .Call("XGBoosterBoostOneIter_R", booster, dtrain, gpair$grad, gpair$hess, 
-        PACKAGE = "xgboost")
+  .Call("XGBoosterBoostOneIter_R", booster, dtrain, gpair$grad, gpair$hess, PACKAGE = "xgboost")
  return(TRUE)
 }

@@ -162,9 +156,9 @@ xgb.iter.update <- function(booster, dtrain, iter, obj = NULL) {
  }

  if (is.null(obj)) {
-    .Call("XGBoosterUpdateOneIter_R", booster, as.integer(iter), dtrain, 
+    .Call("XGBoosterUpdateOneIter_R", booster, as.integer(iter), dtrain,
          PACKAGE = "xgboost")
-  } else {
+    } else {
    pred <- predict(booster, dtrain)
    gpair <- obj(pred, dtrain)
    succ <- xgb.iter.boost(booster, dtrain, gpair)
@@ -195,7 +189,7 @@ xgb.iter.eval <- function(booster, watchlist, iter, feval = NULL, prediction = F
        }
        evnames <- append(evnames, names(w))
      }
-      msg <- .Call("XGBoosterEvalOneIter_R", booster, as.integer(iter), watchlist, 
+      msg <- .Call("XGBoosterEvalOneIter_R", booster, as.integer(iter), watchlist,
                   evnames, PACKAGE = "xgboost")
    } else {
      msg <- paste("[", iter, "]", sep="")
@@ -253,21 +247,21 @@ xgb.cv.mknfold <- function(dall, nfold, param, stratified, folds) {
        if (length(unique(y)) <= 5) y <- factor(y)
      }
      folds <- xgb.createFolds(y, nfold)
-    } else { 
+    } else {
      # make simple non-stratified folds
      kstep <- length(randidx) %/% nfold
      folds <- list()
-      for (i in 1:(nfold-1)) {
-        folds[[i]] = randidx[1:kstep]
-        randidx = setdiff(randidx, folds[[i]])
+      for (i in 1:(nfold - 1)) {
+        folds[[i]] <- randidx[1:kstep]
+        randidx <- setdiff(randidx, folds[[i]])
      }
-      folds[[nfold]] = randidx
+      folds[[nfold]] <- randidx
    }
  }
  ret <- list()
  for (k in 1:nfold) {
    dtest <- slice(dall, folds[[k]])
-    didx = c()
+    didx <- c()
    for (i in 1:nfold) {
      if (i != k) {
        didx <- append(didx, folds[[i]])
@@ -275,7 +269,7 @@ xgb.cv.mknfold <- function(dall, nfold, param, stratified, folds) {
    }
    dtrain <- slice(dall, didx)
    bst <- xgb.Booster(param, list(dtrain, dtest))
-    watchlist = list(train=dtrain, test=dtest)
+    watchlist <- list(train=dtrain, test=dtest)
    ret[[k]] <- list(dtrain=dtrain, booster=bst, watchlist=watchlist, index=folds[[k]])
  }
  return (ret)
@@ -288,7 +282,7 @@ xgb.cv.aggcv <- function(res, showsd = TRUE) {
    kv <- strsplit(header[i], ":")[[1]]
    ret <- paste(ret, "\t", kv[1], ":", sep="")
    stats <- c()
-    stats[1] <- as.numeric(kv[2])    
+    stats[1] <- as.numeric(kv[2])
    for (j in 2:length(res)) {
      tkv <- strsplit(res[[j]][i], ":")[[1]]
      stats[j] <- as.numeric(tkv[2])
@@ -316,9 +310,9 @@ xgb.createFolds <- function(y, k = 10)
    ## At most, we will use quantiles. If the sample
    ## is too small, we just do regular unstratified
    ## CV
-    cuts <- floor(length(y)/k)
-    if(cuts < 2) cuts <- 2
-    if(cuts > 5) cuts <- 5
+    cuts <- floor(length(y) / k)
+    if (cuts < 2) cuts <- 2
+    if (cuts > 5) cuts <- 5
    y <- cut(y,
             unique(stats::quantile(y, probs = seq(0, 1, length = cuts))),
             include.lowest = TRUE)
@@ -330,7 +324,7 @@ xgb.createFolds <- function(y, k = 10)
    y <- factor(as.character(y))
    numInClass <- table(y)
    foldVector <- vector(mode = "integer", length(y))
-    
+
    ## For each class, balance the fold allocation as far
    ## as possible, then resample the remainder.
    ## The final assignment of folds is also randomized.
--- a/R-package/R/xgb.DMatrix.R
+++ b/R-package/R/xgb.DMatrix.R
@@ -17,29 +17,28 @@
 #' xgb.DMatrix.save(dtrain, 'xgb.DMatrix.data')
 #' dtrain <- xgb.DMatrix('xgb.DMatrix.data')
 #' @export
-#' 
-xgb.DMatrix <- function(data, info = list(), missing = 0, ...) {
+xgb.DMatrix <- function(data, info = list(), missing = NA, ...) {
  if (typeof(data) == "character") {
-    handle <- .Call("XGDMatrixCreateFromFile_R", data, as.integer(FALSE), 
+    handle <- .Call("XGDMatrixCreateFromFile_R", data, as.integer(FALSE),
                    PACKAGE = "xgboost")
  } else if (is.matrix(data)) {
-    handle <- .Call("XGDMatrixCreateFromMat_R", data, missing, 
+    handle <- .Call("XGDMatrixCreateFromMat_R", data, missing,
                    PACKAGE = "xgboost")
  } else if (class(data) == "dgCMatrix") {
-    handle <- .Call("XGDMatrixCreateFromCSC_R", data@p, data@i, data@x, 
+    handle <- .Call("XGDMatrixCreateFromCSC_R", data@p, data@i, data@x,
                    PACKAGE = "xgboost")
  } else {
-    stop(paste("xgb.DMatrix: does not support to construct from ", 
+    stop(paste("xgb.DMatrix: does not support to construct from ",
               typeof(data)))
  }
  dmat <- structure(handle, class = "xgb.DMatrix")
-  
+
  info <- append(info, list(...))
-  if (length(info) == 0) 
+  if (length(info) == 0)
    return(dmat)
  for (i in 1:length(info)) {
    p <- info[i]
    xgb.setinfo(dmat, names(p), p[[1]])
  }
  return(dmat)
-} 
+}
--- a/R-package/R/xgb.DMatrix.save.R
+++ b/R-package/R/xgb.DMatrix.save.R
@@ -12,16 +12,15 @@
 #' xgb.DMatrix.save(dtrain, 'xgb.DMatrix.data')
 #' dtrain <- xgb.DMatrix('xgb.DMatrix.data')
 #' @export
-#' 
 xgb.DMatrix.save <- function(DMatrix, fname) {
  if (typeof(fname) != "character") {
    stop("xgb.save: fname must be character")
  }
  if (class(DMatrix) == "xgb.DMatrix") {
-    .Call("XGDMatrixSaveBinary_R", DMatrix, fname, as.integer(FALSE), 
+    .Call("XGDMatrixSaveBinary_R", DMatrix, fname, as.integer(FALSE),
          PACKAGE = "xgboost")
    return(TRUE)
  }
  stop("xgb.DMatrix.save: the input must be xgb.DMatrix")
  return(FALSE)
-} 
+}
--- a/R-package/R/xgb.create.features.R
+++ b/R-package/R/xgb.create.features.R
@@ -0,0 +1,91 @@
+#' Create new features from a previously learned model
+#' 
+#' May improve the learning by adding new features to the training data based on the decision trees from a previously learned model.
+#' 
+#' @importFrom magrittr %>%
+#' @importFrom Matrix cBind
+#' @importFrom Matrix sparse.model.matrix
+#' 
+#' @param model decision tree boosting model learned on the original data
+#' @param training.data original data (usually provided as a \code{dgCMatrix} matrix)
+#' 
+#' @return \code{dgCMatrix} matrix including both the original data and the new features.
+#'
+#' @details 
+#' This is the function inspired from the paragraph 3.1 of the paper:
+#' 
+#' \strong{Practical Lessons from Predicting Clicks on Ads at Facebook}
+#' 
+#' \emph{(Xinran He, Junfeng Pan, Ou Jin, Tianbing Xu, Bo Liu, Tao Xu, Yan, xin Shi, Antoine Atallah, Ralf Herbrich, Stuart Bowers, 
+#' Joaquin Quiñonero Candela)}
+#'  
+#' International Workshop on Data Mining for Online Advertising (ADKDD) - August 24, 2014
+#' 
+#' \url{https://research.facebook.com/publications/758569837499391/practical-lessons-from-predicting-clicks-on-ads-at-facebook/}.
+#' 
+#' Extract explaining the method:
+#' 
+#' "\emph{We found that boosted decision trees are a powerful and very
+#' convenient way to implement non-linear and tuple transformations
+#' of the kind we just described. We treat each individual
+#' tree as a categorical feature that takes as value the
+#' index of the leaf an instance ends up falling in. We use 
+#' 1-of-K coding of this type of features. 
+#' 
+#' For example, consider the boosted tree model in Figure 1 with 2 subtrees, 
+#' where the first subtree has 3 leafs and the second 2 leafs. If an
+#' instance ends up in leaf 2 in the first subtree and leaf 1 in
+#' second subtree, the overall input to the linear classifier will
+#' be the binary vector \code{[0, 1, 0, 1, 0]}, where the first 3 entries
+#' correspond to the leaves of the first subtree and last 2 to
+#' those of the second subtree.
+#' 
+#' [...]
+#' 
+#' We can understand boosted decision tree
+#' based transformation as a supervised feature encoding that
+#' converts a real-valued vector into a compact binary-valued
+#' vector. A traversal from root node to a leaf node represents
+#' a rule on certain features.}"
+#' 
+#' @examples
+#' data(agaricus.train, package='xgboost')
+#' data(agaricus.test, package='xgboost')
+#' dtrain <- xgb.DMatrix(data = agaricus.train$data, label = agaricus.train$label)
+#' dtest <- xgb.DMatrix(data = agaricus.test$data, label = agaricus.test$label)
+#'
+#' param <- list(max.depth=2, eta=1, silent=1, objective='binary:logistic')
+#' nround = 4
+#'
+#' bst = xgb.train(params = param, data = dtrain, nrounds = nround, nthread = 2)
+#' 
+#' # Model accuracy without new features
+#' accuracy.before <- sum((predict(bst, agaricus.test$data) >= 0.5) == agaricus.test$label) / length(agaricus.test$label)
+#' 
+#' # Convert previous features to one hot encoding
+#' new.features.train <- xgb.create.features(model = bst, agaricus.train$data)
+#' new.features.test <- xgb.create.features(model = bst, agaricus.test$data)
+#' 
+#' # learning with new features
+#' new.dtrain <- xgb.DMatrix(data = new.features.train, label = agaricus.train$label)
+#' new.dtest <- xgb.DMatrix(data = new.features.test, label = agaricus.test$label)
+#' watchlist <- list(train = new.dtrain)
+#' bst <- xgb.train(params = param, data = new.dtrain, nrounds = nround, nthread = 2)
+#' 
+#' # Model accuracy with new features
+#' accuracy.after <- sum((predict(bst, new.dtest) >= 0.5) == agaricus.test$label) / length(agaricus.test$label)
+#' 
+#' # Here the accuracy was already good and is now perfect.
+#' cat(paste("The accuracy was", accuracy.before, "before adding leaf features and it is now", accuracy.after, "!\n"))
+#' 
+#' @export
+xgb.create.features <- function(model, training.data){
+  pred_with_leaf = predict(model, training.data, predleaf = TRUE)
+  cols <- list()
+  for(i in 1:length(trees)){
+    # max is not the real max but it s not important for the purpose of adding features
+    leaf.id <- sort(unique(pred_with_leaf[,i]))
+    cols[[i]] <- factor(x = pred_with_leaf[,i], level = leaf.id)
+  }
+  cBind(training.data, sparse.model.matrix( ~ . -1, as.data.frame(cols)))
+}
--- a/R-package/R/xgb.cv.R
+++ b/R-package/R/xgb.cv.R
@@ -90,16 +90,15 @@
 #'                   max.depth =3, eta = 1, objective = "binary:logistic")
 #' print(history)
 #' @export
-#'
-xgb.cv <- function(params=list(), data, nrounds, nfold, label = NULL, missing = NULL, 
-                   prediction = FALSE, showsd = TRUE, metrics=list(), 
+xgb.cv <- function(params=list(), data, nrounds, nfold, label = NULL, missing = NA,
+                   prediction = FALSE, showsd = TRUE, metrics=list(),
                   obj = NULL, feval = NULL, stratified = TRUE, folds = NULL, verbose = T, print.every.n=1L,
                   early.stop.round = NULL, maximize = NULL, ...) {
    if (typeof(params) != "list") {
        stop("xgb.cv: first argument params must be list")
    }
    if(!is.null(folds)) {
-        if(class(folds)!="list" | length(folds) < 2) {
+        if(class(folds) != "list" | length(folds) < 2) {
            stop("folds must be a list with 2 or more elements that are vectors of indices for each CV-fold")
        }
        nfold <- length(folds)
@@ -107,38 +106,34 @@ xgb.cv <- function(params=list(), data, nrounds, nfold, label = NULL, missing =
    if (nfold <= 1) {
        stop("nfold must be bigger than 1")
    }
-    if (is.null(missing)) {
-        dtrain <- xgb.get.DMatrix(data, label)
-    } else {
-        dtrain <- xgb.get.DMatrix(data, label, missing)
-    }
-    dot.params = list(...)
-    nms.params = names(params)
-    nms.dot.params = names(dot.params)
-    if (length(intersect(nms.params,nms.dot.params))>0)
+    dtrain <- xgb.get.DMatrix(data, label, missing)
+    dot.params <- list(...)
+    nms.params <- names(params)
+    nms.dot.params <- names(dot.params)
+    if (length(intersect(nms.params,nms.dot.params)) > 0)
        stop("Duplicated defined term in parameters. Please check your list of params.")
    params <- append(params, dot.params)
    params <- append(params, list(silent=1))
    for (mc in metrics) {
        params <- append(params, list("eval_metric"=mc))
    }
-    
+
    # customized objective and evaluation metric interface
    if (!is.null(params$objective) && !is.null(obj))
        stop("xgb.cv: cannot assign two different objectives")
    if (!is.null(params$objective))
-        if (class(params$objective)=='function') {
-            obj = params$objective
-            params[['objective']] = NULL
+        if (class(params$objective) == 'function') {
+            obj <- params$objective
+            params[['objective']] <- NULL
        }
    # if (!is.null(params$eval_metric) && !is.null(feval))
    #  stop("xgb.cv: cannot assign two different evaluation metrics")
    if (!is.null(params$eval_metric))
-        if (class(params$eval_metric)=='function') {
-            feval = params$eval_metric
-            params[['eval_metric']] = NULL
+        if (class(params$eval_metric) == 'function') {
+            feval <- params$eval_metric
+            params[['eval_metric']] <- NULL
        }
-    
+
    # Early Stopping
    if (!is.null(early.stop.round)){
        if (!is.null(feval) && is.null(maximize))
@@ -148,39 +143,39 @@ xgb.cv <- function(params=list(), data, nrounds, nfold, label = NULL, missing =
        if (is.null(maximize))
        {
            if (params$eval_metric %in% c('rmse','logloss','error','merror','mlogloss')) {
-                maximize = FALSE
+                maximize <- FALSE
            } else {
-                maximize = TRUE
+                maximize <- TRUE
            }
        }
-        
+
        if (maximize) {
-            bestScore = 0
+            bestScore <- 0
        } else {
-            bestScore = Inf
+            bestScore <- Inf
        }
-        bestInd = 0
-        earlyStopflag = FALSE
-        
-        if (length(metrics)>1)
+        bestInd <- 0
+        earlyStopflag <- FALSE
+
+        if (length(metrics) > 1)
            warning('Only the first metric is used for early stopping process.')
    }
-    
+
    xgb_folds <- xgb.cv.mknfold(dtrain, nfold, params, stratified, folds)
-    obj_type = params[['objective']]
-    mat_pred = FALSE
-    if (!is.null(obj_type) && obj_type=='multi:softprob')
+    obj_type <- params[['objective']]
+    mat_pred <- FALSE
+    if (!is.null(obj_type) && obj_type == 'multi:softprob')
    {
-        num_class = params[['num_class']]
+        num_class <- params[['num_class']]
        if (is.null(num_class))
            stop('must set num_class to use softmax')
        predictValues <- matrix(0,xgb.numrow(dtrain),num_class)
-        mat_pred = TRUE
+        mat_pred <- TRUE
    }
    else
        predictValues <- rep(0,xgb.numrow(dtrain))
    history <- c()
-    print.every.n = max(as.integer(print.every.n), 1L)
+    print.every.n <- max(as.integer(print.every.n), 1L)
    for (i in 1:nrounds) {
        msg <- list()
        for (k in 1:nfold) {
@@ -191,62 +186,60 @@ xgb.cv <- function(params=list(), data, nrounds, nfold, label = NULL, missing =
        ret <- xgb.cv.aggcv(msg, showsd)
        history <- c(history, ret)
        if(verbose)
-            if (0==(i-1L)%%print.every.n)
+            if (0 == (i - 1L) %% print.every.n)
                cat(ret, "\n", sep="")
-        
+
        # early_Stopping
        if (!is.null(early.stop.round)){
-            score = strsplit(ret,'\\s+')[[1]][1+length(metrics)+2]
-            score = strsplit(score,'\\+|:')[[1]][[2]]
-            score = as.numeric(score)
-            if ((maximize && score>bestScore) || (!maximize && score<bestScore)) {
-                bestScore = score
-                bestInd = i
+            score <- strsplit(ret,'\\s+')[[1]][1 + length(metrics) + 2]
+            score <- strsplit(score,'\\+|:')[[1]][[2]]
+            score <- as.numeric(score)
+            if ( (maximize && score > bestScore) || (!maximize && score < bestScore)) {
+                bestScore <- score
+                bestInd <- i
            } else {
-                if (i-bestInd>=early.stop.round) {
-                    earlyStopflag = TRUE
+                if (i - bestInd >= early.stop.round) {
+                    earlyStopflag <- TRUE
                    cat('Stopping. Best iteration:',bestInd)
                    break
                }
            }
        }
-        
    }
-    
+
    if (prediction) {
        for (k in 1:nfold) {
-            fd = xgb_folds[[k]]
+            fd <- xgb_folds[[k]]
            if (!is.null(early.stop.round) && earlyStopflag) {
-              res = xgb.iter.eval(fd$booster, fd$watchlist, bestInd - 1, feval, prediction)
+              res <- xgb.iter.eval(fd$booster, fd$watchlist, bestInd - 1, feval, prediction)
            } else {
-              res = xgb.iter.eval(fd$booster, fd$watchlist, nrounds - 1, feval, prediction)
+              res <- xgb.iter.eval(fd$booster, fd$watchlist, nrounds - 1, feval, prediction)
            }
            if (mat_pred) {
-                pred_mat = matrix(res[[2]],num_class,length(fd$index))
-                predictValues[fd$index,] = t(pred_mat)
+                pred_mat <- matrix(res[[2]],num_class,length(fd$index))
+                predictValues[fd$index,] <- t(pred_mat)
            } else {
-                predictValues[fd$index] = res[[2]]
+                predictValues[fd$index] <- res[[2]]
            }
        }
    }
-    
-    
+
    colnames <- str_split(string = history[1], pattern = "\t")[[1]] %>% .[2:length(.)] %>% str_extract(".*:") %>% str_replace(":","") %>% str_replace("-", ".")
    colnamesMean <- paste(colnames, "mean")
    if(showsd) colnamesStd <- paste(colnames, "std")
-    
+
    colnames <- c()
    if(showsd) for(i in 1:length(colnamesMean)) colnames <- c(colnames, colnamesMean[i], colnamesStd[i])
    else colnames <- colnamesMean
-    
+
    type <- rep(x = "numeric", times = length(colnames))
    dt <- utils::read.table(text = "", colClasses = type, col.names = colnames) %>% as.data.table
    split <- str_split(string = history, pattern = "\t")
-    
-    for(line in split) dt <- line[2:length(line)] %>% str_extract_all(pattern = "\\d*\\.+\\d*") %>% unlist %>% as.numeric %>% as.list %>% {rbindlist(list(dt, .), use.names = F, fill = F)}
-    
+
+    for(line in split) dt <- line[2:length(line)] %>% str_extract_all(pattern = "\\d*\\.+\\d*") %>% unlist %>% as.numeric %>% as.list %>% {rbindlist( list( dt, .), use.names = F, fill = F)}
+
    if (prediction) {
-        return(list(dt = dt,pred = predictValues))
+        return( list( dt = dt,pred = predictValues))
    }
    return(dt)
 }
--- a/R-package/R/xgb.dump.R
+++ b/R-package/R/xgb.dump.R
@@ -36,7 +36,6 @@
 #' # print the model without saving it to a file
 #' print(xgb.dump(bst))
 #' @export
-#' 
 xgb.dump <- function(model = NULL, fname = NULL, fmap = "", with.stats=FALSE) {
  if (class(model) != "xgb.Booster") {
    stop("model: argument must be type xgb.Booster")
@@ -49,13 +48,13 @@ xgb.dump <- function(model = NULL, fname = NULL, fmap = "", with.stats=FALSE) {
  if (!(class(fmap) %in% c("character", "NULL") && length(fname) <= 1)) {
    stop("fmap: argument must be type character (when provided)")
  }
-  
+
  longString <- .Call("XGBoosterDumpModel_R", model$handle, fmap, as.integer(with.stats), PACKAGE = "xgboost")
-  
+
  dt <- fread(paste(longString, collapse = ""), sep = "\n", header = F)

  setnames(dt, "Lines")
-  
+
  if(is.null(fname)) {
    result <- dt[Lines != "0"][, Lines := str_replace(Lines, "^\t+", "")][Lines != ""][, paste(Lines)]
    return(result)
--- a/R-package/R/xgb.importance.R
+++ b/R-package/R/xgb.importance.R
@@ -1,7 +1,6 @@
 #' Show importance of features in a model
 #' 
-#' Read a xgboost model text dump. 
-#' Can be tree or linear model (text dump of linear model are only supported in dev version of \code{Xgboost} for now).
+#' Create a \code{data.table} of the most important features of a model. 
 #' 
 #' @importFrom data.table data.table
 #' @importFrom data.table setnames
@@ -11,34 +10,30 @@
 #' @importFrom Matrix cBind
 #' @importFrom Matrix sparseVector
 #' 
-#' @param feature_names names of each feature as a character vector. Can be extracted from a sparse matrix (see example). If model dump already contains feature names, this argument should be \code{NULL}.
-#' 
-#' @param filename_dump the path to the text file storing the model. Model dump must include the gain per feature and per tree (\code{with.stats = T} in function \code{xgb.dump}).
-#' 
-#' @param model generated by the \code{xgb.train} function. Avoid the creation of a dump file.
-#' 
+#' @param feature_names names of each feature as a \code{character} vector. Can be extracted from a sparse matrix (see example). If model dump already contains feature names, this argument should be \code{NULL}.
+#' @param model generated by the \code{xgb.train} function.
 #' @param data the dataset used for the training step. Will be used with \code{label} parameter for co-occurence computation. More information in \code{Detail} part. This parameter is optional.
-#' 
 #' @param label the label vetor used for the training step. Will be used with \code{data} parameter for co-occurence computation. More information in \code{Detail} part. This parameter is optional.
-#' 
 #' @param target a function which returns \code{TRUE} or \code{1} when an observation should be count as a co-occurence and \code{FALSE} or \code{0} otherwise. Default function is provided for computing co-occurences in a binary classification. The \code{target} function should have only one parameter. This parameter will be used to provide each important feature vector after having applied the split condition, therefore these vector will be only made of 0 and 1 only, whatever was the information before. More information in \code{Detail} part. This parameter is optional.
 #'
 #' @return A \code{data.table} of the features used in the model with their average gain (and their weight for boosted tree model) in the model.
 #'
 #' @details 
-#' This is the function to understand the model trained (and through your model, your data).
-#' 
-#' Results are returned for both linear and tree models.
+#' This function is for both linear and tree models.
 #' 
 #' \code{data.table} is returned by the function. 
-#' There are 3 columns :
+#' The columns are :
 #' \itemize{
-#'   \item \code{Features} name of the features as provided in \code{feature_names} or already present in the model dump.
-#'   \item \code{Gain} contribution of each feature to the model. For boosted tree model, each gain of each feature of each tree is taken into account, then average per feature to give a vision of the entire model. Highest percentage means important feature to predict the \code{label} used for the training ;
-#'   \item \code{Cover} metric of the number of observation related to this feature (only available for tree models) ;
-#'   \item \code{Weight} percentage representing the relative number of times a feature have been taken into trees. \code{Gain} should be prefered to search the most important feature. For boosted linear model, this column has no meaning.
+#'   \item \code{Features} name of the features as provided in \code{feature_names} or already present in the model dump;
+#'   \item \code{Gain} contribution of each feature to the model. For boosted tree model, each gain of each feature of each tree is taken into account, then average per feature to give a vision of the entire model. Highest percentage means important feature to predict the \code{label} used for the training (only available for tree models);
+#'   \item \code{Cover} metric of the number of observation related to this feature (only available for tree models);
+#'   \item \code{Weight} percentage representing the relative number of times a feature have been taken into trees.
 #' }
 #' 
+#' If you don't provide \code{feature_names}, index of the features will be used instead.
+#' 
+#' Because the index is extracted from the model dump (made on the C++ side), it starts at 0 (usual in C++) instead of 1 (usual in R).
+#' 
 #' Co-occurence count
 #' ------------------
 #' 
@@ -51,57 +46,55 @@
 #' @examples
 #' data(agaricus.train, package='xgboost')
 #' 
-#' # Both dataset are list with two items, a sparse matrix and labels 
-#' # (labels = outcome column which will be learned). 
-#' # Each column of the sparse Matrix is a feature in one hot encoding format.
-#' train <- agaricus.train
-#' 
-#' bst <- xgboost(data = train$data, label = train$label, max.depth = 2, 
+#' bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label, max.depth = 2, 
 #'                eta = 1, nthread = 2, nround = 2,objective = "binary:logistic")
 #' 
-#' # train$data@@Dimnames[[2]] represents the column names of the sparse matrix.
-#' xgb.importance(train$data@@Dimnames[[2]], model = bst)
+#' # agaricus.train$data@@Dimnames[[2]] represents the column names of the sparse matrix.
+#' xgb.importance(agaricus.train$data@@Dimnames[[2]], model = bst)
 #' 
 #' # Same thing with co-occurence computation this time
-#' xgb.importance(train$data@@Dimnames[[2]], model = bst, data = train$data, label = train$label)
+#' xgb.importance(agaricus.train$data@@Dimnames[[2]], model = bst, data = agaricus.train$data, label = agaricus.train$label)
 #' 
 #' @export
-xgb.importance <- function(feature_names = NULL, filename_dump = NULL, model = NULL, data = NULL, label = NULL, target = function(x) ((x + label) == 2)){  
-  if (!class(feature_names) %in% c("character", "NULL")) {	   
-    stop("feature_names: Has to be a vector of character or NULL if the model dump already contains feature name. Look at this function documentation to see where to get feature names.")
+xgb.importance <- function(feature_names = NULL, model = NULL, data = NULL, label = NULL, target = function(x) ( (x + label) == 2)){
+  if (!class(feature_names) %in% c("character", "NULL")) {
+    stop("feature_names: Has to be a vector of character or NULL if the model already contains feature name. Look at this function documentation to see where to get feature names.")
  }
-  
-  if (!(class(filename_dump) %in% c("character", "NULL") && length(filename_dump) <= 1)) {
-    stop("filename_dump: Has to be a path to the model dump file.")
-  }
-  
-  if (!class(model) %in% c("xgb.Booster", "NULL")) {
+
+  if (class(model) != "xgb.Booster") {
    stop("model: Has to be an object of class xgb.Booster model generaged by the xgb.train function.")
  }
-  
-  if((is.null(data) & !is.null(label)) |(!is.null(data) & is.null(label))) {
+
+  if((is.null(data) & !is.null(label)) | (!is.null(data) & is.null(label))) {
    stop("data/label: Provide the two arguments if you want co-occurence computation or none of them if you are not interested but not one of them only.")
  }
-  
+
  if(class(label) == "numeric"){
    if(sum(label == 0) / length(label) > 0.5) label <- as(label, "sparseVector")
  }
  
-  if(is.null(model)){
-    text <- readLines(filename_dump)  
-  } else {
-    text <- xgb.dump(model = model, with.stats = T)
-  } 
+  treeDump <- function(feature_names, text, keepDetail){
+    if(keepDetail) groupBy <- c("Feature", "Split", "MissingNo") else groupBy <- "Feature"
+    xgb.model.dt.tree(feature_names = feature_names, text = text)[,"MissingNo" := Missing == No ][Feature != "Leaf",.(Gain = sum(Quality), Cover = sum(Cover), Frequency = .N), by = groupBy, with = T][,`:=`(Gain = Gain / sum(Gain), Cover = Cover / sum(Cover), Frequency = Frequency / sum(Frequency))][order(Gain, decreasing = T)]
+  }
  
-  if(text[2] == "bias:"){
-    result <- readLines(filename_dump) %>% linearDump(feature_names, .)
+  linearDump <- function(feature_names, text){
+    weights <- which(text == "weight:") %>% {a =. + 1; text[a:length(text)]} %>% as.numeric
+    if(is.null(feature_names)) feature_names <- seq(to = length(weights))
+    data.table(Feature = feature_names, Weight = weights)
+  }
+
+  model.text.dump <- xgb.dump(model = model, with.stats = T)
+  
+  if(model.text.dump[2] == "bias:"){
+    result <- model.text.dump %>% linearDump(feature_names, .)
    if(!is.null(data) | !is.null(label)) warning("data/label: these parameters should only be provided with decision tree based models.")
  }  else {
-    result <- treeDump(feature_names, text = text, keepDetail = !is.null(data))
-    
+    result <- treeDump(feature_names, text = model.text.dump, keepDetail = !is.null(data))
+
    # Co-occurence computation
    if(!is.null(data) & !is.null(label) & nrow(result) > 0) {
-      # Take care of missing column 
+      # Take care of missing column
      a <- data[, result[MissingNo == T,Feature], drop=FALSE] != 0
      # Bind the two Matrix and reorder columns
      c <- data[, result[MissingNo == F,Feature], drop=FALSE] %>% cBind(a,.) %>% .[,result[,Feature]]
@@ -109,25 +102,13 @@ xgb.importance <- function(feature_names = NULL, filename_dump = NULL, model = N
      # Apply split
      d <- data[, result[,Feature], drop=FALSE] < as.numeric(result[,Split])
      apply(c & d, 2, . %>% target %>% sum) -> vec
-            
-      result <- result[, "RealCover":= as.numeric(vec), with = F][, "RealCover %" := RealCover / sum(label)][,MissingNo:=NULL]
-    }    
+
+      result <- result[, "RealCover" := as.numeric(vec), with = F][, "RealCover %" := RealCover / sum(label)][,MissingNo := NULL]
+    }
  }
  result
 }

-treeDump <- function(feature_names, text, keepDetail){
-  if(keepDetail) groupBy <- c("Feature", "Split", "MissingNo") else groupBy <- "Feature"
-  
-  result <- xgb.model.dt.tree(feature_names = feature_names, text = text)[,"MissingNo":= Missing == No ][Feature!="Leaf",.(Gain = sum(Quality), Cover = sum(Cover), Frequence = .N), by = groupBy, with = T][,`:=`(Gain = Gain/sum(Gain), Cover = Cover/sum(Cover), Frequence = Frequence/sum(Frequence))][order(Gain, decreasing = T)]
-  
-  result  
-}
-
-linearDump <- function(feature_names, text){
-  which(text == "weight:") %>% {a=.+1;text[a:length(text)]} %>% as.numeric %>% data.table(Feature = feature_names, Weight = .)
-}
-
 # Avoid error messages during CRAN check.
 # The reason is that these variables are never declared
 # They are mainly column names inferred by Data.table...
--- a/R-package/R/xgb.load.R
+++ b/R-package/R/xgb.load.R
@@ -15,11 +15,10 @@
 #' bst <- xgb.load('xgb.model')
 #' pred <- predict(bst, test$data)
 #' @export
-#' 
 xgb.load <- function(modelfile) {
-  if (is.null(modelfile)) 
+  if (is.null(modelfile))
    stop("xgb.load: modelfile cannot be NULL")
-  
+
  handle <- xgb.Booster(modelfile = modelfile)
  # re-use modelfile if it is raw so we donot need to serialize
  if (typeof(modelfile) == "raw") {
@@ -29,4 +28,4 @@ xgb.load <- function(modelfile) {
  }
  bst <- xgb.Booster.check(bst)
  return(bst)
-} 
+}
--- a/R-package/R/xgb.model.dt.tree.R
+++ b/R-package/R/xgb.model.dt.tree.R
@@ -1,6 +1,6 @@
-#' Convert tree model dump to data.table
+#' Parse boosted tree model text dump
 #' 
-#' Read a tree model text dump and return a data.table.
+#' Parse a boosted tree model text dump and return a \code{data.table}.
 #' 
 #' @importFrom data.table data.table
 #' @importFrom data.table set
@@ -12,20 +12,20 @@
 #' @importFrom magrittr add
 #' @importFrom stringr str_extract
 #' @importFrom stringr str_split
-#' @importFrom stringr str_extract
 #' @importFrom stringr str_trim
-#' @param feature_names names of each feature as a character vector. Can be extracted from a sparse matrix (see example). If model dump already contains feature names, this argument should be \code{NULL}.
-#' @param filename_dump the path to the text file storing the model. Model dump must include the gain per feature and per tree (parameter \code{with.stats = T} in function \code{xgb.dump}).
-#' @param model dump generated by the \code{xgb.train} function. Avoid the creation of a dump file.
-#' @param text dump generated by the \code{xgb.dump} function. Avoid the creation of a dump file. Model dump must include the gain per feature and per tree (parameter \code{with.stats = T} in function \code{xgb.dump}).
-#' @param n_first_tree limit the plot to the n first trees. If \code{NULL}, all trees of the model are plotted. Performance can be low for huge models.
+#' @param feature_names names of each feature as a character vector. Can be extracted from a sparse matrix (see example). If the model already contains feature names, this argument should be \code{NULL} (default value).
+#' @param model object created by the \code{xgb.train} function.
+#' @param text \code{character} vector generated by the \code{xgb.dump} function. Model dump must include the gain per feature and per tree (parameter \code{with.stats = TRUE} in function \code{xgb.dump}).
+#' @param n_first_tree limit the plot to the \code{n} first trees. If set to \code{NULL}, all trees of the model are plotted. Performance can be low depending of the size of the model.
 #'
-#' @return A \code{data.table} of the features used in the model with their gain, cover and few other thing.
+#' @return A \code{data.table} of the features used in the model with their gain, cover and few other information.
 #'
 #' @details 
-#' General function to convert a text dump of tree model to a Matrix. The purpose is to help user to explore the model and get a better understanding of it.
+#' General function to convert a text dump of tree model to a \code{data.table}. 
 #' 
-#' The content of the \code{data.table} is organised that way:
+#' The purpose is to help user to explore the model and get a better understanding of it.
+#' 
+#' The columns of the \code{data.table} are:
 #' 
 #' \itemize{
 #' \item \code{ID}: unique identifier of a node ;
@@ -37,89 +37,73 @@
 #'  \item \code{Quality}: it's the gain related to the split in this specific node ;
 #'  \item \code{Cover}: metric to measure the number of observation affected by the split ;
 #'  \item \code{Tree}: ID of the tree. It is included in the main ID ;
-#'  \item \code{Yes.X} or \code{No.X}: data related to the pointer in \code{Yes} or \code{No} column ;
+#'  \item \code{Yes.Feature}, \code{No.Feature}, \code{Yes.Cover}, \code{No.Cover}, \code{Yes.Quality} and \code{No.Quality}: data related to the pointer in \code{Yes} or \code{No} column ;
 #' } 
 #'   
 #' @examples
 #' data(agaricus.train, package='xgboost')
 #' 
-#' #Both dataset are list with two items, a sparse matrix and labels 
-#' #(labels = outcome column which will be learned). 
-#' #Each column of the sparse Matrix is a feature in one hot encoding format.
-#' train <- agaricus.train
-#' 
-#' bst <- xgboost(data = train$data, label = train$label, max.depth = 2, 
+#' bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label, max.depth = 2, 
 #'                eta = 1, nthread = 2, nround = 2,objective = "binary:logistic")
 #' 
-#' #agaricus.test$data@@Dimnames[[2]] represents the column names of the sparse matrix.
-#' xgb.model.dt.tree(agaricus.train$data@@Dimnames[[2]], model = bst)
+#' # agaricus.train$data@@Dimnames[[2]] represents the column names of the sparse matrix.
+#' xgb.model.dt.tree(feature_names = agaricus.train$data@@Dimnames[[2]], model = bst)
 #' 
 #' @export
-xgb.model.dt.tree <- function(feature_names = NULL, filename_dump = NULL, model = NULL, text = NULL, n_first_tree = NULL){
-  
-  if (!class(feature_names) %in% c("character", "NULL")) {     
+xgb.model.dt.tree <- function(feature_names = NULL, model = NULL, text = NULL, n_first_tree = NULL){
+
+  if (!class(feature_names) %in% c("character", "NULL")) {
    stop("feature_names: Has to be a vector of character or NULL if the model dump already contains feature name. Look at this function documentation to see where to get feature names.")
  }
-  if (!(class(filename_dump) %in% c("character", "NULL") && length(filename_dump) <= 1)) {
-    stop("filename_dump: Has to be a character vector of size 1 representing the path to the model dump file.")
-  } else if (!is.null(filename_dump) && !file.exists(filename_dump)) {
-    stop("filename_dump: path to the model doesn't exist.")
-  } else if(is.null(filename_dump) && is.null(model) && is.null(text)){
-    stop("filename_dump & model & text: no path to dump model, no model, no text dump, have been provided.")
+
+  if (class(model) != "xgb.Booster" & class(text) != "character") {
+    "model: Has to be an object of class xgb.Booster model generaged by the xgb.train function.\n" %>%
+      paste0("text: Has to be a vector of character or NULL if a path to the model dump has already been provided.") %>%
+      stop()
  }
-  
-  if (!class(model) %in% c("xgb.Booster", "NULL")) {
-    stop("model: Has to be an object of class xgb.Booster model generaged by the xgb.train function.")
-  }
-  
-  if (!class(text) %in% c("character", "NULL")) { 
-    stop("text: Has to be a vector of character or NULL if a path to the model dump has already been provided.")
-  }
-  
+
  if (!class(n_first_tree) %in% c("numeric", "NULL") | length(n_first_tree) > 1) {
    stop("n_first_tree: Has to be a numeric vector of size 1.")
  }
-  
-  if(!is.null(model)){
-    text = xgb.dump(model = model, with.stats = T)
-  } else if(!is.null(filename_dump)){
-    text <- readLines(filename_dump) %>% str_trim(side = "both")  
+
+  if(is.null(text)){		
+    text <- xgb.dump(model = model, with.stats = T)
  }
  
-  position <- str_match(text, "booster") %>% is.na %>% not %>% which %>% c(length(text)+1)
-  
+  position <- str_match(text, "booster") %>% is.na %>% not %>% which %>% c(length(text) + 1)
+
  extract <- function(x, pattern)  str_extract(x, pattern) %>% str_split("=") %>% lapply(function(x) x[2] %>% as.numeric) %>% unlist
-  
+
  n_round <- min(length(position) - 1, n_first_tree)
-  
+
  addTreeId <- function(x, i) paste(i,x,sep = "-")
-  
+
  allTrees <- data.table()
- 
-  anynumber_regex<-"[-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?" 
-  for(i in 1:n_round){
-    
-    tree <- text[(position[i]+1):(position[i+1]-1)]
-    
+
+  anynumber_regex <- "[-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?"
+  for (i in 1:n_round){
+
+    tree <- text[(position[i] + 1):(position[i + 1] - 1)]
+
    # avoid tree made of a leaf only (no split)
-    if(length(tree) <2) next
-    
-    treeID <- i-1
-    
+    if(length(tree) < 2) next
+
+    treeID <- i - 1
+
    notLeaf <- str_match(tree, "leaf") %>% is.na
    leaf <- notLeaf %>% not %>% tree[.]
    branch <- notLeaf %>% tree[.]
    idBranch <- str_extract(branch, "\\d*:") %>% str_replace(":", "") %>% addTreeId(treeID)
    idLeaf <- str_extract(leaf, "\\d*:") %>% str_replace(":", "") %>% addTreeId(treeID)
-    featureBranch <- str_extract(branch, "f\\d*<") %>% str_replace("<", "") %>% str_replace("f", "") %>% as.numeric 
+    featureBranch <- str_extract(branch, "f\\d*<") %>% str_replace("<", "") %>% str_replace("f", "") %>% as.numeric
    if(!is.null(feature_names)){
      featureBranch <- feature_names[featureBranch + 1]
    }
    featureLeaf <- rep("Leaf", length(leaf))
-    splitBranch <- str_extract(branch, paste0("<",anynumber_regex,"\\]")) %>% str_replace("<", "") %>% str_replace("\\]", "") 
-    splitLeaf <- rep(NA, length(leaf)) 
+    splitBranch <- str_extract(branch, paste0("<",anynumber_regex,"\\]")) %>% str_replace("<", "") %>% str_replace("\\]", "")
+    splitLeaf <- rep(NA, length(leaf))
    yesBranch <- extract(branch, "yes=\\d*") %>% addTreeId(treeID)
-    yesLeaf <- rep(NA, length(leaf)) 
+    yesLeaf <- rep(NA, length(leaf))
    noBranch <- extract(branch, "no=\\d*") %>% addTreeId(treeID)
    noLeaf <- rep(NA, length(leaf))
    missingBranch <- extract(branch, "missing=\\d+") %>% addTreeId(treeID)
@@ -128,42 +112,42 @@ xgb.model.dt.tree <- function(feature_names = NULL, filename_dump = NULL, model
    qualityLeaf <- extract(leaf, paste0("leaf=",anynumber_regex))
    coverBranch <- extract(branch, "cover=\\d*\\.*\\d*")
    coverLeaf <- extract(leaf, "cover=\\d*\\.*\\d*")
-    dt <- data.table(ID = c(idBranch, idLeaf), Feature = c(featureBranch, featureLeaf), Split = c(splitBranch, splitLeaf), Yes = c(yesBranch, yesLeaf), No = c(noBranch, noLeaf), Missing = c(missingBranch, missingLeaf), Quality = c(qualityBranch, qualityLeaf), Cover = c(coverBranch, coverLeaf))[order(ID)][,Tree:=treeID]
-    
+    dt <- data.table(ID = c(idBranch, idLeaf), Feature = c(featureBranch, featureLeaf), Split = c(splitBranch, splitLeaf), Yes = c(yesBranch, yesLeaf), No = c(noBranch, noLeaf), Missing = c(missingBranch, missingLeaf), Quality = c(qualityBranch, qualityLeaf), Cover = c(coverBranch, coverLeaf))[order(ID)][,Tree := treeID]
+
    allTrees <- rbindlist(list(allTrees, dt), use.names = T, fill = F)
  }
-  
+
  yes <- allTrees[!is.na(Yes), Yes]
-  
-  set(allTrees, i = which(allTrees[, Feature] != "Leaf"), 
-      j = "Yes.Feature", 
+
+  set(allTrees, i = which(allTrees[, Feature] != "Leaf"),
+      j = "Yes.Feature",
      value = allTrees[ID %in% yes, Feature])
-  
+
  set(allTrees, i = which(allTrees[, Feature] != "Leaf"),
-      j = "Yes.Cover", 
+      j = "Yes.Cover",
      value = allTrees[ID %in% yes, Cover])
-  
+
  set(allTrees, i = which(allTrees[, Feature] != "Leaf"),
-      j = "Yes.Quality", 
+      j = "Yes.Quality",
      value = allTrees[ID %in% yes, Quality])
  no <- allTrees[!is.na(No), No]
-  
+
  set(allTrees, i = which(allTrees[, Feature] != "Leaf"),
-      j = "No.Feature", 
+      j = "No.Feature",
      value = allTrees[ID %in% no, Feature])
-  
+
  set(allTrees, i = which(allTrees[, Feature] != "Leaf"),
-      j = "No.Cover", 
+      j = "No.Cover",
      value = allTrees[ID %in% no, Cover])
-  
-  set(allTrees, i = which(allTrees[, Feature] != "Leaf"), 
-      j = "No.Quality", 
+
+  set(allTrees, i = which(allTrees[, Feature] != "Leaf"),
+      j = "No.Quality",
      value = allTrees[ID %in% no, Quality])
-  
+
  allTrees
 }

 # Avoid error messages during CRAN check.
 # The reason is that these variables are never declared
 # They are mainly column names inferred by Data.table...
-globalVariables(c("ID", "Tree", "Yes", ".", ".N", "Feature", "Cover", "Quality", "No", "Gain", "Frequence"))
+globalVariables(c("ID", "Tree", "Yes", ".", ".N", "Feature", "Cover", "Quality", "No", "Gain", "Frequency"))
--- a/R-package/R/xgb.plot.deepness.R
+++ b/R-package/R/xgb.plot.deepness.R
@@ -0,0 +1,160 @@
+#' Plot multiple graphs at the same time
+#' 
+#' Plot multiple graph aligned by rows and columns.
+#' 
+#' @importFrom data.table data.table
+#' @param cols number of columns
+#' @return NULL
+multiplot <- function(..., cols = 1) {
+  plots <- list(...)
+  numPlots = length(plots)
+  
+  layout <- matrix(seq(1, cols * ceiling(numPlots / cols)),
+                   ncol = cols, nrow = ceiling(numPlots / cols))
+  
+  if (numPlots == 1) {
+    print(plots[[1]])
+  } else {
+    grid::grid.newpage()
+    grid::pushViewport(grid::viewport(layout = grid::grid.layout(nrow(layout), ncol(layout))))
+    for (i in 1:numPlots) {
+      # Get the i,j matrix positions of the regions that contain this subplot
+      matchidx <- as.data.table(which(layout == i, arr.ind = TRUE))
+      
+      print(
+        plots[[i]], vp = grid::viewport(
+          layout.pos.row = matchidx$row,
+          layout.pos.col = matchidx$col
+        )
+      )
+    }
+  }
+}
+
+#' Parse the graph to extract vector of edges
+#' @param element igraph object containing the path from the root to the leaf.
+edge.parser <- function(element) {
+  edges.vector <- igraph::as_ids(element)
+  t <- tail(edges.vector, n = 1)
+  l <- length(edges.vector)
+  list(t,l)
+}
+
+#' Extract path from root to leaf from data.table
+#' @param dt.tree data.table containing the nodes and edges of the trees
+get.paths.to.leaf <- function(dt.tree) {
+  dt.not.leaf.edges <-
+    dt.tree[Feature != "Leaf",.(ID, Yes, Tree)] %>% list(dt.tree[Feature != "Leaf",.(ID, No, Tree)]) %>% rbindlist(use.names = F)
+  
+  trees <- dt.tree[,unique(Tree)]
+  
+  paths <- list()
+  for (tree in trees) {
+    graph <-
+      igraph::graph_from_data_frame(dt.not.leaf.edges[Tree == tree])
+    paths.tmp <-
+      igraph::shortest_paths(graph, from = paste0(tree, "-0"), to = dt.tree[Tree == tree &
+                                                                              Feature == "Leaf", c(ID)])
+    paths <- c(paths, paths.tmp$vpath)
+  }
+  paths
+}
+
+#' Plot model trees deepness
+#'
+#' Generate a graph to plot the distribution of deepness among trees.
+#'
+#' @importFrom data.table data.table
+#' @importFrom data.table rbindlist
+#' @importFrom data.table setnames
+#' @importFrom data.table :=
+#' @importFrom magrittr %>%
+#' @param model dump generated by the \code{xgb.train} function.
+#'
+#' @return Two graphs showing the distribution of the model deepness.
+#'
+#' @details
+#' Display both the number of \code{leaf} and the distribution of \code{weighted observations}
+#' by tree deepness level.
+#' 
+#' The purpose of this function is to help the user to find the best trade-off to set
+#' the \code{max.depth} and \code{min_child_weight} parameters according to the bias / variance trade-off.
+#' 
+#' See \link{xgb.train} for more information about these parameters.
+#'
+#' The graph is made of two parts:
+#'
+#' \itemize{
+#'  \item Count: number of leaf per level of deepness;
+#'  \item Weighted cover: noramlized weighted cover per leaf (weighted number of instances).
+#' }
+#'
+#' This function is inspired by the blog post \url{http://aysent.github.io/2015/11/08/random-forest-leaf-visualization.html}
+#'
+#' @examples
+#' data(agaricus.train, package='xgboost')
+#'
+#' bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label, max.depth = 15,
+#'                  eta = 1, nthread = 2, nround = 30, objective = "binary:logistic",
+#'                  min_child_weight = 50)
+#'
+#' xgb.plot.deepness(model = bst)
+#'
+#' @export
+xgb.plot.deepness <- function(model = NULL) {
+  if (!requireNamespace("ggplot2", quietly = TRUE)) {
+    stop("ggplot2 package is required for plotting the graph deepness.",
+         call. = FALSE)
+  }
+  
+  if (!requireNamespace("igraph", quietly = TRUE)) {
+    stop("igraph package is required for plotting the graph deepness.",
+         call. = FALSE)
+  }
+  
+  if (!requireNamespace("grid", quietly = TRUE)) {
+    stop("grid package is required for plotting the graph deepness.",
+         call. = FALSE)
+  }
+  
+  if (class(model) != "xgb.Booster") {
+    stop("model: Has to be an object of class xgb.Booster model generaged by the xgb.train function.")
+  }
+  
+  dt.tree <- xgb.model.dt.tree(model = model)
+  
+  dt.edge.elements <- data.table()
+  paths <- get.paths.to.leaf(dt.tree)
+  
+  dt.edge.elements <-
+    lapply(paths, edge.parser) %>% rbindlist %>% setnames(c("last.edge", "size")) %>%
+    merge(dt.tree, by.x = "last.edge", by.y = "ID") %>% rbind(dt.edge.elements)
+  
+  dt.edge.summuize <-
+    dt.edge.elements[, .(.N, Cover = sum(Cover)), size][,Cover:= Cover / sum(Cover)]
+  
+  p1 <-
+    ggplot2::ggplot(dt.edge.summuize) + ggplot2::geom_line(ggplot2::aes(x = size, y = N, group = 1)) +
+    ggplot2::xlab("") + ggplot2::ylab("Count") + ggplot2::ggtitle("Model complexity") +
+    ggplot2::theme(
+      plot.title = ggplot2::element_text(lineheight = 0.9, face = "bold"),
+      panel.grid.major.y = ggplot2::element_blank(),
+      axis.ticks = ggplot2::element_blank(),
+      axis.text.x = ggplot2::element_blank()
+    )
+  
+  p2 <- 
+    ggplot2::ggplot(dt.edge.summuize) + ggplot2::geom_line(ggplot2::aes(x =size, y = Cover, group = 1)) + 
+    ggplot2::xlab("From root to leaf path length") + ggplot2::ylab("Weighted cover")
+  
+  multiplot(p1,p2,cols = 1)
+}
+
+# Avoid error messages during CRAN check.
+# The reason is that these variables are never declared
+# They are mainly column names inferred by Data.table...
+globalVariables(
+  c(
+    "Feature", "Count", "ggplot", "aes", "geom_bar", "xlab", "ylab", "ggtitle", "theme", "element_blank", "element_text", "ID", "Yes", "No", "Tree"
+  )
+)
--- a/R-package/R/xgb.plot.importance.R
+++ b/R-package/R/xgb.plot.importance.R
@@ -1,57 +1,79 @@
 #' Plot feature importance bar graph
-#' 
-#' Read a data.table containing feature importance details and plot it.
-#' 
+#'
+#' Read a data.table containing feature importance details and plot it (for both GLM and Trees).
+#'
 #' @importFrom magrittr %>%
 #' @param importance_matrix a \code{data.table} returned by the \code{xgb.importance} function.
 #' @param numberOfClusters a \code{numeric} vector containing the min and the max range of the possible number of clusters of bars.
 #'
 #' @return A \code{ggplot2} bar graph representing each feature by a horizontal bar. Longer is the bar, more important is the feature. Features are classified by importance and clustered by importance. The group is represented through the color of the bar.
 #'
-#' @details 
+#' @details
 #' The purpose of this function is to easily represent the importance of each feature of a model.
-#' The function return a ggplot graph, therefore each of its characteristic can be overriden (to customize it).
-#' In particular you may want to override the title of the graph. To do so, add \code{+ ggtitle("A GRAPH NAME")} next to the value returned by this function. 
-#'   
+#' The function returns a ggplot graph, therefore each of its characteristic can be overriden (to customize it).
+#' In particular you may want to override the title of the graph. To do so, add \code{+ ggtitle("A GRAPH NAME")} next to the value returned by this function.
+#'
 #' @examples
 #' data(agaricus.train, package='xgboost')
-#' 
-#' #Both dataset are list with two items, a sparse matrix and labels 
-#' #(labels = outcome column which will be learned). 
+#'
+#' #Both dataset are list with two items, a sparse matrix and labels
+#' #(labels = outcome column which will be learned).
 #' #Each column of the sparse Matrix is a feature in one hot encoding format.
-#' train <- agaricus.train
-#' 
-#' bst <- xgboost(data = train$data, label = train$label, max.depth = 2, 
+#'
+#' bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label, max.depth = 2,
 #'                eta = 1, nthread = 2, nround = 2,objective = "binary:logistic")
-#' 
-#' #train$data@@Dimnames[[2]] represents the column names of the sparse matrix.
-#' importance_matrix <- xgb.importance(train$data@@Dimnames[[2]], model = bst)
+#'
+#' #agaricus.train$data@@Dimnames[[2]] represents the column names of the sparse matrix.
+#' importance_matrix <- xgb.importance(agaricus.train$data@@Dimnames[[2]], model = bst)
 #' xgb.plot.importance(importance_matrix)
-#' 
+#'
 #' @export
-xgb.plot.importance <- function(importance_matrix = NULL, numberOfClusters = c(1:10)){
-  if (!"data.table" %in% class(importance_matrix))  {     
-    stop("importance_matrix: Should be a data.table.")
-  }
-  if (!requireNamespace("ggplot2", quietly = TRUE)) {
-    stop("ggplot2 package is required for plotting the importance", call. = FALSE)
-  }
-  if (!requireNamespace("Ckmeans.1d.dp", quietly = TRUE)) {
-    stop("Ckmeans.1d.dp package is required for plotting the importance", call. = FALSE)
-  }
-
-  # To avoid issues in clustering when co-occurences are used
-  importance_matrix <- importance_matrix[, .(Gain = sum(Gain)), by = Feature]
-  
-  clusters <- suppressWarnings(Ckmeans.1d.dp::Ckmeans.1d.dp(importance_matrix[,Gain], numberOfClusters))
-  importance_matrix[,"Cluster":=clusters$cluster %>% as.character]
+xgb.plot.importance <-
+  function(importance_matrix = NULL, numberOfClusters = c(1:10)) {
+    if (!"data.table" %in% class(importance_matrix))  {
+      stop("importance_matrix: Should be a data.table.")
+    }
+    if (!requireNamespace("ggplot2", quietly = TRUE)) {
+      stop("ggplot2 package is required for plotting the importance", call. = FALSE)
+    }
+    if (!requireNamespace("Ckmeans.1d.dp", quietly = TRUE)) {
+      stop("Ckmeans.1d.dp package is required for plotting the importance", call. = FALSE)
+    }
    
-  plot <- ggplot2::ggplot(importance_matrix, ggplot2::aes(x=stats::reorder(Feature, Gain), y = Gain, width= 0.05), environment = environment())+  ggplot2::geom_bar(ggplot2::aes(fill=Cluster), stat="identity", position="identity") + ggplot2::coord_flip() + ggplot2::xlab("Features") + ggplot2::ylab("Gain") + ggplot2::ggtitle("Feature importance") + ggplot2::theme(plot.title = ggplot2::element_text(lineheight=.9, face="bold"), panel.grid.major.y = ggplot2::element_blank() )
-  
-  return(plot)  
-}
+    if(isTRUE(all.equal(colnames(importance_matrix), c("Feature", "Gain", "Cover", "Frequency")))){
+      y.axe.name <- "Gain"
+    } else if(isTRUE(all.equal(colnames(importance_matrix), c("Feature", "Weight")))){
+      y.axe.name <- "Weight"
+    } else {
+      stop("Importance matrix is not correct (column names issue)")
+    }
+    
+    # To avoid issues in clustering when co-occurences are used
+    importance_matrix <-
+      importance_matrix[, .(Gain.or.Weight = sum(get(y.axe.name))), by = Feature]
+    
+    clusters <-
+      suppressWarnings(Ckmeans.1d.dp::Ckmeans.1d.dp(importance_matrix[,Gain.or.Weight], numberOfClusters))
+    importance_matrix[,"Cluster":= clusters$cluster %>% as.character]
+    
+    plot <-
+      ggplot2::ggplot(
+        importance_matrix, ggplot2::aes(
+          x = stats::reorder(Feature, Gain.or.Weight), y = Gain.or.Weight, width = 0.05
+        ), environment = environment()
+      ) + ggplot2::geom_bar(ggplot2::aes(fill = Cluster), stat = "identity", position =
+                              "identity") + ggplot2::coord_flip() + ggplot2::xlab("Features") + ggplot2::ylab(y.axe.name) + ggplot2::ggtitle("Feature importance") + ggplot2::theme(
+                                plot.title = ggplot2::element_text(lineheight = .9, face = "bold"), panel.grid.major.y = ggplot2::element_blank()
+                              )
+    
+    return(plot)
+  }

 # Avoid error messages during CRAN check.
 # The reason is that these variables are never declared
 # They are mainly column names inferred by Data.table...
-globalVariables(c("Feature", "Gain", "Cluster", "ggplot", "aes", "geom_bar", "coord_flip", "xlab", "ylab", "ggtitle", "theme", "element_blank", "element_text"))
+globalVariables(
+  c(
+    "Feature", "Gain.or.Weight", "Cluster", "ggplot", "aes", "geom_bar", "coord_flip", "xlab", "ylab", "ggtitle", "theme", "element_blank", "element_text", "Gain.or.Weight"
+  )
+)
--- a/R-package/R/xgb.plot.multi.trees.R
+++ b/R-package/R/xgb.plot.multi.trees.R
@@ -0,0 +1,114 @@
+#' Project all trees on one tree and plot it
+#' 
+#' Visualization of the ensemble of trees as a single collective unit.
+#'
+#' @importFrom data.table data.table
+#' @importFrom data.table rbindlist
+#' @importFrom data.table setnames
+#' @importFrom data.table :=
+#' @importFrom magrittr %>%
+#' @importFrom stringr str_detect
+#' @importFrom stringr str_extract
+#' 
+#' @param model dump generated by the \code{xgb.train} function.
+#' @param feature_names names of each feature as a \code{character} vector. Can be extracted from a sparse matrix (see example). If model dump already contains feature names, this argument should be \code{NULL}.
+#' @param features.keep number of features to keep in each position of the multi trees.
+#' @param plot.width width in pixels of the graph to produce
+#' @param plot.height height in pixels of the graph to produce
+#' 
+#' @return Two graphs showing the distribution of the model deepness.
+#' 
+#' @details
+#' 
+#' This function tries to capture the complexity of gradient boosted tree ensemble 
+#' in a cohesive way. 
+#' 
+#' The goal is to improve the interpretability of the model generally seen as black box.
+#' The function is dedicated to boosting applied to decision trees only.
+#' 
+#' The purpose is to move from an ensemble of trees to a single tree only.
+#' 
+#' It takes advantage of the fact that the shape of a binary tree is only defined by 
+#' its deepness (therefore in a boosting model, all trees have the same shape). 
+#' 
+#' Moreover, the trees tend to reuse the same features.
+#' 
+#' The function will project each tree on one, and keep for each position the 
+#' \code{features.keep} first features (based on Gain per feature measure).
+#' 
+#' This function is inspired by this blog post:
+#' \url{https://wellecks.wordpress.com/2015/02/21/peering-into-the-black-box-visualizing-lambdamart/}
+#'
+#' @examples
+#' data(agaricus.train, package='xgboost')
+#'
+#' bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label, max.depth = 15,
+#'                  eta = 1, nthread = 2, nround = 30, objective = "binary:logistic",
+#'                  min_child_weight = 50)
+#'
+#' p <- xgb.plot.multi.trees(model = bst, feature_names = agaricus.train$data@Dimnames[[2]], features.keep = 3)
+#' print(p)
+#'
+#' @export
+xgb.plot.multi.trees <- function(model, feature_names = NULL, features.keep = 5, plot.width = NULL, plot.height = NULL){
+  tree.matrix <- xgb.model.dt.tree(feature_names = feature_names, model = model)
+  
+  # first number of the path represents the tree, then the following numbers are related to the path to follow
+  # root init
+  root.nodes <- tree.matrix[str_detect(ID, "\\d+-0"), ID]
+  tree.matrix[ID %in% root.nodes, abs.node.position:=root.nodes]
+  
+  precedent.nodes <- root.nodes
+  
+  while(tree.matrix[,sum(is.na(abs.node.position))] > 0) {
+    yes.row.nodes <- tree.matrix[abs.node.position %in% precedent.nodes & !is.na(Yes)]
+    no.row.nodes <- tree.matrix[abs.node.position %in% precedent.nodes & !is.na(No)]
+    yes.nodes.abs.pos <- yes.row.nodes[, abs.node.position] %>% paste0("_0")
+    no.nodes.abs.pos <- no.row.nodes[, abs.node.position] %>% paste0("_1")
+    
+    tree.matrix[ID %in% yes.row.nodes[, Yes], abs.node.position := yes.nodes.abs.pos]
+    tree.matrix[ID %in% no.row.nodes[, No], abs.node.position := no.nodes.abs.pos]
+    precedent.nodes <- c(yes.nodes.abs.pos, no.nodes.abs.pos)
+  }
+  
+  tree.matrix[!is.na(Yes),Yes:= paste0(abs.node.position, "_0")]
+  tree.matrix[!is.na(No),No:= paste0(abs.node.position, "_1")]
+  
+  
+  
+  remove.tree <- . %>% str_replace(pattern = "^\\d+-", replacement = "")
+  
+  tree.matrix[,`:=`(abs.node.position=remove.tree(abs.node.position), Yes=remove.tree(Yes), No=remove.tree(No))]
+  
+  nodes.dt <- tree.matrix[,.(Quality = sum(Quality)),by = .(abs.node.position, Feature)][,.(Text =paste0(Feature[1:min(length(Feature), features.keep)], " (", Quality[1:min(length(Quality), features.keep)], ")") %>% paste0(collapse = "\n")), by=abs.node.position]
+  edges.dt <- tree.matrix[Feature != "Leaf",.(abs.node.position, Yes)] %>% list(tree.matrix[Feature != "Leaf",.(abs.node.position, No)]) %>% rbindlist() %>% setnames(c("From", "To")) %>% .[,.N,.(From, To)] %>% .[,N:=NULL]
+  
+  nodes <- DiagrammeR::create_nodes(nodes = nodes.dt[,abs.node.position],
+                                    label = nodes.dt[,Text],
+                                    style = "filled",
+                                    color = "DimGray",
+                                    fillcolor= "Beige",
+                                    shape = "oval",
+                                    fontname = "Helvetica"
+  )
+  
+  edges <- DiagrammeR::create_edges(from = edges.dt[,From],
+                                    to = edges.dt[,To],
+                                    color = "DimGray", 
+                                    arrowsize = "1.5", 
+                                    arrowhead = "vee",
+                                    fontname = "Helvetica",
+                                    rel = "leading_to")
+  
+  graph <- DiagrammeR::create_graph(nodes_df = nodes,
+                                    edges_df = edges,
+                                    graph_attrs = "rankdir = LR")
+  
+  DiagrammeR::render_graph(graph, width = plot.width, height = plot.height)  
+}
+
+globalVariables(
+  c(
+    "Feature", "no.nodes.abs.pos", "ID", "Yes", "No", "Tree", "yes.nodes.abs.pos", "abs.node.position"
+  )
+)
--- a/R-package/R/xgb.plot.tree.R
+++ b/R-package/R/xgb.plot.tree.R
@@ -1,27 +1,15 @@
 #' Plot a boosted tree model
 #' 
-#' Read a tree model text dump. 
-#' Plotting only works for boosted tree model (not linear model).
+#' Read a tree model text dump and plot the model. 
 #' 
 #' @importFrom data.table data.table
-#' @importFrom data.table set
-#' @importFrom data.table rbindlist
 #' @importFrom data.table :=
-#' @importFrom data.table copy
 #' @importFrom magrittr %>%
-#' @importFrom magrittr not
-#' @importFrom magrittr add
-#' @importFrom stringr str_extract
-#' @importFrom stringr str_split
-#' @importFrom stringr str_extract
-#' @importFrom stringr str_trim
-#' @param feature_names names of each feature as a character vector. Can be extracted from a sparse matrix (see example). If model dump already contains feature names, this argument should be \code{NULL}.
-#' @param filename_dump the path to the text file storing the model. Model dump must include the gain per feature and per tree (parameter \code{with.stats = T} in function \code{xgb.dump}). Possible to provide a model directly (see \code{model} argument).
+#' @param feature_names names of each feature as a \code{character} vector. Can be extracted from a sparse matrix (see example). If model dump already contains feature names, this argument should be \code{NULL}.
 #' @param model generated by the \code{xgb.train} function. Avoid the creation of a dump file.
 #' @param n_first_tree limit the plot to the n first trees. If \code{NULL}, all trees of the model are plotted. Performance can be low for huge models.
-#' @param CSSstyle a \code{character} vector storing a css style to customize the appearance of nodes. Look at the \href{https://github.com/knsv/mermaid/wiki}{Mermaid wiki} for more information.
-#' @param  width  the width of the diagram in pixels.
-#' @param height	the height of the diagram in pixels.
+#' @param plot.width  the width of the diagram in pixels.
+#' @param plot.height	the height of the diagram in pixels.
 #'
 #' @return A \code{DiagrammeR} of the model.
 #'
@@ -30,68 +18,67 @@
 #' The content of each node is organised that way:
 #' 
 #' \itemize{
-#'  \item \code{feature} value ;
-#'  \item \code{cover}: the sum of second order gradient of training data classified to the leaf, if it is square loss, this simply corresponds to the number of instances in that branch. Deeper in the tree a node is, lower this metric will be ;
+#'  \item \code{feature} value;
+#'  \item \code{cover}: the sum of second order gradient of training data classified to the leaf, if it is square loss, this simply corresponds to the number of instances in that branch. Deeper in the tree a node is, lower this metric will be;
 #'  \item \code{gain}: metric the importance of the node in the model.
 #' } 
 #' 
-#' Each branch finishes with a leaf. For each leaf, only the \code{cover} is indicated.
-#' It uses \href{https://github.com/knsv/mermaid/}{Mermaid} library for that purpose.
+#' The function uses \href{http://www.graphviz.org/}{GraphViz} library for that purpose.
 #'  
 #' @examples
 #' data(agaricus.train, package='xgboost')
 #' 
-#' #Both dataset are list with two items, a sparse matrix and labels 
-#' #(labels = outcome column which will be learned). 
-#' #Each column of the sparse Matrix is a feature in one hot encoding format.
-#' train <- agaricus.train
-#' 
-#' bst <- xgboost(data = train$data, label = train$label, max.depth = 2, 
+#' bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label, max.depth = 2, 
 #'                eta = 1, nthread = 2, nround = 2,objective = "binary:logistic")
 #' 
-#' #agaricus.test$data@@Dimnames[[2]] represents the column names of the sparse matrix.
-#' xgb.plot.tree(agaricus.train$data@@Dimnames[[2]], model = bst)
+#' # agaricus.train$data@@Dimnames[[2]] represents the column names of the sparse matrix.
+#' xgb.plot.tree(feature_names = agaricus.train$data@@Dimnames[[2]], model = bst)
 #' 
 #' @export
-#' 
-xgb.plot.tree <- function(feature_names = NULL, filename_dump = NULL, model = NULL, n_first_tree = NULL, CSSstyle = NULL, width = NULL, height = NULL){  
-  
-  if (!(class(CSSstyle) %in% c("character", "NULL") && length(CSSstyle) <= 1)) {
-    stop("style: Has to be a character vector of size 1.")
-  }
-  
-  if (!class(model) %in% c("xgb.Booster", "NULL")) {
+xgb.plot.tree <- function(feature_names = NULL, model = NULL, n_first_tree = NULL, plot.width = NULL, plot.height = NULL){
+
+  if (class(model) != "xgb.Booster") {
    stop("model: Has to be an object of class xgb.Booster model generaged by the xgb.train function.")
  }
-  
+
  if (!requireNamespace("DiagrammeR", quietly = TRUE)) {
    stop("DiagrammeR package is required for xgb.plot.tree", call. = FALSE)
  }
  
-  if(is.null(model)){
-    allTrees <- xgb.model.dt.tree(feature_names = feature_names, filename_dump = filename_dump, n_first_tree = n_first_tree)  
-  } else {
-    allTrees <- xgb.model.dt.tree(feature_names = feature_names, model = model, n_first_tree = n_first_tree)  
-  }
+  allTrees <- xgb.model.dt.tree(feature_names = feature_names, model = model, n_first_tree = n_first_tree)
  
-  allTrees[Feature!="Leaf" ,yesPath:= paste(ID,"(", Feature, "<br/>Cover: ", Cover, "<br/>Gain: ", Quality, ")-->|< ", Split, "|", Yes, ">", Yes.Feature, "]", sep = "")]
+  allTrees[, label:= paste0(Feature, "\nCover: ", Cover, "\nGain: ", Quality)]
+  allTrees[, shape:= "rectangle"][Feature == "Leaf", shape:= "oval"]
+  allTrees[, filledcolor:= "Beige"][Feature == "Leaf", filledcolor:= "Khaki"]
  
-  allTrees[Feature!="Leaf" ,noPath:= paste(ID,"(", Feature, ")-->|>= ", Split, "|", No, ">", No.Feature, "]", sep = "")]
+  # rev is used to put the first tree on top.
+  nodes <- DiagrammeR::create_nodes(nodes = allTrees[,ID] %>% rev,
+                 label = allTrees[,label] %>% rev,
+                 style = "filled",
+                 color = "DimGray",
+                 fillcolor= allTrees[,filledcolor] %>% rev,
+                 shape = allTrees[,shape] %>% rev,
+                 data = allTrees[,Feature] %>% rev,
+                 fontname = "Helvetica"
+                 )
  
+  edges <- DiagrammeR::create_edges(from = allTrees[Feature != "Leaf", c(ID)] %>% rep(2),
+                        to = allTrees[Feature != "Leaf", c(Yes, No)],
+                        label = allTrees[Feature != "Leaf", paste("<",Split)] %>% c(rep("",nrow(allTrees[Feature != "Leaf"]))),
+                        color = "DimGray", 
+                        arrowsize = "1.5", 
+                        arrowhead = "vee",
+                        fontname = "Helvetica",
+                        rel = "leading_to")
+
+  graph <- DiagrammeR::create_graph(nodes_df = nodes,
+                        edges_df = edges,
+                        graph_attrs = "rankdir = LR")
  
-  if(is.null(CSSstyle)){
-    CSSstyle <- "classDef greenNode fill:#A2EB86, stroke:#04C4AB, stroke-width:2px;classDef redNode fill:#FFA070, stroke:#FF5E5E, stroke-width:2px"  
-  }  
-  
-  yes <- allTrees[Feature!="Leaf", c(Yes)] %>% paste(collapse = ",") %>% paste("class ", ., " greenNode", sep = "")
-  
-  no <- allTrees[Feature!="Leaf", c(No)] %>% paste(collapse = ",") %>% paste("class ", ., " redNode", sep = "")
-  
-  path <- allTrees[Feature!="Leaf", c(yesPath, noPath)] %>% .[order(.)] %>% paste(sep = "", collapse = ";") %>% paste("graph LR", .,collapse = "", sep = ";") %>% paste(CSSstyle, yes, no, sep = ";")
-  DiagrammeR::mermaid(path, width, height)
+  DiagrammeR::render_graph(graph, width = plot.width, height = plot.height)
 }

 # Avoid error messages during CRAN check.
 # The reason is that these variables are never declared
 # They are mainly column names inferred by Data.table...
-globalVariables(c("Feature", "yesPath", "ID", "Cover", "Quality", "Split", "Yes", "Yes.Feature", "noPath", "No", "No.Feature", "."))
+globalVariables(c("Feature", "ID", "Cover", "Quality", "Split", "Yes", "No", ".", "shape", "filledcolor", "label"))
--- a/R-package/R/xgb.save.R
+++ b/R-package/R/xgb.save.R
@@ -16,7 +16,6 @@
 #' bst <- xgb.load('xgb.model')
 #' pred <- predict(bst, test$data)
 #' @export
-#' 
 xgb.save <- function(model, fname) {
  if (typeof(fname) != "character") {
    stop("xgb.save: fname must be character")
@@ -29,4 +28,4 @@ xgb.save <- function(model, fname) {
  stop("xgb.save: the input must be xgb.Booster. Use xgb.DMatrix.save to save
       xgb.DMatrix object.")
  return(FALSE)
-} 
+}
--- a/R-package/R/xgb.save.raw.R
+++ b/R-package/R/xgb.save.raw.R
@@ -16,7 +16,6 @@
 #' bst <- xgb.load(raw)
 #' pred <- predict(bst, test$data)
 #' @export
-#' 
 xgb.save.raw <- function(model) {
  if (class(model) == "xgb.Booster"){
    model <- model$handle
--- a/R-package/R/xgb.train.R
+++ b/R-package/R/xgb.train.R
@@ -19,7 +19,7 @@
 #'   \item \code{eta} control the learning rate: scale the contribution of each tree by a factor of \code{0 < eta < 1} when it is added to the current approximation. Used to prevent overfitting by making the boosting process more conservative. Lower value for \code{eta} implies larger value for \code{nrounds}: low \code{eta} value means model more robust to overfitting but slower to compute. Default: 0.3
 #'   \item \code{gamma} minimum loss reduction required to make a further partition on a leaf node of the tree. the larger, the more conservative the algorithm will be. 
 #'   \item \code{max_depth} maximum depth of a tree. Default: 6
-#'   \item \code{min_child_weight} minimum sum of instance weight(hessian) needed in a child. If the tree partition step results in a leaf node with the sum of instance weight less than min_child_weight, then the building process will give up further partitioning. In linear regression mode, this simply corresponds to minimum number of instances needed to be in each node. The larger, the more conservative the algorithm will be. Default: 1
+#'   \item \code{min_child_weight} minimum sum of instance weight (hessian) needed in a child. If the tree partition step results in a leaf node with the sum of instance weight less than min_child_weight, then the building process will give up further partitioning. In linear regression mode, this simply corresponds to minimum number of instances needed to be in each node. The larger, the more conservative the algorithm will be. Default: 1
 #'   \item \code{subsample} subsample ratio of the training instance. Setting it to 0.5 means that xgboost randomly collected half of the data instances to grow trees and this will prevent overfitting. It makes computation shorter (because less data to analyse). It is advised to use this parameter with \code{eta} and increase \code{nround}. Default: 1 
 #'   \item \code{colsample_bytree} subsample ratio of columns when constructing each tree. Default: 1
 #'   \item \code{num_parallel_tree} Experimental parameter. number of trees to grow per round. Useful to test Random Forest through Xgboost (set \code{colsample_bytree < 1}, \code{subsample  < 1}  and \code{round = 1}) accordingly. Default: 1
@@ -43,7 +43,7 @@
 #'     \item \code{binary:logistic} logistic regression for binary classification. Output probability.
 #'     \item \code{binary:logitraw} logistic regression for binary classification, output score before logistic transformation.
 #'     \item \code{num_class} set the number of classes. To use only with multiclass objectives.
-#'     \item \code{multi:softmax} set xgboost to do multiclass classification using the softmax objective. Class is represented by a number and should be from 0 to \code{tonum_class}.
+#'     \item \code{multi:softmax} set xgboost to do multiclass classification using the softmax objective. Class is represented by a number and should be from 0 to \code{num_class}.
 #'     \item \code{multi:softprob} same as softmax, but output a vector of ndata * nclass, which can be further reshaped to ndata, nclass matrix. The result contains predicted probabilities of each data point belonging to each class.
 #'     \item \code{rank:pairwise} set xgboost to do ranking task by minimizing the pairwise loss.
 #'   }
@@ -89,6 +89,7 @@
 #'   \itemize{
 #'      \item \code{rmse} root mean square error. \url{http://en.wikipedia.org/wiki/Root_mean_square_error}
 #'      \item \code{logloss} negative log-likelihood. \url{http://en.wikipedia.org/wiki/Log-likelihood}
+#'      \item \code{mlogloss} multiclass logloss. \url{https://www.kaggle.com/wiki/MultiClassLogLoss}
 #'      \item \code{error} Binary classification error rate. It is calculated as \code{(wrong cases) / (all cases)}. For the predictions, the evaluation will regard the instances with prediction value larger than 0.5 as positive instances, and the others as negative instances.
 #'      \item \code{merror} Multiclass classification error rate. It is calculated as \code{(wrong cases) / (all cases)}.
 #'      \item \code{auc} Area under the curve. \url{http://en.wikipedia.org/wiki/Receiver_operating_characteristic#'Area_under_curve} for ranking evaluation.
@@ -119,10 +120,9 @@
 #' param <- list(max.depth = 2, eta = 1, silent = 1, objective=logregobj,eval_metric=evalerror)
 #' bst <- xgb.train(param, dtrain, nthread = 2, nround = 2, watchlist)
 #' @export
-#' 
-xgb.train <- function(params=list(), data, nrounds, watchlist = list(), 
+xgb.train <- function(params=list(), data, nrounds, watchlist = list(),
                      obj = NULL, feval = NULL, verbose = 1, print.every.n=1L,
-                      early.stop.round = NULL, maximize = NULL, 
+                      early.stop.round = NULL, maximize = NULL,
                      save_period = 0, save_name = "xgboost.model", ...) {
  dtrain <- data
  if (typeof(params) != "list") {
@@ -139,30 +139,31 @@ xgb.train <- function(params=list(), data, nrounds, watchlist = list(),
  if (length(watchlist) != 0 && verbose == 0) {
    warning('watchlist is provided but verbose=0, no evaluation information will be printed')
  }
-  
-  dot.params = list(...)
-  nms.params = names(params)
-  nms.dot.params = names(dot.params)
-  if (length(intersect(nms.params,nms.dot.params))>0)
+
+  fit.call <- match.call()
+  dot.params <- list(...)
+  nms.params <- names(params)
+  nms.dot.params <- names(dot.params)
+  if (length(intersect(nms.params,nms.dot.params)) > 0)
    stop("Duplicated term in parameters. Please check your list of params.")
-  params = append(params, dot.params)
-  
+  params <- append(params, dot.params)
+
  # customized objective and evaluation metric interface
  if (!is.null(params$objective) && !is.null(obj))
    stop("xgb.train: cannot assign two different objectives")
  if (!is.null(params$objective))
-    if (class(params$objective)=='function') {
-      obj = params$objective
-      params$objective = NULL
+    if (class(params$objective) == 'function') {
+      obj <- params$objective
+      params$objective <- NULL
    }
  if (!is.null(params$eval_metric) && !is.null(feval))
    stop("xgb.train: cannot assign two different evaluation metrics")
  if (!is.null(params$eval_metric))
-    if (class(params$eval_metric)=='function') {
-      feval = params$eval_metric
-      params$eval_metric = NULL
+    if (class(params$eval_metric) == 'function') {
+      feval <- params$eval_metric
+      params$eval_metric <- NULL
    }
-    
+
  # Early stopping
  if (!is.null(early.stop.round)){
    if (!is.null(feval) && is.null(maximize))
@@ -174,44 +175,43 @@ xgb.train <- function(params=list(), data, nrounds, watchlist = list(),
    if (is.null(maximize))
    {
      if (params$eval_metric %in% c('rmse','logloss','error','merror','mlogloss')) {
-        maximize = FALSE
+        maximize <- FALSE
      } else {
-        maximize = TRUE
+        maximize <- TRUE
      }
    }
-    
+
    if (maximize) {
-      bestScore = 0
+      bestScore <- 0
    } else {
-      bestScore = Inf
+      bestScore <- Inf
    }
-    bestInd = 0
+    bestInd <- 0
    earlyStopflag = FALSE
-    
-    if (length(watchlist)>1)
+
+    if (length(watchlist) > 1)
      warning('Only the first data set in watchlist is used for early stopping process.')
  }
-  
-  
+
  handle <- xgb.Booster(params, append(watchlist, dtrain))
  bst <- xgb.handleToBooster(handle)
-  print.every.n=max( as.integer(print.every.n), 1L)
+  print.every.n <- max( as.integer(print.every.n), 1L)
  for (i in 1:nrounds) {
    succ <- xgb.iter.update(bst$handle, dtrain, i - 1, obj)
    if (length(watchlist) != 0) {
      msg <- xgb.iter.eval(bst$handle, watchlist, i - 1, feval)
-      if (0== ( (i-1) %% print.every.n))
-	    cat(paste(msg, "\n", sep=""))
+      if (0 == ( (i - 1) %% print.every.n))
+	    cat(paste(msg, "\n", sep = ""))
      if (!is.null(early.stop.round))
      {
-        score = strsplit(msg,':|\\s+')[[1]][3]
-        score = as.numeric(score)
-        if ((maximize && score>bestScore) || (!maximize && score<bestScore)) {
-          bestScore = score
-          bestInd = i
+        score <- strsplit(msg,':|\\s+')[[1]][3]
+        score <- as.numeric(score)
+        if ( (maximize && score > bestScore) || (!maximize && score < bestScore)) {
+          bestScore <- score
+          bestInd <- i
        } else {
-          if (i-bestInd>=early.stop.round) {
-            earlyStopflag = TRUE
+          earlyStopflag = TRUE
+          if (i - bestInd >= early.stop.round) {
            cat('Stopping. Best iteration:',bestInd)
            break
          }
@@ -225,9 +225,13 @@ xgb.train <- function(params=list(), data, nrounds, watchlist = list(),
    }
  }
  bst <- xgb.Booster.check(bst)
+
  if (!is.null(early.stop.round)) {
-    bst$bestScore = bestScore
-    bst$bestInd = bestInd
+    bst$bestScore <- bestScore
+    bst$bestInd <- bestInd
  }
+
+  attr(bst, "call") <- fit.call
+  attr(bst, "params") <- params
  return(bst)
-} 
+}
--- a/R-package/R/xgboost.R
+++ b/R-package/R/xgboost.R
@@ -58,29 +58,26 @@
 #' pred <- predict(bst, test$data)
 #' 
 #' @export
-#' 
-xgboost <- function(data = NULL, label = NULL, missing = NULL, weight = NULL, 
-                    params = list(), nrounds, 
+xgboost <- function(data = NULL, label = NULL, missing = NA, weight = NULL,
+                    params = list(), nrounds,
                    verbose = 1, print.every.n = 1L, early.stop.round = NULL,
                    maximize = NULL, save_period = 0, save_name = "xgboost.model", ...) {
  dtrain <- xgb.get.DMatrix(data, label, missing, weight)
-    
+
  params <- append(params, list(...))
-  
+
  if (verbose > 0) {
    watchlist <- list(train = dtrain)
  } else {
    watchlist <- list()
  }
-  
+
  bst <- xgb.train(params, dtrain, nrounds, watchlist, verbose = verbose, print.every.n=print.every.n,
                   early.stop.round = early.stop.round, maximize = maximize,
                   save_period = save_period, save_name = save_name)
-  
+
  return(bst)
-} 
-
-
+}
 #' Training part from Mushroom Data Set
 #' 
 #' This data set is originally from the Mushroom data set,
--- a/R-package/demo/basic_walkthrough.R
+++ b/R-package/demo/basic_walkthrough.R
@@ -14,28 +14,28 @@ class(train$data)
 # this is the basic usage of xgboost you can put matrix in data field
 # note: we are putting in sparse matrix here, xgboost naturally handles sparse input
 # use sparse matrix when your feature is sparse(e.g. when you are using one-hot encoding vector)
-print("training xgboost with sparseMatrix")
+print("Training xgboost with sparseMatrix")
 bst <- xgboost(data = train$data, label = train$label, max.depth = 2, eta = 1, nround = 2,
               nthread = 2, objective = "binary:logistic")
 # alternatively, you can put in dense matrix, i.e. basic R-matrix
-print("training xgboost with Matrix")
+print("Training xgboost with Matrix")
 bst <- xgboost(data = as.matrix(train$data), label = train$label, max.depth = 2, eta = 1, nround = 2,
               nthread = 2, objective = "binary:logistic")

 # you can also put in xgb.DMatrix object, which stores label, data and other meta datas needed for advanced features
-print("training xgboost with xgb.DMatrix")
+print("Training xgboost with xgb.DMatrix")
 dtrain <- xgb.DMatrix(data = train$data, label = train$label)
 bst <- xgboost(data = dtrain, max.depth = 2, eta = 1, nround = 2, nthread = 2, 
               objective = "binary:logistic")

 # Verbose = 0,1,2
-print ('train xgboost with verbose 0, no message')
+print("Train xgboost with verbose 0, no message")
 bst <- xgboost(data = dtrain, max.depth = 2, eta = 1, nround = 2,
               nthread = 2, objective = "binary:logistic", verbose = 0)
-print ('train xgboost with verbose 1, print evaluation metric')
+print("Train xgboost with verbose 1, print evaluation metric")
 bst <- xgboost(data = dtrain, max.depth = 2, eta = 1, nround = 2,
               nthread = 2, objective = "binary:logistic", verbose = 1)
-print ('train xgboost with verbose 2, also print information about tree')
+print("Train xgboost with verbose 2, also print information about tree")
 bst <- xgboost(data = dtrain, max.depth = 2, eta = 1, nround = 2,
               nthread = 2, objective = "binary:logistic", verbose = 2)

@@ -76,11 +76,11 @@ dtest <- xgb.DMatrix(data = test$data, label=test$label)
 watchlist <- list(train=dtrain, test=dtest)
 # to train with watchlist, use xgb.train, which contains more advanced features
 # watchlist allows us to monitor the evaluation result on all data in the list 
-print ('train xgboost using xgb.train with watchlist')
+print("Train xgboost using xgb.train with watchlist")
 bst <- xgb.train(data=dtrain, max.depth=2, eta=1, nround=2, watchlist=watchlist,
                 nthread = 2, objective = "binary:logistic")
 # we can change evaluation metrics, or use multiple evaluation metrics
-print ('train xgboost using xgb.train with watchlist, watch logloss and error')
+print("train xgboost using xgb.train with watchlist, watch logloss and error")
 bst <- xgb.train(data=dtrain, max.depth=2, eta=1, nround=2, watchlist=watchlist,
                 eval.metric = "error", eval.metric = "logloss",
                 nthread = 2, objective = "binary:logistic")
@@ -102,4 +102,9 @@ xgb.dump(bst, "dump.raw.txt", with.stats = T)

 # Finally, you can check which features are the most important.
 print("Most important features (look at column Gain):")
-print(xgb.importance(feature_names = train$data@Dimnames[[2]], filename_dump = "dump.raw.txt"))
+imp_matrix <- xgb.importance(feature_names = train$data@Dimnames[[2]], model = bst)
+print(imp_matrix)
+
+# Feature importance bar plot by gain
+print("Feature importance Plot : ")
+print(xgb.plot.importance(importance_matrix = imp_matrix))
--- a/R-package/demo/boost_from_prediction.R
+++ b/R-package/demo/boost_from_prediction.R
@@ -23,4 +23,4 @@ setinfo(dtrain, "base_margin", ptrain)
 setinfo(dtest, "base_margin", ptest)

 print('this is result of boost from initial prediction')
-bst <- xgb.train( param, dtrain, 1, watchlist )
+bst <- xgb.train(params = param, data = dtrain, nrounds = 1, watchlist = watchlist)
--- a/R-package/demo/create_sparse_matrix.R
+++ b/R-package/demo/create_sparse_matrix.R
@@ -67,10 +67,9 @@ output_vector = df[,Y:=0][Improved == "Marked",Y:=1][,Y]
 cat("Learning...\n")
 bst <- xgboost(data = sparse_matrix, label = output_vector, max.depth = 9,
               eta = 1, nthread = 2, nround = 10,objective = "binary:logistic")
-xgb.dump(bst, 'xgb.model.dump', with.stats = T)

 # sparse_matrix@Dimnames[[2]] represents the column names of the sparse matrix.
-importance <- xgb.importance(sparse_matrix@Dimnames[[2]], 'xgb.model.dump')
+importance <- xgb.importance(feature_names = sparse_matrix@Dimnames[[2]], model = bst)
 print(importance)
 # According to the matrix below, the most important feature in this dataset to predict if the treatment will work is the Age. The second most important feature is having received a placebo or not. The sex is third. Then we see our generated features (AgeDiscret). We can see that their contribution is very low (Gain column).

--- a/R-package/demo/cross_validation.R
+++ b/R-package/demo/cross_validation.R
@@ -43,9 +43,9 @@ evalerror <- function(preds, dtrain) {
 param <- list(max.depth=2,eta=1,silent=1,
              objective = logregobj, eval_metric = evalerror)
 # train with customized objective
-xgb.cv(param, dtrain, nround, nfold = 5)
+xgb.cv(params = param, data = dtrain, nrounds = nround, nfold = 5)

 # do cross validation with prediction values for each fold
-res <- xgb.cv(param, dtrain, nround, nfold=5, prediction = TRUE)
+res <- xgb.cv(params = param, data = dtrain, nrounds = nround, nfold = 5, prediction = TRUE)
 res$dt
 length(res$pred)
--- a/R-package/demo/predict_leaf_indices.R
+++ b/R-package/demo/predict_leaf_indices.R
@@ -1,21 +1,52 @@
 require(xgboost)
+require(data.table)
+require(Matrix)
+
+set.seed(1982)
+
 # load in the agaricus dataset
 data(agaricus.train, package='xgboost')
 data(agaricus.test, package='xgboost')
-dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
-dtest <- xgb.DMatrix(agaricus.test$data, label = agaricus.test$label)
+dtrain <- xgb.DMatrix(data = agaricus.train$data, label = agaricus.train$label)
+dtest <- xgb.DMatrix(data = agaricus.test$data, label = agaricus.test$label)

-param <- list(max.depth=2,eta=1,silent=1,objective='binary:logistic')
-watchlist <- list(eval = dtest, train = dtrain)
-nround = 5
+param <- list(max.depth=2, eta=1, silent=1, objective='binary:logistic')
+nround = 4

 # training the model for two rounds
-bst = xgb.train(param, dtrain, nround, nthread = 2, watchlist)
-cat('start testing prediction from first n trees\n')
+bst = xgb.train(params = param, data = dtrain, nrounds = nround, nthread = 2)
+
+# Model accuracy without new features
+accuracy.before <- sum((predict(bst, agaricus.test$data) >= 0.5) == agaricus.test$label) / length(agaricus.test$label)

-### predict using first 2 tree
-pred_with_leaf = predict(bst, dtest, ntreelimit = 2, predleaf = TRUE)
-head(pred_with_leaf)
 # by default, we predict using all the trees
+
 pred_with_leaf = predict(bst, dtest, predleaf = TRUE)
 head(pred_with_leaf)
+
+create.new.tree.features <- function(model, original.features){
+  pred_with_leaf <- predict(model, original.features, predleaf = TRUE)
+  cols <- list()
+  for(i in 1:length(trees)){
+    # max is not the real max but it s not important for the purpose of adding features
+    leaf.id <- sort(unique(pred_with_leaf[,i]))
+    cols[[i]] <- factor(x = pred_with_leaf[,i], level = leaf.id)
+  }
+  cBind(original.features, sparse.model.matrix( ~ . -1, as.data.frame(cols)))
+}
+
+# Convert previous features to one hot encoding
+new.features.train <- create.new.tree.features(bst, agaricus.train$data)
+new.features.test <- create.new.tree.features(bst, agaricus.test$data)
+
+# learning with new features
+new.dtrain <- xgb.DMatrix(data = new.features.train, label = agaricus.train$label)
+new.dtest <- xgb.DMatrix(data = new.features.test, label = agaricus.test$label)
+watchlist <- list(train = new.dtrain)
+bst <- xgb.train(params = param, data = new.dtrain, nrounds = nround, nthread = 2)
+
+# Model accuracy with new features
+accuracy.after <- sum((predict(bst, new.dtest) >= 0.5) == agaricus.test$label) / length(agaricus.test$label)
+
+# Here the accuracy was already good and is now perfect.
+cat(paste("The accuracy was", accuracy.before, "before adding leaf features and it is now", accuracy.after, "!\n"))
--- a/R-package/man/agaricus.test.Rd
+++ b/R-package/man/agaricus.test.Rd
@@ -1,10 +1,10 @@
-% Generated by roxygen2 (4.1.1): do not edit by hand
+% Generated by roxygen2: do not edit by hand
 % Please edit documentation in R/xgboost.R
 \docType{data}
 \name{agaricus.test}
 \alias{agaricus.test}
 \title{Test part from Mushroom Data Set}
-\format{A list containing a label vector, and a dgCMatrix object with 1611
+\format{A list containing a label vector, and a dgCMatrix object with 1611 
 rows and 126 variables}
 \usage{
 data(agaricus.test)
@@ -24,8 +24,8 @@ This data set includes the following fields:
 \references{
 https://archive.ics.uci.edu/ml/datasets/Mushroom

-Bache, K. & Lichman, M. (2013). UCI Machine Learning Repository
-[http://archive.ics.uci.edu/ml]. Irvine, CA: University of California,
+Bache, K. & Lichman, M. (2013). UCI Machine Learning Repository 
+[http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, 
 School of Information and Computer Science.
 }
 \keyword{datasets}
--- a/R-package/man/agaricus.train.Rd
+++ b/R-package/man/agaricus.train.Rd
@@ -1,10 +1,10 @@
-% Generated by roxygen2 (4.1.1): do not edit by hand
+% Generated by roxygen2: do not edit by hand
 % Please edit documentation in R/xgboost.R
 \docType{data}
 \name{agaricus.train}
 \alias{agaricus.train}
 \title{Training part from Mushroom Data Set}
-\format{A list containing a label vector, and a dgCMatrix object with 6513
+\format{A list containing a label vector, and a dgCMatrix object with 6513 
 rows and 127 variables}
 \usage{
 data(agaricus.train)
@@ -24,8 +24,8 @@ This data set includes the following fields:
 \references{
 https://archive.ics.uci.edu/ml/datasets/Mushroom

-Bache, K. & Lichman, M. (2013). UCI Machine Learning Repository
-[http://archive.ics.uci.edu/ml]. Irvine, CA: University of California,
+Bache, K. & Lichman, M. (2013). UCI Machine Learning Repository 
+[http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, 
 School of Information and Computer Science.
 }
 \keyword{datasets}
--- a/R-package/man/edge.parser.Rd
+++ b/R-package/man/edge.parser.Rd
@@ -0,0 +1,15 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/xgb.plot.deepness.R
+\name{edge.parser}
+\alias{edge.parser}
+\title{Parse the graph to extract vector of edges}
+\usage{
+edge.parser(element)
+}
+\arguments{
+\item{element}{igraph object containing the path from the root to the leaf.}
+}
+\description{
+Parse the graph to extract vector of edges
+}
+
--- a/R-package/man/get.paths.to.leaf.Rd
+++ b/R-package/man/get.paths.to.leaf.Rd
@@ -0,0 +1,15 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/xgb.plot.deepness.R
+\name{get.paths.to.leaf}
+\alias{get.paths.to.leaf}
+\title{Extract path from root to leaf from data.table}
+\usage{
+get.paths.to.leaf(dt.tree)
+}
+\arguments{
+\item{dt.tree}{data.table containing the nodes and edges of the trees}
+}
+\description{
+Extract path from root to leaf from data.table
+}
+
--- a/R-package/man/getinfo.Rd
+++ b/R-package/man/getinfo.Rd
@@ -1,4 +1,4 @@
-% Generated by roxygen2 (4.1.1): do not edit by hand
+% Generated by roxygen2: do not edit by hand
 % Please edit documentation in R/getinfo.xgb.DMatrix.R
 \docType{methods}
 \name{getinfo}
--- a/R-package/man/multiplot.Rd
+++ b/R-package/man/multiplot.Rd
@@ -0,0 +1,15 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/xgb.plot.deepness.R
+\name{multiplot}
+\alias{multiplot}
+\title{Plot multiple graphs at the same time}
+\usage{
+multiplot(..., cols = 1)
+}
+\arguments{
+\item{cols}{number of columns}
+}
+\description{
+Plot multiple graph aligned by rows and columns.
+}
+
--- a/R-package/man/nrow-xgb.DMatrix-method.Rd
+++ b/R-package/man/nrow-xgb.DMatrix-method.Rd
@@ -1,4 +1,4 @@
-% Generated by roxygen2 (4.1.1): do not edit by hand
+% Generated by roxygen2: do not edit by hand
 % Please edit documentation in R/nrow.xgb.DMatrix.R
 \docType{methods}
 \name{nrow,xgb.DMatrix-method}
@@ -18,5 +18,6 @@ data(agaricus.train, package='xgboost')
 train <- agaricus.train
 dtrain <- xgb.DMatrix(train$data, label=train$label)
 stopifnot(nrow(dtrain) == nrow(train$data))
+
 }

--- a/R-package/man/predict-xgb.Booster-method.Rd
+++ b/R-package/man/predict-xgb.Booster-method.Rd
@@ -1,29 +1,29 @@
-% Generated by roxygen2 (4.1.1): do not edit by hand
+% Generated by roxygen2: do not edit by hand
 % Please edit documentation in R/predict.xgb.Booster.R
 \docType{methods}
 \name{predict,xgb.Booster-method}
 \alias{predict,xgb.Booster-method}
 \title{Predict method for eXtreme Gradient Boosting model}
 \usage{
-\S4method{predict}{xgb.Booster}(object, newdata, missing = NULL,
+\S4method{predict}{xgb.Booster}(object, newdata, missing = NA,
  outputmargin = FALSE, ntreelimit = NULL, predleaf = FALSE)
 }
 \arguments{
 \item{object}{Object of class "xgb.Boost"}

-\item{newdata}{takes \code{matrix}, \code{dgCMatrix}, local data file or
+\item{newdata}{takes \code{matrix}, \code{dgCMatrix}, local data file or 
 \code{xgb.DMatrix}.}

-\item{missing}{Missing is only used when input is dense matrix, pick a float
+\item{missing}{Missing is only used when input is dense matrix, pick a float 
 value that represents missing value. Sometime a data use 0 or other extreme value to represents missing values.}

 \item{outputmargin}{whether the prediction should be shown in the original
-value of sum of functions, when outputmargin=TRUE, the prediction is
+value of sum of functions, when outputmargin=TRUE, the prediction is 
 untransformed margin value. In logistic regression, outputmargin=T will
 output value before logistic transformation.}

 \item{ntreelimit}{limit number of trees used in prediction, this parameter is
-only valid for gbtree, but not for gblinear. set it to be value bigger
+only valid for gbtree, but not for gblinear. set it to be value bigger 
 than 0. It will use all trees by default.}

 \item{predleaf}{whether predict leaf index instead. If set to TRUE, the output will be a matrix object.}
@@ -31,12 +31,22 @@ than 0. It will use all trees by default.}
 \description{
 Predicted values based on xgboost model object.
 }
+\details{
+The option \code{ntreelimit} purpose is to let the user train a model with lots 
+of trees but use only the first trees for prediction to avoid overfitting 
+(without having to train a new model with less trees).
+
+The option \code{predleaf} purpose is inspired from §3.1 of the paper 
+\code{Practical Lessons from Predicting Clicks on Ads at Facebook}.
+The idea is to use the model as a generator of new features which capture non linear link 
+from original features.
+}
 \examples{
 data(agaricus.train, package='xgboost')
 data(agaricus.test, package='xgboost')
 train <- agaricus.train
 test <- agaricus.test
-bst <- xgboost(data = train$data, label = train$label, max.depth = 2,
+bst <- xgboost(data = train$data, label = train$label, max.depth = 2, 
               eta = 1, nthread = 2, nround = 2,objective = "binary:logistic")
 pred <- predict(bst, test$data)
 }
--- a/R-package/man/predict-xgb.Booster.handle-method.Rd
+++ b/R-package/man/predict-xgb.Booster.handle-method.Rd
@@ -1,4 +1,4 @@
-% Generated by roxygen2 (4.1.1): do not edit by hand
+% Generated by roxygen2: do not edit by hand
 % Please edit documentation in R/predict.xgb.Booster.handle.R
 \docType{methods}
 \name{predict,xgb.Booster.handle-method}
--- a/R-package/man/setinfo.Rd
+++ b/R-package/man/setinfo.Rd
@@ -1,4 +1,4 @@
-% Generated by roxygen2 (4.1.1): do not edit by hand
+% Generated by roxygen2: do not edit by hand
 % Please edit documentation in R/setinfo.xgb.DMatrix.R
 \docType{methods}
 \name{setinfo}
--- a/R-package/man/slice.Rd
+++ b/R-package/man/slice.Rd
@@ -1,4 +1,4 @@
-% Generated by roxygen2 (4.1.1): do not edit by hand
+% Generated by roxygen2: do not edit by hand
 % Please edit documentation in R/slice.xgb.DMatrix.R
 \docType{methods}
 \name{slice}
--- a/R-package/man/xgb.DMatrix.Rd
+++ b/R-package/man/xgb.DMatrix.Rd
@@ -1,13 +1,13 @@
-% Generated by roxygen2 (4.1.1): do not edit by hand
+% Generated by roxygen2: do not edit by hand
 % Please edit documentation in R/xgb.DMatrix.R
 \name{xgb.DMatrix}
 \alias{xgb.DMatrix}
 \title{Contruct xgb.DMatrix object}
 \usage{
-xgb.DMatrix(data, info = list(), missing = 0, ...)
+xgb.DMatrix(data, info = list(), missing = NA, ...)
 }
 \arguments{
-\item{data}{a \code{matrix} object, a \code{dgCMatrix} object or a character
+\item{data}{a \code{matrix} object, a \code{dgCMatrix} object or a character 
 indicating the data file.}

 \item{info}{a list of information of the xgb.DMatrix object}
--- a/R-package/man/xgb.DMatrix.save.Rd
+++ b/R-package/man/xgb.DMatrix.save.Rd
@@ -1,4 +1,4 @@
-% Generated by roxygen2 (4.1.1): do not edit by hand
+% Generated by roxygen2: do not edit by hand
 % Please edit documentation in R/xgb.DMatrix.save.R
 \name{xgb.DMatrix.save}
 \alias{xgb.DMatrix.save}
--- a/R-package/man/xgb.create.features.Rd
+++ b/R-package/man/xgb.create.features.Rd
@@ -0,0 +1,88 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/xgb.create.features.R
+\name{xgb.create.features}
+\alias{xgb.create.features}
+\title{Create new features from a previously learned model}
+\usage{
+xgb.create.features(model, training.data)
+}
+\arguments{
+\item{model}{decision tree boosting model learned on the original data}
+
+\item{training.data}{original data (usually provided as a \code{dgCMatrix} matrix)}
+}
+\value{
+\code{dgCMatrix} matrix including both the original data and the new features.
+}
+\description{
+May improve the learning by adding new features to the training data based on the decision trees from a previously learned model.
+}
+\details{
+This is the function inspired from the paragraph 3.1 of the paper:
+
+\strong{Practical Lessons from Predicting Clicks on Ads at Facebook}
+
+\emph{(Xinran He, Junfeng Pan, Ou Jin, Tianbing Xu, Bo Liu, Tao Xu, Yan, xin Shi, Antoine Atallah, Ralf Herbrich, Stuart Bowers, 
+Joaquin Quiñonero Candela)}
+ 
+International Workshop on Data Mining for Online Advertising (ADKDD) - August 24, 2014
+
+\url{https://research.facebook.com/publications/758569837499391/practical-lessons-from-predicting-clicks-on-ads-at-facebook/}.
+
+Extract explaining the method:
+
+"\emph{We found that boosted decision trees are a powerful and very
+convenient way to implement non-linear and tuple transformations
+of the kind we just described. We treat each individual
+tree as a categorical feature that takes as value the
+index of the leaf an instance ends up falling in. We use 
+1-of-K coding of this type of features. 
+
+For example, consider the boosted tree model in Figure 1 with 2 subtrees, 
+where the first subtree has 3 leafs and the second 2 leafs. If an
+instance ends up in leaf 2 in the first subtree and leaf 1 in
+second subtree, the overall input to the linear classifier will
+be the binary vector \code{[0, 1, 0, 1, 0]}, where the first 3 entries
+correspond to the leaves of the first subtree and last 2 to
+those of the second subtree.
+
+[...]
+
+We can understand boosted decision tree
+based transformation as a supervised feature encoding that
+converts a real-valued vector into a compact binary-valued
+vector. A traversal from root node to a leaf node represents
+a rule on certain features.}"
+}
+\examples{
+data(agaricus.train, package='xgboost')
+data(agaricus.test, package='xgboost')
+dtrain <- xgb.DMatrix(data = agaricus.train$data, label = agaricus.train$label)
+dtest <- xgb.DMatrix(data = agaricus.test$data, label = agaricus.test$label)
+
+param <- list(max.depth=2, eta=1, silent=1, objective='binary:logistic')
+nround = 4
+
+bst = xgb.train(params = param, data = dtrain, nrounds = nround, nthread = 2)
+
+# Model accuracy without new features
+accuracy.before <- sum((predict(bst, agaricus.test$data) >= 0.5) == agaricus.test$label) / length(agaricus.test$label)
+
+# Convert previous features to one hot encoding
+new.features.train <- xgb.create.features(model = bst, agaricus.train$data)
+new.features.test <- xgb.create.features(model = bst, agaricus.test$data)
+
+# learning with new features
+new.dtrain <- xgb.DMatrix(data = new.features.train, label = agaricus.train$label)
+new.dtest <- xgb.DMatrix(data = new.features.test, label = agaricus.test$label)
+watchlist <- list(train = new.dtrain)
+bst <- xgb.train(params = param, data = new.dtrain, nrounds = nround, nthread = 2)
+
+# Model accuracy with new features
+accuracy.after <- sum((predict(bst, new.dtest) >= 0.5) == agaricus.test$label) / length(agaricus.test$label)
+
+# Here the accuracy was already good and is now perfect.
+cat(paste("The accuracy was", accuracy.before, "before adding leaf features and it is now", accuracy.after, "!\\n"))
+
+}
+
--- a/R-package/man/xgb.cv.Rd
+++ b/R-package/man/xgb.cv.Rd
@@ -1,14 +1,13 @@
-% Generated by roxygen2 (4.1.1): do not edit by hand
+% Generated by roxygen2: do not edit by hand
 % Please edit documentation in R/xgb.cv.R
 \name{xgb.cv}
 \alias{xgb.cv}
 \title{Cross Validation}
 \usage{
-xgb.cv(params = list(), data, nrounds, nfold, label = NULL,
-  missing = NULL, prediction = FALSE, showsd = TRUE, metrics = list(),
-  obj = NULL, feval = NULL, stratified = TRUE, folds = NULL,
-  verbose = T, print.every.n = 1L, early.stop.round = NULL,
-  maximize = NULL, ...)
+xgb.cv(params = list(), data, nrounds, nfold, label = NULL, missing = NA,
+  prediction = FALSE, showsd = TRUE, metrics = list(), obj = NULL,
+  feval = NULL, stratified = TRUE, folds = NULL, verbose = T,
+  print.every.n = 1L, early.stop.round = NULL, maximize = NULL, ...)
 }
 \arguments{
 \item{params}{the list of parameters. Commonly used ones are:
@@ -41,7 +40,7 @@ value that represents missing value. Sometime a data use 0 or other extreme valu

 \item{showsd}{\code{boolean}, whether show standard deviation of cross validation}

-\item{metrics,}{list of evaluation metrics to be used in corss validation,
+\item{metrics, }{list of evaluation metrics to be used in corss validation,
  when it is not specified, the evaluation metric is chosen according to objective function.
  Possible options are:
 \itemize{
@@ -52,11 +51,11 @@ value that represents missing value. Sometime a data use 0 or other extreme valu
  \item \code{merror} Exact matching error, used to evaluate multi-class classification
 }}

-\item{obj}{customized objective function. Returns gradient and second order
+\item{obj}{customized objective function. Returns gradient and second order 
 gradient with given prediction and dtrain.}

-\item{feval}{custimized evaluation function. Returns
-\code{list(metric='metric-name', value='metric-value')} with given
+\item{feval}{custimized evaluation function. Returns 
+\code{list(metric='metric-name', value='metric-value')} with given 
 prediction and dtrain.}

 \item{stratified}{\code{boolean} whether sampling of folds should be stratified by the values of labels in \code{data}}
@@ -68,12 +67,12 @@ If folds are supplied, the nfold and stratified parameters would be ignored.}

 \item{print.every.n}{Print every N progress messages when \code{verbose>0}. Default is 1 which means all messages are printed.}

-\item{early.stop.round}{If \code{NULL}, the early stopping function is not triggered.
-If set to an integer \code{k}, training with a validation set will stop if the performance
+\item{early.stop.round}{If \code{NULL}, the early stopping function is not triggered. 
+If set to an integer \code{k}, training with a validation set will stop if the performance 
 keeps getting worse consecutively for \code{k} rounds.}

 \item{maximize}{If \code{feval} and \code{early.stop.round} are set, then \code{maximize} must be set as well.
-    \code{maximize=TRUE} means the larger the evaluation score the better.}
+\code{maximize=TRUE} means the larger the evaluation score the better.}

 \item{...}{other parameters to pass to \code{params}.}
 }
@@ -90,9 +89,9 @@ If \code{prediction = FALSE}, just a \code{data.table} with each mean and standa
 The cross valudation function of xgboost
 }
 \details{
-The original sample is randomly partitioned into \code{nfold} equal size subsamples.
+The original sample is randomly partitioned into \code{nfold} equal size subsamples. 

-Of the \code{nfold} subsamples, a single subsample is retained as the validation data for testing the model, and the remaining \code{nfold - 1} subsamples are used as training data.
+Of the \code{nfold} subsamples, a single subsample is retained as the validation data for testing the model, and the remaining \code{nfold - 1} subsamples are used as training data. 

 The cross-validation process is then repeated \code{nrounds} times, with each of the \code{nfold} subsamples used exactly once as the validation data.

--- a/R-package/man/xgb.dump.Rd
+++ b/R-package/man/xgb.dump.Rd
@@ -1,4 +1,4 @@
-% Generated by roxygen2 (4.1.1): do not edit by hand
+% Generated by roxygen2: do not edit by hand
 % Please edit documentation in R/xgb.dump.R
 \name{xgb.dump}
 \alias{xgb.dump}
@@ -11,17 +11,17 @@ xgb.dump(model = NULL, fname = NULL, fmap = "", with.stats = FALSE)

 \item{fname}{the name of the text file where to save the model text dump. If not provided or set to \code{NULL} the function will return the model as a \code{character} vector.}

-\item{fmap}{feature map file representing the type of feature.
-Detailed description could be found at
+\item{fmap}{feature map file representing the type of feature. 
+Detailed description could be found at 
 \url{https://github.com/dmlc/xgboost/wiki/Binary-Classification#dump-model}.
 See demo/ for walkthrough example in R, and
-\url{https://github.com/dmlc/xgboost/blob/master/demo/data/featmap.txt}
+\url{https://github.com/dmlc/xgboost/blob/master/demo/data/featmap.txt} 
 for example Format.}

-\item{with.stats}{whether dump statistics of splits
-       When this option is on, the model dump comes with two additional statistics:
-       gain is the approximate loss function gain we get in each split;
-       cover is the sum of second order gradient in each node.}
+\item{with.stats}{whether dump statistics of splits 
+When this option is on, the model dump comes with two additional statistics:
+gain is the approximate loss function gain we get in each split;
+cover is the sum of second order gradient in each node.}
 }
 \value{
 if fname is not provided or set to \code{NULL} the function will return the model as a \code{character} vector. Otherwise it will return \code{TRUE}.
@@ -34,7 +34,7 @@ data(agaricus.train, package='xgboost')
 data(agaricus.test, package='xgboost')
 train <- agaricus.train
 test <- agaricus.test
-bst <- xgboost(data = train$data, label = train$label, max.depth = 2,
+bst <- xgboost(data = train$data, label = train$label, max.depth = 2, 
               eta = 1, nthread = 2, nround = 2,objective = "binary:logistic")
 # save the model in file 'xgb.model.dump'
 xgb.dump(bst, 'xgb.model.dump', with.stats = TRUE)
--- a/R-package/man/xgb.importance.Rd
+++ b/R-package/man/xgb.importance.Rd
@@ -1,18 +1,16 @@
-% Generated by roxygen2 (4.1.1): do not edit by hand
+% Generated by roxygen2: do not edit by hand
 % Please edit documentation in R/xgb.importance.R
 \name{xgb.importance}
 \alias{xgb.importance}
 \title{Show importance of features in a model}
 \usage{
-xgb.importance(feature_names = NULL, filename_dump = NULL, model = NULL,
-  data = NULL, label = NULL, target = function(x) ((x + label) == 2))
+xgb.importance(feature_names = NULL, model = NULL, data = NULL,
+  label = NULL, target = function(x) ((x + label) == 2))
 }
 \arguments{
-\item{feature_names}{names of each feature as a character vector. Can be extracted from a sparse matrix (see example). If model dump already contains feature names, this argument should be \code{NULL}.}
+\item{feature_names}{names of each feature as a \code{character} vector. Can be extracted from a sparse matrix (see example). If model dump already contains feature names, this argument should be \code{NULL}.}

-\item{filename_dump}{the path to the text file storing the model. Model dump must include the gain per feature and per tree (\code{with.stats = T} in function \code{xgb.dump}).}
-
-\item{model}{generated by the \code{xgb.train} function. Avoid the creation of a dump file.}
+\item{model}{generated by the \code{xgb.train} function.}

 \item{data}{the dataset used for the training step. Will be used with \code{label} parameter for co-occurence computation. More information in \code{Detail} part. This parameter is optional.}

@@ -24,23 +22,24 @@ xgb.importance(feature_names = NULL, filename_dump = NULL, model = NULL,
 A \code{data.table} of the features used in the model with their average gain (and their weight for boosted tree model) in the model.
 }
 \description{
-Read a xgboost model text dump.
-Can be tree or linear model (text dump of linear model are only supported in dev version of \code{Xgboost} for now).
+Create a \code{data.table} of the most important features of a model.
 }
 \details{
-This is the function to understand the model trained (and through your model, your data).
+This function is for both linear and tree models.

-Results are returned for both linear and tree models.
-
-\code{data.table} is returned by the function.
-There are 3 columns :
+\code{data.table} is returned by the function. 
+The columns are :
 \itemize{
-  \item \code{Features} name of the features as provided in \code{feature_names} or already present in the model dump.
-  \item \code{Gain} contribution of each feature to the model. For boosted tree model, each gain of each feature of each tree is taken into account, then average per feature to give a vision of the entire model. Highest percentage means important feature to predict the \code{label} used for the training ;
-  \item \code{Cover} metric of the number of observation related to this feature (only available for tree models) ;
-  \item \code{Weight} percentage representing the relative number of times a feature have been taken into trees. \code{Gain} should be prefered to search the most important feature. For boosted linear model, this column has no meaning.
+  \item \code{Features} name of the features as provided in \code{feature_names} or already present in the model dump;
+  \item \code{Gain} contribution of each feature to the model. For boosted tree model, each gain of each feature of each tree is taken into account, then average per feature to give a vision of the entire model. Highest percentage means important feature to predict the \code{label} used for the training (only available for tree models);
+  \item \code{Cover} metric of the number of observation related to this feature (only available for tree models);
+  \item \code{Weight} percentage representing the relative number of times a feature have been taken into trees.
 }

+If you don't provide \code{feature_names}, index of the features will be used instead.
+
+Because the index is extracted from the model dump (made on the C++ side), it starts at 0 (usual in C++) instead of 1 (usual in R).
+
 Co-occurence count
 ------------------

@@ -53,18 +52,14 @@ If you need to remember one thing only: until you want to leave us early, don't
 \examples{
 data(agaricus.train, package='xgboost')

-# Both dataset are list with two items, a sparse matrix and labels
-# (labels = outcome column which will be learned).
-# Each column of the sparse Matrix is a feature in one hot encoding format.
-train <- agaricus.train
-
-bst <- xgboost(data = train$data, label = train$label, max.depth = 2,
+bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label, max.depth = 2, 
               eta = 1, nthread = 2, nround = 2,objective = "binary:logistic")

-# train$data@Dimnames[[2]] represents the column names of the sparse matrix.
-xgb.importance(train$data@Dimnames[[2]], model = bst)
+# agaricus.train$data@Dimnames[[2]] represents the column names of the sparse matrix.
+xgb.importance(agaricus.train$data@Dimnames[[2]], model = bst)

 # Same thing with co-occurence computation this time
-xgb.importance(train$data@Dimnames[[2]], model = bst, data = train$data, label = train$label)
+xgb.importance(agaricus.train$data@Dimnames[[2]], model = bst, data = agaricus.train$data, label = agaricus.train$label)
+
 }

--- a/R-package/man/xgb.load.Rd
+++ b/R-package/man/xgb.load.Rd
@@ -1,4 +1,4 @@
-% Generated by roxygen2 (4.1.1): do not edit by hand
+% Generated by roxygen2: do not edit by hand
 % Please edit documentation in R/xgb.load.R
 \name{xgb.load}
 \alias{xgb.load}
@@ -17,7 +17,7 @@ data(agaricus.train, package='xgboost')
 data(agaricus.test, package='xgboost')
 train <- agaricus.train
 test <- agaricus.test
-bst <- xgboost(data = train$data, label = train$label, max.depth = 2,
+bst <- xgboost(data = train$data, label = train$label, max.depth = 2, 
               eta = 1, nthread = 2, nround = 2,objective = "binary:logistic")
 xgb.save(bst, 'xgb.model')
 bst <- xgb.load('xgb.model')
--- a/R-package/man/xgb.model.dt.tree.Rd
+++ b/R-package/man/xgb.model.dt.tree.Rd
@@ -1,33 +1,33 @@
-% Generated by roxygen2 (4.1.1): do not edit by hand
+% Generated by roxygen2: do not edit by hand
 % Please edit documentation in R/xgb.model.dt.tree.R
 \name{xgb.model.dt.tree}
 \alias{xgb.model.dt.tree}
-\title{Convert tree model dump to data.table}
+\title{Parse boosted tree model text dump}
 \usage{
-xgb.model.dt.tree(feature_names = NULL, filename_dump = NULL,
-  model = NULL, text = NULL, n_first_tree = NULL)
+xgb.model.dt.tree(feature_names = NULL, model = NULL, text = NULL,
+  n_first_tree = NULL)
 }
 \arguments{
-\item{feature_names}{names of each feature as a character vector. Can be extracted from a sparse matrix (see example). If model dump already contains feature names, this argument should be \code{NULL}.}
+\item{feature_names}{names of each feature as a character vector. Can be extracted from a sparse matrix (see example). If the model already contains feature names, this argument should be \code{NULL} (default value).}

-\item{filename_dump}{the path to the text file storing the model. Model dump must include the gain per feature and per tree (parameter \code{with.stats = T} in function \code{xgb.dump}).}
+\item{model}{object created by the \code{xgb.train} function.}

-\item{model}{dump generated by the \code{xgb.train} function. Avoid the creation of a dump file.}
+\item{text}{\code{character} vector generated by the \code{xgb.dump} function. Model dump must include the gain per feature and per tree (parameter \code{with.stats = TRUE} in function \code{xgb.dump}).}

-\item{text}{dump generated by the \code{xgb.dump} function. Avoid the creation of a dump file. Model dump must include the gain per feature and per tree (parameter \code{with.stats = T} in function \code{xgb.dump}).}
-
-\item{n_first_tree}{limit the plot to the n first trees. If \code{NULL}, all trees of the model are plotted. Performance can be low for huge models.}
+\item{n_first_tree}{limit the plot to the \code{n} first trees. If set to \code{NULL}, all trees of the model are plotted. Performance can be low depending of the size of the model.}
 }
 \value{
-A \code{data.table} of the features used in the model with their gain, cover and few other thing.
+A \code{data.table} of the features used in the model with their gain, cover and few other information.
 }
 \description{
-Read a tree model text dump and return a data.table.
+Parse a boosted tree model text dump and return a \code{data.table}.
 }
 \details{
-General function to convert a text dump of tree model to a Matrix. The purpose is to help user to explore the model and get a better understanding of it.
+General function to convert a text dump of tree model to a \code{data.table}. 

-The content of the \code{data.table} is organised that way:
+The purpose is to help user to explore the model and get a better understanding of it.
+
+The columns of the \code{data.table} are:

 \itemize{
 \item \code{ID}: unique identifier of a node ;
@@ -39,21 +39,17 @@ The content of the \code{data.table} is organised that way:
 \item \code{Quality}: it's the gain related to the split in this specific node ;
 \item \code{Cover}: metric to measure the number of observation affected by the split ;
 \item \code{Tree}: ID of the tree. It is included in the main ID ;
- \item \code{Yes.X} or \code{No.X}: data related to the pointer in \code{Yes} or \code{No} column ;
+ \item \code{Yes.Feature}, \code{No.Feature}, \code{Yes.Cover}, \code{No.Cover}, \code{Yes.Quality} and \code{No.Quality}: data related to the pointer in \code{Yes} or \code{No} column ;
 }
 }
 \examples{
 data(agaricus.train, package='xgboost')

-#Both dataset are list with two items, a sparse matrix and labels
-#(labels = outcome column which will be learned).
-#Each column of the sparse Matrix is a feature in one hot encoding format.
-train <- agaricus.train
-
-bst <- xgboost(data = train$data, label = train$label, max.depth = 2,
+bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label, max.depth = 2, 
               eta = 1, nthread = 2, nround = 2,objective = "binary:logistic")

-#agaricus.test$data@Dimnames[[2]] represents the column names of the sparse matrix.
-xgb.model.dt.tree(agaricus.train$data@Dimnames[[2]], model = bst)
+# agaricus.train$data@Dimnames[[2]] represents the column names of the sparse matrix.
+xgb.model.dt.tree(feature_names = agaricus.train$data@Dimnames[[2]], model = bst)
+
 }

--- a/R-package/man/xgb.plot.deepness.Rd
+++ b/R-package/man/xgb.plot.deepness.Rd
@@ -0,0 +1,46 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/xgb.plot.deepness.R
+\name{xgb.plot.deepness}
+\alias{xgb.plot.deepness}
+\title{Plot model trees deepness}
+\usage{
+xgb.plot.deepness(model = NULL)
+}
+\arguments{
+\item{model}{dump generated by the \code{xgb.train} function.}
+}
+\value{
+Two graphs showing the distribution of the model deepness.
+}
+\description{
+Generate a graph to plot the distribution of deepness among trees.
+}
+\details{
+Display both the number of \code{leaf} and the distribution of \code{weighted observations}
+by tree deepness level.
+
+The purpose of this function is to help the user to find the best trade-off to set
+the \code{max.depth} and \code{min_child_weight} parameters according to the bias / variance trade-off.
+
+See \link{xgb.train} for more information about these parameters.
+
+The graph is made of two parts:
+
+\itemize{
+ \item Count: number of leaf per level of deepness;
+ \item Weighted cover: noramlized weighted cover per leaf (weighted number of instances).
+}
+
+This function is inspired by the blog post \url{http://aysent.github.io/2015/11/08/random-forest-leaf-visualization.html}
+}
+\examples{
+data(agaricus.train, package='xgboost')
+
+bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label, max.depth = 15,
+                 eta = 1, nthread = 2, nround = 30, objective = "binary:logistic",
+                 min_child_weight = 50)
+
+xgb.plot.deepness(model = bst)
+
+}
+
--- a/R-package/man/xgb.plot.importance.Rd
+++ b/R-package/man/xgb.plot.importance.Rd
@@ -1,4 +1,4 @@
-% Generated by roxygen2 (4.1.1): do not edit by hand
+% Generated by roxygen2: do not edit by hand
 % Please edit documentation in R/xgb.plot.importance.R
 \name{xgb.plot.importance}
 \alias{xgb.plot.importance}
@@ -15,11 +15,11 @@ xgb.plot.importance(importance_matrix = NULL, numberOfClusters = c(1:10))
 A \code{ggplot2} bar graph representing each feature by a horizontal bar. Longer is the bar, more important is the feature. Features are classified by importance and clustered by importance. The group is represented through the color of the bar.
 }
 \description{
-Read a data.table containing feature importance details and plot it.
+Read a data.table containing feature importance details and plot it (for both GLM and Trees).
 }
 \details{
 The purpose of this function is to easily represent the importance of each feature of a model.
-The function return a ggplot graph, therefore each of its characteristic can be overriden (to customize it).
+The function returns a ggplot graph, therefore each of its characteristic can be overriden (to customize it).
 In particular you may want to override the title of the graph. To do so, add \code{+ ggtitle("A GRAPH NAME")} next to the value returned by this function.
 }
 \examples{
@@ -28,13 +28,13 @@ data(agaricus.train, package='xgboost')
 #Both dataset are list with two items, a sparse matrix and labels
 #(labels = outcome column which will be learned).
 #Each column of the sparse Matrix is a feature in one hot encoding format.
-train <- agaricus.train

-bst <- xgboost(data = train$data, label = train$label, max.depth = 2,
+bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label, max.depth = 2,
               eta = 1, nthread = 2, nround = 2,objective = "binary:logistic")

-#train$data@Dimnames[[2]] represents the column names of the sparse matrix.
-importance_matrix <- xgb.importance(train$data@Dimnames[[2]], model = bst)
+#agaricus.train$data@Dimnames[[2]] represents the column names of the sparse matrix.
+importance_matrix <- xgb.importance(agaricus.train$data@Dimnames[[2]], model = bst)
 xgb.plot.importance(importance_matrix)
+
 }

--- a/R-package/man/xgb.plot.multi.trees.Rd
+++ b/R-package/man/xgb.plot.multi.trees.Rd
@@ -0,0 +1,58 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/xgb.plot.multi.trees.R
+\name{xgb.plot.multi.trees}
+\alias{xgb.plot.multi.trees}
+\title{Project all trees on one tree and plot it}
+\usage{
+xgb.plot.multi.trees(model, feature_names = NULL, features.keep = 5,
+  plot.width = NULL, plot.height = NULL)
+}
+\arguments{
+\item{model}{dump generated by the \code{xgb.train} function.}
+
+\item{feature_names}{names of each feature as a \code{character} vector. Can be extracted from a sparse matrix (see example). If model dump already contains feature names, this argument should be \code{NULL}.}
+
+\item{features.keep}{number of features to keep in each position of the multi trees.}
+
+\item{plot.width}{width in pixels of the graph to produce}
+
+\item{plot.height}{height in pixels of the graph to produce}
+}
+\value{
+Two graphs showing the distribution of the model deepness.
+}
+\description{
+Visualization of the ensemble of trees as a single collective unit.
+}
+\details{
+This function tries to capture the complexity of gradient boosted tree ensemble 
+in a cohesive way. 
+
+The goal is to improve the interpretability of the model generally seen as black box.
+The function is dedicated to boosting applied to decision trees only.
+
+The purpose is to move from an ensemble of trees to a single tree only.
+
+It takes advantage of the fact that the shape of a binary tree is only defined by 
+its deepness (therefore in a boosting model, all trees have the same shape). 
+
+Moreover, the trees tend to reuse the same features.
+
+The function will project each tree on one, and keep for each position the 
+\code{features.keep} first features (based on Gain per feature measure).
+
+This function is inspired by this blog post:
+\url{https://wellecks.wordpress.com/2015/02/21/peering-into-the-black-box-visualizing-lambdamart/}
+}
+\examples{
+data(agaricus.train, package='xgboost')
+
+bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label, max.depth = 15,
+                 eta = 1, nthread = 2, nround = 30, objective = "binary:logistic",
+                 min_child_weight = 50)
+
+p <- xgb.plot.multi.trees(model = bst, feature_names = agaricus.train$data@Dimnames[[2]], features.keep = 3)
+print(p)
+
+}
+
--- a/R-package/man/xgb.plot.tree.Rd
+++ b/R-package/man/xgb.plot.tree.Rd
@@ -1,58 +1,48 @@
-% Generated by roxygen2 (4.1.1): do not edit by hand
+% Generated by roxygen2: do not edit by hand
 % Please edit documentation in R/xgb.plot.tree.R
 \name{xgb.plot.tree}
 \alias{xgb.plot.tree}
 \title{Plot a boosted tree model}
 \usage{
-xgb.plot.tree(feature_names = NULL, filename_dump = NULL, model = NULL,
-  n_first_tree = NULL, CSSstyle = NULL, width = NULL, height = NULL)
+xgb.plot.tree(feature_names = NULL, model = NULL, n_first_tree = NULL,
+  plot.width = NULL, plot.height = NULL)
 }
 \arguments{
-\item{feature_names}{names of each feature as a character vector. Can be extracted from a sparse matrix (see example). If model dump already contains feature names, this argument should be \code{NULL}.}
-
-\item{filename_dump}{the path to the text file storing the model. Model dump must include the gain per feature and per tree (parameter \code{with.stats = T} in function \code{xgb.dump}). Possible to provide a model directly (see \code{model} argument).}
+\item{feature_names}{names of each feature as a \code{character} vector. Can be extracted from a sparse matrix (see example). If model dump already contains feature names, this argument should be \code{NULL}.}

 \item{model}{generated by the \code{xgb.train} function. Avoid the creation of a dump file.}

 \item{n_first_tree}{limit the plot to the n first trees. If \code{NULL}, all trees of the model are plotted. Performance can be low for huge models.}

-\item{CSSstyle}{a \code{character} vector storing a css style to customize the appearance of nodes. Look at the \href{https://github.com/knsv/mermaid/wiki}{Mermaid wiki} for more information.}
+\item{plot.width}{the width of the diagram in pixels.}

-\item{width}{the width of the diagram in pixels.}
-
-\item{height}{the height of the diagram in pixels.}
+\item{plot.height}{the height of the diagram in pixels.}
 }
 \value{
 A \code{DiagrammeR} of the model.
 }
 \description{
-Read a tree model text dump.
-Plotting only works for boosted tree model (not linear model).
+Read a tree model text dump and plot the model.
 }
 \details{
 The content of each node is organised that way:

 \itemize{
- \item \code{feature} value ;
- \item \code{cover}: the sum of second order gradient of training data classified to the leaf, if it is square loss, this simply corresponds to the number of instances in that branch. Deeper in the tree a node is, lower this metric will be ;
+ \item \code{feature} value;
+ \item \code{cover}: the sum of second order gradient of training data classified to the leaf, if it is square loss, this simply corresponds to the number of instances in that branch. Deeper in the tree a node is, lower this metric will be;
 \item \code{gain}: metric the importance of the node in the model.
-}
+} 

-Each branch finishes with a leaf. For each leaf, only the \code{cover} is indicated.
-It uses \href{https://github.com/knsv/mermaid/}{Mermaid} library for that purpose.
+The function uses \href{http://www.graphviz.org/}{GraphViz} library for that purpose.
 }
 \examples{
 data(agaricus.train, package='xgboost')

-#Both dataset are list with two items, a sparse matrix and labels
-#(labels = outcome column which will be learned).
-#Each column of the sparse Matrix is a feature in one hot encoding format.
-train <- agaricus.train
-
-bst <- xgboost(data = train$data, label = train$label, max.depth = 2,
+bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label, max.depth = 2, 
               eta = 1, nthread = 2, nround = 2,objective = "binary:logistic")

-#agaricus.test$data@Dimnames[[2]] represents the column names of the sparse matrix.
-xgb.plot.tree(agaricus.train$data@Dimnames[[2]], model = bst)
+# agaricus.train$data@Dimnames[[2]] represents the column names of the sparse matrix.
+xgb.plot.tree(feature_names = agaricus.train$data@Dimnames[[2]], model = bst)
+
 }

--- a/R-package/man/xgb.save.Rd
+++ b/R-package/man/xgb.save.Rd
@@ -1,4 +1,4 @@
-% Generated by roxygen2 (4.1.1): do not edit by hand
+% Generated by roxygen2: do not edit by hand
 % Please edit documentation in R/xgb.save.R
 \name{xgb.save}
 \alias{xgb.save}
@@ -19,7 +19,7 @@ data(agaricus.train, package='xgboost')
 data(agaricus.test, package='xgboost')
 train <- agaricus.train
 test <- agaricus.test
-bst <- xgboost(data = train$data, label = train$label, max.depth = 2,
+bst <- xgboost(data = train$data, label = train$label, max.depth = 2, 
               eta = 1, nthread = 2, nround = 2,objective = "binary:logistic")
 xgb.save(bst, 'xgb.model')
 bst <- xgb.load('xgb.model')
--- a/R-package/man/xgb.save.raw.Rd
+++ b/R-package/man/xgb.save.raw.Rd
@@ -1,4 +1,4 @@
-% Generated by roxygen2 (4.1.1): do not edit by hand
+% Generated by roxygen2: do not edit by hand
 % Please edit documentation in R/xgb.save.raw.R
 \name{xgb.save.raw}
 \alias{xgb.save.raw}
@@ -18,7 +18,7 @@ data(agaricus.train, package='xgboost')
 data(agaricus.test, package='xgboost')
 train <- agaricus.train
 test <- agaricus.test
-bst <- xgboost(data = train$data, label = train$label, max.depth = 2,
+bst <- xgboost(data = train$data, label = train$label, max.depth = 2, 
               eta = 1, nthread = 2, nround = 2,objective = "binary:logistic")
 raw <- xgb.save.raw(bst)
 bst <- xgb.load(raw)
--- a/R-package/man/xgb.train.Rd
+++ b/R-package/man/xgb.train.Rd
@@ -1,4 +1,4 @@
-% Generated by roxygen2 (4.1.1): do not edit by hand
+% Generated by roxygen2: do not edit by hand
 % Please edit documentation in R/xgb.train.R
 \name{xgb.train}
 \alias{xgb.train}
@@ -10,7 +10,7 @@ xgb.train(params = list(), data, nrounds, watchlist = list(), obj = NULL,
  save_name = "xgboost.model", ...)
 }
 \arguments{
-\item{params}{the list of parameters.
+\item{params}{the list of parameters. 

 1. General Parameters

@@ -18,30 +18,30 @@ xgb.train(params = list(), data, nrounds, watchlist = list(), obj = NULL,
  \item \code{booster} which booster to use, can be \code{gbtree} or \code{gblinear}. Default: \code{gbtree}
  \item \code{silent} 0 means printing running messages, 1 means silent mode. Default: 0
 }
-
+ 
 2. Booster Parameters

 2.1. Parameter for Tree Booster

 \itemize{
  \item \code{eta} control the learning rate: scale the contribution of each tree by a factor of \code{0 < eta < 1} when it is added to the current approximation. Used to prevent overfitting by making the boosting process more conservative. Lower value for \code{eta} implies larger value for \code{nrounds}: low \code{eta} value means model more robust to overfitting but slower to compute. Default: 0.3
-  \item \code{gamma} minimum loss reduction required to make a further partition on a leaf node of the tree. the larger, the more conservative the algorithm will be.
+  \item \code{gamma} minimum loss reduction required to make a further partition on a leaf node of the tree. the larger, the more conservative the algorithm will be. 
  \item \code{max_depth} maximum depth of a tree. Default: 6
-  \item \code{min_child_weight} minimum sum of instance weight(hessian) needed in a child. If the tree partition step results in a leaf node with the sum of instance weight less than min_child_weight, then the building process will give up further partitioning. In linear regression mode, this simply corresponds to minimum number of instances needed to be in each node. The larger, the more conservative the algorithm will be. Default: 1
-  \item \code{subsample} subsample ratio of the training instance. Setting it to 0.5 means that xgboost randomly collected half of the data instances to grow trees and this will prevent overfitting. It makes computation shorter (because less data to analyse). It is advised to use this parameter with \code{eta} and increase \code{nround}. Default: 1
+  \item \code{min_child_weight} minimum sum of instance weight (hessian) needed in a child. If the tree partition step results in a leaf node with the sum of instance weight less than min_child_weight, then the building process will give up further partitioning. In linear regression mode, this simply corresponds to minimum number of instances needed to be in each node. The larger, the more conservative the algorithm will be. Default: 1
+  \item \code{subsample} subsample ratio of the training instance. Setting it to 0.5 means that xgboost randomly collected half of the data instances to grow trees and this will prevent overfitting. It makes computation shorter (because less data to analyse). It is advised to use this parameter with \code{eta} and increase \code{nround}. Default: 1 
  \item \code{colsample_bytree} subsample ratio of columns when constructing each tree. Default: 1
  \item \code{num_parallel_tree} Experimental parameter. number of trees to grow per round. Useful to test Random Forest through Xgboost (set \code{colsample_bytree < 1}, \code{subsample  < 1}  and \code{round = 1}) accordingly. Default: 1
 }

 2.2. Parameter for Linear Booster
-
+ 
 \itemize{
  \item \code{lambda} L2 regularization term on weights. Default: 0
  \item \code{lambda_bias} L2 regularization term on bias. Default: 0
  \item \code{alpha} L1 regularization term on weights. (there is no L1 reg on bias because it is not important). Default: 0
 }

-3. Task Parameters
+3. Task Parameters 

 \itemize{
 \item \code{objective} specify the learning task and the corresponding learning objective, users can pass a self-defined function to it. The default objective options are below:
@@ -51,7 +51,7 @@ xgb.train(params = list(), data, nrounds, watchlist = list(), obj = NULL,
    \item \code{binary:logistic} logistic regression for binary classification. Output probability.
    \item \code{binary:logitraw} logistic regression for binary classification, output score before logistic transformation.
    \item \code{num_class} set the number of classes. To use only with multiclass objectives.
-    \item \code{multi:softmax} set xgboost to do multiclass classification using the softmax objective. Class is represented by a number and should be from 0 to \code{tonum_class}.
+    \item \code{multi:softmax} set xgboost to do multiclass classification using the softmax objective. Class is represented by a number and should be from 0 to \code{num_class}.
    \item \code{multi:softprob} same as softmax, but output a vector of ndata * nclass, which can be further reshaped to ndata, nclass matrix. The result contains predicted probabilities of each data point belonging to each class.
    \item \code{rank:pairwise} set xgboost to do ranking task by minimizing the pairwise loss.
  }
@@ -64,25 +64,25 @@ xgb.train(params = list(), data, nrounds, watchlist = list(), obj = NULL,
 \item{nrounds}{the max number of iterations}

 \item{watchlist}{what information should be printed when \code{verbose=1} or
-  \code{verbose=2}. Watchlist is used to specify validation set monitoring
-  during training. For example user can specify
-   watchlist=list(validation1=mat1, validation2=mat2) to watch
-   the performance of each round's model on mat1 and mat2}
+\code{verbose=2}. Watchlist is used to specify validation set monitoring
+during training. For example user can specify
+ watchlist=list(validation1=mat1, validation2=mat2) to watch
+ the performance of each round's model on mat1 and mat2}

-\item{obj}{customized objective function. Returns gradient and second order
+\item{obj}{customized objective function. Returns gradient and second order 
 gradient with given prediction and dtrain,}

-\item{feval}{custimized evaluation function. Returns
-\code{list(metric='metric-name', value='metric-value')} with given
+\item{feval}{custimized evaluation function. Returns 
+\code{list(metric='metric-name', value='metric-value')} with given 
 prediction and dtrain,}

-\item{verbose}{If 0, xgboost will stay silent. If 1, xgboost will print
+\item{verbose}{If 0, xgboost will stay silent. If 1, xgboost will print 
 information of performance. If 2, xgboost will print information of both}

 \item{print.every.n}{Print every N progress messages when \code{verbose>0}. Default is 1 which means all messages are printed.}

-\item{early.stop.round}{If \code{NULL}, the early stopping function is not triggered.
-If set to an integer \code{k}, training with a validation set will stop if the performance
+\item{early.stop.round}{If \code{NULL}, the early stopping function is not triggered. 
+If set to an integer \code{k}, training with a validation set will stop if the performance 
 keeps getting worse consecutively for \code{k} rounds.}

 \item{maximize}{If \code{feval} and \code{early.stop.round} are set, then \code{maximize} must be set as well.
@@ -98,24 +98,25 @@ keeps getting worse consecutively for \code{k} rounds.}
 An advanced interface for training xgboost model. Look at \code{\link{xgboost}} function for a simpler interface.
 }
 \details{
-This is the training function for \code{xgboost}.
+This is the training function for \code{xgboost}. 

 It supports advanced features such as \code{watchlist}, customized objective function (\code{feval}),
 therefore it is more flexible than \code{\link{xgboost}} function.

-Parallelization is automatically enabled if \code{OpenMP} is present.
+Parallelization is automatically enabled if \code{OpenMP} is present. 
 Number of threads can also be manually specified via \code{nthread} parameter.

 \code{eval_metric} parameter (not listed above) is set automatically by Xgboost but can be overriden by parameter. Below is provided the list of different metric optimized by Xgboost to help you to understand how it works inside or to use them with the \code{watchlist} parameter.
  \itemize{
     \item \code{rmse} root mean square error. \url{http://en.wikipedia.org/wiki/Root_mean_square_error}
     \item \code{logloss} negative log-likelihood. \url{http://en.wikipedia.org/wiki/Log-likelihood}
+     \item \code{mlogloss} multiclass logloss. \url{https://www.kaggle.com/wiki/MultiClassLogLoss}
     \item \code{error} Binary classification error rate. It is calculated as \code{(wrong cases) / (all cases)}. For the predictions, the evaluation will regard the instances with prediction value larger than 0.5 as positive instances, and the others as negative instances.
     \item \code{merror} Multiclass classification error rate. It is calculated as \code{(wrong cases) / (all cases)}.
     \item \code{auc} Area under the curve. \url{http://en.wikipedia.org/wiki/Receiver_operating_characteristic#'Area_under_curve} for ranking evaluation.
     \item \code{ndcg} Normalized Discounted Cumulative Gain (for ranking task). \url{http://en.wikipedia.org/wiki/NDCG}
  }
-
+  
 Full list of parameters is available in the Wiki \url{https://github.com/dmlc/xgboost/wiki/Parameters}.

 This function only accepts an \code{\link{xgb.DMatrix}} object as the input.
--- a/R-package/man/xgboost.Rd
+++ b/R-package/man/xgboost.Rd
@@ -1,22 +1,22 @@
-% Generated by roxygen2 (4.1.1): do not edit by hand
+% Generated by roxygen2: do not edit by hand
 % Please edit documentation in R/xgboost.R
 \name{xgboost}
 \alias{xgboost}
 \title{eXtreme Gradient Boosting (Tree) library}
 \usage{
-xgboost(data = NULL, label = NULL, missing = NULL, weight = NULL,
+xgboost(data = NULL, label = NULL, missing = NA, weight = NULL,
  params = list(), nrounds, verbose = 1, print.every.n = 1L,
  early.stop.round = NULL, maximize = NULL, save_period = 0,
  save_name = "xgboost.model", ...)
 }
 \arguments{
-\item{data}{takes \code{matrix}, \code{dgCMatrix}, local data file or
+\item{data}{takes \code{matrix}, \code{dgCMatrix}, local data file or 
 \code{xgb.DMatrix}.}

 \item{label}{the response variable. User should not set this field,
 if data is local data file or  \code{xgb.DMatrix}.}

-\item{missing}{Missing is only used when input is dense matrix, pick a float
+\item{missing}{Missing is only used when input is dense matrix, pick a float 
 value that represents missing value. Sometimes a data use 0 or other extreme value to represents missing values.}

 \item{weight}{a vector indicating the weight for each row of the input.}
@@ -34,21 +34,21 @@ Commonly used ones are:
  \item \code{max.depth} maximum depth of the tree
  \item \code{nthread} number of thread used in training, if not set, all threads are used
 }
-
+  
  Look at \code{\link{xgb.train}} for a more complete list of parameters or \url{https://github.com/dmlc/xgboost/wiki/Parameters} for the full list.
-
+  
  See also \code{demo/} for walkthrough example in R.}

 \item{nrounds}{the max number of iterations}

-\item{verbose}{If 0, xgboost will stay silent. If 1, xgboost will print
+\item{verbose}{If 0, xgboost will stay silent. If 1, xgboost will print 
 information of performance. If 2, xgboost will print information of both
 performance and construction progress information}

 \item{print.every.n}{Print every N progress messages when \code{verbose>0}. Default is 1 which means all messages are printed.}

-\item{early.stop.round}{If \code{NULL}, the early stopping function is not triggered.
-If set to an integer \code{k}, training with a validation set will stop if the performance
+\item{early.stop.round}{If \code{NULL}, the early stopping function is not triggered. 
+If set to an integer \code{k}, training with a validation set will stop if the performance 
 keeps getting worse consecutively for \code{k} rounds.}

 \item{maximize}{If \code{feval} and \code{early.stop.round} are set, then \code{maximize} must be set as well.
@@ -75,8 +75,9 @@ data(agaricus.train, package='xgboost')
 data(agaricus.test, package='xgboost')
 train <- agaricus.train
 test <- agaricus.test
-bst <- xgboost(data = train$data, label = train$label, max.depth = 2,
+bst <- xgboost(data = train$data, label = train$label, max.depth = 2, 
               eta = 1, nthread = 2, nround = 2, objective = "binary:logistic")
 pred <- predict(bst, test$data)
+
 }

--- a/R-package/tests/testthat/test_basic.R
+++ b/R-package/tests/testthat/test_basic.R
@@ -4,30 +4,33 @@ context("basic functions")

 data(agaricus.train, package='xgboost')
 data(agaricus.test, package='xgboost')
-train = agaricus.train
-test = agaricus.test
+train <- agaricus.train
+test <- agaricus.test
+set.seed(1994)

 test_that("train and predict", {
-  bst = xgboost(data = train$data, label = train$label, max.depth = 2,
+  bst <- xgboost(data = train$data, label = train$label, max.depth = 2,
                eta = 1, nthread = 2, nround = 2, objective = "binary:logistic")
-  pred = predict(bst, test$data)
+  pred <- predict(bst, test$data)
+  expect_equal(length(pred), 1611)
 })

-
 test_that("early stopping", {
-  res = xgb.cv(data = train$data, label = train$label, max.depth = 2, nfold = 5,
+  res <- xgb.cv(data = train$data, label = train$label, max.depth = 2, nfold = 5,
               eta = 0.3, nthread = 2, nround = 20, objective = "binary:logistic",
               early.stop.round = 3, maximize = FALSE)
-  expect_true(nrow(res)<20)
-  bst = xgboost(data = train$data, label = train$label, max.depth = 2,
+  expect_true(nrow(res) < 20)
+  bst <- xgboost(data = train$data, label = train$label, max.depth = 2,
                eta = 0.3, nthread = 2, nround = 20, objective = "binary:logistic",
                early.stop.round = 3, maximize = FALSE)
-  pred = predict(bst, test$data)
+  pred <- predict(bst, test$data)
+  expect_equal(length(pred), 1611)
 })

 test_that("save_period", {
-  bst = xgboost(data = train$data, label = train$label, max.depth = 2,
+  bst <- xgboost(data = train$data, label = train$label, max.depth = 2,
                eta = 0.3, nthread = 2, nround = 20, objective = "binary:logistic",
                save_period = 10, save_name = "xgb.model")
-  pred = predict(bst, test$data)
+  pred <- predict(bst, test$data)
+  expect_equal(length(pred), 1611)
 })
--- a/R-package/tests/testthat/test_custom_objective.R
+++ b/R-package/tests/testthat/test_custom_objective.R
@@ -2,46 +2,47 @@ context('Test models with custom objective')

 require(xgboost)

+data(agaricus.train, package='xgboost')
+data(agaricus.test, package='xgboost')
+dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
+dtest <- xgb.DMatrix(agaricus.test$data, label = agaricus.test$label)
+
 test_that("custom objective works", {
-  data(agaricus.train, package='xgboost')
-  data(agaricus.test, package='xgboost')
-  dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
-  dtest <- xgb.DMatrix(agaricus.test$data, label = agaricus.test$label)
-  
+
  watchlist <- list(eval = dtest, train = dtrain)
  num_round <- 2
-  
+
  logregobj <- function(preds, dtrain) {
    labels <- getinfo(dtrain, "label")
-    preds <- 1/(1 + exp(-preds))
+    preds <- 1 / (1 + exp(-preds))
    grad <- preds - labels
    hess <- preds * (1 - preds)
    return(list(grad = grad, hess = hess))
  }
  evalerror <- function(preds, dtrain) {
    labels <- getinfo(dtrain, "label")
-    err <- as.numeric(sum(labels != (preds > 0)))/length(labels)
+    err <- as.numeric(sum(labels != (preds > 0))) / length(labels)
    return(list(metric = "error", value = err))
  }
-  
-  param <- list(max.depth=2, eta=1, nthread = 2, silent=1, 
+
+  param <- list(max.depth=2, eta=1, nthread = 2, silent=1,
                objective=logregobj, eval_metric=evalerror)
-  
+
  bst <- xgb.train(param, dtrain, num_round, watchlist)
  expect_equal(class(bst), "xgb.Booster")
  expect_equal(length(bst$raw), 1064)
  attr(dtrain, 'label') <- getinfo(dtrain, 'label')
-  
+
  logregobjattr <- function(preds, dtrain) {
    labels <- attr(dtrain, 'label')
-    preds <- 1/(1 + exp(-preds))
+    preds <- 1 / (1 + exp(-preds))
    grad <- preds - labels
    hess <- preds * (1 - preds)
    return(list(grad = grad, hess = hess))
  }
-  param <- list(max.depth=2, eta=1, nthread = 2, silent=1, 
-                objective=logregobjattr, eval_metric=evalerror)
+  param <- list(max.depth=2, eta=1, nthread = 2, silent = 1,
+                objective = logregobjattr, eval_metric = evalerror)
  bst <- xgb.train(param, dtrain, num_round, watchlist)
  expect_equal(class(bst), "xgb.Booster")
  expect_equal(length(bst$raw), 1064)
-})
+})
--- a/R-package/tests/testthat/test_helpers.R
+++ b/R-package/tests/testthat/test_helpers.R
@@ -5,28 +5,64 @@ require(data.table)
 require(Matrix)
 require(vcd)

+set.seed(1982)
 data(Arthritis)
 data(agaricus.train, package='xgboost')
 df <- data.table(Arthritis, keep.rownames = F)
-df[,AgeDiscret:= as.factor(round(Age/10,0))]
-df[,AgeCat:= as.factor(ifelse(Age > 30, "Old", "Young"))]
-df[,ID:=NULL]
-sparse_matrix = sparse.model.matrix(Improved~.-1, data = df)
-output_vector = df[,Y:=0][Improved == "Marked",Y:=1][,Y]
-bst <- xgboost(data = sparse_matrix, label = output_vector, max.depth = 9,
-               eta = 1, nthread = 2, nround = 10,objective = "binary:logistic")
+df[,AgeDiscret := as.factor(round(Age / 10,0))]
+df[,AgeCat := as.factor(ifelse(Age > 30, "Old", "Young"))]
+df[,ID := NULL]
+sparse_matrix <- sparse.model.matrix(Improved~.-1, data = df)
+output_vector <- df[,Y := 0][Improved == "Marked",Y := 1][,Y]
+bst.Tree <- xgboost(data = sparse_matrix, label = output_vector, max.depth = 9,
+               eta = 1, nthread = 2, nround = 10, objective = "binary:logistic", booster = "gbtree")

+bst.GLM <- xgboost(data = sparse_matrix, label = output_vector,
+                   eta = 1, nthread = 2, nround = 10, objective = "binary:logistic", booster = "gblinear")
+
+feature.names <- agaricus.train$data@Dimnames[[2]]

 test_that("xgb.dump works", {
-  capture.output(print(xgb.dump(bst)))
+  capture.output(print(xgb.dump(bst.Tree)))
+  capture.output(print(xgb.dump(bst.GLM)))
+  expect_true(xgb.dump(bst.Tree, 'xgb.model.dump', with.stats = T))
 })

-test_that("xgb.importance works", {
-  xgb.dump(bst, 'xgb.model.dump', with.stats = T)
-  importance <- xgb.importance(sparse_matrix@Dimnames[[2]], 'xgb.model.dump')
-  expect_equal(dim(importance), c(7, 4))
+test_that("xgb.model.dt.tree works with and without feature names", {
+  names.dt.trees <- c("ID", "Feature", "Split", "Yes", "No", "Missing", "Quality", "Cover",
+   "Tree", "Yes.Feature", "Yes.Cover", "Yes.Quality", "No.Feature", "No.Cover", "No.Quality")
+  dt.tree <- xgb.model.dt.tree(feature_names = feature.names, model = bst.Tree)
+  expect_equal(names.dt.trees, names(dt.tree))
+  expect_equal(dim(dt.tree), c(162, 15))
+  xgb.model.dt.tree(model = bst.Tree)
 })

-test_that("xgb.plot.tree works", {
-  xgb.plot.tree(agaricus.train$data@Dimnames[[2]], model = bst)
-})
+test_that("xgb.importance works with and without feature names", {
+  importance.Tree <- xgb.importance(feature_names = sparse_matrix@Dimnames[[2]], model = bst.Tree)
+  expect_equal(dim(importance.Tree), c(7, 4))
+  expect_equal(colnames(importance.Tree), c("Feature", "Gain", "Cover", "Frequency"))
+  xgb.importance(model = bst.Tree)
+  xgb.plot.importance(importance_matrix = importance.Tree)
+})
+
+test_that("xgb.importance works with GLM model", {
+  importance.GLM <- xgb.importance(feature_names = sparse_matrix@Dimnames[[2]], model = bst.GLM)
+  expect_equal(dim(importance.GLM), c(10, 2))
+  expect_equal(colnames(importance.GLM), c("Feature", "Weight"))
+  xgb.importance(model = bst.GLM)
+  xgb.plot.importance(importance.GLM)
+})
+
+test_that("xgb.plot.tree works with and without feature names", {
+  xgb.plot.tree(feature_names = feature.names, model = bst.Tree)
+  xgb.plot.tree(model = bst.Tree)
+})
+
+test_that("xgb.plot.multi.trees works with and without feature names", {
+  xgb.plot.multi.trees(model = bst.Tree, feature_names = feature.names, features.keep = 3)
+  xgb.plot.multi.trees(model = bst.Tree, features.keep = 3)
+})
+
+test_that("xgb.plot.deepness works", {
+  xgb.plot.deepness(model = bst.Tree)
+})
--- a/R-package/tests/testthat/test_lint.R
+++ b/R-package/tests/testthat/test_lint.R
@@ -0,0 +1,27 @@
+context("Code is of high quality and lint free")
+test_that("Code Lint", {
+  skip_on_cran()
+  skip_on_travis()
+  skip_if_not_installed("lintr")
+  my_linters <- list(
+    absolute_paths_linter=lintr::absolute_paths_linter,
+    assignment_linter=lintr::assignment_linter,
+    closed_curly_linter=lintr::closed_curly_linter,
+    commas_linter=lintr::commas_linter,
+    # commented_code_linter=lintr::commented_code_linter,
+    infix_spaces_linter=lintr::infix_spaces_linter,
+    line_length_linter=lintr::line_length_linter,
+    no_tab_linter=lintr::no_tab_linter,
+    object_usage_linter=lintr::object_usage_linter,
+    # snake_case_linter=lintr::snake_case_linter,
+    # multiple_dots_linter=lintr::multiple_dots_linter,
+    object_length_linter=lintr::object_length_linter,
+    open_curly_linter=lintr::open_curly_linter,
+    # single_quotes_linter=lintr::single_quotes_linter,
+    spaces_inside_linter=lintr::spaces_inside_linter,
+    spaces_left_parentheses_linter=lintr::spaces_left_parentheses_linter,
+    trailing_blank_lines_linter=lintr::trailing_blank_lines_linter,
+    trailing_whitespace_linter=lintr::trailing_whitespace_linter
+  )
+  # lintr::expect_lint_free(linters=my_linters) # uncomment this if you want to check code quality
+})
--- a/R-package/tests/testthat/test_parameter_exposure.R
+++ b/R-package/tests/testthat/test_parameter_exposure.R
@@ -0,0 +1,32 @@
+context('Test model params and call are exposed to R')
+
+require(xgboost)
+
+data(agaricus.train, package='xgboost')
+data(agaricus.test, package='xgboost')
+
+dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
+dtest <- xgb.DMatrix(agaricus.test$data, label = agaricus.test$label)
+
+bst <- xgboost(data = dtrain,
+               max.depth = 2,
+               eta = 1,
+               nround = 10,
+               nthread = 1,
+               verbose = 0,
+               objective = "binary:logistic")
+
+test_that("call is exposed to R", {
+  model_call <- attr(bst, "call")
+  expect_is(model_call, "call")
+})
+
+test_that("params is exposed to R", {
+  model_params <- attr(bst, "params")
+
+  expect_is(model_params, "list")
+
+  expect_equal(model_params$eta, 1)
+  expect_equal(model_params$max.depth, 2)
+  expect_equal(model_params$objective, "binary:logistic")
+})
--- a/R-package/tests/testthat/test_poisson_regression.R
+++ b/R-package/tests/testthat/test_poisson_regression.R
@@ -1,13 +1,14 @@
 context('Test poisson regression model')

 require(xgboost)
+set.seed(1994)

 test_that("poisson regression works", {
  data(mtcars)
-  bst = xgboost(data=as.matrix(mtcars[,-11]),label=mtcars[,11],
-                objective='count:poisson',nrounds=5)
+  bst <- xgboost(data = as.matrix(mtcars[,-11]),label = mtcars[,11],
+                objective = 'count:poisson', nrounds=5)
  expect_equal(class(bst), "xgb.Booster")
-  pred = predict(bst,as.matrix(mtcars[,-11]))
+  pred <- predict(bst,as.matrix(mtcars[, -11]))
  expect_equal(length(pred), 32)
-  sqrt(mean((pred-mtcars[,11])^2))
-})
+  expect_equal(sqrt(mean( (pred - mtcars[,11]) ^ 2)), 1.16, tolerance = 0.01)
+})
--- a/R-package/vignettes/discoverYourData.Rmd
+++ b/R-package/vignettes/discoverYourData.Rmd
@@ -190,7 +190,7 @@ Measure feature importance
 In the code below, `sparse_matrix@Dimnames[[2]]` represents the column names of the sparse matrix. These names are the original values of the features (remember, each binary column == one value of one *categorical* feature).

 ```{r}
-importance <- xgb.importance(sparse_matrix@Dimnames[[2]], model = bst)
+importance <- xgb.importance(feature_names = sparse_matrix@Dimnames[[2]], model = bst)
 head(importance)
 ```

@@ -202,7 +202,7 @@ head(importance)

 `Cover` measures the relative quantity of observations concerned by a feature.

-`Frequence` is a simpler way to measure the `Gain`. It just counts the number of times a feature is used in all generated trees. You should not use it (unless you know why you want to use it).
+`Frequency` is a simpler way to measure the `Gain`. It just counts the number of times a feature is used in all generated trees. You should not use it (unless you know why you want to use it).

 ### Improvement in the interpretability of feature importance data.table

@@ -213,10 +213,10 @@ One simple solution is to count the co-occurrences of a feature and a class of t
 For that purpose we will execute the same function as above but using two more parameters, `data` and `label`.

 ```{r}
-importanceRaw <- xgb.importance(sparse_matrix@Dimnames[[2]], model = bst, data = sparse_matrix, label = output_vector)
+importanceRaw <- xgb.importance(feature_names = sparse_matrix@Dimnames[[2]], model = bst, data = sparse_matrix, label = output_vector)

 # Cleaning for better display
-importanceClean <- importanceRaw[,`:=`(Cover=NULL, Frequence=NULL)]
+importanceClean <- importanceRaw[,`:=`(Cover=NULL, Frequency=NULL)]

 head(importanceClean)
 ```
--- a/R-package/vignettes/xgboostPresentation.Rmd
+++ b/R-package/vignettes/xgboostPresentation.Rmd
@@ -345,7 +345,7 @@ Feature importance is similar to R gbm package's relative influence (rel.inf).
 ```
 importance_matrix <- xgb.importance(model = bst)
 print(importance_matrix)
-xgb.plot.importance(importance_matrix)
+xgb.plot.importance(importance_matrix = importance_matrix)
 ```

 View the trees from a model
--- a/README.md
+++ b/README.md
@@ -2,7 +2,9 @@
 ===========
 [![Build Status](https://travis-ci.org/dmlc/xgboost.svg?branch=master)](https://travis-ci.org/dmlc/xgboost)
 [![Documentation Status](https://readthedocs.org/projects/xgboost/badge/?version=latest)](https://xgboost.readthedocs.org)
+[![GitHub license](http://dmlc.github.io/img/apache2.svg)](./LICENSE)
 [![CRAN Status Badge](http://www.r-pkg.org/badges/version/xgboost)](http://cran.r-project.org/web/packages/xgboost)
+[![PyPI version](https://badge.fury.io/py/xgboost.svg)](https://pypi.python.org/pypi/xgboost/)
 [![Gitter chat for developers at https://gitter.im/dmlc/xgboost](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/dmlc/xgboost?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)

 An optimized general purpose gradient boosting library. The library is parallelized, and also provides an optimized distributed version.
@@ -29,6 +31,9 @@ Contents
 What's New
 ----------

+* XGBoost helps Vlad Mironov, Alexander Guschin to win the [CERN LHCb experiment Flavour of Physics competition](https://www.kaggle.com/c/flavours-of-physics). Check out the [interview from Kaggle](http://blog.kaggle.com/2015/11/30/flavour-of-physics-technical-write-up-1st-place-go-polar-bears/).
+* XGBoost helps Mario Filho, Josef Feigl, Lucas, Gilberto to win the [Caterpillar Tube Pricing competition](https://www.kaggle.com/c/caterpillar-tube-pricing). Check out the [interview from Kaggle](http://blog.kaggle.com/2015/09/22/caterpillar-winners-interview-1st-place-gilberto-josef-leustagos-mario/).
+* XGBoost helps Halla Yang to win the [Recruit Coupon Purchase Prediction Challenge](https://www.kaggle.com/c/coupon-purchase-prediction). Check out the [interview from Kaggle](http://blog.kaggle.com/2015/10/21/recruit-coupon-purchase-winners-interview-2nd-place-halla-yang/).
 * XGBoost helps Owen Zhang to win the [Avito Context Ad Click competition](https://www.kaggle.com/c/avito-context-ad-clicks). Check out the [interview from Kaggle](http://blog.kaggle.com/2015/08/26/avito-winners-interview-1st-place-owen-zhang/).
 * XGBoost helps Chenglong Chen to win [Kaggle CrowdFlower Competition](https://www.kaggle.com/c/crowdflower-search-relevance)
  Check out the [winning solution](https://github.com/ChenglongChen/Kaggle_CrowdFlower)
--- a/demo/README.md
+++ b/demo/README.md
@@ -22,8 +22,8 @@ This is a list of short codes introducing different functionalities of xgboost p
  [Julia](https://github.com/antinucleon/XGBoost.jl/blob/master/demo/boost_from_prediction.jl)
 * Predicting using first n trees
  [python](guide-python/predict_first_ntree.py)
-  [R](../R-package/demo/boost_from_prediction.R)
-  [Julia](https://github.com/antinucleon/XGBoost.jl/blob/master/demo/boost_from_prediction.jl)
+  [R](../R-package/demo/predict_first_ntree.R)
+  [Julia](https://github.com/antinucleon/XGBoost.jl/blob/master/demo/predict_first_ntree.jl)
 * Generalized Linear Model
  [python](guide-python/generalized_linear_model.py)
  [R](../R-package/demo/generalized_linear_model.R)
@@ -49,4 +49,3 @@ Benchmarks
 ----------
 * [Starter script for Kaggle Higgs Boson](kaggle-higgs)
 * [Kaggle Tradeshift winning solution by daxiongshu](https://github.com/daxiongshu/kaggle-tradeshift-winning-solution)
-
--- a/demo/guide-python/README.md
+++ b/demo/guide-python/README.md
@@ -9,4 +9,6 @@ XGBoost Python Feature Walkthrough
 * [Predicting leaf indices](predict_leaf_indices.py)
 * [Sklearn Wrapper](sklearn_examples.py)
 * [Sklearn Parallel](sklearn_parallel.py)
+* [Sklearn access evals result](sklearn_evals_result.py)
+* [Access evals result](evals_result.py)
 * [External Memory](external_memory.py)
--- a/demo/guide-python/evals_result.py
+++ b/demo/guide-python/evals_result.py
@@ -0,0 +1,30 @@
+##
+#  This script demonstrate how to access the eval metrics in xgboost
+##
+
+import xgboost as xgb
+dtrain = xgb.DMatrix('../data/agaricus.txt.train', silent=True)
+dtest = xgb.DMatrix('../data/agaricus.txt.test', silent=True)
+
+param = [('max_depth', 2), ('objective', 'binary:logistic'), ('eval_metric', 'logloss'), ('eval_metric', 'error')]
+ 
+num_round = 2
+watchlist = [(dtest,'eval'), (dtrain,'train')]
+
+evals_result = {}
+bst = xgb.train(param, dtrain, num_round, watchlist, evals_result=evals_result)
+
+print('Access logloss metric directly from evals_result:')
+print(evals_result['eval']['logloss'])
+
+print('')
+print('Access metrics through a loop:')
+for e_name, e_mtrs in evals_result.items():
+    print('- {}'.format(e_name))
+    for e_mtr_name, e_mtr_vals in e_mtrs.items():
+        print('   - {}'.format(e_mtr_name))
+        print('      - {}'.format(e_mtr_vals))
+
+print('')
+print('Access complete dictionary:')
+print(evals_result)
--- a/demo/guide-python/sklearn_evals_result.py
+++ b/demo/guide-python/sklearn_evals_result.py
@@ -0,0 +1,43 @@
+##
+#  This script demonstrate how to access the xgboost eval metrics by using sklearn
+##
+
+import xgboost as xgb
+import numpy as np
+from sklearn.datasets import make_hastie_10_2
+
+X, y = make_hastie_10_2(n_samples=2000, random_state=42)
+
+# Map labels from {-1, 1} to {0, 1}
+labels, y = np.unique(y, return_inverse=True)
+
+X_train, X_test = X[:1600], X[1600:]
+y_train, y_test = y[:1600], y[1600:]
+
+param_dist = {'objective':'binary:logistic', 'n_estimators':2}
+
+clf = xgb.XGBModel(**param_dist)
+# Or you can use: clf = xgb.XGBClassifier(**param_dist)
+
+clf.fit(X_train, y_train,
+        eval_set=[(X_train, y_train), (X_test, y_test)], 
+        eval_metric='logloss',
+        verbose=True)
+
+# Load evals result by calling the evals_result() function
+evals_result = clf.evals_result()
+
+print('Access logloss metric directly from validation_0:')
+print(evals_result['validation_0']['logloss'])
+
+print('')
+print('Access metrics through a loop:')
+for e_name, e_mtrs in evals_result.items():
+    print('- {}'.format(e_name))
+    for e_mtr_name, e_mtr_vals in e_mtrs.items():
+        print('   - {}'.format(e_mtr_name))
+        print('      - {}'.format(e_mtr_vals))
+ 
+print('')
+print('Access complete dict:')
+print(evals_result)
--- a/doc/build.md
+++ b/doc/build.md
@@ -15,66 +15,17 @@ Build XGBoost in OS X with OpenMP
 ---------------------------------
 Here is the complete solution to use OpenMp-enabled compilers to install XGBoost.

-1. Obtain gcc with openmp support by `brew install gcc --without-multilib` **or** clang with openmp by `brew install clang-omp`. The clang one is recommended because the first method requires us compiling gcc inside the machine (more than an hour in mine)! (BTW, `brew` is the de facto standard of `apt-get` on OS X. So installing [HPC](http://hpc.sourceforge.net/) separately is not recommended, but it should work.)
+1. Obtain gcc-5.x.x with openmp support by `brew install gcc --without-multilib`. (`brew` is the de facto standard of `apt-get` on OS X. So installing [HPC](http://hpc.sourceforge.net/) separately is not recommended, but it should work.)

-2. **if you are planing to use clang-omp** - in step 3 and/or 4, change line 9 in `xgboost/src/utils/omp.h` to
+2. `cd xgboost` then `bash build.sh` to compile XGBoost.

-  ```C++
-  #include <libiomp/omp.h> /* instead of #include <omp.h> */`
-  ```
+3. Install xgboost package for Python and R

-  to make it work, otherwise you might get this error
-
-  `src/tree/../utils/omp.h:9:10: error: 'omp.h' file not found...`
-
-
-
-3. Set the `Makefile` correctly for compiling cpp version xgboost then python version xgboost.
-
-  ```Makefile
-  export CC  = gcc-4.9
-  export CXX = g++-4.9
-  ```
-
-  Or
-
-  ```Makefile
-  export CC = clang-omp
-  export CXX = clang-omp++
-  ```
-
-  Remember to change `header` (mentioned in step 2) if using clang-omp.
-
-  Then `cd xgboost` then `bash build.sh` to compile XGBoost. And go to `wrapper` sub-folder to install python version.
-
-4. Set the `Makevars` file in highest piority for R.
+- For Python: go to `python-package` sub-folder to install python version with `python setup.py install` (or `sudo python setup.py install`).
+- For R: Set the `Makevars` file in highest piority for R.

  The point is, there are three `Makevars` : `~/.R/Makevars`, `xgboost/R-package/src/Makevars`, and `/usr/local/Cellar/r/3.2.0/R.framework/Resources/etc/Makeconf` (the last one obtained by running `file.path(R.home("etc"), "Makeconf")` in R), and `SHLIB_OPENMP_CXXFLAGS` is not set by default!! After trying, it seems that the first one has highest piority (surprise!).

-  So, **add** or **change** `~/.R/Makevars` to the following lines:
-
-  ```Makefile
-  CC=gcc-4.9
-  CXX=g++-4.9
-  SHLIB_OPENMP_CFLAGS = -fopenmp
-  SHLIB_OPENMP_CXXFLAGS = -fopenmp
-  SHLIB_OPENMP_FCFLAGS = -fopenmp
-  SHLIB_OPENMP_FFLAGS = -fopenmp
-  ```
-
-  Or
-
-  ```Makefile
-  CC=clang-omp
-  CXX=clang-omp++
-  SHLIB_OPENMP_CFLAGS = -fopenmp
-  SHLIB_OPENMP_CXXFLAGS = -fopenmp
-  SHLIB_OPENMP_FCFLAGS = -fopenmp
-  SHLIB_OPENMP_FFLAGS = -fopenmp
-  ```
-
-  Again, remember to change `header` if using clang-omp.
-
  Then inside R, run

  ```R
--- a/doc/index.md
+++ b/doc/index.md
@@ -11,7 +11,7 @@ This document is hosted at http://xgboost.readthedocs.org/. You can also browse
 How to Get Started
 ------------------
 The best way to get started to learn xgboost is by the examples. There are three types of examples you can find in xgboost.
-* [Tutorials](#tutorials) are self-conatained tutorials on a complete data science tasks.
+* [Tutorials](#tutorials) are self-contained tutorials on complete data science tasks.
 * [XGBoost Code Examples](../demo/) are collections of code and benchmarks of xgboost.
  - There is a walkthrough section in this to walk you through specific API features.
 * [Highlight Solutions](#highlight-solutions) are presentations using xgboost to solve real world problems.
--- a/doc/model.md
+++ b/doc/model.md
@@ -1,8 +1,12 @@
 Introduction to Boosted Trees
 =============================
-XGBoost is short for "Extreme Gradient Boosting", where the term "Gradient Boosting" is proposed in the paper _Greedy Function Approximation: A Gradient Boosting Machine_, Friedman. Based on this original model. This is a tutorial on boosted trees, most of content are based on this [slide](http://homes.cs.washington.edu/~tqchen/pdf/BoostedTree.pdf) by the author of xgboost.
+XGBoost is short for "Extreme Gradient Boosting", where the term "Gradient Boosting" is proposed in the paper _Greedy Function Approximation: A Gradient Boosting Machine_, by Friedman.
+XGBoost is based on this original model.
+This is a tutorial on gradient boosted trees, and most of the content is based on these [slides](http://homes.cs.washington.edu/~tqchen/pdf/BoostedTree.pdf) by the author of xgboost.

-The GBM(boosted trees) has been around for really a while, and there are a lot of materials on the topic. This tutorial tries to explain boosted trees in a self-contained and principled way of supervised learning. We think this explanation is cleaner, more formal, and motivates the variant used in xgboost.
+The GBM (boosted trees) has been around for really a while, and there are a lot of materials on the topic.
+This tutorial tries to explain boosted trees in a self-contained and principled way using the elements of supervised learning.
+We think this explanation is cleaner, more formal, and motivates the variant used in xgboost.

 Elements of Supervised Learning
 -------------------------------
@@ -10,21 +14,21 @@ XGBoost is used for supervised learning problems, where we use the training data
 Before we dive into trees, let us start by reviewing the basic elements in supervised learning.

 ### Model and Parameters
-The ***model*** in supervised learning usually refers to the mathematical structure on how to given the prediction ``$ y_i $`` given ``$ x_i $``.
-For example, a common model is *linear model*, where the prediction is given by ``$ \hat{y}_i = \sum_j w_j x_{ij} $``, a linear combination of weighted input features.
+The ***model*** in supervised learning usually refers to the mathematical structure of how to make the prediction ``$ y_i $`` given ``$ x_i $``.
+For example, a common model is a *linear model*, where the prediction is given by ``$ \hat{y}_i = \sum_j w_j x_{ij} $``, a linear combination of weighted input features.
 The prediction value can have different interpretations, depending on the task.
-For example, it can be logistic transformed to get the probability of positive class in logistic regression, and it can also be used as ranking score when we want to rank the outputs.
+For example, it can be logistic transformed to get the probability of positive class in logistic regression, and it can also be used as a ranking score when we want to rank the outputs.

-The ***parameters*** are the undermined part that we need to learn from data. In linear regression problem, the parameters are the co-efficients ``$ w $``.
+The ***parameters*** are the undetermined part that we need to learn from data. In linear regression problems, the parameters are the coefficients ``$ w $``.
 Usually we will use ``$ \Theta $`` to denote the parameters.

 ### Objective Function : Training Loss + Regularization

-Based on different understanding or assumption of ``$ y_i $``, we can have different problems as regression, classification, ordering, etc.
-We need to find a way to find the best parameters given the training data. In order to do so, we need to define a so called ***objective function***,
-to measure the performance of the model under certain set of parameters.
+Based on different understandings of ``$ y_i $`` we can have different problems, such as regression, classification, ordering, etc.
+We need to find a way to find the best parameters given the training data. In order to do so, we need to define a so-called ***objective function***,
+to measure the performance of the model given a certain set of parameters.

-A very important fact about objective functions, is they ***must always*** contains two parts: training loss and regularization.
+A very important fact about objective functions is they ***must always*** contain two parts: training loss and regularization.

 ```math
 Obj(\Theta) = L(\Theta) + \Omega(\Theta)
@@ -44,7 +48,8 @@ L(\theta) = \sum_i[ y_i\ln (1+e^{-\hat{y}_i}) + (1-y_i)\ln (1+e^{\hat{y}_i})]

 The ***regularization term*** is what people usually forget to add. The regularization term controls the complexity of the model, which helps us to avoid overfitting.
 This sounds a bit abstract, so let us consider the following problem in the following picture. You are asked to *fit* visually a step function given the input data points
-on the upper left corner of the image, which solution among the tree you think is the best fit?
+on the upper left corner of the image.
+Which solution among the three do you think is the best fit?

 ![Step function](img/step_fit.png)

@@ -53,26 +58,26 @@ The tradeoff between the two is also referred as bias-variance tradeoff in machi


 ### Why introduce the general principle
-The elements introduced in above forms the basic elements of supervised learning, and they are naturally the building blocks of machine learning toolkits.
-For example, you should be able to answer what is the difference and common parts between boosted trees and random forest.
-Understanding the process in a formalized way also helps us to understand the objective that we are learning and the reason behind the heurestics such as
+The elements introduced above form the basic elements of supervised learning, and they are naturally the building blocks of machine learning toolkits.
+For example, you should be able to describe the differences and commonalities between boosted trees and random forests.
+Understanding the process in a formalized way also helps us to understand the objective that we are learning and the reason behind the heuristics such as
 pruning and smoothing.

 Tree Ensemble
 -------------
 Now that we have introduced the elements of supervised learning, let us get started with real trees.
-To begin with, let us first learn what is the ***model*** of xgboost: tree ensembles.
+To begin with, let us first learn about the ***model*** of xgboost: tree ensembles.
 The tree ensemble model is a set of classification and regression trees (CART). Here's a simple example of a CART
-that classifies is someone will like computer games.
+that classifies whether someone will like computer games.

 ![CART](img/cart.png)

-We classify the members in thie family into different leaves, and assign them the score on corresponding leaf.
-A CART is a bit different from decision trees, where the leaf only contain decision values. In CART, a real score
+We classify the members of a family into different leaves, and assign them the score on corresponding leaf.
+A CART is a bit different from decision trees, where the leaf only contains decision values. In CART, a real score
 is associated with each of the leaves, which gives us richer interpretations that go beyond classification.
 This also makes the unified optimization step easier, as we will see in later part of this tutorial.

-Usually, a single tree is not so strong enough to be used in practice. What is actually used is the so called
+Usually, a single tree is not strong enough to be used in practice. What is actually used is the so-called
 tree ensemble model, that sums the prediction of multiple trees together.

 ![TwoCART](img/twocart.png)
@@ -90,9 +95,9 @@ where ``$ K $`` is the number of trees, ``$ f $`` is a function in the functiona
 ```math
 obj(\Theta) = \sum_i^n l(y_i, \hat{y}_i) + \sum_{k=1}^K \Omega(f_k)
 ```
-Now here comes the question, what is the *model* of random forest? It is exactly tree ensembles! So random forest and boosted trees are not different in terms of model,
+Now here comes the question, what is the *model* for random forests? It is exactly tree ensembles! So random forests and boosted trees are not different in terms of model,
 the difference is how we train them. This means if you write a predictive service of tree ensembles, you only need to write one of them and they should directly work
-for both random forest and boosted trees. One example of elements of supervised learning rocks.
+for both random forests and boosted trees. One example of why elements of supervised learning rocks.

 Tree Boosting
 -------------
@@ -106,10 +111,11 @@ Obj = \sum_{i=1}^n l(y_i, \hat{y}_i^{(t)}) + \sum_{i=1}^t\Omega(f_i) \\

 ### Additive Training

-First thing we want to ask is what are ***parameters*** of trees. You can find what we need to learn are those functions ``$f_i$``, with each contains the structure
-of the tree, and the leaf score. This is much harder than traditional optimization problem where you can take the gradient and go.
+First thing we want to ask is what are the ***parameters*** of trees.
+You can find what we need to learn are those functions ``$f_i$``, with each containing the structure
+of the tree and the leaf scores. This is much harder than traditional optimization problem where you can take the gradient and go.
 It is not easy to train all the trees at once.
-Instead, we use an additive strategy: fix what we have learned, add a new tree at a time.
+Instead, we use an additive strategy: fix what we have learned, add one new tree at a time.
 We note the prediction value at step ``$t$`` by ``$ \hat{y}_i^{(t)}$``, so we have

 ```math
@@ -120,7 +126,7 @@ We note the prediction value at step ``$t$`` by ``$ \hat{y}_i^{(t)}$``, so we ha
 \hat{y}_i^{(t)} &= \sum_{k=1}^t f_k(x_i)= \hat{y}_i^{(t-1)} + f_t(x_i)
 ```

-It remains to ask Which tree do we want at each step?  A natural thing is to add the one that optimizes our objective.
+It remains to ask, which tree do we want at each step?  A natural thing is to add the one that optimizes our objective.

 ```math
 Obj^{(t)} & = \sum_{i=1}^n l(y_i, \hat{y}_i^{(t)}) + \sum_{i=1}^t\Omega(f_i) \\
@@ -135,8 +141,8 @@ Obj^{(t)} & = \sum_{i=1}^n (y_i - (\hat{y}_i^{(t-1)} + f_t(x_i)))^2 + \sum_{i=1}
 ```

 The form of MSE is friendly, with a first order term (usually called residual) and a quadratic term.
-For other loss of interest (for example, logistic loss), it is not so easy to get such a nice form.
-So in general case, we take the Taylor expansion of the loss function up to the second order
+For other losses of interest (for example, logistic loss), it is not so easy to get such a nice form.
+So in the general case, we take the Taylor expansion of the loss function up to the second order

 ```math
 Obj^{(t)} = \sum_{i=1}^n [l(y_i, \hat{y}_i^{(t-1)}) + g_i f_t(x_i) + \frac{1}{2} h_i f_t^2(x_i)] + \Omega(f_t) + constant
@@ -148,15 +154,15 @@ g_i &= \partial_{\hat{y}_i^{(t)}} l(y_i, \hat{y}_i^{(t-1)})\\
 h_i &= \partial_{\hat{y}_i^{(t)}}^2 l(y_i, \hat{y}_i^{(t-1)})
 ```

-After we remove all the constants, the specific objective at t step becomes
+After we remove all the constants, the specific objective at step ``$t$`` becomes

 ```math
 \sum_{i=1}^n [g_i f_t(x_i) + \frac{1}{2} h_i f_t^2(x_i)] + \Omega(f_t)
 ```

-This becomes our optimization goal for the new tree. One important advantage of this definition, is that
-it only depends on ``$g_i$`` and ``$h_i$``, this is how xgboost allows support of customization of loss functions.
-We can optimized every loss function, including logistic regression, weighted logistic regression, using the exactly
+This becomes our optimization goal for the new tree. One important advantage of this definition is that
+it only depends on ``$g_i$`` and ``$h_i$``. This is how xgboost can support custom loss functions.
+We can optimize every loss function, including logistic regression, weighted logistic regression, using the exactly
 the same solver that takes ``$g_i$`` and ``$h_i$`` as input!

 ### Model Complexity
@@ -173,9 +179,9 @@ In XGBoost, we define the complexity as
 ```math
 \Omega(f) = \gamma T + \frac{1}{2}\lambda \sum_{j=1}^T w_j^2
 ```
-Of course there is more than one way to define the complexity, but this specific one works well in practice. The regularization is one part most tree packages takes
-less carefully, or simply ignore. This was due to the traditional treatment tree learning only emphasize improving impurity, while the complexity control part
-are more lies as part of heuristics. By defining it formally, we can get a better idea of what we are learning, and yes it works well in practice.
+Of course there is more than one way to define the complexity, but this specific one works well in practice. The regularization is one part most tree packages treat
+less carefully, or simply ignore. This was because the traditional treatment of tree learning only emphasized improving impurity, while the complexity control was left to heuristics.
+By defining it formally, we can get a better idea of what we are learning, and yes it works well in practice.

 ### The Structure Score

@@ -186,13 +192,15 @@ Obj^{(t)} &\approx \sum_{i=1}^n [g_i w_q(x_i) + \frac{1}{2} h_i w_{q(x_i)}^2] +
 &= \sum^T_{j=1} [(\sum_{i\in I_j} g_i) w_j + \frac{1}{2} (\sum_{i\in I_j} h_i + \lambda) w_j^2 ] + \gamma T
 ```

-where ``$ I_j = \{i|q(x_i)=j\} $`` is the set of indices of data points assigned to the ``$ j $``-th leaf. Notice that in the second line we have change the index of the summation because all the data points on the same leaf get the same score. We could further compress the expression by defining ``$ G_j = \sum_{i\in I_j} g_i $`` and ``$ H_j = \sum_{i\in I_j} h_i $``:
+where ``$ I_j = \{i|q(x_i)=j\} $`` is the set of indices of data points assigned to the ``$ j $``-th leaf.
+Notice that in the second line we have changed the index of the summation because all the data points on the same leaf get the same score.
+We could further compress the expression by defining ``$ G_j = \sum_{i\in I_j} g_i $`` and ``$ H_j = \sum_{i\in I_j} h_i $``:

 ```math
 Obj^{(t)} = \sum^T_{j=1} [G_jw_j + \frac{1}{2} (H_j+\lambda) w_j^2] +\gamma T
 ```

-In this equation ``$ w_j $`` are independent to each other, the form ``$ G_jw_j+\frac{1}{2}(H_j+\lambda)w_j^2 $`` is quadratic and the best ``$ w_j $`` for a given structure ``$q(x)$`` and the best objective reduction we can get:
+In this equation ``$ w_j $`` are independent to each other, the form ``$ G_jw_j+\frac{1}{2}(H_j+\lambda)w_j^2 $`` is quadratic and the best ``$ w_j $`` for a given structure ``$q(x)$`` and the best objective reduction we can get is:

 ```math
 w_j^\ast = -\frac{G_j}{H_j+\lambda}\\
@@ -202,30 +210,31 @@ The last equation measures ***how good*** a tree structure ``$q(x)$`` is.

 ![Structure Score](img/struct_score.png)

-If all these sounds a bit complicated. Let us take a look the the picture, and see how the scores can be calculated.
+If all this sounds a bit complicated, let's take a look at the picture, and see how the scores can be calculated.
 Basically, for a given tree structure, we push the statistics ``$g_i$`` and ``$h_i$`` to the leaves they belong to,
-sum the statistics together, and use the formula to calulate how good the tree is.
-This score is like impurity measure in decision tree, except that it also takes the model complexity into account.
+sum the statistics together, and use the formula to calculate how good the tree is.
+This score is like the impurity measure in a decision tree, except that it also takes the model complexity into account.

 ### Learn the tree structure
-Now we have a way to measure how good a tree is ideally we can enumerate all possible trees and pick the best one.
-In practice it is impossible, so we will try to one level of the tree at a time.
+Now that we have a way to measure how good a tree is, ideally we would enumerate all possible trees and pick the best one.
+In practice it is intractable, so we will try to optimize one level of the tree at a time.
 Specifically we try to split a leaf into two leaves, and the score it gains is

 ```math
 Gain = \frac{1}{2} \left[\frac{G_L^2}{H_L+\lambda}+\frac{G_R^2}{H_R+\lambda}-\frac{(G_L+G_R)^2}{H_L+H_R+\lambda}\right] - \gamma
 ```
-This formula can be decomposited as 1) the score on the new left leaf 2) the score on the new right leaf 3) The score on the original leaf 4) regularization on the additional leaf.
-We can find an important fact here: if the gain is smaller than ``$\gamma$``, we would better not to add that branch. This is exactly the ***prunning*** techniques in tree based
-models! By using the principles of supervised learning, we can naturally comes up with the reason these techniques :)
+This formula can be decomposed as 1) the score on the new left leaf 2) the score on the new right leaf 3) The score on the original leaf 4) regularization on the additional leaf.
+We can see an important fact here: if the gain is smaller than ``$\gamma$``, we would do better not to add that branch. This is exactly the ***pruning*** techniques in tree based
+models! By using the principles of supervised learning, we can naturally come up with the reason these techniques work :)

-For real valued data, we usually want to search for an optimal split. To efficiently do so, we place all the instances in a sorted way, like the following picture.
+For real valued data, we usually want to search for an optimal split. To efficiently do so, we place all the instances in sorted order, like the following picture.
 ![Best split](img/split_find.png)
+
 Then a left to right scan is sufficient to calculate the structure score of all possible split solutions, and we can find the best split efficiently.

 Final words on XGBoost
 ----------------------
-Now you have understand what is a boosted tree, you may ask, where is the introduction on [XGBoost](https://github.com/dmlc/xgboost)?
+Now that you understand what boosted trees are, you may ask, where is the introduction on [XGBoost](https://github.com/dmlc/xgboost)?
 XGBoost is exactly a tool motivated by the formal principle introduced in this tutorial!
 More importantly, it is developed with both deep consideration in terms of ***systems optimization*** and ***principles in machine learning***.
 The goal of this library is to push the extreme of the computation limits of machines to provide a ***scalable***, ***portable*** and ***accurate*** library.
--- a/doc/parameter.md
+++ b/doc/parameter.md
@@ -1,6 +1,6 @@
 XGBoost Parameters
 ==================
-Before running XGboost, we must set three types of parameters, general parameters, booster parameters and task parameters:
+Before running XGboost, we must set three types of parameters: general parameters, booster parameters and task parameters.
 - General parameters relates to which booster we are using to do boosting, commonly tree or linear model
 - Booster parameters depends on which booster you have chosen
 - Learning Task parameters that decides on the learning scenario, for example, regression tasks may use different parameters with ranking tasks.
@@ -62,8 +62,8 @@ Parameters for Linear Booster

 Learning Task Parameters
 ------------------------
+Specify the learning task and the corresponding learning objective. The objective options are below:
 * objective [ default=reg:linear ]
- - specify the learning task and the corresponding learning objective, and the objective options are below:
 - "reg:linear" --linear regression
 - "reg:logistic" --logistic regression
 - "binary:logistic" --logistic regression for binary classification, output probability
@@ -97,9 +97,9 @@ Command Line Parameters
 -----------------------
 The following parameters are only used in the console version of xgboost
 * use_buffer [ default=1 ]
- -  whether create binary buffer for text input, this normally will speedup loading when do
+ -  Whether to create a binary buffer from text input. Doing so normally will speed up loading times
 * num_round
- - the number of round for boosting.
+ - The number of rounds for boosting
 * data
  - The path of training data
 * test:data
--- a/doc/python/python_intro.md
+++ b/doc/python/python_intro.md
@@ -8,7 +8,7 @@ This document gives a basic walkthrough of xgboost python package.

 Install XGBoost
 ---------------
-To install XGBoost, do the following steps.
+To install XGBoost, do the following steps:

 * You need to run `make` in the root directory of the project
 * In the  `python-package` directory run
@@ -22,34 +22,39 @@ import xgboost as xgb

 Data Interface
 --------------
-XGBoost python module is able to loading from libsvm txt format file, Numpy 2D array and xgboost binary buffer file. The data will be store in ```DMatrix``` object.
+The XGBoost python module is able to load data from:
+- libsvm txt format file
+- Numpy 2D array, and 
+- xgboost binary buffer file. 

-* To load libsvm text format file and XGBoost binary file into ```DMatrix```, the usage is like
+The data will be store in a ```DMatrix``` object.
+
+* To load a libsvm text file or a XGBoost binary file into ```DMatrix```, the command is:
 ```python
 dtrain = xgb.DMatrix('train.svm.txt')
 dtest = xgb.DMatrix('test.svm.buffer')
 ```
-* To load numpy array into ```DMatrix```, the usage is like
+* To load a numpy array into ```DMatrix```, the command is:
 ```python
 data = np.random.rand(5,10) # 5 entities, each contains 10 features
 label = np.random.randint(2, size=5) # binary target
 dtrain = xgb.DMatrix( data, label=label)
 ```
-* Build ```DMatrix``` from ```scipy.sparse```
+* To load a scpiy.sparse array into ```DMatrix```, the command is:
 ```python
 csr = scipy.sparse.csr_matrix((dat, (row, col)))
 dtrain = xgb.DMatrix(csr)
 ```
-* Saving ```DMatrix``` into XGBoost binary file will make loading faster in next time. The usage is like:
+* Saving ```DMatrix``` into XGBoost binary file will make loading faster in next time:
 ```python
 dtrain = xgb.DMatrix('train.svm.txt')
 dtrain.save_binary("train.buffer")
 ```
-* To handle missing value in ```DMatrix```, you can initialize the ```DMatrix``` like:
+* To handle missing value in ```DMatrix```, you can initialize the ```DMatrix``` by specifying missing values:
 ```python
 dtrain = xgb.DMatrix(data, label=label, missing = -999.0)
 ```
-* Weight can be set when needed, like
+* Weight can be set when needed:
 ```python
 w = np.random.rand(5, 1)
 dtrain = xgb.DMatrix(data, label=label, missing = -999.0, weight=w)
@@ -62,10 +67,17 @@ XGBoost use list of pair to save [parameters](../parameter.md). Eg
 ```python
 param = {'bst:max_depth':2, 'bst:eta':1, 'silent':1, 'objective':'binary:logistic' }
 param['nthread'] = 4
-plst = param.items()
-plst += [('eval_metric', 'auc')] # Multiple evals can be handled in this way
-plst += [('eval_metric', 'ams@0')]
+param['eval_metric'] = 'auc'
 ```
+* You can also specify multiple eval metrics:
+```python
+param['eval_metric'] = ['auc', 'ams@0'] 
+
+# alternativly:
+# plst = param.items()
+# plst += [('eval_metric', 'ams@0')]
+```
+
 * Specify validations set to watch performance
 ```python
 evallist  = [(dtest,'eval'), (dtrain,'train')]
@@ -109,9 +121,9 @@ Early stopping requires at least one set in `evals`. If there's more than one, i

 The model will train until the validation score stops improving. Validation error needs to decrease at least every `early_stopping_rounds` to continue training.

-If early stopping occurs, the model will have two additional fields: `bst.best_score` and `bst.best_iteration`. Note that `train()` will return a model from the last iteration, not the best one.
+If early stopping occurs, the model will have three additional fields: `bst.best_score`, `bst.best_iteration` and `bst.best_ntree_limit`. Note that `train()` will return a model from the last iteration, not the best one.

-This works with both metrics to minimize (RMSE, log loss, etc.) and to maximize (MAP, NDCG, AUC).
+This works with both metrics to minimize (RMSE, log loss, etc.) and to maximize (MAP, NDCG, AUC). Note that if you specify more than one evaluation metric the last one in `param['eval_metric']` is used for early stopping.

 Prediction
 ----------
@@ -123,9 +135,9 @@ dtest = xgb.DMatrix(data)
 ypred = bst.predict(xgmat)
 ```

-If early stopping is enabled during training, you can predict with the best iteration.
+If early stopping is enabled during training, you can get predicticions from the best iteration with `bst.best_ntree_limit`:
 ```python
-ypred = bst.predict(xgmat,ntree_limit=bst.best_iteration)
+ypred = bst.predict(xgmat,ntree_limit=bst.best_ntree_limit)
 ```

 Plotting
@@ -150,4 +162,4 @@ When you use ``IPython``, you can use ``to_graphviz`` function which converts th

 ```python
 xgb.to_graphviz(bst, num_trees=2)
-```
+```
--- a/python-package/.pylintrc
+++ b/python-package/.pylintrc
@@ -0,0 +1,9 @@
+[MASTER]
+
+ignore=tests
+
+unexpected-special-method-signature,too-many-nested-blocks
+
+dummy-variables-rgx=(unused|)_.*
+
+reports=no
--- a/python-package/MANIFEST.in
+++ b/python-package/MANIFEST.in
@@ -1,7 +1,14 @@
-include *.sh *.md
+include *.sh *.md *.rst
 recursive-include xgboost *
 recursive-include xgboost/wrapper *
 recursive-include xgboost/windows *
 recursive-include xgboost/subtree *
 recursive-include xgboost/src *
 recursive-include xgboost/multi-node *
+#exclude pre-compiled .o file for less confusions
+#include the pre-compiled .so is needed as a placeholder
+#since it will be copy after compiling on the fly
+global-exclude xgboost/wrapper/*.so.gz
+global-exclude xgboost/*.o
+global-exclude *.pyo
+global-exclude *.pyc
--- a/python-package/README.md
+++ b/python-package/README.md
@@ -1,27 +0,0 @@
-XGBoost Python Package
-======================
-Installation
------------
-We are on [PyPI](https://pypi.python.org/pypi/xgboost) now. For stable version, please install using pip:
-
-* ```pip install xgboost```
-* Note for windows users: this pip installation may not work on some windows environment, and it may cause unexpected errors. pip installation on windows is currently disabled for further invesigation, please install from github.
-
-For up-to-date version, please install from github.
-
-* To make the python module, type ```./build.sh``` in the root directory of project
-* Make sure you have [setuptools](https://pypi.python.org/pypi/setuptools)
-* Install with `python setup.py install` from this directory.
-* For windows users, please use the Visual Studio project file under [windows folder](../windows/). See also the [installation tutorial](https://www.kaggle.com/c/otto-group-product-classification-challenge/forums/t/13043/run-xgboost-from-windows-and-python) from Kaggle Otto Forum.
-
-Examples
------
-
-* Refer also to the walk through example in [demo folder](../demo/guide-python)
-* See also the [example scripts](../demo/kaggle-higgs) for Kaggle Higgs Challenge, including [speedtest script](../demo/kaggle-higgs/speedtest.py) on this dataset.
-
-Note
-----
-
-* If you want to build xgboost on Mac OS X with multiprocessing support where clang in XCode by default doesn't support, please install gcc 4.9 or higher using [homebrew](http://brew.sh/) ```brew tap homebrew/versions; brew install gcc49```
-* If you want to run XGBoost process in parallel using the fork backend for joblib/multiprocessing, you must build XGBoost without support for OpenMP by `make no_omp=1`. Otherwise, use the forkserver (in Python 3.4) or spawn backend. See the [sklearn_parallel.py](../demo/guide-python/sklearn_parallel.py) demo.
--- a/python-package/README.rst
+++ b/python-package/README.rst
@@ -0,0 +1,56 @@
+XGBoost Python Package
+======================
+
+|PyPI version| |PyPI downloads|
+
+Installation
+------------
+
+We are on `PyPI <https://pypi.python.org/pypi/xgboost>`__ now. For
+stable version, please install using pip:
+
+-  ``pip install xgboost``
+-  Note for windows users: this pip installation may not work on some
+   windows environment, and it may cause unexpected errors. pip
+   installation on windows is currently disabled for further
+   invesigation, please install from github.
+
+For up-to-date version, please install from github.
+
+-  To make the python module, type ``./build.sh`` in the root directory
+   of project
+-  Make sure you have
+   `setuptools <https://pypi.python.org/pypi/setuptools>`__
+-  Install with ``cd python-package; python setup.py install`` from this directory.
+-  For windows users, please use the Visual Studio project file under
+   `windows folder <../windows/>`__. See also the `installation
+   tutorial <https://www.kaggle.com/c/otto-group-product-classification-challenge/forums/t/13043/run-xgboost-from-windows-and-python>`__
+   from Kaggle Otto Forum.
+
+Examples
+--------
+
+-  Refer also to the walk through example in `demo
+   folder <../demo/guide-python>`__
+-  See also the `example scripts <../demo/kaggle-higgs>`__ for Kaggle
+   Higgs Challenge, including `speedtest
+   script <../demo/kaggle-higgs/speedtest.py>`__ on this dataset.
+
+Note
+----
+
+-  If you want to build xgboost on Mac OS X with multiprocessing support
+   where clang in XCode by default doesn't support, please install gcc
+   4.9 or higher using `homebrew <http://brew.sh/>`__
+   ``brew tap homebrew/versions; brew install gcc49``
+-  If you want to run XGBoost process in parallel using the fork backend
+   for joblib/multiprocessing, you must build XGBoost without support
+   for OpenMP by ``make no_omp=1``. Otherwise, use the forkserver (in
+   Python 3.4) or spawn backend. See the
+   `sklearn\_parallel.py <../demo/guide-python/sklearn_parallel.py>`__
+   demo.
+
+.. |PyPI version| image:: https://badge.fury.io/py/xgboost.svg
+   :target: http://badge.fury.io/py/xgboost
+.. |PyPI downloads| image:: https://img.shields.io/pypi/dm/xgboost.svg
+   :target: https://pypi.python.org/pypi/xgboost/
--- a/python-package/build_trouble_shooting.md
+++ b/python-package/build_trouble_shooting.md
@@ -0,0 +1,52 @@
+XGBoost Python Package Troubleshooting
+======================
+Windows platform
+------------
+The current best solution for installing xgboost on windows machine is building from github. Please go to [windows](/windows/), build with the Visual Studio project file, and install. Additional detailed instruction can be found at this [installation tutorial](https://www.kaggle.com/c/otto-group-product-classification-challenge/forums/t/13043/run-xgboost-from-windows-and-python) from Kaggle Otto Forum.
+
+`pip install xgboost` is **not** tested nor supported in windows platform for now. 
+
+Linux platform (also Mac OS X in general)
+------------
+**Trouble 0**: I see error messages like this when install from github using `python setup.py install`.
+
+    XGBoostLibraryNotFound: Cannot find XGBoost Libarary in the candicate path, did you install compilers and run build.sh in root path?
+    List of candidates:
+    /home/dmlc/anaconda/lib/python2.7/site-packages/xgboost-0.4-py2.7.egg/xgboost/libxgboostwrapper.so
+    /home/dmlc/anaconda/lib/python2.7/site-packages/xgboost-0.4-py2.7.egg/xgboost/../../wrapper/libxgboostwrapper.so
+    /home/dmlc/anaconda/lib/python2.7/site-packages/xgboost-0.4-py2.7.egg/xgboost/./wrapper/libxgboostwrapper.so
+
+**Solution 0**: Please check if you have:
+
+* installed the latest C++ compilers and `make`, for example `g++` and `gcc` (Linux) or `clang LLVM` (Mac OS X). Recommended compilers are `g++-5` or newer (Linux and Mac), or `clang` comes with Xcode in Mac OS X. For installting compilers, please refer to your system package management commands, e.g. `apt-get` `yum` or `brew`(Mac).
+* compilers in your `$PATH`. Try typing `gcc` and see if your have it in your path.
+* Do you use other shells than `bash` and install from `pip`? In some old version of pip installation, the shell script used `pushd` for changing directory and triggering the build process, which may failed some shells without `pushd` command. Please update to the latest version by removing the old installation and redo `pip install xgboost`
+* Some outdated `make` may not recognize the recent changes in the `Makefile` and gives this error, please update to the latest `make`:
+
+    `/usr/lib/ruby/gems/1.8/gems/make-0.3.1/bin/make:4: undefined local variable or method 'make' for main:Object (NameError)`    
+
+**Trouble 1**: I see the same error message in **Trouble 0** when install from `pip install xgboost`.
+
+**Solution 1**: the problem is the same as in **Trouble 0**, please see **Solution 0**.
+
+**Trouble 2**: I see this error message when `pip install xgboost`. It says I have `libxgboostwrapper.so` but it is not valid.
+
+    OSError: /home/dmlc/anaconda/lib/python2.7/site-packages/xgboost/./wrapper/libxgboostwrapper.so: invalid ELF header
+   
+**Solution 2**: Solution is as in 0 and 1 by installing the latest `g++` compiler and the latest `make`. The reason for this rare error is that, `pip` ships with a pre-compiled `libxgboostwrapper.so` with Mac for placeholder for allowing `setup.py` to find the right lib path. If a system doesn't compile, it may refer to this placeholder lib and fail. This placeholder `libxgboostwrapper.so` will be automatically removed and correctly generated by the compiling on-the-fly for the system.
+
+**Trouble 3**: My system's `pip` says it can't find a valid `xgboost` installation release on `PyPI`.
+**Solution 3**: Some linux system comes with an old `pip` version. Please update to the latest `pip` by following the official installation document at <http://pip.readthedocs.org/en/stable/installing/>
+
+**Trouble 4**: I tried `python setup.py install` but it says `setuptools` import fail.
+**Solution 4**: Please make sure you have [setuptools](https://pypi.python.org/pypi/setuptools) before installing the python package.
+
+Mac OS X (specific)
+------------
+Most of the troubles and solutions are the same with that in the Linux platform. Mac has the following specific problems.
+
+**Trouble 0**: I successfully installed `xgboost` using github installation/using `pip install xgboost`. But it runs very slow with only single thread, what is going on?
+**Solution 0**: `clang LLVM` compiler on Mac OS X from Xcode doesn't support OpenMP multi-thread. An alternative choice is installing `homebrew` <http://brew.sh/> and `brew install g++-5` which provides multi-thread OpenMP support.
+
+**Trouble 1**: Can I install `clang-omp` for supporting OpenMP without using `gcc`?
+**Solution 1**: it is not support and may have linking errors.
--- a/python-package/setup.cfg
+++ b/python-package/setup.cfg
@@ -1,2 +1,2 @@
 [metadata]
-description-file = README.md
+description-file = README.rst
--- a/python-package/setup.py
+++ b/python-package/setup.py
@@ -2,21 +2,10 @@
 """Setup xgboost package."""
 from __future__ import absolute_import
 import sys
-from setuptools import setup, find_packages
-import subprocess
-sys.path.insert(0, '.')
-
 import os
-#build on the fly if install in pip
-#otherwise, use build.sh in the parent directory
-
-if 'pip' in __file__:
-    if not os.name == 'nt': #if not windows
-        build_sh = subprocess.Popen(['sh', 'xgboost/build-python.sh'])
-        build_sh.wait()
-        output = build_sh.communicate()
-        print(output)
-
+from setuptools import setup, find_packages
+#import subprocess
+sys.path.insert(0, '.')

 CURRENT_DIR = os.path.dirname(__file__)

@@ -28,16 +17,13 @@ libpath = {'__file__': libpath_py}
 exec(compile(open(libpath_py, "rb").read(), libpath_py, 'exec'), libpath, libpath)

 LIB_PATH = libpath['find_lib_path']()
-#print LIB_PATH

-#to deploy to pip, please use
-#make pythonpack
-#python setup.py register sdist upload
-#and be sure to test it firstly using "python setup.py register sdist upload -r pypitest"
+#Please use setup_pip.py for generating and deploying pip installation
+#detailed instruction in setup_pip.py
 setup(name='xgboost',
      version=open(os.path.join(CURRENT_DIR, 'xgboost/VERSION')).read().strip(),
-      #version='0.4a13',
-      description=open(os.path.join(CURRENT_DIR, 'README.md')).read(),
+      #version='0.4a23',
+      description=open(os.path.join(CURRENT_DIR, 'README.rst')).read(),
      install_requires=[
          'numpy',
          'scipy',
@@ -46,10 +32,6 @@ setup(name='xgboost',
      maintainer_email='phunter.lau@gmail.com',
      zip_safe=False,
      packages=find_packages(),
-      #don't need this and don't use this, give everything to MANIFEST.in
-      #package_dir = {'':'xgboost'},
-      #package_data = {'': ['*.txt','*.md','*.sh'],
-      #               }
      #this will use MANIFEST.in during install where we specify additional files,
      #this is the golden line
      include_package_data=True,
--- a/python-package/setup_pip.py
+++ b/python-package/setup_pip.py
@@ -0,0 +1,58 @@
+# pylint: disable=invalid-name, exec-used
+"""Setup xgboost package."""
+from __future__ import absolute_import
+import sys
+import os
+from setuptools import setup, find_packages
+#import subprocess
+sys.path.insert(0, '.')
+
+#this script is for packing and shipping pip installation
+#it builds xgboost code on the fly and packs for pip
+#please don't use this file for installing from github
+
+if os.name != 'nt': #if not windows, compile and install
+    os.system('sh ./xgboost/build-python.sh')
+else:
+    print('Windows users please use github installation.')
+    sys.exit()
+
+CURRENT_DIR = os.path.dirname(__file__)
+
+# We can not import `xgboost.libpath` in setup.py directly since xgboost/__init__.py
+# import `xgboost.core` and finally will import `numpy` and `scipy` which are setup
+# `install_requires`. That's why we're using `exec` here.
+libpath_py = os.path.join(CURRENT_DIR, 'xgboost/libpath.py')
+libpath = {'__file__': libpath_py}
+exec(compile(open(libpath_py, "rb").read(), libpath_py, 'exec'), libpath, libpath)
+
+LIB_PATH = libpath['find_lib_path']()
+
+#to deploy to pip, please use
+#make pythonpack
+#python setup.py register sdist upload
+#and be sure to test it firstly using "python setup.py register sdist upload -r pypitest"
+setup(name='xgboost',
+      #version=open(os.path.join(CURRENT_DIR, 'xgboost/VERSION')).read().strip(),
+      version='0.4a30',
+      description=open(os.path.join(CURRENT_DIR, 'README.rst')).read(),
+      install_requires=[
+          'numpy',
+          'scipy',
+      ],
+      maintainer='Hongliang Liu',
+      maintainer_email='phunter.lau@gmail.com',
+      zip_safe=False,
+      packages=find_packages(),
+      #don't need this and don't use this, give everything to MANIFEST.in
+      #package_dir = {'':'xgboost'},
+      #package_data = {'': ['*.txt','*.md','*.sh'],
+      #               }
+      #this will use MANIFEST.in during install where we specify additional files,
+      #this is the golden line
+      include_package_data=True,
+      #!!! don't use data_files for creating pip installation,
+      #otherwise install_data process will copy it to
+      #root directory for some machines, and cause confusions on building
+      #data_files=[('xgboost', LIB_PATH)],
+      url='https://github.com/dmlc/xgboost')
--- a/python-package/xgboost/init.py
+++ b/python-package/xgboost/init.py
@@ -10,8 +10,11 @@ import os

 from .core import DMatrix, Booster
 from .training import train, cv
-from .sklearn import XGBModel, XGBClassifier, XGBRegressor
-from .plotting import plot_importance, plot_tree, to_graphviz
+try:
+    from .sklearn import XGBModel, XGBClassifier, XGBRegressor
+    from .plotting import plot_importance, plot_tree, to_graphviz
+except ImportError:
+    print('Error when loading sklearn/plotting. Please install scikit-learn')

 VERSION_FILE = os.path.join(os.path.dirname(__file__), 'VERSION')
 __version__ = open(VERSION_FILE).read().strip()
--- a/python-package/xgboost/build-python.sh
+++ b/python-package/xgboost/build-python.sh
@@ -10,7 +10,11 @@
 #       conflict with build.sh which is for everything. 


-pushd xgboost
+#pushd xgboost
+oldpath=`pwd`
+cd ./xgboost/
+#remove the pre-compiled .so and trigger the system's on-the-fly compiling
+make clean
 if make python; then
    echo "Successfully build multi-thread xgboost"
 else
@@ -23,4 +27,4 @@ else
    echo "If you want multi-threaded version"
    echo "See additional instructions in doc/build.md"
 fi
-popd
+cd $oldpath
--- a/python-package/xgboost/compat.py
+++ b/python-package/xgboost/compat.py
@@ -0,0 +1,47 @@
+# coding: utf-8
+# pylint: disable=unused-import, invalid-name, wrong-import-position
+"""For compatibility"""
+
+from __future__ import absolute_import
+
+import sys
+
+
+PY3 = (sys.version_info[0] == 3)
+
+if PY3:
+    # pylint: disable=invalid-name, redefined-builtin
+    STRING_TYPES = str,
+else:
+    # pylint: disable=invalid-name
+    STRING_TYPES = basestring,
+
+# pandas
+try:
+    from pandas import DataFrame
+    PANDAS_INSTALLED = True
+except ImportError:
+
+    class DataFrame(object):
+        """ dummy for pandas.DataFrame """
+        pass
+
+    PANDAS_INSTALLED = False
+
+# sklearn
+try:
+    from sklearn.base import BaseEstimator
+    from sklearn.base import RegressorMixin, ClassifierMixin
+    from sklearn.preprocessing import LabelEncoder
+    SKLEARN_INSTALLED = True
+
+    XGBModelBase = BaseEstimator
+    XGBRegressorBase = RegressorMixin
+    XGBClassifierBase = ClassifierMixin
+except ImportError:
+    SKLEARN_INSTALLED = False
+
+    # used for compatiblity without sklearn
+    XGBModelBase = object
+    XGBClassifierBase = object
+    XGBRegressorBase = object
--- a/python-package/xgboost/core.py
+++ b/python-package/xgboost/core.py
@@ -4,7 +4,6 @@
 from __future__ import absolute_import

 import os
-import sys
 import ctypes
 import collections

@@ -13,20 +12,12 @@ import scipy.sparse

 from .libpath import find_lib_path

+from .compat import STRING_TYPES, PY3, DataFrame

 class XGBoostError(Exception):
    """Error throwed by xgboost trainer."""
    pass

-PY3 = (sys.version_info[0] == 3)
-
-if PY3:
-    # pylint: disable=invalid-name, redefined-builtin
-    STRING_TYPES = str,
-else:
-    # pylint: disable=invalid-name
-    STRING_TYPES = basestring,
-

 def from_pystr_to_cstr(data):
    """Convert a list of Python str to C pointer
@@ -138,28 +129,50 @@ def c_array(ctype, values):
    return (ctype * len(values))(*values)


-def _maybe_from_pandas(data, feature_names, feature_types):
-    """ Extract internal data from pd.DataFrame """
-    try:
-        import pandas as pd
-    except ImportError:
+
+PANDAS_DTYPE_MAPPER = {'int8': 'int', 'int16': 'int', 'int32': 'int', 'int64': 'int',
+                       'uint8': 'int', 'uint16': 'int', 'uint32': 'int', 'uint64': 'int',
+                       'float16': 'float', 'float32': 'float', 'float64': 'float',
+                       'bool': 'i'}
+
+
+def _maybe_pandas_data(data, feature_names, feature_types):
+    """ Extract internal data from pd.DataFrame for DMatrix data """
+
+    if not isinstance(data, DataFrame):
        return data, feature_names, feature_types

-    if not isinstance(data, pd.DataFrame):
-        return data, feature_names, feature_types
-
-    dtypes = data.dtypes
-    if not all(dtype.name in ('int64', 'float64', 'bool') for dtype in dtypes):
-        raise ValueError('DataFrame.dtypes must be int, float or bool')
+    data_dtypes = data.dtypes
+    if not all(dtype.name in PANDAS_DTYPE_MAPPER for dtype in data_dtypes):
+        raise ValueError('DataFrame.dtypes for data must be int, float or bool')

    if feature_names is None:
        feature_names = data.columns.format()
+
    if feature_types is None:
-        mapper = {'int64': 'int', 'float64': 'q', 'bool': 'i'}
-        feature_types = [mapper[dtype.name] for dtype in dtypes]
+        feature_types = [PANDAS_DTYPE_MAPPER[dtype.name] for dtype in data_dtypes]
+
    data = data.values.astype('float')
+
    return data, feature_names, feature_types

+
+def _maybe_pandas_label(label):
+    """ Extract internal data from pd.DataFrame for DMatrix label """
+
+    if isinstance(label, DataFrame):
+        if len(label.columns) > 1:
+            raise ValueError('DataFrame for label cannot have multiple columns')
+
+        label_dtypes = label.dtypes
+        if not all(dtype.name in PANDAS_DTYPE_MAPPER for dtype in label_dtypes):
+            raise ValueError('DataFrame.dtypes for label must be int, float or bool')
+        else:
+            label = label.values.astype('float')
+    # pd.Series can be passed to xgb as it is
+
+    return label
+
 class DMatrix(object):
    """Data Matrix used in XGBoost.

@@ -192,20 +205,19 @@ class DMatrix(object):
        silent : boolean, optional
            Whether print messages during construction
        feature_names : list, optional
-            Labels for features.
+            Set names for features.
        feature_types : list, optional
-            Labels for features.
+            Set types for features.
        """
        # force into void_p, mac need to pass things in as void_p
        if data is None:
            self.handle = None
            return

-        klass = getattr(getattr(data, '__class__', None), '__name__', None)
-        if klass == 'DataFrame':
-            # once check class name to avoid unnecessary pandas import
-            data, feature_names, feature_types = _maybe_from_pandas(data, feature_names,
-                                                                    feature_types)
+        data, feature_names, feature_types = _maybe_pandas_data(data,
+                                                                feature_names,
+                                                                feature_types)
+        label = _maybe_pandas_label(label)

        if isinstance(data, STRING_TYPES):
            self.handle = ctypes.c_void_p()
@@ -223,7 +235,7 @@ class DMatrix(object):
                csr = scipy.sparse.csr_matrix(data)
                self._init_from_csr(csr)
            except:
-                raise TypeError('can not intialize DMatrix from {}'.format(type(data).__name__))
+                raise TypeError('can not initialize DMatrix from {}'.format(type(data).__name__))
        if label is not None:
            self.set_label(label)
        if weight is not None:
@@ -511,7 +523,7 @@ class DMatrix(object):
        feature_names : list or None
            Labels for features. None will reset existing feature names
        """
-        if not feature_names is None:
+        if feature_names is not None:
            # validate feature name
            if not isinstance(feature_names, list):
                feature_names = list(feature_names)
@@ -520,10 +532,11 @@ class DMatrix(object):
            if len(feature_names) != self.num_col():
                msg = 'feature_names must have the same length as data'
                raise ValueError(msg)
-            # prohibit to use symbols may affect to parse. e.g. ``[]=.``
-            if not all(isinstance(f, STRING_TYPES) and f.isalnum()
+            # prohibit to use symbols may affect to parse. e.g. []<
+            if not all(isinstance(f, STRING_TYPES) and
+                       not any(x in f for x in set(('[', ']', '<')))
                       for f in feature_names):
-                raise ValueError('all feature_names must be alphanumerics')
+                raise ValueError('feature_names may not contain [, ] or <')
        else:
            # reset feature_types also
            self.feature_types = None
@@ -541,7 +554,7 @@ class DMatrix(object):
        feature_types : list or None
            Labels for features. None will reset existing feature names
        """
-        if not feature_types is None:
+        if feature_types is not None:

            if self.feature_names is None:
                msg = 'Unable to set feature types before setting names'
@@ -556,12 +569,11 @@ class DMatrix(object):
            if len(feature_types) != self.num_col():
                msg = 'feature_types must have the same length as data'
                raise ValueError(msg)
-            # prohibit to use symbols may affect to parse. e.g. ``[]=.``

-            valid = ('q', 'i', 'int', 'float')
+            valid = ('int', 'float', 'i', 'q')
            if not all(isinstance(f, STRING_TYPES) and f in valid
                       for f in feature_types):
-                raise ValueError('all feature_names must be {i, q, int, float}')
+                raise ValueError('All feature_names must be {int, float, i, q}')
        self._feature_types = feature_types


@@ -745,8 +757,13 @@ class Booster(object):
        else:
            res = '[%d]' % iteration
            for dmat, evname in evals:
-                name, val = feval(self.predict(dmat), dmat)
-                res += '\t%s-%s:%f' % (evname, name, val)
+                feval_ret = feval(self.predict(dmat), dmat)
+                if isinstance(feval_ret, list):
+                    for name, val in feval_ret:
+                        res += '\t%s-%s:%f' % (evname, name, val)
+                else:
+                    name, val = feval_ret
+                    res += '\t%s-%s:%f' % (evname, name, val)
            return res

    def eval(self, data, name='eval', iteration=0):
@@ -873,6 +890,7 @@ class Booster(object):
            _check_call(_LIB.XGBoosterLoadModelFromBuffer(self.handle, ptr, length))

    def dump_model(self, fout, fmap='', with_stats=False):
+        # pylint: disable=consider-using-enumerate
        """
        Dump model into a text file.

--- a/python-package/xgboost/libpath.py
+++ b/python-package/xgboost/libpath.py
@@ -36,9 +36,10 @@ def find_lib_path():
    else:
        dll_path = [os.path.join(p, 'libxgboostwrapper.so') for p in dll_path]
    lib_path = [p for p in dll_path if os.path.exists(p) and os.path.isfile(p)]
+    #From github issues, most of installation errors come from machines w/o compilers
    if len(lib_path) == 0 and not os.environ.get('XGBOOST_BUILD_DOC', False):
        raise XGBoostLibraryNotFound(
            'Cannot find XGBoost Libarary in the candicate path, ' +
-            'did you run build.sh in root path?\n'
+            'did you install compilers and run build.sh in root path?\n'
            'List of candidates:\n' + ('\n'.join(dll_path)))
    return lib_path
--- a/python-package/xgboost/plotting.py
+++ b/python-package/xgboost/plotting.py
@@ -5,13 +5,13 @@
 from __future__ import absolute_import

 import re
+from io import BytesIO
 import numpy as np
 from .core import Booster
-
-from io import BytesIO
+from .sklearn import XGBModel

 def plot_importance(booster, ax=None, height=0.2,
-                    xlim=None, title='Feature importance',
+                    xlim=None, ylim=None, title='Feature importance',
                    xlabel='F score', ylabel='Features',
                    grid=True, **kwargs):

@@ -19,14 +19,16 @@ def plot_importance(booster, ax=None, height=0.2,

    Parameters
    ----------
-    booster : Booster or dict
-        Booster instance, or dict taken by Booster.get_fscore()
+    booster : Booster, XGBModel or dict
+        Booster or XGBModel instance, or dict taken by Booster.get_fscore()
    ax : matplotlib Axes, default None
        Target axes instance. If None, new figure and axes will be created.
    height : float, default 0.2
        Bar height, passed to ax.barh()
    xlim : tuple, default None
        Tuple passed to axes.xlim()
+    ylim : tuple, default None
+        Tuple passed to axes.ylim()
    title : str, default "Feature importance"
        Axes title. To disable, pass None.
    xlabel : str, default "F score"
@@ -46,12 +48,14 @@ def plot_importance(booster, ax=None, height=0.2,
    except ImportError:
        raise ImportError('You must install matplotlib to plot importance')

-    if isinstance(booster, Booster):
+    if isinstance(booster, XGBModel):
+        importance = booster.booster().get_fscore()
+    elif isinstance(booster, Booster):
        importance = booster.get_fscore()
    elif isinstance(booster, dict):
        importance = booster
    else:
-        raise ValueError('tree must be Booster or dict instance')
+        raise ValueError('tree must be Booster, XGBModel or dict instance')

    if len(importance) == 0:
        raise ValueError('Booster.get_fscore() results in empty')
@@ -73,12 +77,19 @@ def plot_importance(booster, ax=None, height=0.2,
    ax.set_yticklabels(labels)

    if xlim is not None:
-        if not isinstance(xlim, tuple) or len(xlim, 2):
+        if not isinstance(xlim, tuple) or len(xlim) != 2:
            raise ValueError('xlim must be a tuple of 2 elements')
    else:
        xlim = (0, max(values) * 1.1)
    ax.set_xlim(xlim)

+    if ylim is not None:
+        if not isinstance(ylim, tuple) or len(ylim) != 2:
+            raise ValueError('ylim must be a tuple of 2 elements')
+    else:
+        ylim = (-1, len(importance))
+    ax.set_ylim(ylim)
+
    if title is not None:
        ax.set_title(title)
    if xlabel is not None:
@@ -142,8 +153,8 @@ def to_graphviz(booster, num_trees=0, rankdir='UT',

    Parameters
    ----------
-    booster : Booster
-        Booster instance
+    booster : Booster, XGBModel
+        Booster or XGBModel instance
    num_trees : int, default 0
        Specify the ordinal number of target tree
    rankdir : str, default "UT"
@@ -165,8 +176,11 @@ def to_graphviz(booster, num_trees=0, rankdir='UT',
    except ImportError:
        raise ImportError('You must install graphviz to plot tree')

-    if not isinstance(booster, Booster):
-        raise ValueError('booster must be Booster instance')
+    if not isinstance(booster, (Booster, XGBModel)):
+        raise ValueError('booster must be Booster or XGBModel instance')
+
+    if isinstance(booster, XGBModel):
+        booster = booster.booster()

    tree = booster.get_dump()[num_trees]
    tree = tree.split()
@@ -193,8 +207,8 @@ def plot_tree(booster, num_trees=0, rankdir='UT', ax=None, **kwargs):

    Parameters
    ----------
-    booster : Booster
-        Booster instance
+    booster : Booster, XGBModel
+        Booster or XGBModel instance
    num_trees : int, default 0
        Specify the ordinal number of target tree
    rankdir : str, default "UT"
@@ -216,7 +230,6 @@ def plot_tree(booster, num_trees=0, rankdir='UT', ax=None, **kwargs):
    except ImportError:
        raise ImportError('You must install matplotlib to plot tree')

-
    if ax is None:
        _, ax = plt.subplots(1, 1)

--- a/python-package/xgboost/sklearn.py
+++ b/python-package/xgboost/sklearn.py
@@ -7,23 +7,9 @@ import numpy as np
 from .core import Booster, DMatrix, XGBoostError
 from .training import train

-try:
-    from sklearn.base import BaseEstimator
-    from sklearn.base import RegressorMixin, ClassifierMixin
-    from sklearn.preprocessing import LabelEncoder
-    SKLEARN_INSTALLED = True
-except ImportError:
-    SKLEARN_INSTALLED = False
+from .compat import (SKLEARN_INSTALLED, XGBModelBase,
+                     XGBClassifierBase, XGBRegressorBase, LabelEncoder)

-# used for compatiblity without sklearn
-XGBModelBase = object
-XGBClassifierBase = object
-XGBRegressorBase = object
-
-if SKLEARN_INSTALLED:
-    XGBModelBase = BaseEstimator
-    XGBRegressorBase = RegressorMixin
-    XGBClassifierBase = ClassifierMixin

 class XGBModel(XGBModelBase):
    # pylint: disable=too-many-arguments, too-many-instance-attributes, invalid-name
@@ -54,6 +40,14 @@ class XGBModel(XGBModelBase):
        Subsample ratio of the training instance.
    colsample_bytree : float
        Subsample ratio of columns when constructing each tree.
+    colsample_bylevel : float
+        Subsample ratio of columns for each split, in each level.
+    reg_alpha : float (xgb's alpha)
+        L2 regularization term on weights
+    reg_lambda : float (xgb's lambda)
+        L1 regularization term on weights
+    scale_pos_weight : float
+        Balancing of positive and negative weights.

    base_score:
        The initial prediction score of all instances, global bias.
@@ -66,7 +60,8 @@ class XGBModel(XGBModelBase):
    def __init__(self, max_depth=3, learning_rate=0.1, n_estimators=100,
                 silent=True, objective="reg:linear",
                 nthread=-1, gamma=0, min_child_weight=1, max_delta_step=0,
-                 subsample=1, colsample_bytree=1,
+                 subsample=1, colsample_bytree=1, colsample_bylevel=1,
+                 reg_alpha=0, reg_lambda=1, scale_pos_weight=1,
                 base_score=0.5, seed=0, missing=None):
        if not SKLEARN_INSTALLED:
            raise XGBoostError('sklearn needs to be installed in order to use this module')
@@ -82,6 +77,10 @@ class XGBModel(XGBModelBase):
        self.max_delta_step = max_delta_step
        self.subsample = subsample
        self.colsample_bytree = colsample_bytree
+        self.colsample_bylevel = colsample_bylevel
+        self.reg_alpha = reg_alpha
+        self.reg_lambda = reg_lambda
+        self.scale_pos_weight = scale_pos_weight

        self.base_score = base_score
        self.seed = seed
@@ -131,7 +130,7 @@ class XGBModel(XGBModelBase):

    def fit(self, X, y, eval_set=None, eval_metric=None,
            early_stopping_rounds=None, verbose=True):
-        # pylint: disable=missing-docstring,invalid-name,attribute-defined-outside-init
+        # pylint: disable=missing-docstring,invalid-name,attribute-defined-outside-init, redefined-variable-type
        """
        Fit the gradient boosting model

@@ -165,7 +164,7 @@ class XGBModel(XGBModelBase):
        """
        trainDmatrix = DMatrix(X, label=y, missing=self.missing)

-        eval_results = {}
+        evals_result = {}
        if eval_set is not None:
            evals = list(DMatrix(x[0], label=x[1]) for x in eval_set)
            evals = list(zip(evals, ["validation_{}".format(i) for i in
@@ -185,23 +184,62 @@ class XGBModel(XGBModelBase):
        self._Booster = train(params, trainDmatrix,
                              self.n_estimators, evals=evals,
                              early_stopping_rounds=early_stopping_rounds,
-                              evals_result=eval_results, feval=feval,
+                              evals_result=evals_result, feval=feval,
                              verbose_eval=verbose)
-        if eval_results:
-            eval_results = {k: np.array(v, dtype=float)
-                            for k, v in eval_results.items()}
-            eval_results = {k: np.array(v) for k, v in eval_results.items()}
-            self.eval_results = eval_results
+
+        if evals_result:
+            for val in evals_result.items():
+                evals_result_key = list(val[1].keys())[0]
+                evals_result[val[0]][evals_result_key] = val[1][evals_result_key]
+            self.evals_result_ = evals_result

        if early_stopping_rounds is not None:
            self.best_score = self._Booster.best_score
            self.best_iteration = self._Booster.best_iteration
        return self

-    def predict(self, data):
+    def predict(self, data, output_margin=False, ntree_limit=0):
        # pylint: disable=missing-docstring,invalid-name
        test_dmatrix = DMatrix(data, missing=self.missing)
-        return self.booster().predict(test_dmatrix)
+        return self.booster().predict(test_dmatrix,
+                                      output_margin=output_margin,
+                                      ntree_limit=ntree_limit)
+
+    def evals_result(self):
+        """Return the evaluation results.
+
+        If eval_set is passed to the `fit` function, you can call evals_result() to
+        get evaluation results for all passed eval_sets. When eval_metric is also
+        passed to the `fit` function, the evals_result will contain the eval_metrics
+        passed to the `fit` function
+
+        Returns
+        -------
+        evals_result : dictionary
+
+        Example
+        -------
+        param_dist = {'objective':'binary:logistic', 'n_estimators':2}
+
+        clf = xgb.XGBModel(**param_dist)
+
+        clf.fit(X_train, y_train,
+                eval_set=[(X_train, y_train), (X_test, y_test)],
+                eval_metric='logloss',
+                verbose=True)
+
+        evals_result = clf.evals_result()
+
+        The variable evals_result will contain:
+        {'validation_0': {'logloss': ['0.604835', '0.531479']},
+         'validation_1': {'logloss': ['0.41965', '0.17686']}}
+        """
+        if self.evals_result_:
+            evals_result = self.evals_result_
+        else:
+            raise XGBoostError('No results.')
+
+        return evals_result


 class XGBClassifier(XGBModel, XGBClassifierBase):
@@ -214,18 +252,20 @@ class XGBClassifier(XGBModel, XGBClassifierBase):
                 n_estimators=100, silent=True,
                 objective="binary:logistic",
                 nthread=-1, gamma=0, min_child_weight=1,
-                 max_delta_step=0, subsample=1, colsample_bytree=1,
+                 max_delta_step=0, subsample=1, colsample_bytree=1, colsample_bylevel=1,
+                 reg_alpha=0, reg_lambda=1, scale_pos_weight=1,
                 base_score=0.5, seed=0, missing=None):
        super(XGBClassifier, self).__init__(max_depth, learning_rate,
                                            n_estimators, silent, objective,
                                            nthread, gamma, min_child_weight,
                                            max_delta_step, subsample,
-                                            colsample_bytree,
-                                            base_score, seed, missing)
+                                            colsample_bytree, colsample_bylevel,
+                                            reg_alpha, reg_lambda,
+                                            scale_pos_weight, base_score, seed, missing)

    def fit(self, X, y, sample_weight=None, eval_set=None, eval_metric=None,
            early_stopping_rounds=None, verbose=True):
-        # pylint: disable = attribute-defined-outside-init,arguments-differ
+        # pylint: disable = attribute-defined-outside-init,arguments-differ, redefined-variable-type
        """
        Fit gradient boosting classifier

@@ -259,7 +299,7 @@ class XGBClassifier(XGBModel, XGBClassifierBase):
            If `verbose` and an evaluation set is used, writes the evaluation
            metric measured on the validation set to stderr.
        """
-        eval_results = {}
+        evals_result = {}
        self.classes_ = list(np.unique(y))
        self.n_classes_ = len(self.classes_)
        if self.n_classes_ > 2:
@@ -299,13 +339,14 @@ class XGBClassifier(XGBModel, XGBClassifierBase):
        self._Booster = train(xgb_options, train_dmatrix, self.n_estimators,
                              evals=evals,
                              early_stopping_rounds=early_stopping_rounds,
-                              evals_result=eval_results, feval=feval,
+                              evals_result=evals_result, feval=feval,
                              verbose_eval=verbose)

-        if eval_results:
-            eval_results = {k: np.array(v, dtype=float)
-                            for k, v in eval_results.items()}
-            self.eval_results = eval_results
+        if evals_result:
+            for val in evals_result.items():
+                evals_result_key = list(val[1].keys())[0]
+                evals_result[val[0]][evals_result_key] = val[1][evals_result_key]
+            self.evals_result_ = evals_result

        if early_stopping_rounds is not None:
            self.best_score = self._Booster.best_score
@@ -313,9 +354,11 @@ class XGBClassifier(XGBModel, XGBClassifierBase):

        return self

-    def predict(self, data):
+    def predict(self, data, output_margin=False, ntree_limit=0):
        test_dmatrix = DMatrix(data, missing=self.missing)
-        class_probs = self.booster().predict(test_dmatrix)
+        class_probs = self.booster().predict(test_dmatrix,
+                                             output_margin=output_margin,
+                                             ntree_limit=ntree_limit)
        if len(class_probs.shape) > 1:
            column_indexes = np.argmax(class_probs, axis=1)
        else:
@@ -323,9 +366,11 @@ class XGBClassifier(XGBModel, XGBClassifierBase):
            column_indexes[class_probs > 0.5] = 1
        return self._le.inverse_transform(column_indexes)

-    def predict_proba(self, data):
+    def predict_proba(self, data, output_margin=False, ntree_limit=0):
        test_dmatrix = DMatrix(data, missing=self.missing)
-        class_probs = self.booster().predict(test_dmatrix)
+        class_probs = self.booster().predict(test_dmatrix,
+                                             output_margin=output_margin,
+                                             ntree_limit=ntree_limit)
        if self.objective == "multi:softprob":
            return class_probs
        else:
@@ -333,6 +378,42 @@ class XGBClassifier(XGBModel, XGBClassifierBase):
            classzero_probs = 1.0 - classone_probs
            return np.vstack((classzero_probs, classone_probs)).transpose()

+    def evals_result(self):
+        """Return the evaluation results.
+
+        If eval_set is passed to the `fit` function, you can call evals_result() to
+        get evaluation results for all passed eval_sets. When eval_metric is also
+        passed to the `fit` function, the evals_result will contain the eval_metrics
+        passed to the `fit` function
+
+        Returns
+        -------
+        evals_result : dictionary
+
+        Example
+        -------
+        param_dist = {'objective':'binary:logistic', 'n_estimators':2}
+
+        clf = xgb.XGBClassifier(**param_dist)
+
+        clf.fit(X_train, y_train,
+                eval_set=[(X_train, y_train), (X_test, y_test)],
+                eval_metric='logloss',
+                verbose=True)
+
+        evals_result = clf.evals_result()
+
+        The variable evals_result will contain:
+        {'validation_0': {'logloss': ['0.604835', '0.531479']},
+         'validation_1': {'logloss': ['0.41965', '0.17686']}}
+        """
+        if self.evals_result_:
+            evals_result = self.evals_result_
+        else:
+            raise XGBoostError('No results.')
+
+        return evals_result
+
 class XGBRegressor(XGBModel, XGBRegressorBase):
    # pylint: disable=missing-docstring
    __doc__ = """Implementation of the scikit-learn API for XGBoost regression.
--- a/python-package/xgboost/training.py
+++ b/python-package/xgboost/training.py
@@ -10,7 +10,8 @@ import numpy as np
 from .core import Booster, STRING_TYPES

 def train(params, dtrain, num_boost_round=10, evals=(), obj=None, feval=None,
-          early_stopping_rounds=None, evals_result=None, verbose_eval=True):
+          maximize=False, early_stopping_rounds=None, evals_result=None,
+          verbose_eval=True, learning_rates=None, xgb_model=None):
    # pylint: disable=too-many-statements,too-many-branches, attribute-defined-outside-init
    """Train a booster with given parameters.

@@ -29,26 +30,83 @@ def train(params, dtrain, num_boost_round=10, evals=(), obj=None, feval=None,
        Customized objective function.
    feval : function
        Customized evaluation function.
+    maximize : bool
+        Whether to maximize feval.
    early_stopping_rounds: int
        Activates early stopping. Validation error needs to decrease at least
        every <early_stopping_rounds> round(s) to continue training.
        Requires at least one item in evals.
        If there's more than one, will use the last.
        Returns the model from the last iteration (not the best one).
-        If early stopping occurs, the model will have two additional fields:
-        bst.best_score and bst.best_iteration.
+        If early stopping occurs, the model will have three additional fields:
+        bst.best_score, bst.best_iteration and bst.best_ntree_limit.
+        (Use bst.best_ntree_limit to get the correct value if num_parallel_tree
+        and/or num_class appears in the parameters)
    evals_result: dict
-        This dictionary stores the evaluation results of all the items in watchlist
-    verbose_eval : bool
-        If `verbose_eval` then the evaluation metric on the validation set, if
-        given, is printed at each boosting stage.
+        This dictionary stores the evaluation results of all the items in watchlist.
+        Example: with a watchlist containing [(dtest,'eval'), (dtrain,'train')] and
+        and a paramater containing ('eval_metric', 'logloss')
+        Returns: {'train': {'logloss': ['0.48253', '0.35953']},
+                  'eval': {'logloss': ['0.480385', '0.357756']}}
+    verbose_eval : bool or int
+        Requires at least one item in evals.
+        If `verbose_eval` is True then the evaluation metric on the validation set is
+        printed at each boosting stage.
+        If `verbose_eval` is an integer then the evaluation metric on the validation set
+        is printed at every given `verbose_eval` boosting stage. The last boosting stage
+        / the boosting stage found by using `early_stopping_rounds` is also printed.
+        Example: with verbose_eval=4 and at least one item in evals, an evaluation metric
+        is printed every 4 boosting stages, instead of every boosting stage.
+    learning_rates: list or function
+        List of learning rate for each boosting round
+        or a customized function that calculates eta in terms of
+        current number of round and the total number of boosting round (e.g. yields
+        learning rate decay)
+        - list l: eta = l[boosting round]
+        - function f: eta = f(boosting round, num_boost_round)
+    xgb_model : file name of stored xgb model or 'Booster' instance
+        Xgb model to be loaded before training (allows training continuation).

    Returns
    -------
    booster : a trained booster model
    """
    evals = list(evals)
+    if isinstance(params, dict) \
+            and 'eval_metric' in params \
+            and isinstance(params['eval_metric'], list):
+        params = dict((k, v) for k, v in params.items())
+        eval_metrics = params['eval_metric']
+        params.pop("eval_metric", None)
+        params = list(params.items())
+        for eval_metric in eval_metrics:
+            params += [('eval_metric', eval_metric)]
+
    bst = Booster(params, [dtrain] + [d[0] for d in evals])
+    nboost = 0
+    num_parallel_tree = 1
+
+    if isinstance(verbose_eval, bool):
+        verbose_eval_every_line = False
+    else:
+        if isinstance(verbose_eval, int):
+            verbose_eval_every_line = verbose_eval
+            verbose_eval = True if verbose_eval_every_line > 0 else False
+
+    if xgb_model is not None:
+        if not isinstance(xgb_model, STRING_TYPES):
+            xgb_model = xgb_model.save_raw()
+        bst = Booster(params, [dtrain] + [d[0] for d in evals], model_file=xgb_model)
+        nboost = len(bst.get_dump())
+    else:
+        bst = Booster(params, [dtrain] + [d[0] for d in evals])
+
+    _params = dict(params) if isinstance(params, list) else params
+    if 'num_parallel_tree' in _params:
+        num_parallel_tree = _params['num_parallel_tree']
+        nboost //= num_parallel_tree
+    if 'num_class' in _params:
+        nboost //= _params['num_class']

    if evals_result is not None:
        if not isinstance(evals_result, dict):
@@ -56,11 +114,12 @@ def train(params, dtrain, num_boost_round=10, evals=(), obj=None, feval=None,
        else:
            evals_name = [d[1] for d in evals]
            evals_result.clear()
-            evals_result.update({key: [] for key in evals_name})
+            evals_result.update(dict([(key, {}) for key in evals_name]))

    if not early_stopping_rounds:
        for i in range(num_boost_round):
            bst.update(dtrain, i, obj)
+            nboost += 1
            if len(evals) != 0:
                bst_eval_set = bst.eval_set(evals, i, feval)
                if isinstance(bst_eval_set, STRING_TYPES):
@@ -69,11 +128,27 @@ def train(params, dtrain, num_boost_round=10, evals=(), obj=None, feval=None,
                    msg = bst_eval_set.decode()

                if verbose_eval:
-                    sys.stderr.write(msg + '\n')
+                    if verbose_eval_every_line:
+                        if i % verbose_eval_every_line == 0 or i == num_boost_round - 1:
+                            sys.stderr.write(msg + '\n')
+                    else:
+                        sys.stderr.write(msg + '\n')
+
                if evals_result is not None:
-                    res = re.findall(":-?([0-9.]+).", msg)
-                    for key, val in zip(evals_name, res):
-                        evals_result[key].append(val)
+                    res = re.findall("([0-9a-zA-Z@]+[-]*):-?([0-9.]+).", msg)
+                    for key in evals_name:
+                        evals_idx = evals_name.index(key)
+                        res_per_eval = len(res) // len(evals_name)
+                        for r in range(res_per_eval):
+                            res_item = res[(evals_idx*res_per_eval) + r]
+                            res_key = res_item[0]
+                            res_val = res_item[1]
+                            if res_key in evals_result[key]:
+                                evals_result[key][res_key].append(res_val)
+                            else:
+                                evals_result[key][res_key] = [res_val]
+        bst.best_iteration = (nboost - 1)
+        bst.best_ntree_limit = nboost * num_parallel_tree
        return bst

    else:
@@ -81,15 +156,18 @@ def train(params, dtrain, num_boost_round=10, evals=(), obj=None, feval=None,
        if len(evals) < 1:
            raise ValueError('For early stopping you need at least one set in evals.')

-        sys.stderr.write("Will train until {} error hasn't decreased in {} rounds.\n".format(\
+        if verbose_eval:
+            sys.stderr.write("Will train until {} error hasn't decreased in {} rounds.\n".format(\
                evals[-1][1], early_stopping_rounds))

        # is params a list of tuples? are we using multiple eval metrics?
        if isinstance(params, list):
            if len(params) != len(dict(params).items()):
-                raise ValueError('Check your params.'\
-                                     'Early stopping works with single eval metric only.')
-            params = dict(params)
+                params = dict(params)
+                sys.stderr.write("Multiple eval metrics have been passed: " \
+                "'{0}' will be used for early stopping.\n\n".format(params['eval_metric']))
+            else:
+                params = dict(params)

        # either minimize loss or maximize AUC/MAP/NDCG
        maximize_score = False
@@ -97,6 +175,8 @@ def train(params, dtrain, num_boost_round=10, evals=(), obj=None, feval=None,
            maximize_metrics = ('auc', 'map', 'ndcg')
            if any(params['eval_metric'].startswith(x) for x in maximize_metrics):
                maximize_score = True
+        if feval is not None:
+            maximize_score = maximize

        if maximize_score:
            best_score = 0.0
@@ -104,10 +184,19 @@ def train(params, dtrain, num_boost_round=10, evals=(), obj=None, feval=None,
            best_score = float('inf')

        best_msg = ''
-        best_score_i = 0
+        best_score_i = (nboost - 1)
+
+        if isinstance(learning_rates, list) and len(learning_rates) != num_boost_round:
+            raise ValueError("Length of list 'learning_rates' has to equal 'num_boost_round'.")

        for i in range(num_boost_round):
+            if learning_rates is not None:
+                if isinstance(learning_rates, list):
+                    bst.set_param({'eta': learning_rates[i]})
+                else:
+                    bst.set_param({'eta': learning_rates(i, num_boost_round)})
            bst.update(dtrain, i, obj)
+            nboost += 1
            bst_eval_set = bst.eval_set(evals, i, feval)

            if isinstance(bst_eval_set, STRING_TYPES):
@@ -116,26 +205,41 @@ def train(params, dtrain, num_boost_round=10, evals=(), obj=None, feval=None,
                msg = bst_eval_set.decode()

            if verbose_eval:
-                sys.stderr.write(msg + '\n')
+                if verbose_eval_every_line:
+                    if i % verbose_eval_every_line == 0 or i == num_boost_round - 1:
+                        sys.stderr.write(msg + '\n')
+                else:
+                    sys.stderr.write(msg + '\n')

            if evals_result is not None:
-                res = re.findall(":-?([0-9.]+).", msg)
-                for key, val in zip(evals_name, res):
-                    evals_result[key].append(val)
+                res = re.findall("([0-9a-zA-Z@]+[-]*):-?([0-9.]+).", msg)
+                for key in evals_name:
+                    evals_idx = evals_name.index(key)
+                    res_per_eval = len(res) // len(evals_name)
+                    for r in range(res_per_eval):
+                        res_item = res[(evals_idx*res_per_eval) + r]
+                        res_key = res_item[0]
+                        res_val = res_item[1]
+                        if res_key in evals_result[key]:
+                            evals_result[key][res_key].append(res_val)
+                        else:
+                            evals_result[key][res_key] = [res_val]

            score = float(msg.rsplit(':', 1)[1])
            if (maximize_score and score > best_score) or \
                    (not maximize_score and score < best_score):
                best_score = score
-                best_score_i = i
+                best_score_i = (nboost - 1)
                best_msg = msg
            elif i - best_score_i >= early_stopping_rounds:
-                sys.stderr.write("Stopping. Best iteration:\n{}\n\n".format(best_msg))
+                if verbose_eval:
+                    sys.stderr.write("Stopping. Best iteration:\n{}\n\n".format(best_msg))
                bst.best_score = best_score
                bst.best_iteration = best_score_i
                break
        bst.best_score = best_score
        bst.best_iteration = best_score_i
+        bst.best_ntree_limit = (bst.best_iteration + 1) * num_parallel_tree
        return bst


@@ -179,11 +283,14 @@ def mknfold(dall, nfold, param, seed, evals=(), fpreproc=None):
        ret.append(CVPack(dtrain, dtest, plst))
    return ret

-
-def aggcv(rlist, show_stdv=True, show_progress=None, as_pandas=True):
+def aggcv(rlist, show_stdv=True, show_progress=None, as_pandas=True, trial=0):
    # pylint: disable=invalid-name
    """
    Aggregate cross-validation results.
+    
+    If show_progress is true, progress is displayed in every call. If
+    show_progress is an integer, progress will only be displayed every
+    `show_progress` trees, tracked via trial.
    """
    cvmap = {}
    idx = rlist[0].split()[0]
@@ -217,8 +324,6 @@ def aggcv(rlist, show_stdv=True, show_progress=None, as_pandas=True):
        index.extend([k + '-mean', k + '-std'])
        results.extend([mean, std])

-
-
    if as_pandas:
        try:
            import pandas as pd
@@ -232,15 +337,16 @@ def aggcv(rlist, show_stdv=True, show_progress=None, as_pandas=True):
        if show_progress is None:
            show_progress = True

-    if show_progress:
+    if (isinstance(show_progress, int) and trial % show_progress == 0) or (isinstance(show_progress, bool) and show_progress):
        sys.stderr.write(msg + '\n')
+        sys.stderr.flush()

    return results


 def cv(params, dtrain, num_boost_round=10, nfold=3, metrics=(),
-       obj=None, feval=None, fpreproc=None, as_pandas=True,
-       show_progress=None, show_stdv=True, seed=0):
+       obj=None, feval=None, maximize=False, early_stopping_rounds=None,
+       fpreproc=None, as_pandas=True, show_progress=None, show_stdv=True, seed=0):
    # pylint: disable = invalid-name
    """Cross-validation with given paramaters.

@@ -260,15 +366,23 @@ def cv(params, dtrain, num_boost_round=10, nfold=3, metrics=(),
        Custom objective function.
    feval : function
        Custom evaluation function.
+    maximize : bool
+        Whether to maximize feval.
+    early_stopping_rounds: int
+        Activates early stopping. CV error needs to decrease at least
+        every <early_stopping_rounds> round(s) to continue.
+        Last entry in evaluation history is the one from best iteration.
    fpreproc : function
        Preprocessing function that takes (dtrain, dtest, param) and returns
        transformed versions of those.
    as_pandas : bool, default True
        Return pd.DataFrame when pandas is installed.
        If False or pandas is not installed, return np.ndarray
-    show_progress : bool or None, default None
+    show_progress : bool, int, or None, default None
        Whether to display the progress. If None, progress will be displayed
-        when np.ndarray is returned.
+        when np.ndarray is returned. If True, progress will be displayed at 
+        boosting stage. If an integer is given, progress will be displayed 
+        at every given `show_progress` boosting stage. 
    show_stdv : bool, default True
        Whether to display the standard deviation in progress.
        Results are not affected, and always contains std.
@@ -279,6 +393,28 @@ def cv(params, dtrain, num_boost_round=10, nfold=3, metrics=(),
    -------
    evaluation history : list(string)
    """
+    if early_stopping_rounds is not None:
+        if len(metrics) > 1:
+            raise ValueError('Check your params.'\
+                                     'Early stopping works with single eval metric only.')
+
+        sys.stderr.write("Will train until cv error hasn't decreased in {} rounds.\n".format(\
+            early_stopping_rounds))
+
+        maximize_score = False
+        if len(metrics) == 1:
+            maximize_metrics = ('auc', 'map', 'ndcg')
+            if any(metrics[0].startswith(x) for x in maximize_metrics):
+                maximize_score = True
+        if feval is not None:
+            maximize_score = maximize
+
+        if maximize_score:
+            best_score = 0.0
+        else:
+            best_score = float('inf')
+
+    best_score_i = 0
    results = []
    cvfolds = mknfold(dtrain, nfold, params, seed, metrics, fpreproc)
    for i in range(num_boost_round):
@@ -286,9 +422,20 @@ def cv(params, dtrain, num_boost_round=10, nfold=3, metrics=(),
            fold.update(i, obj)
        res = aggcv([f.eval(i, feval) for f in cvfolds],
                    show_stdv=show_stdv, show_progress=show_progress,
-                    as_pandas=as_pandas)
+                    as_pandas=as_pandas, trial=i)
        results.append(res)

+        if early_stopping_rounds is not None:
+            score = res[0]
+            if (maximize_score and score > best_score) or \
+                    (not maximize_score and score < best_score):
+                best_score = score
+                best_score_i = i
+            elif i - best_score_i >= early_stopping_rounds:
+                sys.stderr.write("Stopping. Best iteration: {}\n".format(best_score_i))
+                results = results[:best_score_i+1]
+                break
+
    if as_pandas:
        try:
            import pandas as pd
@@ -299,4 +446,3 @@ def cv(params, dtrain, num_boost_round=10, nfold=3, metrics=(),
        results = np.array(results)

    return results
-
--- a/scripts/travis_script.sh
+++ b/scripts/travis_script.sh
@@ -64,7 +64,7 @@ if [ ${TASK} == "python-package" -o ${TASK} == "python-package3" ]; then
        conda create -n myenv python=2.7
    fi
    source activate myenv
-    conda install numpy scipy pandas matplotlib nose
+    conda install numpy scipy pandas matplotlib nose scikit-learn
    python -m pip install graphviz

    make all CXX=${CXX} || exit -1
--- a/src/data.h
+++ b/src/data.h
@@ -14,7 +14,7 @@

 namespace xgboost {
 /*!
- * \brief unsigned interger type used in boost,
+ * \brief unsigned integer type used in boost,
 *        used for feature index and row index
 */
 typedef unsigned bst_uint;
@@ -35,8 +35,8 @@ struct bst_gpair {
 };

 /*!
- * \brief extra information that might needed by gbm and tree module
- * these information are not necessarily presented, and can be empty
+ * \brief extra information that might be needed by gbm and tree module
+ * this information is not necessarily present, and can be empty
 */
 struct BoosterInfo {
  /*! \brief number of rows in the data */
@@ -53,7 +53,7 @@ struct BoosterInfo {
  /*! \brief number of rows, number of columns */
  BoosterInfo(void) : num_row(0), num_col(0) {
  }
-  /*! \brief get root of ith instance */
+  /*! \brief get root of i-th instance */
  inline unsigned GetRoot(size_t i) const {
    return root_index.size() == 0 ? 0 : root_index[i];
  }
@@ -120,13 +120,13 @@ struct ColBatch : public SparseBatch {
 };
 /**
 * \brief interface of feature matrix, needed for tree construction
- *  this interface defines two way to access features,
- *  row access is defined by iterator of RowBatch
- *  col access is optional, checked by HaveColAccess, and defined by iterator of ColBatch
+ *  this interface defines two ways to access features:
+ *   row access is defined by iterator of RowBatch
+ *   col access is optional, checked by HaveColAccess, and defined by iterator of ColBatch
 */
 class IFMatrix {
 public:
-  // the interface only need to ganrantee row iter
+  // the interface only need to guarantee row iter
  // column iter is active, when ColIterator is called, row_iter can be disabled
  /*! \brief get the row iterator associated with FMatrix */
  virtual utils::IIterator<RowBatch> *RowIterator(void) = 0;
@@ -142,7 +142,7 @@ class IFMatrix {
   * \brief check if column access is supported, if not, initialize column access
   * \param enabled whether certain feature should be included in column access
   * \param subsample subsample ratio when generating column access
-   * \param max_row_perbatch auxilary information, maximum row used in each column batch
+   * \param max_row_perbatch auxiliary information, maximum row used in each column batch
   *         this is a hint information that can be ignored by the implementation
   */
  virtual void InitColAccess(const std::vector<bool> &enabled,
--- a/src/gbm/gbm.h
+++ b/src/gbm/gbm.h
@@ -58,7 +58,7 @@ class IGradBooster {
    return false;
  }
  /*!
-   * \brief peform update to the model(boosting)
+   * \brief perform update to the model(boosting)
   * \param p_fmat feature matrix that provide access to features
   * \param buffer_offset buffer index offset of these instances, if equals -1
   *        this means we do not have buffer index allocated to the gbm
@@ -88,7 +88,7 @@ class IGradBooster {
                       std::vector<float> *out_preds,
                       unsigned ntree_limit = 0) = 0;
  /*!
-   * \brief online prediction funciton, predict score for one instance at a time
+   * \brief online prediction function, predict score for one instance at a time
   *  NOTE: use the batch prediction interface if possible, batch prediction is usually
   *        more efficient than online prediction
   *        This function is NOT threadsafe, make sure you only call from one thread
@@ -119,7 +119,7 @@ class IGradBooster {
  /*!
   * \brief dump the model in text format
   * \param fmap feature map that may help give interpretations of feature
-   * \param option extra option of the dumo model
+   * \param option extra option of the dump model
   * \return a vector of dump for boosters
   */
  virtual std::vector<std::string> DumpModel(const utils::FeatMap& fmap, int option) = 0;
--- a/src/gbm/gbtree-inl.hpp
+++ b/src/gbm/gbtree-inl.hpp
@@ -31,7 +31,7 @@ class GBTree : public IGradBooster {
    using namespace std;
    if (!strncmp(name, "bst:", 4)) {
      cfg.push_back(std::make_pair(std::string(name+4), std::string(val)));
-      // set into updaters, if already intialized
+      // set into updaters, if already initialized
      for (size_t i = 0; i < updaters.size(); ++i) {
        updaters[i]->SetParam(name+4, val);
      }
@@ -85,7 +85,7 @@ class GBTree : public IGradBooster {
      fo.Write(BeginPtr(pred_counter), pred_counter.size() * sizeof(unsigned));
    }
  }
-  // initialize the predic buffer
+  // initialize the predict buffer
  virtual void InitModel(void) {
    pred_buffer.clear(); pred_counter.clear();
    pred_buffer.resize(mparam.PredBufferSize(), 0.0f);
@@ -138,10 +138,7 @@ class GBTree : public IGradBooster {
    {
      nthread = omp_get_num_threads();
    }
-    thread_temp.resize(nthread, tree::RegTree::FVec());
-    for (int i = 0; i < nthread; ++i) {
-      thread_temp[i].Init(mparam.num_feature);
-    }
+    InitThreadTemp(nthread);
    std::vector<float> &preds = *out_preds;
    const size_t stride = info.num_row * mparam.num_output_group;
    preds.resize(stride * (mparam.size_leaf_vector+1));
@@ -194,10 +191,7 @@ class GBTree : public IGradBooster {
    {
      nthread = omp_get_num_threads();
    }
-    thread_temp.resize(nthread, tree::RegTree::FVec());
-    for (int i = 0; i < nthread; ++i) {
-      thread_temp[i].Init(mparam.num_feature);
-    }
+    InitThreadTemp(nthread);
    this->PredPath(p_fmat, info, out_preds, ntree_limit);
  }
  virtual std::vector<std::string> DumpModel(const utils::FeatMap& fmap, int option) {
@@ -391,6 +385,16 @@ class GBTree : public IGradBooster {
      }
    }
  }
+  // init thread buffers
+  inline void InitThreadTemp(int nthread) {
+    int prev_thread_temp_size = thread_temp.size();
+    if (prev_thread_temp_size < nthread) {
+      thread_temp.resize(nthread, tree::RegTree::FVec());
+      for (int i = prev_thread_temp_size; i < nthread; ++i) {
+        thread_temp[i].Init(mparam.num_feature);
+      }
+    }
+  }

  // --- data structure ---
  /*! \brief training parameters */
@@ -442,7 +446,7 @@ class GBTree : public IGradBooster {
    int num_roots;
    /*! \brief number of features to be used by trees */
    int num_feature;
-    /*! \brief size of predicton buffer allocated used for buffering */
+    /*! \brief size of prediction buffer allocated used for buffering */
    int64_t num_pbuffer;
    /*!
     * \brief how many output group a single instance can produce
--- a/src/io/io.h
+++ b/src/io/io.h
@@ -22,7 +22,7 @@ typedef learner::DMatrix DataMatrix;
 * \param silent whether print message during loading
 * \param savebuffer whether temporal buffer the file if the file is in text format
 * \param loadsplit whether we only load a split of input files
- *   such that each worker node get a split of the data
+ *        such that each worker node get a split of the data
 * \param cache_file name of cache_file, used by external memory version
 *        can be NULL, if cache_file is specified, this will be the temporal
 *        space that can be re-used to store intermediate data
@@ -38,7 +38,7 @@ DataMatrix* LoadDataMatrix(const char *fname,
 *  note: the saved dmatrix format may not be in exactly same as input
 *  SaveDMatrix will choose the best way to materialize the dmatrix.
 * \param dmat the dmatrix to be saved
- * \param fname file name to be savd
+ * \param fname file name to be saved
 * \param silent whether print message during saving
 */
 void SaveDataMatrix(const DataMatrix &dmat, const char *fname, bool silent = false);
--- a/src/io/libsvm_parser.h
+++ b/src/io/libsvm_parser.h
@@ -31,7 +31,7 @@ struct LibSVMPage : public SparsePage {
 /*!
 * \brief libsvm parser that parses the input lines
 * and returns rows in input data
- * factry that was used by threadbuffer template
+ * factory that was used by threadbuffer template
 */
 class LibSVMPageFactory  {
 public:
--- a/Show More
+++ b/Show More