- Limit the number of used CPU cores in examples. - Add a note for the constraint. - Bring back the cleanup script.
97 lines
3.6 KiB
R
97 lines
3.6 KiB
R
% Generated by roxygen2: do not edit by hand
|
|
% Please edit documentation in R/xgb.create.features.R
|
|
\name{xgb.create.features}
|
|
\alias{xgb.create.features}
|
|
\title{Create new features from a previously learned model}
|
|
\usage{
|
|
xgb.create.features(model, data, ...)
|
|
}
|
|
\arguments{
|
|
\item{model}{decision tree boosting model learned on the original data}
|
|
|
|
\item{data}{original data (usually provided as a \code{dgCMatrix} matrix)}
|
|
|
|
\item{...}{currently not used}
|
|
}
|
|
\value{
|
|
\code{dgCMatrix} matrix including both the original data and the new features.
|
|
}
|
|
\description{
|
|
May improve the learning by adding new features to the training data based on the decision trees from a previously learned model.
|
|
}
|
|
\details{
|
|
This is the function inspired from the paragraph 3.1 of the paper:
|
|
|
|
\strong{Practical Lessons from Predicting Clicks on Ads at Facebook}
|
|
|
|
\emph{(Xinran He, Junfeng Pan, Ou Jin, Tianbing Xu, Bo Liu, Tao Xu, Yan, xin Shi, Antoine Atallah, Ralf Herbrich, Stuart Bowers,
|
|
Joaquin Quinonero Candela)}
|
|
|
|
International Workshop on Data Mining for Online Advertising (ADKDD) - August 24, 2014
|
|
|
|
\url{https://research.facebook.com/publications/practical-lessons-from-predicting-clicks-on-ads-at-facebook/}.
|
|
|
|
Extract explaining the method:
|
|
|
|
"We found that boosted decision trees are a powerful and very
|
|
convenient way to implement non-linear and tuple transformations
|
|
of the kind we just described. We treat each individual
|
|
tree as a categorical feature that takes as value the
|
|
index of the leaf an instance ends up falling in. We use
|
|
1-of-K coding of this type of features.
|
|
|
|
For example, consider the boosted tree model in Figure 1 with 2 subtrees,
|
|
where the first subtree has 3 leafs and the second 2 leafs. If an
|
|
instance ends up in leaf 2 in the first subtree and leaf 1 in
|
|
second subtree, the overall input to the linear classifier will
|
|
be the binary vector \code{[0, 1, 0, 1, 0]}, where the first 3 entries
|
|
correspond to the leaves of the first subtree and last 2 to
|
|
those of the second subtree.
|
|
|
|
[...]
|
|
|
|
We can understand boosted decision tree
|
|
based transformation as a supervised feature encoding that
|
|
converts a real-valued vector into a compact binary-valued
|
|
vector. A traversal from root node to a leaf node represents
|
|
a rule on certain features."
|
|
}
|
|
\examples{
|
|
data(agaricus.train, package='xgboost')
|
|
data(agaricus.test, package='xgboost')
|
|
dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
|
dtest <- with(agaricus.test, xgb.DMatrix(data, label = label, nthread = 2))
|
|
|
|
param <- list(max_depth=2, eta=1, silent=1, objective='binary:logistic')
|
|
nrounds = 4
|
|
|
|
bst = xgb.train(params = param, data = dtrain, nrounds = nrounds, nthread = 2)
|
|
|
|
# Model accuracy without new features
|
|
accuracy.before <- sum((predict(bst, agaricus.test$data) >= 0.5) == agaricus.test$label) /
|
|
length(agaricus.test$label)
|
|
|
|
# Convert previous features to one hot encoding
|
|
new.features.train <- xgb.create.features(model = bst, agaricus.train$data)
|
|
new.features.test <- xgb.create.features(model = bst, agaricus.test$data)
|
|
|
|
# learning with new features
|
|
new.dtrain <- xgb.DMatrix(
|
|
data = new.features.train, label = agaricus.train$label, nthread = 2
|
|
)
|
|
new.dtest <- xgb.DMatrix(
|
|
data = new.features.test, label = agaricus.test$label, nthread = 2
|
|
)
|
|
watchlist <- list(train = new.dtrain)
|
|
bst <- xgb.train(params = param, data = new.dtrain, nrounds = nrounds, nthread = 2)
|
|
|
|
# Model accuracy with new features
|
|
accuracy.after <- sum((predict(bst, new.dtest) >= 0.5) == agaricus.test$label) /
|
|
length(agaricus.test$label)
|
|
|
|
# Here the accuracy was already good and is now perfect.
|
|
cat(paste("The accuracy was", accuracy.before, "before adding leaf features and it is now",
|
|
accuracy.after, "!\n"))
|
|
|
|
}
|