Merge branch 'mastet push origin unityr' into unity
This commit is contained in:
commit
46cddb80f4
@ -1,18 +1,18 @@
|
|||||||
Package: xgboost
|
Package: xgboost
|
||||||
Type: Package
|
Type: Package
|
||||||
Title: eXtreme Gradient Boosting
|
Title: eXtreme Gradient Boosting
|
||||||
Version: 0.3-0
|
Version: 0.3-1
|
||||||
Date: 2014-08-23
|
Date: 2014-08-23
|
||||||
Author: Tianqi Chen <tianqi.tchen@gmail.com>, Tong He <hetong007@gmail.com>
|
Author: Tianqi Chen <tianqi.tchen@gmail.com>, Tong He <hetong007@gmail.com>
|
||||||
Maintainer: Tong He <hetong007@gmail.com>
|
Maintainer: Tong He <hetong007@gmail.com>
|
||||||
Description: This package is a R wrapper of xgboost, which is short for eXtreme
|
Description: This package is a R wrapper of xgboost, which is short for eXtreme
|
||||||
Gradient Boosting. It is an efficient and scalable implementation of
|
Gradient Boosting. It is an efficient and scalable implementation of
|
||||||
gradient boosting framework. The package includes efficient linear model
|
gradient boosting framework. The package includes efficient linear model
|
||||||
solver and tree learning algorithm. The package can automatically do
|
solver and tree learning algorithms. The package can automatically do
|
||||||
parallel computation with OpenMP, and it can be more than 10 times faster
|
parallel computation with OpenMP, and it can be more than 10 times faster
|
||||||
than existing gradient boosting packages such as gbm. It supports various
|
than existing gradient boosting packages such as gbm. It supports various
|
||||||
objective functions, including regression, classification and ranking. The
|
objective functions, including regression, classification and ranking. The
|
||||||
package is made to be extensible, so that user are also allowed to define
|
package is made to be extensible, so that users are also allowed to define
|
||||||
their own objectives easily.
|
their own objectives easily.
|
||||||
License: Apache License (== 2.0) | file LICENSE
|
License: Apache License (== 2.0) | file LICENSE
|
||||||
URL: https://github.com/tqchen/xgboost
|
URL: https://github.com/tqchen/xgboost
|
||||||
|
|||||||
@ -52,8 +52,7 @@ This is an introductory document of using the \verb@xgboost@ package in R.
|
|||||||
and scalable implementation of gradient boosting framework by \citep{friedman2001greedy}.
|
and scalable implementation of gradient boosting framework by \citep{friedman2001greedy}.
|
||||||
The package includes efficient linear model solver and tree learning algorithm.
|
The package includes efficient linear model solver and tree learning algorithm.
|
||||||
It supports various objective functions, including regression, classification
|
It supports various objective functions, including regression, classification
|
||||||
and ranking. The package is made to be extendible, so that user are also allowed
|
and ranking. The package is made to be extendible, so that users are also allowed to define their own objectives easily. It has several features:
|
||||||
to define there own objectives easily. It has several features:
|
|
||||||
\begin{enumerate}
|
\begin{enumerate}
|
||||||
\item{Speed: }{\verb@xgboost@ can automatically do parallel computation on
|
\item{Speed: }{\verb@xgboost@ can automatically do parallel computation on
|
||||||
Windows and Linux, with openmp. It is generally over 10 times faster than
|
Windows and Linux, with openmp. It is generally over 10 times faster than
|
||||||
@ -137,13 +136,10 @@ diris = xgb.DMatrix('iris.xgb.DMatrix')
|
|||||||
|
|
||||||
\section{Advanced Examples}
|
\section{Advanced Examples}
|
||||||
|
|
||||||
The function \verb@xgboost@ is a simple function with less parameters, in order
|
The function \verb@xgboost@ is a simple function with less parameter, in order
|
||||||
to be R-friendly. The core training function is wrapped in \verb@xgb.train@. It
|
to be R-friendly. The core training function is wrapped in \verb@xgb.train@. It is more flexible than \verb@xgboost@, but it requires users to read the document a bit more carefully.
|
||||||
is more flexible than \verb@xgboost@, but it requires users to read the document
|
|
||||||
a bit more carefully.
|
|
||||||
|
|
||||||
\verb@xgb.train@ only accept a \verb@xgb.DMatrix@ object as its input, while it
|
\verb@xgb.train@ only accept a \verb@xgb.DMatrix@ object as its input, while it supports advanced features as custom objective and evaluation functions.
|
||||||
supports advanced features as custom objective and evaluation functions.
|
|
||||||
|
|
||||||
<<Customized loss function>>=
|
<<Customized loss function>>=
|
||||||
logregobj <- function(preds, dtrain) {
|
logregobj <- function(preds, dtrain) {
|
||||||
@ -213,3 +209,4 @@ competition.
|
|||||||
\bibliography{xgboost}
|
\bibliography{xgboost}
|
||||||
|
|
||||||
\end{document}
|
\end{document}
|
||||||
|
|
||||||
|
|||||||
@ -8,6 +8,8 @@ Turorial and Documentation: https://github.com/tqchen/xgboost/wiki
|
|||||||
|
|
||||||
Questions and Issues: [https://github.com/tqchen/xgboost/issues](https://github.com/tqchen/xgboost/issues?q=is%3Aissue+label%3Aquestion)
|
Questions and Issues: [https://github.com/tqchen/xgboost/issues](https://github.com/tqchen/xgboost/issues?q=is%3Aissue+label%3Aquestion)
|
||||||
|
|
||||||
|
Examples Code: [demo folder](demo)
|
||||||
|
|
||||||
Notes on the Code: [Code Guide](src)
|
Notes on the Code: [Code Guide](src)
|
||||||
|
|
||||||
Features
|
Features
|
||||||
|
|||||||
25
demo/README.md
Normal file
25
demo/README.md
Normal file
@ -0,0 +1,25 @@
|
|||||||
|
XGBoost Examples
|
||||||
|
====
|
||||||
|
This folder contains the all example codes using xgboost.
|
||||||
|
Contribution of exampls, benchmarks is more than welcomed!
|
||||||
|
If you like to share how you use xgboost to solve your problem, send a pull request:)
|
||||||
|
|
||||||
|
Features Walkthrough
|
||||||
|
====
|
||||||
|
This is a list of short codes introducing different functionalities of xgboost and its wrapper.
|
||||||
|
* Basic walkthrough of wrappers. [python](guide-python/basic_walkthrough.py)
|
||||||
|
* Cutomize loss function, and evaluation metric. [python](guide-python/custom_objective.py)
|
||||||
|
* Boosting from existing prediction. [python](guide-python/boost_from_prediction.py)
|
||||||
|
* Predicting using first n trees. [python](guide-python/predict_first_ntree.py)
|
||||||
|
* Cross validation(to come)
|
||||||
|
|
||||||
|
Basic Examples by Tasks
|
||||||
|
====
|
||||||
|
* [Binary classification](binary_classification)
|
||||||
|
* [Multiclass classification](multiclass_classification)
|
||||||
|
* [Regression](regression)
|
||||||
|
* [Learning to Rank](rank)
|
||||||
|
|
||||||
|
Benchmarks
|
||||||
|
====
|
||||||
|
* [Starter script for Kaggle Higgs Boson](kaggle-higgs)
|
||||||
2
demo/data/README.md
Normal file
2
demo/data/README.md
Normal file
@ -0,0 +1,2 @@
|
|||||||
|
This folder contains processed example dataset used by the demos.
|
||||||
|
Copyright of the dataset belongs to the original copyright holder
|
||||||
3
demo/guide-R/README.md
Normal file
3
demo/guide-R/README.md
Normal file
@ -0,0 +1,3 @@
|
|||||||
|
XGBoost R Feature Walkthrough
|
||||||
|
====
|
||||||
|
To be finished
|
||||||
5
demo/guide-R/runall.sh
Executable file
5
demo/guide-R/runall.sh
Executable file
@ -0,0 +1,5 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
# todo
|
||||||
|
Rscript basic_walkthrough.R
|
||||||
|
Rscript custom_objective.R
|
||||||
|
Rscript boost_from_prediction.R
|
||||||
6
demo/guide-python/README.md
Normal file
6
demo/guide-python/README.md
Normal file
@ -0,0 +1,6 @@
|
|||||||
|
XGBoost Python Feature Walkthrough
|
||||||
|
====
|
||||||
|
* [Basic walkthrough of wrappers](basic_walkthrough.py)
|
||||||
|
* [Cutomize loss function, and evaluation metric](custom_objective.py)
|
||||||
|
* [Boosting from existing prediction](boost_from_prediction.py)
|
||||||
|
* [Predicting using first n trees](predict_first_ntree.py)
|
||||||
70
demo/guide-python/basic_walkthrough.py
Executable file
70
demo/guide-python/basic_walkthrough.py
Executable file
@ -0,0 +1,70 @@
|
|||||||
|
#!/usr/bin/python
|
||||||
|
import sys
|
||||||
|
import numpy as np
|
||||||
|
import scipy.sparse
|
||||||
|
# append the path to xgboost, you may need to change the following line
|
||||||
|
# alternatively, you can add the path to PYTHONPATH environment variable
|
||||||
|
sys.path.append('../../wrapper')
|
||||||
|
import xgboost as xgb
|
||||||
|
|
||||||
|
### simple example
|
||||||
|
# load file from text file, also binary buffer generated by xgboost
|
||||||
|
dtrain = xgb.DMatrix('../data/agaricus.txt.train')
|
||||||
|
dtest = xgb.DMatrix('../data/agaricus.txt.test')
|
||||||
|
|
||||||
|
# specify parameters via map, definition are same as c++ version
|
||||||
|
param = {'max_depth':2, 'eta':1, 'silent':1, 'objective':'binary:logistic' }
|
||||||
|
|
||||||
|
# specify validations set to watch performance
|
||||||
|
watchlist = [(dtest,'eval'), (dtrain,'train')]
|
||||||
|
num_round = 2
|
||||||
|
bst = xgb.train(param, dtrain, num_round, watchlist)
|
||||||
|
|
||||||
|
# this is prediction
|
||||||
|
preds = bst.predict(dtest)
|
||||||
|
labels = dtest.get_label()
|
||||||
|
print ('error=%f' % ( sum(1 for i in range(len(preds)) if int(preds[i]>0.5)!=labels[i]) /float(len(preds))))
|
||||||
|
bst.save_model('0001.model')
|
||||||
|
# dump model
|
||||||
|
bst.dump_model('dump.raw.txt')
|
||||||
|
# dump model with feature map
|
||||||
|
bst.dump_model('dump.nice.txt','../data/featmap.txt')
|
||||||
|
|
||||||
|
# save dmatrix into binary buffer
|
||||||
|
dtest.save_binary('dtest.buffer')
|
||||||
|
bst.save_model('xgb.model')
|
||||||
|
# load model and data in
|
||||||
|
bst2 = xgb.Booster(model_file='xgb.model')
|
||||||
|
dtest2 = xgb.DMatrix('dtest.buffer')
|
||||||
|
preds2 = bst2.predict(dtest2)
|
||||||
|
# assert they are the same
|
||||||
|
assert np.sum(np.abs(preds2-preds)) == 0
|
||||||
|
|
||||||
|
###
|
||||||
|
# build dmatrix from scipy.sparse
|
||||||
|
print ('start running example of build DMatrix from scipy.sparse')
|
||||||
|
labels = []
|
||||||
|
row = []; col = []; dat = []
|
||||||
|
i = 0
|
||||||
|
for l in open('../data/agaricus.txt.train'):
|
||||||
|
arr = l.split()
|
||||||
|
labels.append( int(arr[0]))
|
||||||
|
for it in arr[1:]:
|
||||||
|
k,v = it.split(':')
|
||||||
|
row.append(i); col.append(int(k)); dat.append(float(v))
|
||||||
|
i += 1
|
||||||
|
csr = scipy.sparse.csr_matrix( (dat, (row,col)) )
|
||||||
|
dtrain = xgb.DMatrix( csr )
|
||||||
|
dtrain.set_label(labels)
|
||||||
|
watchlist = [(dtest,'eval'), (dtrain,'train')]
|
||||||
|
bst = xgb.train( param, dtrain, num_round, watchlist )
|
||||||
|
|
||||||
|
print ('start running example of build DMatrix from numpy array')
|
||||||
|
# NOTE: npymat is numpy array, we will convert it into scipy.sparse.csr_matrix in internal implementation,then convert to DMatrix
|
||||||
|
npymat = csr.todense()
|
||||||
|
dtrain = xgb.DMatrix( npymat)
|
||||||
|
dtrain.set_label(labels)
|
||||||
|
watchlist = [(dtest,'eval'), (dtrain,'train')]
|
||||||
|
bst = xgb.train( param, dtrain, num_round, watchlist )
|
||||||
|
|
||||||
|
|
||||||
26
demo/guide-python/boost_from_prediction.py
Executable file
26
demo/guide-python/boost_from_prediction.py
Executable file
@ -0,0 +1,26 @@
|
|||||||
|
#!/usr/bin/python
|
||||||
|
import sys
|
||||||
|
import numpy as np
|
||||||
|
sys.path.append('../../wrapper')
|
||||||
|
import xgboost as xgb
|
||||||
|
|
||||||
|
dtrain = xgb.DMatrix('../data/agaricus.txt.train')
|
||||||
|
dtest = xgb.DMatrix('../data/agaricus.txt.test')
|
||||||
|
watchlist = [(dtest,'eval'), (dtrain,'train')]
|
||||||
|
###
|
||||||
|
# advanced: start from a initial base prediction
|
||||||
|
#
|
||||||
|
print ('start running example to start from a initial prediction')
|
||||||
|
# specify parameters via map, definition are same as c++ version
|
||||||
|
param = {'max_depth':2, 'eta':1, 'silent':1, 'objective':'binary:logistic' }
|
||||||
|
# train xgboost for 1 round
|
||||||
|
bst = xgb.train( param, dtrain, 1, watchlist )
|
||||||
|
# Note: we need the margin value instead of transformed prediction in set_base_margin
|
||||||
|
# do predict with output_margin=True, will always give you margin values before logistic transformation
|
||||||
|
ptrain = bst.predict(dtrain, output_margin=True)
|
||||||
|
ptest = bst.predict(dtest, output_margin=True)
|
||||||
|
dtrain.set_base_margin(ptrain)
|
||||||
|
dtest.set_base_margin(ptest)
|
||||||
|
|
||||||
|
print ('this is result of running from initial prediction')
|
||||||
|
bst = xgb.train( param, dtrain, 1, watchlist )
|
||||||
44
demo/guide-python/custom_objective.py
Executable file
44
demo/guide-python/custom_objective.py
Executable file
@ -0,0 +1,44 @@
|
|||||||
|
#!/usr/bin/python
|
||||||
|
import sys
|
||||||
|
import numpy as np
|
||||||
|
sys.path.append('../../wrapper')
|
||||||
|
import xgboost as xgb
|
||||||
|
###
|
||||||
|
# advanced: cutomsized loss function
|
||||||
|
#
|
||||||
|
print ('start running example to used cutomized objective function')
|
||||||
|
|
||||||
|
dtrain = xgb.DMatrix('../data/agaricus.txt.train')
|
||||||
|
dtest = xgb.DMatrix('../data/agaricus.txt.test')
|
||||||
|
|
||||||
|
# note: for customized objective function, we leave objective as default
|
||||||
|
# note: what we are getting is margin value in prediction
|
||||||
|
# you must know what you are doing
|
||||||
|
param = {'max_depth':2, 'eta':1, 'silent':1 }
|
||||||
|
watchlist = [(dtest,'eval'), (dtrain,'train')]
|
||||||
|
num_round = 2
|
||||||
|
|
||||||
|
# user define objective function, given prediction, return gradient and second order gradient
|
||||||
|
# this is loglikelihood loss
|
||||||
|
def logregobj(preds, dtrain):
|
||||||
|
labels = dtrain.get_label()
|
||||||
|
preds = 1.0 / (1.0 + np.exp(-preds))
|
||||||
|
grad = preds - labels
|
||||||
|
hess = preds * (1.0-preds)
|
||||||
|
return grad, hess
|
||||||
|
|
||||||
|
# user defined evaluation function, return a pair metric_name, result
|
||||||
|
# NOTE: when you do customized loss function, the default prediction value is margin
|
||||||
|
# this may make buildin evalution metric not function properly
|
||||||
|
# for example, we are doing logistic loss, the prediction is score before logistic transformation
|
||||||
|
# the buildin evaluation error assumes input is after logistic transformation
|
||||||
|
# Take this in mind when you use the customization, and maybe you need write customized evaluation function
|
||||||
|
def evalerror(preds, dtrain):
|
||||||
|
labels = dtrain.get_label()
|
||||||
|
# return a pair metric_name, result
|
||||||
|
# since preds are margin(before logistic transformation, cutoff at 0)
|
||||||
|
return 'error', float(sum(labels != (preds > 0.0))) / len(labels)
|
||||||
|
|
||||||
|
# training with customized objective, we can also do step by step training
|
||||||
|
# simply look at xgboost.py's implementation of train
|
||||||
|
bst = xgb.train(param, dtrain, num_round, watchlist, logregobj, evalerror)
|
||||||
22
demo/guide-python/predict_first_ntree.py
Executable file
22
demo/guide-python/predict_first_ntree.py
Executable file
@ -0,0 +1,22 @@
|
|||||||
|
#!/usr/bin/python
|
||||||
|
import sys
|
||||||
|
import numpy as np
|
||||||
|
sys.path.append('../../wrapper')
|
||||||
|
import xgboost as xgb
|
||||||
|
|
||||||
|
### load data in do training
|
||||||
|
dtrain = xgb.DMatrix('../data/agaricus.txt.train')
|
||||||
|
dtest = xgb.DMatrix('../data/agaricus.txt.test')
|
||||||
|
param = {'max_depth':2, 'eta':1, 'silent':1, 'objective':'binary:logistic' }
|
||||||
|
watchlist = [(dtest,'eval'), (dtrain,'train')]
|
||||||
|
num_round = 3
|
||||||
|
bst = xgb.train(param, dtrain, num_round, watchlist)
|
||||||
|
|
||||||
|
print ('start testing prediction from first n trees')
|
||||||
|
### predict using first 1 tree
|
||||||
|
label = dtest.get_label()
|
||||||
|
ypred1 = bst.predict(dtest, ntree_limit=1)
|
||||||
|
# by default, we predict using all the trees
|
||||||
|
ypred2 = bst.predict(dtest)
|
||||||
|
print ('error of ypred1=%f' % (np.sum((ypred1>0.5)!=label) /float(len(label))))
|
||||||
|
print ('error of ypred2=%f' % (np.sum((ypred2>0.5)!=label) /float(len(label))))
|
||||||
5
demo/guide-python/runall.sh
Executable file
5
demo/guide-python/runall.sh
Executable file
@ -0,0 +1,5 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
python basic_walkthrough.py
|
||||||
|
python custom_objective.py
|
||||||
|
python boost_from_prediction.py
|
||||||
|
rm *~ *.model *.buffer
|
||||||
@ -24,6 +24,7 @@ class GBLinear : public IGradBooster {
|
|||||||
}
|
}
|
||||||
// set model parameters
|
// set model parameters
|
||||||
virtual void SetParam(const char *name, const char *val) {
|
virtual void SetParam(const char *name, const char *val) {
|
||||||
|
using namespace std;
|
||||||
if (!strncmp(name, "bst:", 4)) {
|
if (!strncmp(name, "bst:", 4)) {
|
||||||
param.SetParam(name + 4, val);
|
param.SetParam(name + 4, val);
|
||||||
}
|
}
|
||||||
@ -166,6 +167,7 @@ class GBLinear : public IGradBooster {
|
|||||||
learning_rate = 1.0f;
|
learning_rate = 1.0f;
|
||||||
}
|
}
|
||||||
inline void SetParam(const char *name, const char *val) {
|
inline void SetParam(const char *name, const char *val) {
|
||||||
|
using namespace std;
|
||||||
// sync-names
|
// sync-names
|
||||||
if (!strcmp("eta", name)) learning_rate = static_cast<float>(atof(val));
|
if (!strcmp("eta", name)) learning_rate = static_cast<float>(atof(val));
|
||||||
if (!strcmp("lambda", name)) reg_lambda = static_cast<float>(atof(val));
|
if (!strcmp("lambda", name)) reg_lambda = static_cast<float>(atof(val));
|
||||||
@ -207,9 +209,10 @@ class GBLinear : public IGradBooster {
|
|||||||
Param(void) {
|
Param(void) {
|
||||||
num_feature = 0;
|
num_feature = 0;
|
||||||
num_output_group = 1;
|
num_output_group = 1;
|
||||||
memset(reserved, 0, sizeof(reserved));
|
std::memset(reserved, 0, sizeof(reserved));
|
||||||
}
|
}
|
||||||
inline void SetParam(const char *name, const char *val) {
|
inline void SetParam(const char *name, const char *val) {
|
||||||
|
using namespace std;
|
||||||
if (!strcmp(name, "bst:num_feature")) num_feature = atoi(val);
|
if (!strcmp(name, "bst:num_feature")) num_feature = atoi(val);
|
||||||
if (!strcmp(name, "num_output_group")) num_output_group = atoi(val);
|
if (!strcmp(name, "num_output_group")) num_output_group = atoi(val);
|
||||||
}
|
}
|
||||||
|
|||||||
@ -1,7 +1,6 @@
|
|||||||
#define _CRT_SECURE_NO_WARNINGS
|
#define _CRT_SECURE_NO_WARNINGS
|
||||||
#define _CRT_SECURE_NO_DEPRECATE
|
#define _CRT_SECURE_NO_DEPRECATE
|
||||||
#include <cstring>
|
#include <cstring>
|
||||||
using namespace std;
|
|
||||||
#include "./gbm.h"
|
#include "./gbm.h"
|
||||||
#include "./gbtree-inl.hpp"
|
#include "./gbtree-inl.hpp"
|
||||||
#include "./gblinear-inl.hpp"
|
#include "./gblinear-inl.hpp"
|
||||||
@ -9,6 +8,7 @@ using namespace std;
|
|||||||
namespace xgboost {
|
namespace xgboost {
|
||||||
namespace gbm {
|
namespace gbm {
|
||||||
IGradBooster* CreateGradBooster(const char *name) {
|
IGradBooster* CreateGradBooster(const char *name) {
|
||||||
|
using namespace std;
|
||||||
if (!strcmp("gbtree", name)) return new GBTree();
|
if (!strcmp("gbtree", name)) return new GBTree();
|
||||||
if (!strcmp("gblinear", name)) return new GBLinear();
|
if (!strcmp("gblinear", name)) return new GBLinear();
|
||||||
utils::Error("unknown booster type: %s", name);
|
utils::Error("unknown booster type: %s", name);
|
||||||
|
|||||||
@ -23,6 +23,7 @@ class GBTree : public IGradBooster {
|
|||||||
this->Clear();
|
this->Clear();
|
||||||
}
|
}
|
||||||
virtual void SetParam(const char *name, const char *val) {
|
virtual void SetParam(const char *name, const char *val) {
|
||||||
|
using namespace std;
|
||||||
if (!strncmp(name, "bst:", 4)) {
|
if (!strncmp(name, "bst:", 4)) {
|
||||||
cfg.push_back(std::make_pair(std::string(name+4), std::string(val)));
|
cfg.push_back(std::make_pair(std::string(name+4), std::string(val)));
|
||||||
// set into updaters, if already intialized
|
// set into updaters, if already intialized
|
||||||
@ -171,14 +172,14 @@ class GBTree : public IGradBooster {
|
|||||||
updaters.clear();
|
updaters.clear();
|
||||||
std::string tval = tparam.updater_seq;
|
std::string tval = tparam.updater_seq;
|
||||||
char *pstr;
|
char *pstr;
|
||||||
pstr = strtok(&tval[0], ",");
|
pstr = std::strtok(&tval[0], ",");
|
||||||
while (pstr != NULL) {
|
while (pstr != NULL) {
|
||||||
updaters.push_back(tree::CreateUpdater(pstr));
|
updaters.push_back(tree::CreateUpdater(pstr));
|
||||||
for (size_t j = 0; j < cfg.size(); ++j) {
|
for (size_t j = 0; j < cfg.size(); ++j) {
|
||||||
// set parameters
|
// set parameters
|
||||||
updaters.back()->SetParam(cfg[j].first.c_str(), cfg[j].second.c_str());
|
updaters.back()->SetParam(cfg[j].first.c_str(), cfg[j].second.c_str());
|
||||||
}
|
}
|
||||||
pstr = strtok(NULL, ",");
|
pstr = std::strtok(NULL, ",");
|
||||||
}
|
}
|
||||||
tparam.updater_initialized = 1;
|
tparam.updater_initialized = 1;
|
||||||
}
|
}
|
||||||
@ -279,6 +280,7 @@ class GBTree : public IGradBooster {
|
|||||||
updater_initialized = 0;
|
updater_initialized = 0;
|
||||||
}
|
}
|
||||||
inline void SetParam(const char *name, const char *val){
|
inline void SetParam(const char *name, const char *val){
|
||||||
|
using namespace std;
|
||||||
if (!strcmp(name, "updater") &&
|
if (!strcmp(name, "updater") &&
|
||||||
strcmp(updater_seq.c_str(), val) != 0) {
|
strcmp(updater_seq.c_str(), val) != 0) {
|
||||||
updater_seq = val;
|
updater_seq = val;
|
||||||
@ -319,7 +321,7 @@ class GBTree : public IGradBooster {
|
|||||||
num_pbuffer = 0;
|
num_pbuffer = 0;
|
||||||
num_output_group = 1;
|
num_output_group = 1;
|
||||||
size_leaf_vector = 0;
|
size_leaf_vector = 0;
|
||||||
memset(reserved, 0, sizeof(reserved));
|
std::memset(reserved, 0, sizeof(reserved));
|
||||||
}
|
}
|
||||||
/*!
|
/*!
|
||||||
* \brief set parameters from outside
|
* \brief set parameters from outside
|
||||||
@ -327,6 +329,7 @@ class GBTree : public IGradBooster {
|
|||||||
* \param val value of the parameter
|
* \param val value of the parameter
|
||||||
*/
|
*/
|
||||||
inline void SetParam(const char *name, const char *val) {
|
inline void SetParam(const char *name, const char *val) {
|
||||||
|
using namespace std;
|
||||||
if (!strcmp("num_pbuffer", name)) num_pbuffer = atol(val);
|
if (!strcmp("num_pbuffer", name)) num_pbuffer = atol(val);
|
||||||
if (!strcmp("num_output_group", name)) num_output_group = atol(val);
|
if (!strcmp("num_output_group", name)) num_output_group = atol(val);
|
||||||
if (!strcmp("bst:num_roots", name)) num_roots = atoi(val);
|
if (!strcmp("bst:num_roots", name)) num_roots = atoi(val);
|
||||||
|
|||||||
@ -1,7 +1,6 @@
|
|||||||
#define _CRT_SECURE_NO_WARNINGS
|
#define _CRT_SECURE_NO_WARNINGS
|
||||||
#define _CRT_SECURE_NO_DEPRECATE
|
#define _CRT_SECURE_NO_DEPRECATE
|
||||||
#include <string>
|
#include <string>
|
||||||
using namespace std;
|
|
||||||
#include "./io.h"
|
#include "./io.h"
|
||||||
#include "../utils/io.h"
|
#include "../utils/io.h"
|
||||||
#include "../utils/utils.h"
|
#include "../utils/utils.h"
|
||||||
|
|||||||
@ -55,7 +55,7 @@ class DMatrixSimple : public DataMatrix {
|
|||||||
RowBatch::Inst inst = batch[i];
|
RowBatch::Inst inst = batch[i];
|
||||||
row_data_.resize(row_data_.size() + inst.length);
|
row_data_.resize(row_data_.size() + inst.length);
|
||||||
if (inst.length != 0) {
|
if (inst.length != 0) {
|
||||||
memcpy(&row_data_[row_ptr_.back()], inst.data,
|
std::memcpy(&row_data_[row_ptr_.back()], inst.data,
|
||||||
sizeof(RowBatch::Entry) * inst.length);
|
sizeof(RowBatch::Entry) * inst.length);
|
||||||
}
|
}
|
||||||
row_ptr_.push_back(row_ptr_.back() + inst.length);
|
row_ptr_.push_back(row_ptr_.back() + inst.length);
|
||||||
@ -82,6 +82,7 @@ class DMatrixSimple : public DataMatrix {
|
|||||||
* \param silent whether print information or not
|
* \param silent whether print information or not
|
||||||
*/
|
*/
|
||||||
inline void LoadText(const char* fname, bool silent = false) {
|
inline void LoadText(const char* fname, bool silent = false) {
|
||||||
|
using namespace std;
|
||||||
this->Clear();
|
this->Clear();
|
||||||
FILE* file = utils::FopenCheck(fname, "r");
|
FILE* file = utils::FopenCheck(fname, "r");
|
||||||
float label; bool init = true;
|
float label; bool init = true;
|
||||||
@ -135,7 +136,7 @@ class DMatrixSimple : public DataMatrix {
|
|||||||
* \return whether loading is success
|
* \return whether loading is success
|
||||||
*/
|
*/
|
||||||
inline bool LoadBinary(const char* fname, bool silent = false) {
|
inline bool LoadBinary(const char* fname, bool silent = false) {
|
||||||
FILE *fp = fopen64(fname, "rb");
|
std::FILE *fp = fopen64(fname, "rb");
|
||||||
if (fp == NULL) return false;
|
if (fp == NULL) return false;
|
||||||
utils::FileStream fs(fp);
|
utils::FileStream fs(fp);
|
||||||
this->LoadBinary(fs, silent, fname);
|
this->LoadBinary(fs, silent, fname);
|
||||||
@ -208,6 +209,7 @@ class DMatrixSimple : public DataMatrix {
|
|||||||
* \param savebuffer whether do save binary buffer if it is text
|
* \param savebuffer whether do save binary buffer if it is text
|
||||||
*/
|
*/
|
||||||
inline void CacheLoad(const char *fname, bool silent = false, bool savebuffer = true) {
|
inline void CacheLoad(const char *fname, bool silent = false, bool savebuffer = true) {
|
||||||
|
using namespace std;
|
||||||
size_t len = strlen(fname);
|
size_t len = strlen(fname);
|
||||||
if (len > 8 && !strcmp(fname + len - 7, ".buffer")) {
|
if (len > 8 && !strcmp(fname + len - 7, ".buffer")) {
|
||||||
if (!this->LoadBinary(fname, silent)) {
|
if (!this->LoadBinary(fname, silent)) {
|
||||||
@ -216,7 +218,7 @@ class DMatrixSimple : public DataMatrix {
|
|||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
char bname[1024];
|
char bname[1024];
|
||||||
snprintf(bname, sizeof(bname), "%s.buffer", fname);
|
utils::SPrintf(bname, sizeof(bname), "%s.buffer", fname);
|
||||||
if (!this->LoadBinary(bname, silent)) {
|
if (!this->LoadBinary(bname, silent)) {
|
||||||
this->LoadText(fname, silent);
|
this->LoadText(fname, silent);
|
||||||
if (savebuffer) this->SaveBinary(bname, silent);
|
if (savebuffer) this->SaveBinary(bname, silent);
|
||||||
|
|||||||
@ -90,6 +90,7 @@ struct MetaInfo {
|
|||||||
}
|
}
|
||||||
// try to load group information from file, if exists
|
// try to load group information from file, if exists
|
||||||
inline bool TryLoadGroup(const char* fname, bool silent = false) {
|
inline bool TryLoadGroup(const char* fname, bool silent = false) {
|
||||||
|
using namespace std;
|
||||||
FILE *fi = fopen64(fname, "r");
|
FILE *fi = fopen64(fname, "r");
|
||||||
if (fi == NULL) return false;
|
if (fi == NULL) return false;
|
||||||
group_ptr.push_back(0);
|
group_ptr.push_back(0);
|
||||||
@ -105,6 +106,7 @@ struct MetaInfo {
|
|||||||
return true;
|
return true;
|
||||||
}
|
}
|
||||||
inline std::vector<float>& GetFloatInfo(const char *field) {
|
inline std::vector<float>& GetFloatInfo(const char *field) {
|
||||||
|
using namespace std;
|
||||||
if (!strcmp(field, "label")) return labels;
|
if (!strcmp(field, "label")) return labels;
|
||||||
if (!strcmp(field, "weight")) return weights;
|
if (!strcmp(field, "weight")) return weights;
|
||||||
if (!strcmp(field, "base_margin")) return base_margin;
|
if (!strcmp(field, "base_margin")) return base_margin;
|
||||||
@ -115,6 +117,7 @@ struct MetaInfo {
|
|||||||
return ((MetaInfo*)this)->GetFloatInfo(field);
|
return ((MetaInfo*)this)->GetFloatInfo(field);
|
||||||
}
|
}
|
||||||
inline std::vector<unsigned> &GetUIntInfo(const char *field) {
|
inline std::vector<unsigned> &GetUIntInfo(const char *field) {
|
||||||
|
using namespace std;
|
||||||
if (!strcmp(field, "root_index")) return info.root_index;
|
if (!strcmp(field, "root_index")) return info.root_index;
|
||||||
if (!strcmp(field, "fold_index")) return info.fold_index;
|
if (!strcmp(field, "fold_index")) return info.fold_index;
|
||||||
utils::Error("unknown field %s", field);
|
utils::Error("unknown field %s", field);
|
||||||
@ -125,6 +128,7 @@ struct MetaInfo {
|
|||||||
}
|
}
|
||||||
// try to load weight information from file, if exists
|
// try to load weight information from file, if exists
|
||||||
inline bool TryLoadFloatInfo(const char *field, const char* fname, bool silent = false) {
|
inline bool TryLoadFloatInfo(const char *field, const char* fname, bool silent = false) {
|
||||||
|
using namespace std;
|
||||||
std::vector<float> &data = this->GetFloatInfo(field);
|
std::vector<float> &data = this->GetFloatInfo(field);
|
||||||
FILE *fi = fopen64(fname, "r");
|
FILE *fi = fopen64(fname, "r");
|
||||||
if (fi == NULL) return false;
|
if (fi == NULL) return false;
|
||||||
|
|||||||
@ -147,10 +147,11 @@ struct EvalAMS : public IEvaluator {
|
|||||||
explicit EvalAMS(const char *name) {
|
explicit EvalAMS(const char *name) {
|
||||||
name_ = name;
|
name_ = name;
|
||||||
// note: ams@0 will automatically select which ratio to go
|
// note: ams@0 will automatically select which ratio to go
|
||||||
utils::Check(sscanf(name, "ams@%f", &ratio_) == 1, "invalid ams format");
|
utils::Check(std::sscanf(name, "ams@%f", &ratio_) == 1, "invalid ams format");
|
||||||
}
|
}
|
||||||
virtual float Eval(const std::vector<float> &preds,
|
virtual float Eval(const std::vector<float> &preds,
|
||||||
const MetaInfo &info) const {
|
const MetaInfo &info) const {
|
||||||
|
using namespace std;
|
||||||
const bst_omp_uint ndata = static_cast<bst_omp_uint>(info.labels.size());
|
const bst_omp_uint ndata = static_cast<bst_omp_uint>(info.labels.size());
|
||||||
|
|
||||||
utils::Check(info.weights.size() == ndata, "we need weight to evaluate ams");
|
utils::Check(info.weights.size() == ndata, "we need weight to evaluate ams");
|
||||||
@ -202,6 +203,7 @@ struct EvalAMS : public IEvaluator {
|
|||||||
struct EvalPrecisionRatio : public IEvaluator{
|
struct EvalPrecisionRatio : public IEvaluator{
|
||||||
public:
|
public:
|
||||||
explicit EvalPrecisionRatio(const char *name) : name_(name) {
|
explicit EvalPrecisionRatio(const char *name) : name_(name) {
|
||||||
|
using namespace std;
|
||||||
if (sscanf(name, "apratio@%f", &ratio_) == 1) {
|
if (sscanf(name, "apratio@%f", &ratio_) == 1) {
|
||||||
use_ap = 1;
|
use_ap = 1;
|
||||||
} else {
|
} else {
|
||||||
@ -342,6 +344,7 @@ struct EvalRankList : public IEvaluator {
|
|||||||
|
|
||||||
protected:
|
protected:
|
||||||
explicit EvalRankList(const char *name) {
|
explicit EvalRankList(const char *name) {
|
||||||
|
using namespace std;
|
||||||
name_ = name;
|
name_ = name;
|
||||||
minus_ = false;
|
minus_ = false;
|
||||||
if (sscanf(name, "%*[^@]@%u[-]?", &topn_) != 1) {
|
if (sscanf(name, "%*[^@]@%u[-]?", &topn_) != 1) {
|
||||||
@ -388,7 +391,7 @@ struct EvalNDCG : public EvalRankList{
|
|||||||
for (size_t i = 0; i < rec.size() && i < this->topn_; ++i) {
|
for (size_t i = 0; i < rec.size() && i < this->topn_; ++i) {
|
||||||
const unsigned rel = rec[i].second;
|
const unsigned rel = rec[i].second;
|
||||||
if (rel != 0) {
|
if (rel != 0) {
|
||||||
sumdcg += ((1 << rel) - 1) / log(i + 2.0);
|
sumdcg += ((1 << rel) - 1) / std::log(i + 2.0);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
return static_cast<float>(sumdcg);
|
return static_cast<float>(sumdcg);
|
||||||
|
|||||||
@ -36,6 +36,7 @@ struct IEvaluator{
|
|||||||
namespace xgboost {
|
namespace xgboost {
|
||||||
namespace learner {
|
namespace learner {
|
||||||
inline IEvaluator* CreateEvaluator(const char *name) {
|
inline IEvaluator* CreateEvaluator(const char *name) {
|
||||||
|
using namespace std;
|
||||||
if (!strcmp(name, "rmse")) return new EvalRMSE();
|
if (!strcmp(name, "rmse")) return new EvalRMSE();
|
||||||
if (!strcmp(name, "error")) return new EvalError();
|
if (!strcmp(name, "error")) return new EvalError();
|
||||||
if (!strcmp(name, "merror")) return new EvalMatchError();
|
if (!strcmp(name, "merror")) return new EvalMatchError();
|
||||||
@ -56,6 +57,7 @@ inline IEvaluator* CreateEvaluator(const char *name) {
|
|||||||
class EvalSet{
|
class EvalSet{
|
||||||
public:
|
public:
|
||||||
inline void AddEval(const char *name) {
|
inline void AddEval(const char *name) {
|
||||||
|
using namespace std;
|
||||||
for (size_t i = 0; i < evals_.size(); ++i) {
|
for (size_t i = 0; i < evals_.size(); ++i) {
|
||||||
if (!strcmp(name, evals_[i]->Name())) return;
|
if (!strcmp(name, evals_[i]->Name())) return;
|
||||||
}
|
}
|
||||||
|
|||||||
@ -79,6 +79,7 @@ class BoostLearner {
|
|||||||
* \param val value of the parameter
|
* \param val value of the parameter
|
||||||
*/
|
*/
|
||||||
inline void SetParam(const char *name, const char *val) {
|
inline void SetParam(const char *name, const char *val) {
|
||||||
|
using namespace std;
|
||||||
// in this version, bst: prefix is no longer required
|
// in this version, bst: prefix is no longer required
|
||||||
if (strncmp(name, "bst:", 4) != 0) {
|
if (strncmp(name, "bst:", 4) != 0) {
|
||||||
std::string n = "bst:"; n += name;
|
std::string n = "bst:"; n += name;
|
||||||
@ -290,7 +291,7 @@ class BoostLearner {
|
|||||||
base_score = 0.5f;
|
base_score = 0.5f;
|
||||||
num_feature = 0;
|
num_feature = 0;
|
||||||
num_class = 0;
|
num_class = 0;
|
||||||
memset(reserved, 0, sizeof(reserved));
|
std::memset(reserved, 0, sizeof(reserved));
|
||||||
}
|
}
|
||||||
/*!
|
/*!
|
||||||
* \brief set parameters from outside
|
* \brief set parameters from outside
|
||||||
@ -298,6 +299,7 @@ class BoostLearner {
|
|||||||
* \param val value of the parameter
|
* \param val value of the parameter
|
||||||
*/
|
*/
|
||||||
inline void SetParam(const char *name, const char *val) {
|
inline void SetParam(const char *name, const char *val) {
|
||||||
|
using namespace std;
|
||||||
if (!strcmp("base_score", name)) base_score = static_cast<float>(atof(val));
|
if (!strcmp("base_score", name)) base_score = static_cast<float>(atof(val));
|
||||||
if (!strcmp("num_class", name)) num_class = atoi(val);
|
if (!strcmp("num_class", name)) num_class = atoi(val);
|
||||||
if (!strcmp("bst:num_feature", name)) num_feature = atoi(val);
|
if (!strcmp("bst:num_feature", name)) num_feature = atoi(val);
|
||||||
|
|||||||
@ -101,6 +101,7 @@ class RegLossObj : public IObjFunction{
|
|||||||
}
|
}
|
||||||
virtual ~RegLossObj(void) {}
|
virtual ~RegLossObj(void) {}
|
||||||
virtual void SetParam(const char *name, const char *val) {
|
virtual void SetParam(const char *name, const char *val) {
|
||||||
|
using namespace std;
|
||||||
if (!strcmp("scale_pos_weight", name)) {
|
if (!strcmp("scale_pos_weight", name)) {
|
||||||
scale_pos_weight = static_cast<float>(atof(val));
|
scale_pos_weight = static_cast<float>(atof(val));
|
||||||
}
|
}
|
||||||
@ -156,6 +157,7 @@ class SoftmaxMultiClassObj : public IObjFunction {
|
|||||||
}
|
}
|
||||||
virtual ~SoftmaxMultiClassObj(void) {}
|
virtual ~SoftmaxMultiClassObj(void) {}
|
||||||
virtual void SetParam(const char *name, const char *val) {
|
virtual void SetParam(const char *name, const char *val) {
|
||||||
|
using namespace std;
|
||||||
if (!strcmp( "num_class", name )) nclass = atoi(val);
|
if (!strcmp( "num_class", name )) nclass = atoi(val);
|
||||||
}
|
}
|
||||||
virtual void GetGradient(const std::vector<float> &preds,
|
virtual void GetGradient(const std::vector<float> &preds,
|
||||||
@ -247,6 +249,7 @@ class LambdaRankObj : public IObjFunction {
|
|||||||
}
|
}
|
||||||
virtual ~LambdaRankObj(void) {}
|
virtual ~LambdaRankObj(void) {}
|
||||||
virtual void SetParam(const char *name, const char *val) {
|
virtual void SetParam(const char *name, const char *val) {
|
||||||
|
using namespace std;
|
||||||
if (!strcmp( "loss_type", name )) loss.loss_type = atoi(val);
|
if (!strcmp( "loss_type", name )) loss.loss_type = atoi(val);
|
||||||
if (!strcmp( "fix_list_weight", name)) fix_list_weight = static_cast<float>(atof(val));
|
if (!strcmp( "fix_list_weight", name)) fix_list_weight = static_cast<float>(atof(val));
|
||||||
if (!strcmp( "num_pairsample", name)) num_pairsample = atoi(val);
|
if (!strcmp( "num_pairsample", name)) num_pairsample = atoi(val);
|
||||||
|
|||||||
@ -67,6 +67,7 @@ namespace xgboost {
|
|||||||
namespace learner {
|
namespace learner {
|
||||||
/*! \brief factory funciton to create objective function by name */
|
/*! \brief factory funciton to create objective function by name */
|
||||||
inline IObjFunction* CreateObjFunction(const char *name) {
|
inline IObjFunction* CreateObjFunction(const char *name) {
|
||||||
|
using namespace std;
|
||||||
if (!strcmp("reg:linear", name)) return new RegLossObj(LossType::kLinearSquare);
|
if (!strcmp("reg:linear", name)) return new RegLossObj(LossType::kLinearSquare);
|
||||||
if (!strcmp("reg:logistic", name)) return new RegLossObj(LossType::kLogisticNeglik);
|
if (!strcmp("reg:logistic", name)) return new RegLossObj(LossType::kLogisticNeglik);
|
||||||
if (!strcmp("binary:logistic", name)) return new RegLossObj(LossType::kLogisticClassify);
|
if (!strcmp("binary:logistic", name)) return new RegLossObj(LossType::kLogisticClassify);
|
||||||
|
|||||||
@ -53,7 +53,7 @@ class TreeModel {
|
|||||||
Param(void) {
|
Param(void) {
|
||||||
max_depth = 0;
|
max_depth = 0;
|
||||||
size_leaf_vector = 0;
|
size_leaf_vector = 0;
|
||||||
memset(reserved, 0, sizeof(reserved));
|
std::memset(reserved, 0, sizeof(reserved));
|
||||||
}
|
}
|
||||||
/*!
|
/*!
|
||||||
* \brief set parameters from outside
|
* \brief set parameters from outside
|
||||||
@ -61,6 +61,7 @@ class TreeModel {
|
|||||||
* \param val value of the parameter
|
* \param val value of the parameter
|
||||||
*/
|
*/
|
||||||
inline void SetParam(const char *name, const char *val) {
|
inline void SetParam(const char *name, const char *val) {
|
||||||
|
using namespace std;
|
||||||
if (!strcmp("num_roots", name)) num_roots = atoi(val);
|
if (!strcmp("num_roots", name)) num_roots = atoi(val);
|
||||||
if (!strcmp("num_feature", name)) num_feature = atoi(val);
|
if (!strcmp("num_feature", name)) num_feature = atoi(val);
|
||||||
if (!strcmp("size_leaf_vector", name)) size_leaf_vector = atoi(val);
|
if (!strcmp("size_leaf_vector", name)) size_leaf_vector = atoi(val);
|
||||||
|
|||||||
@ -65,6 +65,7 @@ struct TrainParam{
|
|||||||
* \param val value of the parameter
|
* \param val value of the parameter
|
||||||
*/
|
*/
|
||||||
inline void SetParam(const char *name, const char *val) {
|
inline void SetParam(const char *name, const char *val) {
|
||||||
|
using namespace std;
|
||||||
// sync-names
|
// sync-names
|
||||||
if (!strcmp(name, "gamma")) min_split_loss = static_cast<float>(atof(val));
|
if (!strcmp(name, "gamma")) min_split_loss = static_cast<float>(atof(val));
|
||||||
if (!strcmp(name, "eta")) learning_rate = static_cast<float>(atof(val));
|
if (!strcmp(name, "eta")) learning_rate = static_cast<float>(atof(val));
|
||||||
|
|||||||
@ -1,7 +1,6 @@
|
|||||||
#define _CRT_SECURE_NO_WARNINGS
|
#define _CRT_SECURE_NO_WARNINGS
|
||||||
#define _CRT_SECURE_NO_DEPRECATE
|
#define _CRT_SECURE_NO_DEPRECATE
|
||||||
#include <cstring>
|
#include <cstring>
|
||||||
using namespace std;
|
|
||||||
#include "./updater.h"
|
#include "./updater.h"
|
||||||
#include "./updater_prune-inl.hpp"
|
#include "./updater_prune-inl.hpp"
|
||||||
#include "./updater_refresh-inl.hpp"
|
#include "./updater_refresh-inl.hpp"
|
||||||
@ -10,6 +9,7 @@ using namespace std;
|
|||||||
namespace xgboost {
|
namespace xgboost {
|
||||||
namespace tree {
|
namespace tree {
|
||||||
IUpdater* CreateUpdater(const char *name) {
|
IUpdater* CreateUpdater(const char *name) {
|
||||||
|
using namespace std;
|
||||||
if (!strcmp(name, "prune")) return new TreePruner();
|
if (!strcmp(name, "prune")) return new TreePruner();
|
||||||
if (!strcmp(name, "refresh")) return new TreeRefresher<GradStats>();
|
if (!strcmp(name, "refresh")) return new TreeRefresher<GradStats>();
|
||||||
if (!strcmp(name, "grow_colmaker")) return new ColMaker<GradStats>();
|
if (!strcmp(name, "grow_colmaker")) return new ColMaker<GradStats>();
|
||||||
|
|||||||
@ -85,18 +85,18 @@ class ColMaker: public IUpdater {
|
|||||||
const BoosterInfo &info,
|
const BoosterInfo &info,
|
||||||
RegTree *p_tree) {
|
RegTree *p_tree) {
|
||||||
this->InitData(gpair, *p_fmat, info.root_index, *p_tree);
|
this->InitData(gpair, *p_fmat, info.root_index, *p_tree);
|
||||||
this->InitNewNode(qexpand, gpair, *p_fmat, info, *p_tree);
|
this->InitNewNode(qexpand_, gpair, *p_fmat, info, *p_tree);
|
||||||
for (int depth = 0; depth < param.max_depth; ++depth) {
|
for (int depth = 0; depth < param.max_depth; ++depth) {
|
||||||
this->FindSplit(depth, this->qexpand, gpair, p_fmat, info, p_tree);
|
this->FindSplit(depth, qexpand_, gpair, p_fmat, info, p_tree);
|
||||||
this->ResetPosition(this->qexpand, p_fmat, *p_tree);
|
this->ResetPosition(qexpand_, p_fmat, *p_tree);
|
||||||
this->UpdateQueueExpand(*p_tree, &this->qexpand);
|
this->UpdateQueueExpand(*p_tree, &qexpand_);
|
||||||
this->InitNewNode(qexpand, gpair, *p_fmat, info, *p_tree);
|
this->InitNewNode(qexpand_, gpair, *p_fmat, info, *p_tree);
|
||||||
// if nothing left to be expand, break
|
// if nothing left to be expand, break
|
||||||
if (qexpand.size() == 0) break;
|
if (qexpand_.size() == 0) break;
|
||||||
}
|
}
|
||||||
// set all the rest expanding nodes to leaf
|
// set all the rest expanding nodes to leaf
|
||||||
for (size_t i = 0; i < qexpand.size(); ++i) {
|
for (size_t i = 0; i < qexpand_.size(); ++i) {
|
||||||
const int nid = qexpand[i];
|
const int nid = qexpand_[i];
|
||||||
(*p_tree)[nid].set_leaf(snode[nid].weight * param.learning_rate);
|
(*p_tree)[nid].set_leaf(snode[nid].weight * param.learning_rate);
|
||||||
}
|
}
|
||||||
// remember auxiliary statistics in the tree node
|
// remember auxiliary statistics in the tree node
|
||||||
@ -169,9 +169,9 @@ class ColMaker: public IUpdater {
|
|||||||
snode.reserve(256);
|
snode.reserve(256);
|
||||||
}
|
}
|
||||||
{// expand query
|
{// expand query
|
||||||
qexpand.reserve(256); qexpand.clear();
|
qexpand_.reserve(256); qexpand_.clear();
|
||||||
for (int i = 0; i < tree.param.num_roots; ++i) {
|
for (int i = 0; i < tree.param.num_roots; ++i) {
|
||||||
qexpand.push_back(i);
|
qexpand_.push_back(i);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@ -233,6 +233,7 @@ class ColMaker: public IUpdater {
|
|||||||
const BoosterInfo &info) {
|
const BoosterInfo &info) {
|
||||||
bool need_forward = param.need_forward_search(fmat.GetColDensity(fid));
|
bool need_forward = param.need_forward_search(fmat.GetColDensity(fid));
|
||||||
bool need_backward = param.need_backward_search(fmat.GetColDensity(fid));
|
bool need_backward = param.need_backward_search(fmat.GetColDensity(fid));
|
||||||
|
const std::vector<int> &qexpand = qexpand_;
|
||||||
int nthread;
|
int nthread;
|
||||||
#pragma omp parallel
|
#pragma omp parallel
|
||||||
{
|
{
|
||||||
@ -362,6 +363,7 @@ class ColMaker: public IUpdater {
|
|||||||
const std::vector<bst_gpair> &gpair,
|
const std::vector<bst_gpair> &gpair,
|
||||||
const BoosterInfo &info,
|
const BoosterInfo &info,
|
||||||
std::vector<ThreadEntry> &temp) {
|
std::vector<ThreadEntry> &temp) {
|
||||||
|
const std::vector<int> &qexpand = qexpand_;
|
||||||
// clear all the temp statistics
|
// clear all the temp statistics
|
||||||
for (size_t j = 0; j < qexpand.size(); ++j) {
|
for (size_t j = 0; j < qexpand.size(); ++j) {
|
||||||
temp[qexpand[j]].stats.Clear();
|
temp[qexpand[j]].stats.Clear();
|
||||||
@ -382,7 +384,7 @@ class ColMaker: public IUpdater {
|
|||||||
e.last_fvalue = fvalue;
|
e.last_fvalue = fvalue;
|
||||||
} else {
|
} else {
|
||||||
// try to find a split
|
// try to find a split
|
||||||
if (fabsf(fvalue - e.last_fvalue) > rt_2eps && e.stats.sum_hess >= param.min_child_weight) {
|
if (std::abs(fvalue - e.last_fvalue) > rt_2eps && e.stats.sum_hess >= param.min_child_weight) {
|
||||||
c.SetSubstract(snode[nid].stats, e.stats);
|
c.SetSubstract(snode[nid].stats, e.stats);
|
||||||
if (c.sum_hess >= param.min_child_weight) {
|
if (c.sum_hess >= param.min_child_weight) {
|
||||||
bst_float loss_chg = static_cast<bst_float>(e.stats.CalcGain(param) + c.CalcGain(param) - snode[nid].root_gain);
|
bst_float loss_chg = static_cast<bst_float>(e.stats.CalcGain(param) + c.CalcGain(param) - snode[nid].root_gain);
|
||||||
@ -539,7 +541,7 @@ class ColMaker: public IUpdater {
|
|||||||
/*! \brief TreeNode Data: statistics for each constructed node */
|
/*! \brief TreeNode Data: statistics for each constructed node */
|
||||||
std::vector<NodeEntry> snode;
|
std::vector<NodeEntry> snode;
|
||||||
/*! \brief queue of nodes to be expanded */
|
/*! \brief queue of nodes to be expanded */
|
||||||
std::vector<int> qexpand;
|
std::vector<int> qexpand_;
|
||||||
};
|
};
|
||||||
};
|
};
|
||||||
|
|
||||||
|
|||||||
@ -17,6 +17,7 @@ class TreePruner: public IUpdater {
|
|||||||
virtual ~TreePruner(void) {}
|
virtual ~TreePruner(void) {}
|
||||||
// set training parameter
|
// set training parameter
|
||||||
virtual void SetParam(const char *name, const char *val) {
|
virtual void SetParam(const char *name, const char *val) {
|
||||||
|
using namespace std;
|
||||||
param.SetParam(name, val);
|
param.SetParam(name, val);
|
||||||
if (!strcmp(name, "silent")) silent = atoi(val);
|
if (!strcmp(name, "silent")) silent = atoi(val);
|
||||||
}
|
}
|
||||||
|
|||||||
@ -24,15 +24,15 @@ class FeatMap {
|
|||||||
// function definitions
|
// function definitions
|
||||||
/*! \brief load feature map from text format */
|
/*! \brief load feature map from text format */
|
||||||
inline void LoadText(const char *fname) {
|
inline void LoadText(const char *fname) {
|
||||||
FILE *fi = utils::FopenCheck(fname, "r");
|
std::FILE *fi = utils::FopenCheck(fname, "r");
|
||||||
this->LoadText(fi);
|
this->LoadText(fi);
|
||||||
fclose(fi);
|
std::fclose(fi);
|
||||||
}
|
}
|
||||||
/*! \brief load feature map from text format */
|
/*! \brief load feature map from text format */
|
||||||
inline void LoadText(FILE *fi) {
|
inline void LoadText(std::FILE *fi) {
|
||||||
int fid;
|
int fid;
|
||||||
char fname[1256], ftype[1256];
|
char fname[1256], ftype[1256];
|
||||||
while (fscanf(fi, "%d\t%[^\t]\t%s\n", &fid, fname, ftype) == 3) {
|
while (std::fscanf(fi, "%d\t%[^\t]\t%s\n", &fid, fname, ftype) == 3) {
|
||||||
this->PushBack(fid, fname, ftype);
|
this->PushBack(fid, fname, ftype);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@ -62,6 +62,7 @@ class FeatMap {
|
|||||||
|
|
||||||
private:
|
private:
|
||||||
inline static Type GetType(const char *tname) {
|
inline static Type GetType(const char *tname) {
|
||||||
|
using namespace std;
|
||||||
if (!strcmp("i", tname)) return kIndicator;
|
if (!strcmp("i", tname)) return kIndicator;
|
||||||
if (!strcmp("q", tname)) return kQuantitive;
|
if (!strcmp("q", tname)) return kQuantitive;
|
||||||
if (!strcmp("int", tname)) return kInteger;
|
if (!strcmp("int", tname)) return kInteger;
|
||||||
|
|||||||
@ -105,20 +105,20 @@ class FileStream : public ISeekStream {
|
|||||||
this->fp = NULL;
|
this->fp = NULL;
|
||||||
}
|
}
|
||||||
virtual size_t Read(void *ptr, size_t size) {
|
virtual size_t Read(void *ptr, size_t size) {
|
||||||
return fread(ptr, size, 1, fp);
|
return std::fread(ptr, size, 1, fp);
|
||||||
}
|
}
|
||||||
virtual void Write(const void *ptr, size_t size) {
|
virtual void Write(const void *ptr, size_t size) {
|
||||||
fwrite(ptr, size, 1, fp);
|
std::fwrite(ptr, size, 1, fp);
|
||||||
}
|
}
|
||||||
virtual void Seek(long pos) {
|
virtual void Seek(long pos) {
|
||||||
fseek(fp, pos, SEEK_SET);
|
std::fseek(fp, pos, SEEK_SET);
|
||||||
}
|
}
|
||||||
virtual long Tell(void) {
|
virtual long Tell(void) {
|
||||||
return ftell(fp);
|
return std::ftell(fp);
|
||||||
}
|
}
|
||||||
inline void Close(void) {
|
inline void Close(void) {
|
||||||
if (fp != NULL){
|
if (fp != NULL){
|
||||||
fclose(fp); fp = NULL;
|
std::fclose(fp); fp = NULL;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@ -53,7 +53,7 @@ inline double NextDouble(void) {
|
|||||||
}
|
}
|
||||||
/*! \brief return a random number in n */
|
/*! \brief return a random number in n */
|
||||||
inline uint32_t NextUInt32(uint32_t n) {
|
inline uint32_t NextUInt32(uint32_t n) {
|
||||||
return (uint32_t)floor(NextDouble() * n);
|
return (uint32_t)std::floor(NextDouble() * n);
|
||||||
}
|
}
|
||||||
/*! \brief return x~N(mu,sigma^2) */
|
/*! \brief return x~N(mu,sigma^2) */
|
||||||
inline double SampleNormal(double mu, double sigma) {
|
inline double SampleNormal(double mu, double sigma) {
|
||||||
|
|||||||
@ -149,8 +149,8 @@ inline void Error(const char *fmt, ...) {
|
|||||||
#endif
|
#endif
|
||||||
|
|
||||||
/*! \brief replace fopen, report error when the file open fails */
|
/*! \brief replace fopen, report error when the file open fails */
|
||||||
inline FILE *FopenCheck(const char *fname, const char *flag) {
|
inline std::FILE *FopenCheck(const char *fname, const char *flag) {
|
||||||
FILE *fp = fopen64(fname, flag);
|
std::FILE *fp = fopen64(fname, flag);
|
||||||
Check(fp != NULL, "can not open file \"%s\"\n", fname);
|
Check(fp != NULL, "can not open file \"%s\"\n", fname);
|
||||||
return fp;
|
return fp;
|
||||||
}
|
}
|
||||||
|
|||||||
@ -2,11 +2,10 @@ Wrapper of XGBoost
|
|||||||
=====
|
=====
|
||||||
This folder provides wrapper of xgboost to other languages
|
This folder provides wrapper of xgboost to other languages
|
||||||
|
|
||||||
|
|
||||||
Python
|
Python
|
||||||
=====
|
=====
|
||||||
* To make the python module, type ```make``` in the root directory of project
|
* To make the python module, type ```make``` in the root directory of project
|
||||||
* Refer to the walk through example in [python-example/demo.py](python-example/demo.py)
|
* Refer also to the walk through example in [demo folder](../demo/guide-python)
|
||||||
|
|
||||||
R
|
R
|
||||||
=====
|
=====
|
||||||
|
|||||||
@ -1,3 +0,0 @@
|
|||||||
example to use python xgboost, the data is generated from demo/binary_classification, in libsvm format
|
|
||||||
|
|
||||||
for usage: see demo.py and comments in demo.py
|
|
||||||
@ -1,121 +0,0 @@
|
|||||||
#!/usr/bin/python
|
|
||||||
import sys
|
|
||||||
import numpy as np
|
|
||||||
import scipy.sparse
|
|
||||||
# append the path to xgboost, you may need to change the following line
|
|
||||||
# alternatively, you can add the path to PYTHONPATH environment variable
|
|
||||||
sys.path.append('../')
|
|
||||||
import xgboost as xgb
|
|
||||||
|
|
||||||
### simple example
|
|
||||||
# load file from text file, also binary buffer generated by xgboost
|
|
||||||
dtrain = xgb.DMatrix('agaricus.txt.train')
|
|
||||||
dtest = xgb.DMatrix('agaricus.txt.test')
|
|
||||||
|
|
||||||
# specify parameters via map, definition are same as c++ version
|
|
||||||
param = {'max_depth':2, 'eta':1, 'silent':1, 'objective':'binary:logistic' }
|
|
||||||
|
|
||||||
# specify validations set to watch performance
|
|
||||||
evallist = [(dtest,'eval'), (dtrain,'train')]
|
|
||||||
num_round = 2
|
|
||||||
bst = xgb.train(param, dtrain, num_round, evallist)
|
|
||||||
|
|
||||||
# this is prediction
|
|
||||||
preds = bst.predict(dtest)
|
|
||||||
labels = dtest.get_label()
|
|
||||||
print ('error=%f' % ( sum(1 for i in range(len(preds)) if int(preds[i]>0.5)!=labels[i]) /float(len(preds))))
|
|
||||||
bst.save_model('0001.model')
|
|
||||||
# dump model
|
|
||||||
bst.dump_model('dump.raw.txt')
|
|
||||||
# dump model with feature map
|
|
||||||
bst.dump_model('dump.nice.txt','featmap.txt')
|
|
||||||
|
|
||||||
# save dmatrix into binary buffer
|
|
||||||
dtest.save_binary('dtest.buffer')
|
|
||||||
bst.save_model('xgb.model')
|
|
||||||
# load model and data in
|
|
||||||
bst2 = xgb.Booster(model_file='xgb.model')
|
|
||||||
dtest2 = xgb.DMatrix('dtest.buffer')
|
|
||||||
preds2 = bst2.predict(dtest2)
|
|
||||||
# assert they are the same
|
|
||||||
assert np.sum(np.abs(preds2-preds)) == 0
|
|
||||||
|
|
||||||
###
|
|
||||||
# build dmatrix from scipy.sparse
|
|
||||||
print ('start running example of build DMatrix from scipy.sparse')
|
|
||||||
labels = []
|
|
||||||
row = []; col = []; dat = []
|
|
||||||
i = 0
|
|
||||||
for l in open('agaricus.txt.train'):
|
|
||||||
arr = l.split()
|
|
||||||
labels.append( int(arr[0]))
|
|
||||||
for it in arr[1:]:
|
|
||||||
k,v = it.split(':')
|
|
||||||
row.append(i); col.append(int(k)); dat.append(float(v))
|
|
||||||
i += 1
|
|
||||||
csr = scipy.sparse.csr_matrix( (dat, (row,col)) )
|
|
||||||
dtrain = xgb.DMatrix( csr )
|
|
||||||
dtrain.set_label(labels)
|
|
||||||
evallist = [(dtest,'eval'), (dtrain,'train')]
|
|
||||||
bst = xgb.train( param, dtrain, num_round, evallist )
|
|
||||||
|
|
||||||
print ('start running example of build DMatrix from numpy array')
|
|
||||||
# NOTE: npymat is numpy array, we will convert it into scipy.sparse.csr_matrix in internal implementation,then convert to DMatrix
|
|
||||||
npymat = csr.todense()
|
|
||||||
dtrain = xgb.DMatrix( npymat)
|
|
||||||
dtrain.set_label(labels)
|
|
||||||
evallist = [(dtest,'eval'), (dtrain,'train')]
|
|
||||||
bst = xgb.train( param, dtrain, num_round, evallist )
|
|
||||||
|
|
||||||
###
|
|
||||||
# advanced: cutomsized loss function
|
|
||||||
#
|
|
||||||
print ('start running example to used cutomized objective function')
|
|
||||||
|
|
||||||
# note: for customized objective function, we leave objective as default
|
|
||||||
# note: what we are getting is margin value in prediction
|
|
||||||
# you must know what you are doing
|
|
||||||
param = {'max_depth':2, 'eta':1, 'silent':1 }
|
|
||||||
|
|
||||||
# user define objective function, given prediction, return gradient and second order gradient
|
|
||||||
# this is loglikelihood loss
|
|
||||||
def logregobj(preds, dtrain):
|
|
||||||
labels = dtrain.get_label()
|
|
||||||
preds = 1.0 / (1.0 + np.exp(-preds))
|
|
||||||
grad = preds - labels
|
|
||||||
hess = preds * (1.0-preds)
|
|
||||||
return grad, hess
|
|
||||||
|
|
||||||
# user defined evaluation function, return a pair metric_name, result
|
|
||||||
# NOTE: when you do customized loss function, the default prediction value is margin
|
|
||||||
# this may make buildin evalution metric not function properly
|
|
||||||
# for example, we are doing logistic loss, the prediction is score before logistic transformation
|
|
||||||
# the buildin evaluation error assumes input is after logistic transformation
|
|
||||||
# Take this in mind when you use the customization, and maybe you need write customized evaluation function
|
|
||||||
def evalerror(preds, dtrain):
|
|
||||||
labels = dtrain.get_label()
|
|
||||||
# return a pair metric_name, result
|
|
||||||
# since preds are margin(before logistic transformation, cutoff at 0)
|
|
||||||
return 'error', float(sum(labels != (preds > 0.0))) / len(labels)
|
|
||||||
|
|
||||||
# training with customized objective, we can also do step by step training
|
|
||||||
# simply look at xgboost.py's implementation of train
|
|
||||||
bst = xgb.train(param, dtrain, num_round, evallist, logregobj, evalerror)
|
|
||||||
|
|
||||||
###
|
|
||||||
# advanced: start from a initial base prediction
|
|
||||||
#
|
|
||||||
print ('start running example to start from a initial prediction')
|
|
||||||
# specify parameters via map, definition are same as c++ version
|
|
||||||
param = {'max_depth':2, 'eta':1, 'silent':1, 'objective':'binary:logistic' }
|
|
||||||
# train xgboost for 1 round
|
|
||||||
bst = xgb.train( param, dtrain, 1, evallist )
|
|
||||||
# Note: we need the margin value instead of transformed prediction in set_base_margin
|
|
||||||
# do predict with output_margin=True, will always give you margin values before logistic transformation
|
|
||||||
ptrain = bst.predict(dtrain, output_margin=True)
|
|
||||||
ptest = bst.predict(dtest, output_margin=True)
|
|
||||||
dtrain.set_base_margin(ptrain)
|
|
||||||
dtest.set_base_margin(ptest)
|
|
||||||
|
|
||||||
print ('this is result of running from initial prediction')
|
|
||||||
bst = xgb.train( param, dtrain, 1, evallist )
|
|
||||||
@ -318,7 +318,7 @@ class Booster:
|
|||||||
self.handle, ctypes.c_char_p(k.encode('utf-8')),
|
self.handle, ctypes.c_char_p(k.encode('utf-8')),
|
||||||
ctypes.c_char_p(str(v).encode('utf-8')))
|
ctypes.c_char_p(str(v).encode('utf-8')))
|
||||||
|
|
||||||
def update(self, dtrain, it):
|
def update(self, dtrain, it, fobj=None):
|
||||||
"""
|
"""
|
||||||
update
|
update
|
||||||
Args:
|
Args:
|
||||||
@ -326,11 +326,19 @@ class Booster:
|
|||||||
the training DMatrix
|
the training DMatrix
|
||||||
it: int
|
it: int
|
||||||
current iteration number
|
current iteration number
|
||||||
|
fobj: function
|
||||||
|
cutomzied objective function
|
||||||
Returns:
|
Returns:
|
||||||
None
|
None
|
||||||
"""
|
"""
|
||||||
assert isinstance(dtrain, DMatrix)
|
assert isinstance(dtrain, DMatrix)
|
||||||
|
if fobj is None:
|
||||||
xglib.XGBoosterUpdateOneIter(self.handle, it, dtrain.handle)
|
xglib.XGBoosterUpdateOneIter(self.handle, it, dtrain.handle)
|
||||||
|
else:
|
||||||
|
pred = self.predict( dtrain )
|
||||||
|
grad, hess = fobj( pred, dtrain )
|
||||||
|
self.boost( dtrain, grad, hess )
|
||||||
|
|
||||||
def boost(self, dtrain, grad, hess):
|
def boost(self, dtrain, grad, hess):
|
||||||
""" update
|
""" update
|
||||||
Args:
|
Args:
|
||||||
@ -347,15 +355,19 @@ class Booster:
|
|||||||
(ctypes.c_float*len(grad))(*grad),
|
(ctypes.c_float*len(grad))(*grad),
|
||||||
(ctypes.c_float*len(hess))(*hess),
|
(ctypes.c_float*len(hess))(*hess),
|
||||||
len(grad))
|
len(grad))
|
||||||
def eval_set(self, evals, it = 0):
|
|
||||||
|
def eval_set(self, evals, it = 0, feval = None):
|
||||||
"""evaluates by metric
|
"""evaluates by metric
|
||||||
Args:
|
Args:
|
||||||
evals: list of tuple (DMatrix, string)
|
evals: list of tuple (DMatrix, string)
|
||||||
lists of items to be evaluated
|
lists of items to be evaluated
|
||||||
it: int
|
it: int
|
||||||
|
feval: function
|
||||||
|
custom evaluation function
|
||||||
Returns:
|
Returns:
|
||||||
evals result
|
evals result
|
||||||
"""
|
"""
|
||||||
|
if feval is None:
|
||||||
for d in evals:
|
for d in evals:
|
||||||
assert isinstance(d[0], DMatrix)
|
assert isinstance(d[0], DMatrix)
|
||||||
assert isinstance(d[1], str)
|
assert isinstance(d[1], str)
|
||||||
@ -363,6 +375,12 @@ class Booster:
|
|||||||
evnames = (ctypes.c_char_p * len(evals))(
|
evnames = (ctypes.c_char_p * len(evals))(
|
||||||
* [ctypes.c_char_p(d[1].encode('utf-8')) for d in evals])
|
* [ctypes.c_char_p(d[1].encode('utf-8')) for d in evals])
|
||||||
return xglib.XGBoosterEvalOneIter(self.handle, it, dmats, evnames, len(evals))
|
return xglib.XGBoosterEvalOneIter(self.handle, it, dmats, evnames, len(evals))
|
||||||
|
else:
|
||||||
|
res = '[%d]' % it
|
||||||
|
for dm, evname in evals:
|
||||||
|
name, val = feval(self.predict(dm), dm)
|
||||||
|
res += '\t%s-%s:%f' % (evname, name, val)
|
||||||
|
return res
|
||||||
def eval(self, mat, name = 'eval', it = 0):
|
def eval(self, mat, name = 'eval', it = 0):
|
||||||
return self.eval_set( [(mat,name)], it)
|
return self.eval_set( [(mat,name)], it)
|
||||||
def predict(self, data, output_margin=False, ntree_limit=0):
|
def predict(self, data, output_margin=False, ntree_limit=0):
|
||||||
@ -373,7 +391,6 @@ class Booster:
|
|||||||
the dmatrix storing the input
|
the dmatrix storing the input
|
||||||
output_margin: bool
|
output_margin: bool
|
||||||
whether output raw margin value that is untransformed
|
whether output raw margin value that is untransformed
|
||||||
|
|
||||||
ntree_limit: limit number of trees in prediction, default to 0, 0 means using all the trees
|
ntree_limit: limit number of trees in prediction, default to 0, 0 means using all the trees
|
||||||
Returns:
|
Returns:
|
||||||
numpy array of prediction
|
numpy array of prediction
|
||||||
@ -447,30 +464,6 @@ class Booster:
|
|||||||
fmap[fid]+= 1
|
fmap[fid]+= 1
|
||||||
return fmap
|
return fmap
|
||||||
|
|
||||||
def evaluate(bst, evals, it, feval = None):
|
|
||||||
"""evaluation on eval set
|
|
||||||
Args:
|
|
||||||
bst: XGBoost object
|
|
||||||
object of XGBoost model
|
|
||||||
evals: list of tuple (DMatrix, string)
|
|
||||||
obj need to be evaluated
|
|
||||||
it: int
|
|
||||||
feval: optional
|
|
||||||
Returns:
|
|
||||||
eval result
|
|
||||||
"""
|
|
||||||
if feval != None:
|
|
||||||
res = '[%d]' % it
|
|
||||||
for dm, evname in evals:
|
|
||||||
name, val = feval(bst.predict(dm), dm)
|
|
||||||
res += '\t%s-%s:%f' % (evname, name, val)
|
|
||||||
else:
|
|
||||||
res = bst.eval_set(evals, it)
|
|
||||||
|
|
||||||
return res
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
def train(params, dtrain, num_boost_round = 10, evals = [], obj=None, feval=None):
|
def train(params, dtrain, num_boost_round = 10, evals = [], obj=None, feval=None):
|
||||||
""" train a booster with given paramaters
|
""" train a booster with given paramaters
|
||||||
Args:
|
Args:
|
||||||
@ -482,26 +475,69 @@ def train(params, dtrain, num_boost_round = 10, evals = [], obj=None, feval=None
|
|||||||
num of round to be boosted
|
num of round to be boosted
|
||||||
evals: list
|
evals: list
|
||||||
list of items to be evaluated
|
list of items to be evaluated
|
||||||
obj:
|
obj: function
|
||||||
feval:
|
cutomized objective function
|
||||||
|
feval: function
|
||||||
|
cutomized evaluation function
|
||||||
"""
|
"""
|
||||||
bst = Booster(params, [dtrain]+[ d[0] for d in evals ] )
|
bst = Booster(params, [dtrain]+[ d[0] for d in evals ] )
|
||||||
if obj is None:
|
|
||||||
for i in range(num_boost_round):
|
for i in range(num_boost_round):
|
||||||
bst.update( dtrain, i )
|
bst.update( dtrain, i, obj )
|
||||||
if len(evals) != 0:
|
if len(evals) != 0:
|
||||||
sys.stderr.write(evaluate(bst, evals, i, feval).decode()+'\n')
|
sys.stderr.write(bst.eval_set(evals, i, feval).decode()+'\n')
|
||||||
else:
|
|
||||||
# try customized objective function
|
|
||||||
for i in range(num_boost_round):
|
|
||||||
pred = bst.predict( dtrain )
|
|
||||||
grad, hess = obj( pred, dtrain )
|
|
||||||
bst.boost( dtrain, grad, hess )
|
|
||||||
if len(evals) != 0:
|
|
||||||
sys.stderr.write(evaluate(bst, evals, i, feval)+'\n')
|
|
||||||
return bst
|
return bst
|
||||||
|
|
||||||
def cv(params, dtrain, num_boost_round = 10, nfold=3, evals = [], obj=None, feval=None):
|
class CVPack:
|
||||||
|
def __init__(self, dtrain, dtest, param):
|
||||||
|
self.dtrain = dtrain
|
||||||
|
self.dtest = dtest
|
||||||
|
self.watchlist = watchlist = [ (dtrain,'train'), (dtest, 'test') ]
|
||||||
|
self.bst = Booster(param, [dtrain,dtest])
|
||||||
|
def update(self, r, fobj):
|
||||||
|
self.bst.update(self.dtrain, r, fobj)
|
||||||
|
def eval(self, r, feval):
|
||||||
|
return self.bst.eval_set(self.watchlist, r, feval)
|
||||||
|
|
||||||
|
def mknfold(dall, nfold, param, seed, evals=[], fpreproc = None):
|
||||||
|
"""
|
||||||
|
mk nfold list of cvpack from randidx
|
||||||
|
"""
|
||||||
|
np.random.seed(seed)
|
||||||
|
randidx = np.random.permutation(dall.num_row())
|
||||||
|
kstep = len(randidx) / nfold
|
||||||
|
idset = [randidx[ (i*kstep) : min(len(randidx),(i+1)*kstep) ] for i in range(nfold)]
|
||||||
|
ret = []
|
||||||
|
for k in range(nfold):
|
||||||
|
dtrain = dall.slice(np.concatenate([idset[i] for i in range(nfold) if k != i]))
|
||||||
|
dtest = dall.slice(idset[k])
|
||||||
|
# run preprocessing on the data set if needed
|
||||||
|
if fpreproc is not None:
|
||||||
|
dtrain, dtest, tparam = fpreproc(dtrain, dtest, param.copy())
|
||||||
|
plst = tparam.items() + [('eval_metric', itm) for itm in evals]
|
||||||
|
ret.append(CVPack(dtrain, dtest, plst))
|
||||||
|
return ret
|
||||||
|
|
||||||
|
def aggcv(rlist):
|
||||||
|
"""
|
||||||
|
aggregate cross validation results
|
||||||
|
"""
|
||||||
|
cvmap = {}
|
||||||
|
ret = rlist[0].split()[0]
|
||||||
|
for line in rlist:
|
||||||
|
arr = line.split()
|
||||||
|
assert ret == arr[0]
|
||||||
|
for it in arr[1:]:
|
||||||
|
k, v = it.split(':')
|
||||||
|
if k not in cvmap:
|
||||||
|
cvmap[k] = []
|
||||||
|
cvmap[k].append(float(v))
|
||||||
|
for k, v in sorted(cvmap.items(), key = lambda x:x[0]):
|
||||||
|
v = np.array(v)
|
||||||
|
ret += '\t%s:%f+%f' % (k, np.mean(v), np.std(v))
|
||||||
|
return ret
|
||||||
|
|
||||||
|
def cv(params, dtrain, num_boost_round = 10, nfold=3, eval_metric = [], \
|
||||||
|
obj = None, feval = None, fpreproc = None):
|
||||||
""" cross validation with given paramaters
|
""" cross validation with given paramaters
|
||||||
Args:
|
Args:
|
||||||
params: dict
|
params: dict
|
||||||
@ -512,15 +548,16 @@ def cv(params, dtrain, num_boost_round = 10, nfold=3, evals = [], obj=None, feva
|
|||||||
num of round to be boosted
|
num of round to be boosted
|
||||||
nfold: int
|
nfold: int
|
||||||
folds to do cv
|
folds to do cv
|
||||||
evals: list
|
evals: list or
|
||||||
list of items to be evaluated
|
list of items to be evaluated
|
||||||
obj:
|
obj:
|
||||||
feval:
|
feval:
|
||||||
|
fpreproc: preprocessing function that takes dtrain, dtest,
|
||||||
|
param and return transformed version of dtrain, dtest, param
|
||||||
"""
|
"""
|
||||||
plst = list(params.items())+[('eval_metric', itm) for itm in evals]
|
cvfolds = mknfold(dtrain, nfold, params, 0, eval_metric, fpreproc)
|
||||||
cvfolds = mknfold(dtrain, nfold, plst, 0)
|
|
||||||
for i in range(num_boost_round):
|
for i in range(num_boost_round):
|
||||||
for f in cvfolds:
|
for f in cvfolds:
|
||||||
f.update(i)
|
f.update(i, obj)
|
||||||
res = aggcv([f.eval(i) for f in cvfolds])
|
res = aggcv([f.eval(i, feval) for f in cvfolds])
|
||||||
sys.stderr.write(res+'\n')
|
sys.stderr.write(res+'\n')
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user