Compare commits

..

56 Commits
v0.7 ... v0.71

Author SHA1 Message Date
Philip Hyunsu Cho
230cb9b787 Release version 0.71 (#3200) 2018-04-11 21:43:32 +09:00
Nan Zhu
4109818b32 [jvm-packages] add back libsvm notes (#3232)
* add back train method but mark as deprecated

* add back train method but mark as deprecated

* fix scalastyle error

* fix scalastyle error

* add back libsvm notes
2018-04-10 09:00:58 -07:00
Rory Mitchell
443ff746e9 Fix logic in GPU predictor cache lookup (#3217)
* Fix logic in GPU predictor cache lookup

* Add sklearn test for GPU prediction
2018-04-04 15:08:22 +12:00
Rory Mitchell
a1ec7b1716 Change reduce operation from thrust to cub. Fix for cuda 9.1 error (#3218)
* Change reduce operation from thrust to cub. Fix for cuda 9.1 runtime error

* Unit test sum reduce
2018-04-04 14:21:48 +12:00
Philip Hyunsu Cho
017acf54d9 Fix up make pippack command for building source package for PyPI (#3199)
* Now `make pippack` works without any manual action: it will produce
  xgboost-[version].tar.gz, which one can use by typing
  `pip3 install xgboost-[version].tar.gz`.
* Detect OpenMP-capable compilers (clang, gcc-5, gcc-7) on MacOS
2018-03-28 10:32:52 -07:00
Tong He
ace4016c36 Replace cBind by cbind (#3203)
* modify test_helper.R

* fix noLD

* update desc

* fix solaris test

* fix desc

* improve fix

* fix url

* change Matrix cBind to cbind

* fix

* fix error in demo

* fix examples
2018-03-28 10:05:47 -07:00
Philip Hyunsu Cho
b087620661 Condense MinGW installation instruction (#3201) 2018-03-25 03:05:11 -07:00
Yuan (Terry) Tang
92782a8406 Change DESCRIPTION to more modern look (#3179)
So other things can be added in comment field, such as ORCID.
2018-03-23 10:45:10 -04:00
Arjan van der Velde
04221a7469 rank_metric: add AUC-PR (#3172)
* rank_metric: add AUC-PR

Implementation of the AUC-PR calculation for weighted data, proposed by Keilwagen, Grosse and Grau (https://doi.org/10.1371/journal.pone.0092209)

* rank_metric: fix lint warnings

* Implement tests for AUC-PR and fix implementation

* add aucpr to documentation for other languages
2018-03-23 10:43:47 -04:00
zhaocc
8fb3388af2 fix typo (#3188) 2018-03-21 19:24:29 -04:00
Will Storey
00d9728e4b Fix memory leak in XGDMatrixCreateFromMat_omp() (#3182)
* Fix memory leak in XGDMatrixCreateFromMat_omp()

This replaces the array allocated by new with a std::vector.

Fixes #3161
2018-03-18 15:03:27 +13:00
Will Storey
c85995952f Allow compilation with -Werror=strict-prototypes (#3183) 2018-03-18 12:25:42 +13:00
Rory Mitchell
9fa45d3a9c Fix bug with gpu_predictor caching behaviour (#3177)
* Fixes #3162
2018-03-18 10:35:10 +13:00
Ray Kim
cdc036b752 Fixed performance bug (#3171)
Minor performance improvements to gpu predictor
2018-03-15 09:40:24 +13:00
Rory Mitchell
7a81c87dfa Fix incorrect minimum value in quantile generation (#3167) 2018-03-14 08:21:18 -07:00
Vadim Khotilovich
706be4e5d4 Additional improvements for gblinear (#3134)
* fix rebase conflict

* [core] additional gblinear improvements

* [R] callback for gblinear coefficients history

* force eta=1 for gblinear python tests

* add top_k to GreedyFeatureSelector

* set eta=1 in shotgun test

* [core] fix SparsePage processing in gblinear; col-wise multithreading in greedy updater

* set sorted flag within TryInitColData

* gblinear tests: use scale, add external memory test

* fix multiclass for greedy updater

* fix whitespace

* fix typo
2018-03-13 01:27:13 -05:00
Andrew V. Adinetz
a1b48afa41 Added back UpdatePredictionCache() in updater_gpu_hist.cu. (#3120)
* Added back UpdatePredictionCache() in updater_gpu_hist.cu.

- it had been there before, but wasn't ported to the new version
  of updater_gpu_hist.cu
2018-03-09 15:06:45 +13:00
redditur
d5f1b74ef5 'hist': Montonic Constraints (#3085)
* Extended monotonic constraints support to 'hist' tree method.

* Added monotonic constraints tests.

* Fix the signature of NoConstraint::CalcSplitGain()

* Document monotonic constraint support in 'hist'

* Update signature of Update to account for latest refactor
2018-03-05 16:45:49 -08:00
Andrea Bergonzo
8937134015 Update build_trouble_shooting.md (#3144) 2018-03-02 16:23:45 -08:00
Philip Hyunsu Cho
32ea70c1c9 Documenting CSV loading into DMatrix (#3137)
* Support CSV file in DMatrix

We'd just need to expose the CSV parser in dmlc-core to the Python wrapper

* Revert extra code; document existing CSV support

CSV support is already there but undocumented

* Add notice about categorical features
2018-02-28 18:41:10 -08:00
Andrew V. Adinetz
d5992dd881 Replaced std::vector-based interfaces with HostDeviceVector-based interfaces. (#3116)
* Replaced std::vector-based interfaces with HostDeviceVector-based interfaces.

- replacement was performed in the learner, boosters, predictors,
  updaters, and objective functions
- only interfaces used in training were replaced;
  interfaces like PredictInstance() still use std::vector
- refactoring necessary for replacement of interfaces was also performed,
  such as using HostDeviceVector in prediction cache

* HostDeviceVector-based interfaces for custom objective function example plugin.
2018-02-28 13:00:04 +13:00
Yuan (Terry) Tang
11bfa8584d Remove unnecessary dependencies in distributed test (#3132) 2018-02-24 20:24:34 -05:00
Yuan (Terry) Tang
cf89fa7139 Remove additional "/" in external memory doc (#3131) 2018-02-24 14:27:03 -05:00
Yuan (Terry) Tang
5d4cc49080 Update GPU plug-in documentation link (#3130) 2018-02-24 13:37:12 -05:00
Philip Hyunsu Cho
3d7aff5697 Fix doc build (#3126)
* Fix doc build

ReadTheDocs build has been broken for a while due to incompatibilities between
commonmark, recommonmark, and sphinx. See:
* "Recommonmark not working with Sphinx 1.6"
  https://github.com/rtfd/recommonmark/issues/73
* "CommonMark 0.6.0 breaks compatibility"
  https://github.com/rtfd/recommonmark/issues/24
For now, we fix the versions to get the build working again

* Fix search bar
2018-02-21 16:57:30 -08:00
Dmitry Mottl
eb9e30bb30 Minor: fixed dropdown <li> width in xgboost.css (#3121) 2018-02-20 07:24:38 -08:00
Dmitry Mottl
20b733e1a0 Minor: removed extra parenthesis in doc (#3119) 2018-02-20 02:55:29 -08:00
tomisuker
8153ba6fe7 modify build guide from source on macOS (#2993)
* modify build guide from source on macOS

* fix; installation for macOS
2018-02-19 12:20:00 -08:00
Rory Mitchell
dd82b28e20 Update GPU code with dmatrix changes (#3117) 2018-02-17 12:11:48 +13:00
Rory Mitchell
10eb05a63a Refactor linear modelling and add new coordinate descent updater (#3103)
* Refactor linear modelling and add new coordinate descent updater

* Allow unsorted column iterator

* Add prediction cacheing to gblinear
2018-02-17 09:17:01 +13:00
Vadim Khotilovich
9ffe8596f2 [core] fix slow predict-caching with many classes (#3109)
* fix prediction caching inefficiency for multiclass

* silence some warnings

* redundant if

* workaround for R v3.4.3 bug; fixes #3081
2018-02-15 18:31:42 -06:00
Oleg Panichev
cf19caa46a Fix for ZeroDivisionError when verbose_eval equals to 0. (#3115) 2018-02-15 17:58:06 -06:00
Philip Hyunsu Cho
375d75304d Fix typos, addressing issues #2212 and #3090 (#3105) 2018-02-09 11:16:44 -08:00
Felipe Arruda Pontes
81d1b17f9c adding some docs based on core.Boost.predict (#1865) 2018-02-09 06:38:38 -08:00
cinqS
b99f56e386 added mingw64 installation instruction, and library file copy. (#2977)
* added mingw64 installation instruction, and library file copy.

* Change all `libxgboost.dll` to `xgboost.dll`

On Windows, the library file is called `xgboost.dll`, not `libxgboost.dll` as in the build doc previously
2018-02-09 01:54:15 -08:00
Abraham Zhan
874525c152 c_api.cc variable declared inapproiate (#3044)
In line 461, the "size_t offset = 0;" should be declared before any calculation, otherwise will cause compilation error. 

```
I:\Libraries\xgboost\src\c_api\c_api.cc(416): error C2146: Missing ";" before "offset" [I:\Libraries\xgboost\build\objxgboost.vcxproj]
```
2018-02-09 01:32:01 -08:00
Scott Lundberg
d878c36c84 Add SHAP interaction effects, fix minor bug, and add cox loss (#3043)
* Add interaction effects and cox loss

* Minimize whitespace changes

* Cox loss now no longer needs a pre-sorted dataset.

* Address code review comments

* Remove mem check, rename to pred_interactions, include bias

* Make lint happy

* More lint fixes

* Fix cox loss indexing

* Fix main effects and tests

* Fix lint

* Use half interaction values on the off-diagonals

* Fix lint again
2018-02-07 20:38:01 -06:00
Jonas
077abb35cd fix relative link to demo (#3066) 2018-02-07 01:09:03 -06:00
Vadim Khotilovich
94e655329f Replacing cout with LOG (#3076)
* change cout to LOG

* lint fix
2018-02-06 02:00:34 -06:00
Sergei Lebedev
7c99e90ecd [jvm-packages] Declared Spark as provided in the POM (#3093)
* [jvm-packages] Explicitly declared Spark dependencies as provided

* Removed noop spark-2.x profile
2018-02-05 10:06:06 -08:00
Peter M. Landwehr
86bf930497 Fix typo: cutomize -> customize (#3073) 2018-02-04 22:56:04 +01:00
Andrew V. Adinetz
24c2e41287 Fixed the bug with illegal memory access in test_large_sizes.py with 4 GPUs. (#3068)
- thrust::copy() called from dvec::copy() for gpairs invoked a GPU kernel instead of
  cudaMemcpy()
- this resulted in illegal memory access if the GPU running the kernel could not access
  the data being copied
- new version of dvec::copy() for thrust::device_ptr iterators calls cudaMemcpy(),
  avoiding the problem.
2018-02-01 16:54:46 +13:00
Tong He
98be9aef9a A fix for CRAN submission of version 0.7-0 (#3061)
* modify test_helper.R

* fix noLD

* update desc

* fix solaris test

* fix desc

* improve fix

* fix url
2018-01-27 17:06:28 -08:00
Vadim Khotilovich
c88bae112e change cmd to cmd.exe in appveyor (#3071) 2018-01-26 12:27:33 -06:00
tomasatdatabricks
5ef684641b Fixed SparkParallelTracker to work with Spark2.3 (#3062) 2018-01-25 04:31:38 +01:00
Rory Mitchell
f87802f00c Fix GPU bugs (#3051)
* Change uint to unsigned int

* Fix no root predictions bug

* Remove redundant splitting due to numerical instability
2018-01-23 13:14:15 +13:00
Yun Ni
8b2f4e2d39 [jvm-packages] Move cache files to TempDirectory and delete this directory after XGBoost job finishes (#3022)
* [jvm-packages] Move cache files to tmp dir and delete on exit

* Delete the cache dir when watches are deleted
2018-01-20 21:13:25 -08:00
Yun Ni
3f3f54bcad [jvm-packages] Update docs and unify the terminology (#3024)
* [jvm-packages] Move cache files to tmp dir and delete on exit

* [jvm-packages] Update docs and unify terminology

* Address CR Comments
2018-01-16 17:16:55 +01:00
Thejaswi
84ab74f3a5 Objective function evaluation on GPU with minimal PCIe transfers (#2935)
* Added GPU objective function and no-copy interface.

- xgboost::HostDeviceVector<T> syncs automatically between host and device
- no-copy interfaces have been added
- default implementations just sync the data to host
  and call the implementations with std::vector
- GPU objective function, predictor, histogram updater process data
  directly on GPU
2018-01-12 21:33:39 +13:00
Nan Zhu
a187ed6c8f [jvm-packages] tiny fix for empty partition in predict (#3014)
* add back train method but mark as deprecated

* add back train method but mark as deprecated

* fix scalastyle error

* fix scalastyle error

* tiny fix for empty partition in predict

* further fix
2018-01-07 08:34:18 -08:00
Yun Ni
740eba42f7 [jvm-packages] Add back the overriden finalize() method for SBooster (#3011)
* Convert SIGSEGV to XGBoostError

* Address CR Comments

* Address CR Comments
2018-01-06 14:07:37 -08:00
Yun Ni
65fb4e3f5c [jvm-packages] Prevent dispose being called on unfinalized JBooster (#3005)
* [jvm-packages] Prevent dispose being called twice when finalize

* Convert SIGSEGV to XGBoostError

* Avoid creating a new SBooster with the same JBooster

* Address CR Comments
2018-01-06 09:46:52 -08:00
Nan Zhu
9747ea2acb [jvm-packages] fix the pattern in dev script and version mismatch (#3009)
* add back train method but mark as deprecated

* add back train method but mark as deprecated

* fix scalastyle error

* fix scalastyle error

* fix the pattern in dev script and version mismatch
2018-01-06 06:59:38 -08:00
Zhirui Wang
bf43671841 update macOS gcc@5 installation guide (#3003)
After installing ``gcc@5``, ``CMAKE_C_COMPILER`` will not be set to gcc-5 in some macOS environment automatically and the installation of xgboost will still fail. Manually setting the compiler will solve the problem.
2018-01-04 11:28:26 -08:00
Nan Zhu
14c6392381 [jvm-packages] add dev script to update version and update versions (#2998)
* add back train method but mark as deprecated

* add back train method but mark as deprecated

* fix scalastyle error

* fix scalastyle error

* add dev script to update version and update versions
2018-01-01 21:28:53 -08:00
Vadim Khotilovich
526801cdb3 [R] fix for the 32 bit windows issue (#2994)
* [R] disable thred_local for 32bit windows

* [R] require C++11 and GNU make in DESCRIPTION

* [R] enable 32+64 build and check in appveyor
2017-12-31 14:18:50 -08:00
139 changed files with 4557 additions and 1039 deletions

View File

@@ -198,7 +198,11 @@ endif
clean: clean:
$(RM) -rf build build_plugin lib bin *~ */*~ */*/*~ */*/*/*~ */*.o */*/*.o */*/*/*.o #xgboost $(RM) -rf build build_plugin lib bin *~ */*~ */*/*~ */*/*/*~ */*.o */*/*.o */*/*/*.o #xgboost
$(RM) -rf build_tests *.gcov tests/cpp/xgboost_test $(RM) -rf build_tests *.gcov tests/cpp/xgboost_test
cd R-package/src; $(RM) -rf rabit src include dmlc-core amalgamation *.so *.dll; cd $(ROOTDIR) if [ -d "R-package/src" ]; then \
cd R-package/src; \
$(RM) -rf rabit src include dmlc-core amalgamation *.so *.dll; \
cd $(ROOTDIR); \
fi
clean_all: clean clean_all: clean
cd $(DMLC_CORE); "$(MAKE)" clean; cd $(ROOTDIR) cd $(DMLC_CORE); "$(MAKE)" clean; cd $(ROOTDIR)
@@ -212,16 +216,28 @@ pypack: ${XGBOOST_DYLIB}
cp ${XGBOOST_DYLIB} python-package/xgboost cp ${XGBOOST_DYLIB} python-package/xgboost
cd python-package; tar cf xgboost.tar xgboost; cd .. cd python-package; tar cf xgboost.tar xgboost; cd ..
# create pip installation pack for PyPI # create pip source dist (sdist) pack for PyPI
pippack: clean_all pippack: clean_all
rm -rf xgboost-python rm -rf xgboost-python
# remove symlinked directories in python-package/xgboost
rm -rf python-package/xgboost/lib
rm -rf python-package/xgboost/dmlc-core
rm -rf python-package/xgboost/include
rm -rf python-package/xgboost/make
rm -rf python-package/xgboost/rabit
rm -rf python-package/xgboost/src
cp -r python-package xgboost-python cp -r python-package xgboost-python
cp -r Makefile xgboost-python/xgboost/ cp -r Makefile xgboost-python/xgboost/
cp -r make xgboost-python/xgboost/ cp -r make xgboost-python/xgboost/
cp -r src xgboost-python/xgboost/ cp -r src xgboost-python/xgboost/
cp -r tests xgboost-python/xgboost/
cp -r include xgboost-python/xgboost/ cp -r include xgboost-python/xgboost/
cp -r dmlc-core xgboost-python/xgboost/ cp -r dmlc-core xgboost-python/xgboost/
cp -r rabit xgboost-python/xgboost/ cp -r rabit xgboost-python/xgboost/
# Use setup_pip.py instead of setup.py
mv xgboost-python/setup_pip.py xgboost-python/setup.py
# Build sdist tarball
cd xgboost-python; python setup.py sdist; mv dist/*.tar.gz ..; cd ..
# Script to make a clean installable R package. # Script to make a clean installable R package.
Rpack: clean_all Rpack: clean_all

29
NEWS.md
View File

@@ -3,6 +3,35 @@ XGBoost Change Log
This file records the changes in xgboost library in reverse chronological order. This file records the changes in xgboost library in reverse chronological order.
## v0.71 (2018.04.11)
* This is a minor release, mainly motivated by issues concerning `pip install`, e.g. #2426, #3189, #3118, and #3194.
With this release, users of Linux and MacOS will be able to run `pip install` for the most part.
* Refactored linear booster class (`gblinear`), so as to support multiple coordinate descent updaters (#3103, #3134). See BREAKING CHANGES below.
* Fix slow training for multiclass classification with high number of classes (#3109)
* Fix a corner case in approximate quantile sketch (#3167). Applicable for 'hist' and 'gpu_hist' algorithms
* Fix memory leak in DMatrix (#3182)
* New functionality
- Better linear booster class (#3103, #3134)
- Pairwise SHAP interaction effects (#3043)
- Cox loss (#3043)
- AUC-PR metric for ranking task (#3172)
- Monotonic constraints for 'hist' algorithm (#3085)
* GPU support
- Create an abtract 1D vector class that moves data seamlessly between the main and GPU memory (#2935, #3116, #3068). This eliminates unnecessary PCIe data transfer during training time.
- Fix minor bugs (#3051, #3217)
- Fix compatibility error for CUDA 9.1 (#3218)
* Python package:
- Correctly handle parameter `verbose_eval=0` (#3115)
* R package:
- Eliminate segmentation fault on 32-bit Windows platform (#2994)
* JVM packages
- Fix a memory bug involving double-freeing Booster objects (#3005, #3011)
- Handle empty partition in predict (#3014)
- Update docs and unify terminology (#3024)
- Delete cache files after job finishes (#3022)
- Compatibility fixes for latest Spark versions (#3062, #3093)
* BREAKING CHANGES: Updated linear modelling algorithms. In particular L1/L2 regularisation penalties are now normalised to number of training examples. This makes the implementation consistent with sklearn/glmnet. L2 regularisation has also been removed from the intercept. To produce linear models with the old regularisation behaviour, the alpha/lambda regularisation parameters can be manually scaled by dividing them by the number of training examples.
## v0.7 (2017.12.30) ## v0.7 (2017.12.30)
* **This version represents a major change from the last release (v0.6), which was released one year and half ago.** * **This version represents a major change from the last release (v0.6), which was released one year and half ago.**
* Updated Sklearn API * Updated Sklearn API

View File

@@ -1,12 +1,21 @@
Package: xgboost Package: xgboost
Type: Package Type: Package
Title: Extreme Gradient Boosting Title: Extreme Gradient Boosting
Version: 0.6.4.8 Version: 0.71.1
Date: 2017-12-05 Date: 2018-04-11
Author: Tianqi Chen <tianqi.tchen@gmail.com>, Tong He <hetong007@gmail.com>, Authors@R: c(
Michael Benesty <michael@benesty.fr>, Vadim Khotilovich <khotilovich@gmail.com>, person("Tianqi", "Chen", role = c("aut"),
Yuan Tang <terrytangyuan@gmail.com> email = "tianqi.tchen@gmail.com"),
Maintainer: Tong He <hetong007@gmail.com> person("Tong", "He", role = c("aut", "cre"),
email = "hetong007@gmail.com"),
person("Michael", "Benesty", role = c("aut"),
email = "michael@benesty.fr"),
person("Vadim", "Khotilovich", role = c("aut"),
email = "khotilovich@gmail.com"),
person("Yuan", "Tang", role = c("aut"),
email = "terrytangyuan@gmail.com",
comment = c(ORCID = "0000-0001-5243-233X"))
)
Description: Extreme Gradient Boosting, which is an efficient implementation Description: Extreme Gradient Boosting, which is an efficient implementation
of the gradient boosting framework from Chen & Guestrin (2016) <doi:10.1145/2939672.2939785>. of the gradient boosting framework from Chen & Guestrin (2016) <doi:10.1145/2939672.2939785>.
This package is its R interface. The package includes efficient linear This package is its R interface. The package includes efficient linear
@@ -38,3 +47,4 @@ Imports:
magrittr (>= 1.5), magrittr (>= 1.5),
stringi (>= 0.5.2) stringi (>= 0.5.2)
RoxygenNote: 6.0.1 RoxygenNote: 6.0.1
SystemRequirements: GNU make, C++11

View File

@@ -18,6 +18,7 @@ export("xgb.parameters<-")
export(cb.cv.predict) export(cb.cv.predict)
export(cb.early.stop) export(cb.early.stop)
export(cb.evaluation.log) export(cb.evaluation.log)
export(cb.gblinear.history)
export(cb.print.evaluation) export(cb.print.evaluation)
export(cb.reset.parameters) export(cb.reset.parameters)
export(cb.save.model) export(cb.save.model)
@@ -32,6 +33,7 @@ export(xgb.attributes)
export(xgb.create.features) export(xgb.create.features)
export(xgb.cv) export(xgb.cv)
export(xgb.dump) export(xgb.dump)
export(xgb.gblinear.history)
export(xgb.ggplot.deepness) export(xgb.ggplot.deepness)
export(xgb.ggplot.importance) export(xgb.ggplot.importance)
export(xgb.importance) export(xgb.importance)
@@ -49,10 +51,11 @@ export(xgboost)
import(methods) import(methods)
importClassesFrom(Matrix,dgCMatrix) importClassesFrom(Matrix,dgCMatrix)
importClassesFrom(Matrix,dgeMatrix) importClassesFrom(Matrix,dgeMatrix)
importFrom(Matrix,cBind)
importFrom(Matrix,colSums) importFrom(Matrix,colSums)
importFrom(Matrix,sparse.model.matrix) importFrom(Matrix,sparse.model.matrix)
importFrom(Matrix,sparseMatrix)
importFrom(Matrix,sparseVector) importFrom(Matrix,sparseVector)
importFrom(Matrix,t)
importFrom(data.table,":=") importFrom(data.table,":=")
importFrom(data.table,as.data.table) importFrom(data.table,as.data.table)
importFrom(data.table,data.table) importFrom(data.table,data.table)

View File

@@ -524,6 +524,228 @@ cb.cv.predict <- function(save_models = FALSE) {
} }
#' Callback closure for collecting the model coefficients history of a gblinear booster
#' during its training.
#'
#' @param sparse when set to FALSE/TURE, a dense/sparse matrix is used to store the result.
#' Sparse format is useful when one expects only a subset of coefficients to be non-zero,
#' when using the "thrifty" feature selector with fairly small number of top features
#' selected per iteration.
#'
#' @details
#' To keep things fast and simple, gblinear booster does not internally store the history of linear
#' model coefficients at each boosting iteration. This callback provides a workaround for storing
#' the coefficients' path, by extracting them after each training iteration.
#'
#' Callback function expects the following values to be set in its calling frame:
#' \code{bst} (or \code{bst_folds}).
#'
#' @return
#' Results are stored in the \code{coefs} element of the closure.
#' The \code{\link{xgb.gblinear.history}} convenience function provides an easy way to access it.
#' With \code{xgb.train}, it is either a dense of a sparse matrix.
#' While with \code{xgb.cv}, it is a list (an element per each fold) of such matrices.
#'
#' @seealso
#' \code{\link{callbacks}}, \code{\link{xgb.gblinear.history}}.
#'
#' @examples
#' #### Binary classification:
#' #
#' # In the iris dataset, it is hard to linearly separate Versicolor class from the rest
#' # without considering the 2nd order interactions:
#' require(magrittr)
#' x <- model.matrix(Species ~ .^2, iris)[,-1]
#' colnames(x)
#' dtrain <- xgb.DMatrix(scale(x), label = 1*(iris$Species == "versicolor"))
#' param <- list(booster = "gblinear", objective = "reg:logistic", eval_metric = "auc",
#' lambda = 0.0003, alpha = 0.0003, nthread = 2)
#' # For 'shotgun', which is a default linear updater, using high eta values may result in
#' # unstable behaviour in some datasets. With this simple dataset, however, the high learning
#' # rate does not break the convergence, but allows us to illustrate the typical pattern of
#' # "stochastic explosion" behaviour of this lock-free algorithm at early boosting iterations.
#' bst <- xgb.train(param, dtrain, list(tr=dtrain), nrounds = 200, eta = 1.,
#' callbacks = list(cb.gblinear.history()))
#' # Extract the coefficients' path and plot them vs boosting iteration number:
#' coef_path <- xgb.gblinear.history(bst)
#' matplot(coef_path, type = 'l')
#'
#' # With the deterministic coordinate descent updater, it is safer to use higher learning rates.
#' # Will try the classical componentwise boosting which selects a single best feature per round:
#' bst <- xgb.train(param, dtrain, list(tr=dtrain), nrounds = 200, eta = 0.8,
#' updater = 'coord_descent', feature_selector = 'thrifty', top_k = 1,
#' callbacks = list(cb.gblinear.history()))
#' xgb.gblinear.history(bst) %>% matplot(type = 'l')
#' # Componentwise boosting is known to have similar effect to Lasso regularization.
#' # Try experimenting with various values of top_k, eta, nrounds,
#' # as well as different feature_selectors.
#'
#' # For xgb.cv:
#' bst <- xgb.cv(param, dtrain, nfold = 5, nrounds = 100, eta = 0.8,
#' callbacks = list(cb.gblinear.history()))
#' # coefficients in the CV fold #3
#' xgb.gblinear.history(bst)[[3]] %>% matplot(type = 'l')
#'
#'
#' #### Multiclass classification:
#' #
#' dtrain <- xgb.DMatrix(scale(x), label = as.numeric(iris$Species) - 1)
#' param <- list(booster = "gblinear", objective = "multi:softprob", num_class = 3,
#' lambda = 0.0003, alpha = 0.0003, nthread = 2)
#' # For the default linear updater 'shotgun' it sometimes is helpful
#' # to use smaller eta to reduce instability
#' bst <- xgb.train(param, dtrain, list(tr=dtrain), nrounds = 70, eta = 0.5,
#' callbacks = list(cb.gblinear.history()))
#' # Will plot the coefficient paths separately for each class:
#' xgb.gblinear.history(bst, class_index = 0) %>% matplot(type = 'l')
#' xgb.gblinear.history(bst, class_index = 1) %>% matplot(type = 'l')
#' xgb.gblinear.history(bst, class_index = 2) %>% matplot(type = 'l')
#'
#' # CV:
#' bst <- xgb.cv(param, dtrain, nfold = 5, nrounds = 70, eta = 0.5,
#' callbacks = list(cb.gblinear.history(FALSE)))
#' # 1st forld of 1st class
#' xgb.gblinear.history(bst, class_index = 0)[[1]] %>% matplot(type = 'l')
#'
#' @export
cb.gblinear.history <- function(sparse=FALSE) {
coefs <- NULL
init <- function(env) {
if (!is.null(env$bst)) { # xgb.train:
coef_path <- list()
} else if (!is.null(env$bst_folds)) { # xgb.cv:
coef_path <- rep(list(), length(env$bst_folds))
} else stop("Parent frame has neither 'bst' nor 'bst_folds'")
}
# convert from list to (sparse) matrix
list2mat <- function(coef_list) {
if (sparse) {
coef_mat <- sparseMatrix(x = unlist(lapply(coef_list, slot, "x")),
i = unlist(lapply(coef_list, slot, "i")),
p = c(0, cumsum(sapply(coef_list, function(x) length(x@x)))),
dims = c(length(coef_list[[1]]), length(coef_list)))
return(t(coef_mat))
} else {
return(do.call(rbind, coef_list))
}
}
finalizer <- function(env) {
if (length(coefs) == 0)
return()
if (!is.null(env$bst)) { # # xgb.train:
coefs <<- list2mat(coefs)
} else { # xgb.cv:
# first lapply transposes the list
coefs <<- lapply(seq_along(coefs[[1]]), function(i) lapply(coefs, "[[", i)) %>%
lapply(function(x) list2mat(x))
}
}
extract.coef <- function(env) {
if (!is.null(env$bst)) { # # xgb.train:
cf <- as.numeric(grep('(booster|bias|weigh)', xgb.dump(env$bst), invert = TRUE, value = TRUE))
if (sparse) cf <- as(cf, "sparseVector")
} else { # xgb.cv:
cf <- vector("list", length(env$bst_folds))
for (i in seq_along(env$bst_folds)) {
dmp <- xgb.dump(xgb.handleToBooster(env$bst_folds[[i]]$bst))
cf[[i]] <- as.numeric(grep('(booster|bias|weigh)', dmp, invert = TRUE, value = TRUE))
if (sparse) cf[[i]] <- as(cf[[i]], "sparseVector")
}
}
cf
}
callback <- function(env = parent.frame(), finalize = FALSE) {
if (is.null(coefs)) init(env)
if (finalize) return(finalizer(env))
cf <- extract.coef(env)
coefs <<- c(coefs, list(cf))
}
attr(callback, 'call') <- match.call()
attr(callback, 'name') <- 'cb.gblinear.history'
callback
}
#' Extract gblinear coefficients history.
#'
#' A helper function to extract the matrix of linear coefficients' history
#' from a gblinear model created while using the \code{cb.gblinear.history()}
#' callback.
#'
#' @param model either an \code{xgb.Booster} or a result of \code{xgb.cv()}, trained
#' using the \code{cb.gblinear.history()} callback.
#' @param class_index zero-based class index to extract the coefficients for only that
#' specific class in a multinomial multiclass model. When it is NULL, all the
#' coeffients are returned. Has no effect in non-multiclass models.
#'
#' @return
#' For an \code{xgb.train} result, a matrix (either dense or sparse) with the columns
#' corresponding to iteration's coefficients (in the order as \code{xgb.dump()} would
#' return) and the rows corresponding to boosting iterations.
#'
#' For an \code{xgb.cv} result, a list of such matrices is returned with the elements
#' corresponding to CV folds.
#'
#' @examples
#' \dontrun{
#' See \code{\link{cv.gblinear.history}}
#' }
#'
#' @export
xgb.gblinear.history <- function(model, class_index = NULL) {
if (!(inherits(model, "xgb.Booster") ||
inherits(model, "xgb.cv.synchronous")))
stop("model must be an object of either xgb.Booster or xgb.cv.synchronous class")
is_cv <- inherits(model, "xgb.cv.synchronous")
if (is.null(model[["callbacks"]]) || is.null(model$callbacks[["cb.gblinear.history"]]))
stop("model must be trained while using the cb.gblinear.history() callback")
if (!is_cv) {
# extract num_class & num_feat from the internal model
dmp <- xgb.dump(model)
if(length(dmp) < 2 || dmp[2] != "bias:")
stop("It does not appear to be a gblinear model")
dmp <- dmp[-c(1,2)]
n <- which(dmp == 'weight:')
if(length(n) != 1)
stop("It does not appear to be a gblinear model")
num_class <- n - 1
num_feat <- (length(dmp) - 4) / num_class
} else {
# in case of CV, the object is expected to have this info
if (model$params$booster != "gblinear")
stop("It does not appear to be a gblinear model")
num_class <- NVL(model$params$num_class, 1)
num_feat <- model$nfeatures
if (is.null(num_feat))
stop("This xgb.cv result does not have nfeatures info")
}
if (!is.null(class_index) &&
num_class > 1 &&
(class_index[1] < 0 || class_index[1] >= num_class))
stop("class_index has to be within [0,", num_class - 1, "]")
coef_path <- environment(model$callbacks$cb.gblinear.history)[["coefs"]]
if (!is.null(class_index) && num_class > 1) {
coef_path <- if (is.list(coef_path)) {
lapply(coef_path,
function(x) x[, seq(1 + class_index, by=num_class, length.out=num_feat)])
} else {
coef_path <- coef_path[, seq(1 + class_index, by=num_class, length.out=num_feat)]
}
}
coef_path
}
# #
# Internal utility functions for callbacks ------------------------------------ # Internal utility functions for callbacks ------------------------------------
# #

View File

@@ -83,5 +83,5 @@ xgb.create.features <- function(model, data, ...){
check.deprecation(...) check.deprecation(...)
pred_with_leaf <- predict(model, data, predleaf = TRUE) pred_with_leaf <- predict(model, data, predleaf = TRUE)
cols <- lapply(as.data.frame(pred_with_leaf), factor) cols <- lapply(as.data.frame(pred_with_leaf), factor)
cBind(data, sparse.model.matrix( ~ . -1, cols)) cbind(data, sparse.model.matrix( ~ . -1, cols))
} }

View File

@@ -34,6 +34,7 @@
#' \item \code{rmse} Rooted mean square error #' \item \code{rmse} Rooted mean square error
#' \item \code{logloss} negative log-likelihood function #' \item \code{logloss} negative log-likelihood function
#' \item \code{auc} Area under curve #' \item \code{auc} Area under curve
#' \item \code{aucpr} Area under PR curve
#' \item \code{merror} Exact matching error, used to evaluate multi-class classification #' \item \code{merror} Exact matching error, used to evaluate multi-class classification
#' } #' }
#' @param obj customized objective function. Returns gradient and second order #' @param obj customized objective function. Returns gradient and second order
@@ -88,6 +89,7 @@
#' CV-based evaluation means and standard deviations for the training and test CV-sets. #' CV-based evaluation means and standard deviations for the training and test CV-sets.
#' It is created by the \code{\link{cb.evaluation.log}} callback. #' It is created by the \code{\link{cb.evaluation.log}} callback.
#' \item \code{niter} number of boosting iterations. #' \item \code{niter} number of boosting iterations.
#' \item \code{nfeatures} number of features in training data.
#' \item \code{folds} the list of CV folds' indices - either those passed through the \code{folds} #' \item \code{folds} the list of CV folds' indices - either those passed through the \code{folds}
#' parameter or randomly generated. #' parameter or randomly generated.
#' \item \code{best_iteration} iteration number with the best evaluation metric value #' \item \code{best_iteration} iteration number with the best evaluation metric value
@@ -184,6 +186,7 @@ xgb.cv <- function(params=list(), data, nrounds, nfold, label = NULL, missing =
handle <- xgb.Booster.handle(params, list(dtrain, dtest)) handle <- xgb.Booster.handle(params, list(dtrain, dtest))
list(dtrain = dtrain, bst = handle, watchlist = list(train = dtrain, test=dtest), index = folds[[k]]) list(dtrain = dtrain, bst = handle, watchlist = list(train = dtrain, test=dtest), index = folds[[k]])
}) })
rm(dall)
# a "basket" to collect some results from callbacks # a "basket" to collect some results from callbacks
basket <- list() basket <- list()
@@ -221,6 +224,7 @@ xgb.cv <- function(params=list(), data, nrounds, nfold, label = NULL, missing =
callbacks = callbacks, callbacks = callbacks,
evaluation_log = evaluation_log, evaluation_log = evaluation_log,
niter = end_iteration, niter = end_iteration,
nfeatures = ncol(data),
folds = folds folds = folds
) )
ret <- c(ret, basket) ret <- c(ret, basket)

View File

@@ -121,12 +121,13 @@
#' \itemize{ #' \itemize{
#' \item \code{rmse} root mean square error. \url{http://en.wikipedia.org/wiki/Root_mean_square_error} #' \item \code{rmse} root mean square error. \url{http://en.wikipedia.org/wiki/Root_mean_square_error}
#' \item \code{logloss} negative log-likelihood. \url{http://en.wikipedia.org/wiki/Log-likelihood} #' \item \code{logloss} negative log-likelihood. \url{http://en.wikipedia.org/wiki/Log-likelihood}
#' \item \code{mlogloss} multiclass logloss. \url{https://www.kaggle.com/wiki/MultiClassLogLoss/} #' \item \code{mlogloss} multiclass logloss. \url{http://wiki.fast.ai/index.php/Log_Loss}
#' \item \code{error} Binary classification error rate. It is calculated as \code{(# wrong cases) / (# all cases)}. #' \item \code{error} Binary classification error rate. It is calculated as \code{(# wrong cases) / (# all cases)}.
#' By default, it uses the 0.5 threshold for predicted values to define negative and positive instances. #' By default, it uses the 0.5 threshold for predicted values to define negative and positive instances.
#' Different threshold (e.g., 0.) could be specified as "error@0." #' Different threshold (e.g., 0.) could be specified as "error@0."
#' \item \code{merror} Multiclass classification error rate. It is calculated as \code{(# wrong cases) / (# all cases)}. #' \item \code{merror} Multiclass classification error rate. It is calculated as \code{(# wrong cases) / (# all cases)}.
#' \item \code{auc} Area under the curve. \url{http://en.wikipedia.org/wiki/Receiver_operating_characteristic#'Area_under_curve} for ranking evaluation. #' \item \code{auc} Area under the curve. \url{http://en.wikipedia.org/wiki/Receiver_operating_characteristic#'Area_under_curve} for ranking evaluation.
#' \item \code{aucpr} Area under the PR curve. \url{https://en.wikipedia.org/wiki/Precision_and_recall} for ranking evaluation.
#' \item \code{ndcg} Normalized Discounted Cumulative Gain (for ranking task). \url{http://en.wikipedia.org/wiki/NDCG} #' \item \code{ndcg} Normalized Discounted Cumulative Gain (for ranking task). \url{http://en.wikipedia.org/wiki/NDCG}
#' } #' }
#' #'
@@ -162,6 +163,7 @@
#' (only available with early stopping). #' (only available with early stopping).
#' \item \code{feature_names} names of the training dataset features #' \item \code{feature_names} names of the training dataset features
#' (only when comun names were defined in training data). #' (only when comun names were defined in training data).
#' \item \code{nfeatures} number of features in training data.
#' } #' }
#' #'
#' @seealso #' @seealso
@@ -351,8 +353,8 @@ xgb.train <- function(params = list(), data, nrounds, watchlist = list(),
if (inherits(xgb_model, 'xgb.Booster') && if (inherits(xgb_model, 'xgb.Booster') &&
!is_update && !is_update &&
!is.null(xgb_model$evaluation_log) && !is.null(xgb_model$evaluation_log) &&
all.equal(colnames(evaluation_log), isTRUE(all.equal(colnames(evaluation_log),
colnames(xgb_model$evaluation_log))) { colnames(xgb_model$evaluation_log)))) {
evaluation_log <- rbindlist(list(xgb_model$evaluation_log, evaluation_log)) evaluation_log <- rbindlist(list(xgb_model$evaluation_log, evaluation_log))
} }
bst$evaluation_log <- evaluation_log bst$evaluation_log <- evaluation_log
@@ -363,6 +365,7 @@ xgb.train <- function(params = list(), data, nrounds, watchlist = list(),
bst$callbacks <- callbacks bst$callbacks <- callbacks
if (!is.null(colnames(dtrain))) if (!is.null(colnames(dtrain)))
bst$feature_names <- colnames(dtrain) bst$feature_names <- colnames(dtrain)
bst$nfeatures <- ncol(dtrain)
return(bst) return(bst)
} }

View File

@@ -77,10 +77,11 @@ NULL
# Various imports # Various imports
#' @importClassesFrom Matrix dgCMatrix dgeMatrix #' @importClassesFrom Matrix dgCMatrix dgeMatrix
#' @importFrom Matrix cBind
#' @importFrom Matrix colSums #' @importFrom Matrix colSums
#' @importFrom Matrix sparse.model.matrix #' @importFrom Matrix sparse.model.matrix
#' @importFrom Matrix sparseVector #' @importFrom Matrix sparseVector
#' @importFrom Matrix sparseMatrix
#' @importFrom Matrix t
#' @importFrom data.table data.table #' @importFrom data.table data.table
#' @importFrom data.table is.data.table #' @importFrom data.table is.data.table
#' @importFrom data.table as.data.table #' @importFrom data.table as.data.table

0
R-package/configure.win Normal file
View File

View File

@@ -32,7 +32,7 @@ create.new.tree.features <- function(model, original.features){
leaf.id <- sort(unique(pred_with_leaf[,i])) leaf.id <- sort(unique(pred_with_leaf[,i]))
cols[[i]] <- factor(x = pred_with_leaf[,i], level = leaf.id) cols[[i]] <- factor(x = pred_with_leaf[,i], level = leaf.id)
} }
cBind(original.features, sparse.model.matrix( ~ . -1, as.data.frame(cols))) cbind(original.features, sparse.model.matrix( ~ . -1, as.data.frame(cols)))
} }
# Convert previous features to one hot encoding # Convert previous features to one hot encoding

View File

@@ -0,0 +1,95 @@
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/callbacks.R
\name{cb.gblinear.history}
\alias{cb.gblinear.history}
\title{Callback closure for collecting the model coefficients history of a gblinear booster
during its training.}
\usage{
cb.gblinear.history(sparse = FALSE)
}
\arguments{
\item{sparse}{when set to FALSE/TURE, a dense/sparse matrix is used to store the result.
Sparse format is useful when one expects only a subset of coefficients to be non-zero,
when using the "thrifty" feature selector with fairly small number of top features
selected per iteration.}
}
\value{
Results are stored in the \code{coefs} element of the closure.
The \code{\link{xgb.gblinear.history}} convenience function provides an easy way to access it.
With \code{xgb.train}, it is either a dense of a sparse matrix.
While with \code{xgb.cv}, it is a list (an element per each fold) of such matrices.
}
\description{
Callback closure for collecting the model coefficients history of a gblinear booster
during its training.
}
\details{
To keep things fast and simple, gblinear booster does not internally store the history of linear
model coefficients at each boosting iteration. This callback provides a workaround for storing
the coefficients' path, by extracting them after each training iteration.
Callback function expects the following values to be set in its calling frame:
\code{bst} (or \code{bst_folds}).
}
\examples{
#### Binary classification:
#
# In the iris dataset, it is hard to linearly separate Versicolor class from the rest
# without considering the 2nd order interactions:
require(magrittr)
x <- model.matrix(Species ~ .^2, iris)[,-1]
colnames(x)
dtrain <- xgb.DMatrix(scale(x), label = 1*(iris$Species == "versicolor"))
param <- list(booster = "gblinear", objective = "reg:logistic", eval_metric = "auc",
lambda = 0.0003, alpha = 0.0003, nthread = 2)
# For 'shotgun', which is a default linear updater, using high eta values may result in
# unstable behaviour in some datasets. With this simple dataset, however, the high learning
# rate does not break the convergence, but allows us to illustrate the typical pattern of
# "stochastic explosion" behaviour of this lock-free algorithm at early boosting iterations.
bst <- xgb.train(param, dtrain, list(tr=dtrain), nrounds = 200, eta = 1.,
callbacks = list(cb.gblinear.history()))
# Extract the coefficients' path and plot them vs boosting iteration number:
coef_path <- xgb.gblinear.history(bst)
matplot(coef_path, type = 'l')
# With the deterministic coordinate descent updater, it is safer to use higher learning rates.
# Will try the classical componentwise boosting which selects a single best feature per round:
bst <- xgb.train(param, dtrain, list(tr=dtrain), nrounds = 200, eta = 0.8,
updater = 'coord_descent', feature_selector = 'thrifty', top_k = 1,
callbacks = list(cb.gblinear.history()))
xgb.gblinear.history(bst) \%>\% matplot(type = 'l')
# Componentwise boosting is known to have similar effect to Lasso regularization.
# Try experimenting with various values of top_k, eta, nrounds,
# as well as different feature_selectors.
# For xgb.cv:
bst <- xgb.cv(param, dtrain, nfold = 5, nrounds = 100, eta = 0.8,
callbacks = list(cb.gblinear.history()))
# coefficients in the CV fold #3
xgb.gblinear.history(bst)[[3]] \%>\% matplot(type = 'l')
#### Multiclass classification:
#
dtrain <- xgb.DMatrix(scale(x), label = as.numeric(iris$Species) - 1)
param <- list(booster = "gblinear", objective = "multi:softprob", num_class = 3,
lambda = 0.0003, alpha = 0.0003, nthread = 2)
# For the default linear updater 'shotgun' it sometimes is helpful
# to use smaller eta to reduce instability
bst <- xgb.train(param, dtrain, list(tr=dtrain), nrounds = 70, eta = 0.5,
callbacks = list(cb.gblinear.history()))
# Will plot the coefficient paths separately for each class:
xgb.gblinear.history(bst, class_index = 0) \%>\% matplot(type = 'l')
xgb.gblinear.history(bst, class_index = 1) \%>\% matplot(type = 'l')
xgb.gblinear.history(bst, class_index = 2) \%>\% matplot(type = 'l')
# CV:
bst <- xgb.cv(param, dtrain, nfold = 5, nrounds = 70, eta = 0.5,
callbacks = list(cb.gblinear.history(FALSE)))
# 1st forld of 1st class
xgb.gblinear.history(bst, class_index = 0)[[1]] \%>\% matplot(type = 'l')
}
\seealso{
\code{\link{callbacks}}, \code{\link{xgb.gblinear.history}}.
}

View File

@@ -51,6 +51,7 @@ from each CV model. This parameter engages the \code{\link{cb.cv.predict}} callb
\item \code{rmse} Rooted mean square error \item \code{rmse} Rooted mean square error
\item \code{logloss} negative log-likelihood function \item \code{logloss} negative log-likelihood function
\item \code{auc} Area under curve \item \code{auc} Area under curve
\item \code{aucpr} Area under PR curve
\item \code{merror} Exact matching error, used to evaluate multi-class classification \item \code{merror} Exact matching error, used to evaluate multi-class classification
}} }}
@@ -104,6 +105,7 @@ An object of class \code{xgb.cv.synchronous} with the following elements:
CV-based evaluation means and standard deviations for the training and test CV-sets. CV-based evaluation means and standard deviations for the training and test CV-sets.
It is created by the \code{\link{cb.evaluation.log}} callback. It is created by the \code{\link{cb.evaluation.log}} callback.
\item \code{niter} number of boosting iterations. \item \code{niter} number of boosting iterations.
\item \code{nfeatures} number of features in training data.
\item \code{folds} the list of CV folds' indices - either those passed through the \code{folds} \item \code{folds} the list of CV folds' indices - either those passed through the \code{folds}
parameter or randomly generated. parameter or randomly generated.
\item \code{best_iteration} iteration number with the best evaluation metric value \item \code{best_iteration} iteration number with the best evaluation metric value

View File

@@ -0,0 +1,35 @@
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/callbacks.R
\name{xgb.gblinear.history}
\alias{xgb.gblinear.history}
\title{Extract gblinear coefficients history.}
\usage{
xgb.gblinear.history(model, class_index = NULL)
}
\arguments{
\item{model}{either an \code{xgb.Booster} or a result of \code{xgb.cv()}, trained
using the \code{cb.gblinear.history()} callback.}
\item{class_index}{zero-based class index to extract the coefficients for only that
specific class in a multinomial multiclass model. When it is NULL, all the
coeffients are returned. Has no effect in non-multiclass models.}
}
\value{
For an \code{xgb.train} result, a matrix (either dense or sparse) with the columns
corresponding to iteration's coefficients (in the order as \code{xgb.dump()} would
return) and the rows corresponding to boosting iterations.
For an \code{xgb.cv} result, a list of such matrices is returned with the elements
corresponding to CV folds.
}
\description{
A helper function to extract the matrix of linear coefficients' history
from a gblinear model created while using the \code{cb.gblinear.history()}
callback.
}
\examples{
\dontrun{
See \\code{\\link{cv.gblinear.history}}
}
}

View File

@@ -155,6 +155,7 @@ An object of class \code{xgb.Booster} with the following elements:
(only available with early stopping). (only available with early stopping).
\item \code{feature_names} names of the training dataset features \item \code{feature_names} names of the training dataset features
(only when comun names were defined in training data). (only when comun names were defined in training data).
\item \code{nfeatures} number of features in training data.
} }
} }
\description{ \description{
@@ -179,12 +180,13 @@ The folloiwing is the list of built-in metrics for which Xgboost provides optimi
\itemize{ \itemize{
\item \code{rmse} root mean square error. \url{http://en.wikipedia.org/wiki/Root_mean_square_error} \item \code{rmse} root mean square error. \url{http://en.wikipedia.org/wiki/Root_mean_square_error}
\item \code{logloss} negative log-likelihood. \url{http://en.wikipedia.org/wiki/Log-likelihood} \item \code{logloss} negative log-likelihood. \url{http://en.wikipedia.org/wiki/Log-likelihood}
\item \code{mlogloss} multiclass logloss. \url{https://www.kaggle.com/wiki/MultiClassLogLoss/} \item \code{mlogloss} multiclass logloss. \url{http://wiki.fast.ai/index.php/Log_Loss}
\item \code{error} Binary classification error rate. It is calculated as \code{(# wrong cases) / (# all cases)}. \item \code{error} Binary classification error rate. It is calculated as \code{(# wrong cases) / (# all cases)}.
By default, it uses the 0.5 threshold for predicted values to define negative and positive instances. By default, it uses the 0.5 threshold for predicted values to define negative and positive instances.
Different threshold (e.g., 0.) could be specified as "error@0." Different threshold (e.g., 0.) could be specified as "error@0."
\item \code{merror} Multiclass classification error rate. It is calculated as \code{(# wrong cases) / (# all cases)}. \item \code{merror} Multiclass classification error rate. It is calculated as \code{(# wrong cases) / (# all cases)}.
\item \code{auc} Area under the curve. \url{http://en.wikipedia.org/wiki/Receiver_operating_characteristic#'Area_under_curve} for ranking evaluation. \item \code{auc} Area under the curve. \url{http://en.wikipedia.org/wiki/Receiver_operating_characteristic#'Area_under_curve} for ranking evaluation.
\item \code{aucpr} Area under the PR curve. \url{https://en.wikipedia.org/wiki/Precision_and_recall} for ranking evaluation.
\item \code{ndcg} Normalized Discounted Cumulative Gain (for ranking task). \url{http://en.wikipedia.org/wiki/NDCG} \item \code{ndcg} Normalized Discounted Cumulative Gain (for ranking task). \url{http://en.wikipedia.org/wiki/NDCG}
} }

View File

@@ -10,6 +10,12 @@ XGB_RFLAGS = -DXGBOOST_STRICT_R_MODE=1 -DDMLC_LOG_BEFORE_THROW=0\
-DDMLC_LOG_CUSTOMIZE=1 -DXGBOOST_CUSTOMIZE_LOGGER=1\ -DDMLC_LOG_CUSTOMIZE=1 -DXGBOOST_CUSTOMIZE_LOGGER=1\
-DRABIT_CUSTOMIZE_MSG_ -DRABIT_STRICT_CXX98_ -DRABIT_CUSTOMIZE_MSG_ -DRABIT_STRICT_CXX98_
# disable the use of thread_local for 32 bit windows:
ifeq ($(R_OSTYPE)$(WIN),windows)
XGB_RFLAGS += -DDMLC_CXX11_THREAD_LOCAL=0
endif
$(foreach v, $(XGB_RFLAGS), $(warning $(v)))
PKG_CPPFLAGS= -I$(PKGROOT)/include -I$(PKGROOT)/dmlc-core/include -I$(PKGROOT)/rabit/include -I$(PKGROOT) $(XGB_RFLAGS) PKG_CPPFLAGS= -I$(PKGROOT)/include -I$(PKGROOT)/dmlc-core/include -I$(PKGROOT)/rabit/include -I$(PKGROOT) $(XGB_RFLAGS)
PKG_CXXFLAGS= @OPENMP_CXXFLAGS@ $(SHLIB_PTHREAD_FLAGS) PKG_CXXFLAGS= @OPENMP_CXXFLAGS@ $(SHLIB_PTHREAD_FLAGS)
PKG_LIBS = @OPENMP_CXXFLAGS@ $(SHLIB_PTHREAD_FLAGS) PKG_LIBS = @OPENMP_CXXFLAGS@ $(SHLIB_PTHREAD_FLAGS)

View File

@@ -4,7 +4,7 @@ ENABLE_STD_THREAD=0
# _*_ mode: Makefile; _*_ # _*_ mode: Makefile; _*_
# This file is only used for windows compilation from github # This file is only used for windows compilation from github
# It will be replaced by Makevars in CRAN version # It will be replaced with Makevars.in for the CRAN version
.PHONY: all xgblib .PHONY: all xgblib
all: $(SHLIB) all: $(SHLIB)
$(SHLIB): xgblib $(SHLIB): xgblib
@@ -22,6 +22,12 @@ XGB_RFLAGS = -DXGBOOST_STRICT_R_MODE=1 -DDMLC_LOG_BEFORE_THROW=0\
-DDMLC_LOG_CUSTOMIZE=1 -DXGBOOST_CUSTOMIZE_LOGGER=1\ -DDMLC_LOG_CUSTOMIZE=1 -DXGBOOST_CUSTOMIZE_LOGGER=1\
-DRABIT_CUSTOMIZE_MSG_ -DRABIT_STRICT_CXX98_ -DRABIT_CUSTOMIZE_MSG_ -DRABIT_STRICT_CXX98_
# disable the use of thread_local for 32 bit windows:
ifeq ($(R_OSTYPE)$(WIN),windows)
XGB_RFLAGS += -DDMLC_CXX11_THREAD_LOCAL=0
endif
$(foreach v, $(XGB_RFLAGS), $(warning $(v)))
PKG_CPPFLAGS= -I$(PKGROOT)/include -I$(PKGROOT)/dmlc-core/include -I$(PKGROOT)/rabit/include -I$(PKGROOT) $(XGB_RFLAGS) PKG_CPPFLAGS= -I$(PKGROOT)/include -I$(PKGROOT)/dmlc-core/include -I$(PKGROOT)/rabit/include -I$(PKGROOT) $(XGB_RFLAGS)
PKG_CXXFLAGS= $(SHLIB_OPENMP_CFLAGS) $(SHLIB_PTHREAD_FLAGS) PKG_CXXFLAGS= $(SHLIB_OPENMP_CFLAGS) $(SHLIB_PTHREAD_FLAGS)
PKG_LIBS = $(SHLIB_OPENMP_CFLAGS) $(SHLIB_PTHREAD_FLAGS) PKG_LIBS = $(SHLIB_OPENMP_CFLAGS) $(SHLIB_PTHREAD_FLAGS)

View File

@@ -19,10 +19,10 @@ extern SEXP XGBoosterBoostOneIter_R(SEXP, SEXP, SEXP, SEXP);
extern SEXP XGBoosterCreate_R(SEXP); extern SEXP XGBoosterCreate_R(SEXP);
extern SEXP XGBoosterDumpModel_R(SEXP, SEXP, SEXP, SEXP); extern SEXP XGBoosterDumpModel_R(SEXP, SEXP, SEXP, SEXP);
extern SEXP XGBoosterEvalOneIter_R(SEXP, SEXP, SEXP, SEXP); extern SEXP XGBoosterEvalOneIter_R(SEXP, SEXP, SEXP, SEXP);
extern SEXP XGBoosterGetAttr_R(SEXP, SEXP);
extern SEXP XGBoosterGetAttrNames_R(SEXP); extern SEXP XGBoosterGetAttrNames_R(SEXP);
extern SEXP XGBoosterLoadModel_R(SEXP, SEXP); extern SEXP XGBoosterGetAttr_R(SEXP, SEXP);
extern SEXP XGBoosterLoadModelFromRaw_R(SEXP, SEXP); extern SEXP XGBoosterLoadModelFromRaw_R(SEXP, SEXP);
extern SEXP XGBoosterLoadModel_R(SEXP, SEXP);
extern SEXP XGBoosterModelToRaw_R(SEXP); extern SEXP XGBoosterModelToRaw_R(SEXP);
extern SEXP XGBoosterPredict_R(SEXP, SEXP, SEXP, SEXP); extern SEXP XGBoosterPredict_R(SEXP, SEXP, SEXP, SEXP);
extern SEXP XGBoosterSaveModel_R(SEXP, SEXP); extern SEXP XGBoosterSaveModel_R(SEXP, SEXP);
@@ -45,10 +45,10 @@ static const R_CallMethodDef CallEntries[] = {
{"XGBoosterCreate_R", (DL_FUNC) &XGBoosterCreate_R, 1}, {"XGBoosterCreate_R", (DL_FUNC) &XGBoosterCreate_R, 1},
{"XGBoosterDumpModel_R", (DL_FUNC) &XGBoosterDumpModel_R, 4}, {"XGBoosterDumpModel_R", (DL_FUNC) &XGBoosterDumpModel_R, 4},
{"XGBoosterEvalOneIter_R", (DL_FUNC) &XGBoosterEvalOneIter_R, 4}, {"XGBoosterEvalOneIter_R", (DL_FUNC) &XGBoosterEvalOneIter_R, 4},
{"XGBoosterGetAttr_R", (DL_FUNC) &XGBoosterGetAttr_R, 2},
{"XGBoosterGetAttrNames_R", (DL_FUNC) &XGBoosterGetAttrNames_R, 1}, {"XGBoosterGetAttrNames_R", (DL_FUNC) &XGBoosterGetAttrNames_R, 1},
{"XGBoosterLoadModel_R", (DL_FUNC) &XGBoosterLoadModel_R, 2}, {"XGBoosterGetAttr_R", (DL_FUNC) &XGBoosterGetAttr_R, 2},
{"XGBoosterLoadModelFromRaw_R", (DL_FUNC) &XGBoosterLoadModelFromRaw_R, 2}, {"XGBoosterLoadModelFromRaw_R", (DL_FUNC) &XGBoosterLoadModelFromRaw_R, 2},
{"XGBoosterLoadModel_R", (DL_FUNC) &XGBoosterLoadModel_R, 2},
{"XGBoosterModelToRaw_R", (DL_FUNC) &XGBoosterModelToRaw_R, 1}, {"XGBoosterModelToRaw_R", (DL_FUNC) &XGBoosterModelToRaw_R, 1},
{"XGBoosterPredict_R", (DL_FUNC) &XGBoosterPredict_R, 4}, {"XGBoosterPredict_R", (DL_FUNC) &XGBoosterPredict_R, 4},
{"XGBoosterSaveModel_R", (DL_FUNC) &XGBoosterSaveModel_R, 2}, {"XGBoosterSaveModel_R", (DL_FUNC) &XGBoosterSaveModel_R, 2},

View File

@@ -11,6 +11,7 @@ set.seed(1994)
# disable some tests for Win32 # disable some tests for Win32
windows_flag = .Platform$OS.type == "windows" && windows_flag = .Platform$OS.type == "windows" &&
.Machine$sizeof.pointer != 8 .Machine$sizeof.pointer != 8
solaris_flag = (Sys.info()['sysname'] == "SunOS")
test_that("train and predict binary classification", { test_that("train and predict binary classification", {
nrounds = 2 nrounds = 2
@@ -152,20 +153,20 @@ test_that("training continuation works", {
bst1 <- xgb.train(param, dtrain, nrounds = 2, watchlist, verbose = 0) bst1 <- xgb.train(param, dtrain, nrounds = 2, watchlist, verbose = 0)
# continue for two more: # continue for two more:
bst2 <- xgb.train(param, dtrain, nrounds = 2, watchlist, verbose = 0, xgb_model = bst1) bst2 <- xgb.train(param, dtrain, nrounds = 2, watchlist, verbose = 0, xgb_model = bst1)
if (!windows_flag) if (!windows_flag && !solaris_flag)
expect_equal(bst$raw, bst2$raw) expect_equal(bst$raw, bst2$raw)
expect_false(is.null(bst2$evaluation_log)) expect_false(is.null(bst2$evaluation_log))
expect_equal(dim(bst2$evaluation_log), c(4, 2)) expect_equal(dim(bst2$evaluation_log), c(4, 2))
expect_equal(bst2$evaluation_log, bst$evaluation_log) expect_equal(bst2$evaluation_log, bst$evaluation_log)
# test continuing from raw model data # test continuing from raw model data
bst2 <- xgb.train(param, dtrain, nrounds = 2, watchlist, verbose = 0, xgb_model = bst1$raw) bst2 <- xgb.train(param, dtrain, nrounds = 2, watchlist, verbose = 0, xgb_model = bst1$raw)
if (!windows_flag) if (!windows_flag && !solaris_flag)
expect_equal(bst$raw, bst2$raw) expect_equal(bst$raw, bst2$raw)
expect_equal(dim(bst2$evaluation_log), c(2, 2)) expect_equal(dim(bst2$evaluation_log), c(2, 2))
# test continuing from a model in file # test continuing from a model in file
xgb.save(bst1, "xgboost.model") xgb.save(bst1, "xgboost.model")
bst2 <- xgb.train(param, dtrain, nrounds = 2, watchlist, verbose = 0, xgb_model = "xgboost.model") bst2 <- xgb.train(param, dtrain, nrounds = 2, watchlist, verbose = 0, xgb_model = "xgboost.model")
if (!windows_flag) if (!windows_flag && !solaris_flag)
expect_equal(bst$raw, bst2$raw) expect_equal(bst$raw, bst2$raw)
expect_equal(dim(bst2$evaluation_log), c(2, 2)) expect_equal(dim(bst2$evaluation_log), c(2, 2))
}) })

View File

@@ -2,18 +2,47 @@ context('Test generalized linear models')
require(xgboost) require(xgboost)
test_that("glm works", { test_that("gblinear works", {
data(agaricus.train, package='xgboost') data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost') data(agaricus.test, package='xgboost')
dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label) dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
dtest <- xgb.DMatrix(agaricus.test$data, label = agaricus.test$label) dtest <- xgb.DMatrix(agaricus.test$data, label = agaricus.test$label)
expect_equal(class(dtrain), "xgb.DMatrix")
expect_equal(class(dtest), "xgb.DMatrix")
param <- list(objective = "binary:logistic", booster = "gblinear", param <- list(objective = "binary:logistic", booster = "gblinear",
nthread = 2, alpha = 0.0001, lambda = 1) nthread = 2, eta = 0.8, alpha = 0.0001, lambda = 0.0001)
watchlist <- list(eval = dtest, train = dtrain) watchlist <- list(eval = dtest, train = dtrain)
num_round <- 2
bst <- xgb.train(param, dtrain, num_round, watchlist) n <- 5 # iterations
ERR_UL <- 0.005 # upper limit for the test set error
VERB <- 0 # chatterbox switch
param$updater = 'shotgun'
bst <- xgb.train(param, dtrain, n, watchlist, verbose = VERB, feature_selector = 'shuffle')
ypred <- predict(bst, dtest) ypred <- predict(bst, dtest)
expect_equal(length(getinfo(dtest, 'label')), 1611) expect_equal(length(getinfo(dtest, 'label')), 1611)
expect_lt(bst$evaluation_log$eval_error[n], ERR_UL)
bst <- xgb.train(param, dtrain, n, watchlist, verbose = VERB, feature_selector = 'cyclic',
callbacks = list(cb.gblinear.history()))
expect_lt(bst$evaluation_log$eval_error[n], ERR_UL)
h <- xgb.gblinear.history(bst)
expect_equal(dim(h), c(n, ncol(dtrain) + 1))
expect_is(h, "matrix")
param$updater = 'coord_descent'
bst <- xgb.train(param, dtrain, n, watchlist, verbose = VERB, feature_selector = 'cyclic')
expect_lt(bst$evaluation_log$eval_error[n], ERR_UL)
bst <- xgb.train(param, dtrain, n, watchlist, verbose = VERB, feature_selector = 'shuffle')
expect_lt(bst$evaluation_log$eval_error[n], ERR_UL)
bst <- xgb.train(param, dtrain, 2, watchlist, verbose = VERB, feature_selector = 'greedy')
expect_lt(bst$evaluation_log$eval_error[2], ERR_UL)
bst <- xgb.train(param, dtrain, n, watchlist, verbose = VERB, feature_selector = 'thrifty',
top_n = 50, callbacks = list(cb.gblinear.history(sparse = TRUE)))
expect_lt(bst$evaluation_log$eval_error[n], ERR_UL)
h <- xgb.gblinear.history(bst)
expect_equal(dim(h), c(n, ncol(dtrain) + 1))
expect_s4_class(h, "dgCMatrix")
}) })

View File

@@ -5,6 +5,8 @@ require(data.table)
require(Matrix) require(Matrix)
require(vcd, quietly = TRUE) require(vcd, quietly = TRUE)
float_tolerance = 5e-6
set.seed(1982) set.seed(1982)
data(Arthritis) data(Arthritis)
df <- data.table(Arthritis, keep.rownames = F) df <- data.table(Arthritis, keep.rownames = F)
@@ -85,7 +87,8 @@ test_that("predict feature contributions works", {
X <- sparse_matrix X <- sparse_matrix
colnames(X) <- NULL colnames(X) <- NULL
expect_error(pred_contr_ <- predict(bst.Tree, X, predcontrib = TRUE), regexp = NA) expect_error(pred_contr_ <- predict(bst.Tree, X, predcontrib = TRUE), regexp = NA)
expect_equal(pred_contr, pred_contr_, check.attributes = FALSE) expect_equal(pred_contr, pred_contr_, check.attributes = FALSE,
tolerance = float_tolerance)
# gbtree binary classifier (approximate method) # gbtree binary classifier (approximate method)
expect_error(pred_contr <- predict(bst.Tree, sparse_matrix, predcontrib = TRUE, approxcontrib = TRUE), regexp = NA) expect_error(pred_contr <- predict(bst.Tree, sparse_matrix, predcontrib = TRUE, approxcontrib = TRUE), regexp = NA)
@@ -104,7 +107,8 @@ test_that("predict feature contributions works", {
coefs <- xgb.dump(bst.GLM)[-c(1,2,4)] %>% as.numeric coefs <- xgb.dump(bst.GLM)[-c(1,2,4)] %>% as.numeric
coefs <- c(coefs[-1], coefs[1]) # intercept must be the last coefs <- c(coefs[-1], coefs[1]) # intercept must be the last
pred_contr_manual <- sweep(cbind(sparse_matrix, 1), 2, coefs, FUN="*") pred_contr_manual <- sweep(cbind(sparse_matrix, 1), 2, coefs, FUN="*")
expect_equal(as.numeric(pred_contr), as.numeric(pred_contr_manual), 1e-5) expect_equal(as.numeric(pred_contr), as.numeric(pred_contr_manual),
tolerance = float_tolerance)
# gbtree multiclass # gbtree multiclass
pred <- predict(mbst.Tree, as.matrix(iris[, -5]), outputmargin = TRUE, reshape = TRUE) pred <- predict(mbst.Tree, as.matrix(iris[, -5]), outputmargin = TRUE, reshape = TRUE)
@@ -123,11 +127,12 @@ test_that("predict feature contributions works", {
coefs_all <- xgb.dump(mbst.GLM)[-c(1,2,6)] %>% as.numeric %>% matrix(ncol = 3, byrow = TRUE) coefs_all <- xgb.dump(mbst.GLM)[-c(1,2,6)] %>% as.numeric %>% matrix(ncol = 3, byrow = TRUE)
for (g in seq_along(pred_contr)) { for (g in seq_along(pred_contr)) {
expect_equal(colnames(pred_contr[[g]]), c(colnames(iris[, -5]), "BIAS")) expect_equal(colnames(pred_contr[[g]]), c(colnames(iris[, -5]), "BIAS"))
expect_lt(max(abs(rowSums(pred_contr[[g]]) - pred[, g])), 2e-6) expect_lt(max(abs(rowSums(pred_contr[[g]]) - pred[, g])), float_tolerance)
# manual calculation of linear terms # manual calculation of linear terms
coefs <- c(coefs_all[-1, g], coefs_all[1, g]) # intercept needs to be the last coefs <- c(coefs_all[-1, g], coefs_all[1, g]) # intercept needs to be the last
pred_contr_manual <- sweep(as.matrix(cbind(iris[,-5], 1)), 2, coefs, FUN="*") pred_contr_manual <- sweep(as.matrix(cbind(iris[,-5], 1)), 2, coefs, FUN="*")
expect_equal(as.numeric(pred_contr[[g]]), as.numeric(pred_contr_manual), 2e-6) expect_equal(as.numeric(pred_contr[[g]]), as.numeric(pred_contr_manual),
tolerance = float_tolerance)
} }
}) })
@@ -171,14 +176,16 @@ if (grepl('Windows', Sys.info()[['sysname']]) ||
# check that lossless conversion works with 17 digits # check that lossless conversion works with 17 digits
# numeric -> character -> numeric # numeric -> character -> numeric
X <- 10^runif(100, -20, 20) X <- 10^runif(100, -20, 20)
if (capabilities('long.double')) {
X2X <- as.numeric(format(X, digits = 17)) X2X <- as.numeric(format(X, digits = 17))
expect_identical(X, X2X) expect_identical(X, X2X)
}
# retrieved attributes to be the same as written # retrieved attributes to be the same as written
for (x in X) { for (x in X) {
xgb.attr(bst.Tree, "x") <- x xgb.attr(bst.Tree, "x") <- x
expect_identical(as.numeric(xgb.attr(bst.Tree, "x")), x) expect_equal(as.numeric(xgb.attr(bst.Tree, "x")), x, tolerance = float_tolerance)
xgb.attributes(bst.Tree) <- list(a = "A", b = x) xgb.attributes(bst.Tree) <- list(a = "A", b = x)
expect_identical(as.numeric(xgb.attr(bst.Tree, "b")), x) expect_equal(as.numeric(xgb.attr(bst.Tree, "b")), x, tolerance = float_tolerance)
} }
}) })
} }
@@ -187,7 +194,7 @@ test_that("xgb.Booster serializing as R object works", {
saveRDS(bst.Tree, 'xgb.model.rds') saveRDS(bst.Tree, 'xgb.model.rds')
bst <- readRDS('xgb.model.rds') bst <- readRDS('xgb.model.rds')
dtrain <- xgb.DMatrix(sparse_matrix, label = label) dtrain <- xgb.DMatrix(sparse_matrix, label = label)
expect_equal(predict(bst.Tree, dtrain), predict(bst, dtrain)) expect_equal(predict(bst.Tree, dtrain), predict(bst, dtrain), tolerance = float_tolerance)
expect_equal(xgb.dump(bst.Tree), xgb.dump(bst)) expect_equal(xgb.dump(bst.Tree), xgb.dump(bst))
xgb.save(bst, 'xgb.model') xgb.save(bst, 'xgb.model')
nil_ptr <- new("externalptr") nil_ptr <- new("externalptr")
@@ -195,7 +202,7 @@ test_that("xgb.Booster serializing as R object works", {
expect_true(identical(bst$handle, nil_ptr)) expect_true(identical(bst$handle, nil_ptr))
bst <- xgb.Booster.complete(bst) bst <- xgb.Booster.complete(bst)
expect_true(!identical(bst$handle, nil_ptr)) expect_true(!identical(bst$handle, nil_ptr))
expect_equal(predict(bst.Tree, dtrain), predict(bst, dtrain)) expect_equal(predict(bst.Tree, dtrain), predict(bst, dtrain), tolerance = float_tolerance)
}) })
test_that("xgb.model.dt.tree works with and without feature names", { test_that("xgb.model.dt.tree works with and without feature names", {
@@ -233,13 +240,14 @@ test_that("xgb.importance works with and without feature names", {
expect_output(str(importance.Tree), 'Feature.*\\"Age\\"') expect_output(str(importance.Tree), 'Feature.*\\"Age\\"')
importance.Tree.0 <- xgb.importance(model = bst.Tree) importance.Tree.0 <- xgb.importance(model = bst.Tree)
expect_equal(importance.Tree, importance.Tree.0) expect_equal(importance.Tree, importance.Tree.0, tolerance = float_tolerance)
# when model contains no feature names: # when model contains no feature names:
bst.Tree.x <- bst.Tree bst.Tree.x <- bst.Tree
bst.Tree.x$feature_names <- NULL bst.Tree.x$feature_names <- NULL
importance.Tree.x <- xgb.importance(model = bst.Tree) importance.Tree.x <- xgb.importance(model = bst.Tree)
expect_equal(importance.Tree[, -1, with=FALSE], importance.Tree.x[, -1, with=FALSE]) expect_equal(importance.Tree[, -1, with=FALSE], importance.Tree.x[, -1, with=FALSE],
tolerance = float_tolerance)
imp2plot <- xgb.plot.importance(importance_matrix = importance.Tree) imp2plot <- xgb.plot.importance(importance_matrix = importance.Tree)
expect_equal(colnames(imp2plot), c("Feature", "Gain", "Cover", "Frequency", "Importance")) expect_equal(colnames(imp2plot), c("Feature", "Gain", "Cover", "Frequency", "Importance"))

View File

@@ -53,10 +53,16 @@
#include "../src/tree/updater_histmaker.cc" #include "../src/tree/updater_histmaker.cc"
#include "../src/tree/updater_skmaker.cc" #include "../src/tree/updater_skmaker.cc"
// linear
#include "../src/linear/linear_updater.cc"
#include "../src/linear/updater_coordinate.cc"
#include "../src/linear/updater_shotgun.cc"
// global // global
#include "../src/learner.cc" #include "../src/learner.cc"
#include "../src/logging.cc" #include "../src/logging.cc"
#include "../src/common/common.cc" #include "../src/common/common.cc"
#include "../src/common/host_device_vector.cc"
#include "../src/common/hist_util.cc" #include "../src/common/hist_util.cc"
// c_api // c_api

View File

@@ -53,7 +53,7 @@ install:
Import-Module "$Env:TEMP\appveyor-tool.ps1" Import-Module "$Env:TEMP\appveyor-tool.ps1"
Bootstrap Bootstrap
$DEPS = "c('data.table','magrittr','stringi','ggplot2','DiagrammeR','Ckmeans.1d.dp','vcd','testthat','igraph','knitr','rmarkdown')" $DEPS = "c('data.table','magrittr','stringi','ggplot2','DiagrammeR','Ckmeans.1d.dp','vcd','testthat','igraph','knitr','rmarkdown')"
cmd /c "R.exe -q -e ""install.packages($DEPS, repos='$CRAN', type='win.binary')"" 2>&1" cmd.exe /c "R.exe -q -e ""install.packages($DEPS, repos='$CRAN', type='win.binary')"" 2>&1"
} }
build_script: build_script:
@@ -81,7 +81,7 @@ build_script:
- if /i "%target%" == "rmingw" ( - if /i "%target%" == "rmingw" (
make Rbuild && make Rbuild &&
ls -l && ls -l &&
R.exe CMD INSTALL --no-multiarch xgboost*.tar.gz R.exe CMD INSTALL xgboost*.tar.gz
) )
# R package: cmake + VC2015 # R package: cmake + VC2015
- if /i "%target%" == "rmsvc" ( - if /i "%target%" == "rmsvc" (
@@ -98,10 +98,9 @@ test_script:
# mingw R package: run the R check (which includes unit tests), and also keep the built binary package # mingw R package: run the R check (which includes unit tests), and also keep the built binary package
- if /i "%target%" == "rmingw" ( - if /i "%target%" == "rmingw" (
set _R_CHECK_CRAN_INCOMING_=FALSE&& set _R_CHECK_CRAN_INCOMING_=FALSE&&
R.exe CMD check xgboost*.tar.gz --no-manual --no-build-vignettes --as-cran --install-args=--build --no-multiarch R.exe CMD check xgboost*.tar.gz --no-manual --no-build-vignettes --as-cran --install-args=--build
) )
# MSVC R package: run only the unit tests # MSVC R package: run only the unit tests
# TODO: create a binary msvc-built package to keep as an artifact
- if /i "%target%" == "rmsvc" ( - if /i "%target%" == "rmsvc" (
cd build_rmsvc%ver%\R-package && cd build_rmsvc%ver%\R-package &&
R.exe -q -e "library(testthat); setwd('tests'); source('testthat.R')" R.exe -q -e "library(testthat); setwd('tests'); source('testthat.R')"

View File

@@ -117,7 +117,7 @@ else()
# ask R for R_HOME # ask R for R_HOME
if(LIBR_EXECUTABLE) if(LIBR_EXECUTABLE)
execute_process( execute_process(
COMMAND ${LIBR_EXECUTABLE} "--slave" "--no-save" "-e" "cat(normalizePath(R.home(), winslash='/'))" COMMAND ${LIBR_EXECUTABLE} "--slave" "--no-save" "-e" "cat(normalizePath(R.home(),winslash='/'))"
OUTPUT_VARIABLE LIBR_HOME) OUTPUT_VARIABLE LIBR_HOME)
endif() endif()
# if R executable not available, query R_HOME path from registry # if R executable not available, query R_HOME path from registry

View File

@@ -2,8 +2,6 @@
This demo shows how to train a model on the [forest cover type](https://archive.ics.uci.edu/ml/datasets/covertype) dataset using GPU acceleration. The forest cover type dataset has 581,012 rows and 54 features, making it time consuming to process. We compare the run-time and accuracy of the GPU and CPU histogram algorithms. This demo shows how to train a model on the [forest cover type](https://archive.ics.uci.edu/ml/datasets/covertype) dataset using GPU acceleration. The forest cover type dataset has 581,012 rows and 54 features, making it time consuming to process. We compare the run-time and accuracy of the GPU and CPU histogram algorithms.
This demo requires the [GPU plug-in](https://github.com/dmlc/xgboost/tree/master/plugin/updater_gpu) to be built and installed. This demo requires the [GPU plug-in](https://xgboost.readthedocs.io/en/latest/gpu/index.html) to be built and installed.
The dataset is automatically loaded via the sklearn script. The dataset is automatically loaded via the sklearn script.

View File

@@ -1,7 +1,7 @@
XGBoost Python Feature Walkthrough XGBoost Python Feature Walkthrough
================================== ==================================
* [Basic walkthrough of wrappers](basic_walkthrough.py) * [Basic walkthrough of wrappers](basic_walkthrough.py)
* [Cutomize loss function, and evaluation metric](custom_objective.py) * [Customize loss function, and evaluation metric](custom_objective.py)
* [Boosting from existing prediction](boost_from_prediction.py) * [Boosting from existing prediction](boost_from_prediction.py)
* [Predicting using first n trees](predict_first_ntree.py) * [Predicting using first n trees](predict_first_ntree.py)
* [Generalized Linear Model](generalized_linear_model.py) * [Generalized Linear Model](generalized_linear_model.py)

View File

@@ -42,7 +42,7 @@ xgb.cv(param, dtrain, num_round, nfold=5,
metrics={'auc'}, seed=0, fpreproc=fpreproc) metrics={'auc'}, seed=0, fpreproc=fpreproc)
### ###
# you can also do cross validation with cutomized loss function # you can also do cross validation with customized loss function
# See custom_objective.py # See custom_objective.py
## ##
print('running cross validation, with cutomsized loss function') print('running cross validation, with cutomsized loss function')

View File

@@ -1,7 +1,5 @@
The documentation of xgboost is generated with recommonmark and sphinx. The documentation of xgboost is generated with recommonmark and sphinx.
You can build it locally by typing "make html" in this folder. You can build it locally by typing "make html" in this folder.
- clone https://github.com/tqchen/recommonmark to root
- type make html
Checkout https://recommonmark.readthedocs.org for guide on how to write markdown with extensions used in this doc, such as math formulas and table of content. Checkout https://recommonmark.readthedocs.org for guide on how to write markdown with extensions used in this doc, such as math formulas and table of content.

View File

@@ -56,7 +56,7 @@
}; };
</script> </script>
{% for name in ['jquery.js', 'underscore.js', 'doctools.js', 'searchtools.js'] %} {% for name in ['jquery.js', 'underscore.js', 'doctools.js', 'searchtools-new.js'] %}
<script type="text/javascript" src="{{ pathto('_static/' + name, 1) }}"></script> <script type="text/javascript" src="{{ pathto('_static/' + name, 1) }}"></script>
{% endfor %} {% endfor %}

View File

@@ -185,7 +185,7 @@ pre {
.dropdown-menu li { .dropdown-menu li {
padding: 0px 0px; padding: 0px 0px;
width: 120px; width: 100%;
} }
.dropdown-menu li a { .dropdown-menu li a {
color: #0079b2; color: #0079b2;

View File

@@ -4,7 +4,7 @@ Installation Guide
This page gives instructions on how to build and install the xgboost package from This page gives instructions on how to build and install the xgboost package from
scratch on various systems. It consists of two steps: scratch on various systems. It consists of two steps:
1. First build the shared library from the C++ codes (`libxgboost.so` for linux/osx and `libxgboost.dll` for windows). 1. First build the shared library from the C++ codes (`libxgboost.so` for Linux/OSX and `xgboost.dll` for Windows).
- Exception: for R-package installation please directly refer to the R package section. - Exception: for R-package installation please directly refer to the R package section.
2. Then install the language packages (e.g. Python Package). 2. Then install the language packages (e.g. Python Package).
@@ -39,7 +39,7 @@ even better to send pull request if you can fix the problem.
Our goal is to build the shared library: Our goal is to build the shared library:
- On Linux/OSX the target library is `libxgboost.so` - On Linux/OSX the target library is `libxgboost.so`
- On Windows the target library is `libxgboost.dll` - On Windows the target library is `xgboost.dll`
The minimal building requirement is The minimal building requirement is
@@ -85,12 +85,33 @@ Now, clone the repository
```bash ```bash
git clone --recursive https://github.com/dmlc/xgboost git clone --recursive https://github.com/dmlc/xgboost
cd xgboost; cp make/config.mk ./config.mk
```
Open config.mk and uncomment these two lines
```config.mk
export CC = gcc
export CXX = g++
```
and replace these two lines into(5 or 6 or 7; depending on your gcc-version)
```config.mk
export CC = gcc-7
export CXX = g++-7
```
To find your gcc version
```bash
gcc-version
``` ```
and build using the following commands and build using the following commands
```bash ```bash
cd xgboost; cp make/config.mk ./config.mk; make -j4 make -j4
``` ```
head over to `Python Package Installation` for the next steps head over to `Python Package Installation` for the next steps
@@ -111,12 +132,13 @@ After installing [Git for Windows](https://git-for-windows.github.io/), you shou
All the following steps are in the `Git Bash`. All the following steps are in the `Git Bash`.
In MinGW, `make` command comes with the name `mingw32-make`. You can add the following line into the `.bashrc` file. In MinGW, `make` command comes with the name `mingw32-make`. You can add the following line into the `.bashrc` file.
```bash ```bash
alias make='mingw32-make' alias make='mingw32-make'
``` ```
(On 64-bit Windows, you should get [mingw64](https://sourceforge.net/projects/mingw-w64/) instead.) Make sure
that the path to MinGW is in the system PATH.
To build with MinGW To build with MinGW, type:
```bash ```bash
cp make/mingw64.mk config.mk; make -j4 cp make/mingw64.mk config.mk; make -j4
@@ -130,7 +152,7 @@ cd build
cmake .. -G"Visual Studio 12 2013 Win64" cmake .. -G"Visual Studio 12 2013 Win64"
``` ```
This specifies an out of source build using the MSVC 12 64 bit generator. Open the .sln file in the build directory and build with Visual Studio. To use the Python module you can copy libxgboost.dll into python-package\xgboost. This specifies an out of source build using the MSVC 12 64 bit generator. Open the .sln file in the build directory and build with Visual Studio. To use the Python module you can copy `xgboost.dll` into python-package\xgboost.
Other versions of Visual Studio may work but are untested. Other versions of Visual Studio may work but are untested.
@@ -169,6 +191,8 @@ If build seems to use only a single process, you might try to append an option l
### Windows Binaries ### Windows Binaries
After the build process successfully ends, you will find a `xgboost.dll` library file inside `./lib/` folder, copy this file to the the API package folder like `python-package/xgboost` if you are using *python* API. And you are good to follow the below instructions.
Unofficial windows binaries and instructions on how to use them are hosted on [Guido Tapia's blog](http://www.picnet.com.au/blogs/guido/post/2016/09/22/xgboost-windows-x64-binaries-for-download/) Unofficial windows binaries and instructions on how to use them are hosted on [Guido Tapia's blog](http://www.picnet.com.au/blogs/guido/post/2016/09/22/xgboost-windows-x64-binaries-for-download/)
### Customized Building ### Customized Building

View File

@@ -14,7 +14,6 @@
import sys import sys
import os, subprocess import os, subprocess
import shlex import shlex
import urllib
# If extensions (or modules to document with autodoc) are in another directory, # If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the # add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here. # documentation root, use os.path.abspath to make it absolute, like shown here.
@@ -79,6 +78,8 @@ master_doc = 'index'
# Usually you set "language" from the command line for these cases. # Usually you set "language" from the command line for these cases.
language = None language = None
autoclass_content = 'both'
# There are two options for replacing |today|: either, you set today to some # There are two options for replacing |today|: either, you set today to some
# non-false value, then it is used: # non-false value, then it is used:
#today = '' #today = ''
@@ -164,7 +165,13 @@ def setup(app):
# Add hook for building doxygen xml when needed # Add hook for building doxygen xml when needed
# no c++ API for now # no c++ API for now
# app.connect("builder-inited", generate_doxygen_xml) # app.connect("builder-inited", generate_doxygen_xml)
urllib.urlretrieve('https://code.jquery.com/jquery-2.2.4.min.js',
# urlretrieve got moved in Python 3.x
try:
from urllib import urlretrieve
except ImportError:
from urllib.request import urlretrieve
urlretrieve('https://code.jquery.com/jquery-2.2.4.min.js',
'_static/jquery.js') '_static/jquery.js')
app.add_config_value('recommonmark_config', { app.add_config_value('recommonmark_config', {
'url_resolver': lambda url: github_doc_root + url, 'url_resolver': lambda url: github_doc_root + url,

View File

@@ -11,7 +11,7 @@ filename#cacheprefix
The ```filename``` is the normal path to libsvm file you want to load in, ```cacheprefix``` is a The ```filename``` is the normal path to libsvm file you want to load in, ```cacheprefix``` is a
path to a cache file that xgboost will use for external memory cache. path to a cache file that xgboost will use for external memory cache.
The following code was extracted from [../demo/guide-python/external_memory.py](../demo/guide-python/external_memory.py) The following code was extracted from [../../demo/guide-python/external_memory.py](../../demo/guide-python/external_memory.py)
```python ```python
dtrain = xgb.DMatrix('../data/agaricus.txt.train#dtrain.cache') dtrain = xgb.DMatrix('../data/agaricus.txt.train#dtrain.cache')
``` ```
@@ -28,7 +28,7 @@ Distributed Version
------------------- -------------------
The external memory mode naturally works on distributed version, you can simply set path like The external memory mode naturally works on distributed version, you can simply set path like
``` ```
data = "hdfs:///path-to-data/#dtrain.cache" data = "hdfs://path-to-data/#dtrain.cache"
``` ```
xgboost will cache the data to the local position. When you run on YARN, the current folder is temporal xgboost will cache the data to the local position. When you run on YARN, the current folder is temporal
so that you can directly use ```dtrain.cache``` to cache to current folder. so that you can directly use ```dtrain.cache``` to cache to current folder.

View File

@@ -96,7 +96,7 @@ Parameters for Tree Booster
- A type of boosting process to run. - A type of boosting process to run.
- Choices: {'default', 'update'} - Choices: {'default', 'update'}
- 'default': the normal boosting process which creates new trees. - 'default': the normal boosting process which creates new trees.
- 'update': starts from an existing model and only updates its trees. In each boosting iteration, a tree from the initial model is taken, a specified sequence of updater plugins is run for that tree, and a modified tree is added to the new model. The new model would have either the same or smaller number of trees, depending on the number of boosting iteratons performed. Currently, the following built-in updater plugins could be meaningfully used with this process type: 'refresh', 'prune'. With 'update', one cannot use updater plugins that create new nrees. - 'update': starts from an existing model and only updates its trees. In each boosting iteration, a tree from the initial model is taken, a specified sequence of updater plugins is run for that tree, and a modified tree is added to the new model. The new model would have either the same or smaller number of trees, depending on the number of boosting iteratons performed. Currently, the following built-in updater plugins could be meaningfully used with this process type: 'refresh', 'prune'. With 'update', one cannot use updater plugins that create new trees.
* grow_policy, string [default='depthwise'] * grow_policy, string [default='depthwise']
- Controls a way new nodes are added to the tree. - Controls a way new nodes are added to the tree.
- Currently supported only if `tree_method` is set to 'hist'. - Currently supported only if `tree_method` is set to 'hist'.
@@ -142,11 +142,14 @@ Additional parameters for Dart Booster
Parameters for Linear Booster Parameters for Linear Booster
----------------------------- -----------------------------
* lambda [default=0, alias: reg_lambda] * lambda [default=0, alias: reg_lambda]
- L2 regularization term on weights, increase this value will make model more conservative. - L2 regularization term on weights, increase this value will make model more conservative. Normalised to number of training examples.
* alpha [default=0, alias: reg_alpha] * alpha [default=0, alias: reg_alpha]
- L1 regularization term on weights, increase this value will make model more conservative. - L1 regularization term on weights, increase this value will make model more conservative. Normalised to number of training examples.
* lambda_bias [default=0, alias: reg_lambda_bias] * updater [default='shotgun']
- L2 regularization term on bias (no L1 reg on bias because it is not important) - Linear model algorithm
- 'shotgun': Parallel coordinate descent algorithm based on shotgun algorithm. Uses 'hogwild' parallelism and therefore produces a nondeterministic solution on each run.
- 'coord_descent': Ordinary coordinate descent algorithm. Also multithreaded but still produces a deterministic solution.
Parameters for Tweedie Regression Parameters for Tweedie Regression
--------------------------------- ---------------------------------
@@ -165,8 +168,13 @@ Specify the learning task and the corresponding learning objective. The objectiv
- "reg:logistic" --logistic regression - "reg:logistic" --logistic regression
- "binary:logistic" --logistic regression for binary classification, output probability - "binary:logistic" --logistic regression for binary classification, output probability
- "binary:logitraw" --logistic regression for binary classification, output score before logistic transformation - "binary:logitraw" --logistic regression for binary classification, output score before logistic transformation
- "gpu:reg:linear", "gpu:reg:logistic", "gpu:binary:logistic", gpu:binary:logitraw" --versions
of the corresponding objective functions evaluated on the GPU; note that like the GPU histogram algorithm,
they can only be used when the entire training session uses the same dataset
- "count:poisson" --poisson regression for count data, output mean of poisson distribution - "count:poisson" --poisson regression for count data, output mean of poisson distribution
- max_delta_step is set to 0.7 by default in poisson regression (used to safeguard optimization) - max_delta_step is set to 0.7 by default in poisson regression (used to safeguard optimization)
- "survival:cox" --Cox regression for right censored survival time data (negative values are considered right censored).
Note that predictions are returned on the hazard ratio scale (i.e., as HR = exp(marginal_prediction) in the proportional hazard function h(t) = h0(t) * HR).
- "multi:softmax" --set XGBoost to do multiclass classification using the softmax objective, you also need to set num_class(number of classes) - "multi:softmax" --set XGBoost to do multiclass classification using the softmax objective, you also need to set num_class(number of classes)
- "multi:softprob" --same as softmax, but output a vector of ndata * nclass, which can be further reshaped to ndata, nclass matrix. The result contains predicted probability of each data point belonging to each class. - "multi:softprob" --same as softmax, but output a vector of ndata * nclass, which can be further reshaped to ndata, nclass matrix. The result contains predicted probability of each data point belonging to each class.
- "rank:pairwise" --set XGBoost to do ranking task by minimizing the pairwise loss - "rank:pairwise" --set XGBoost to do ranking task by minimizing the pairwise loss
@@ -194,6 +202,7 @@ Specify the learning task and the corresponding learning objective. The objectiv
training repeatedly training repeatedly
- "poisson-nloglik": negative log-likelihood for Poisson regression - "poisson-nloglik": negative log-likelihood for Poisson regression
- "gamma-nloglik": negative log-likelihood for gamma regression - "gamma-nloglik": negative log-likelihood for gamma regression
- "cox-nloglik": negative partial log-likelihood for Cox proportional hazards regression
- "gamma-deviance": residual deviance for gamma regression - "gamma-deviance": residual deviance for gamma regression
- "tweedie-nloglik": negative log-likelihood for Tweedie regression (at a specified value of the tweedie_variance_power parameter) - "tweedie-nloglik": negative log-likelihood for Tweedie regression (at a specified value of the tweedie_variance_power parameter)
* seed [default=0] * seed [default=0]

View File

@@ -25,7 +25,9 @@ Data Interface
-------------- --------------
The XGBoost python module is able to load data from: The XGBoost python module is able to load data from:
- libsvm txt format file - libsvm txt format file
- Numpy 2D array, and - comma-separated values (CSV) file
- Numpy 2D array
- Scipy 2D sparse array, and
- xgboost binary buffer file. - xgboost binary buffer file.
The data is stored in a ```DMatrix``` object. The data is stored in a ```DMatrix``` object.
@@ -35,6 +37,16 @@ The data is stored in a ```DMatrix``` object.
dtrain = xgb.DMatrix('train.svm.txt') dtrain = xgb.DMatrix('train.svm.txt')
dtest = xgb.DMatrix('test.svm.buffer') dtest = xgb.DMatrix('test.svm.buffer')
``` ```
* To load a CSV file into ```DMatrix```:
```python
# label_column specifies the index of the column containing the true label
dtrain = xgb.DMatrix('train.csv?format=csv&label_column=0')
dtest = xgb.DMatrix('test.csv?format=csv&label_column=0')
```
(Note that XGBoost does not support categorical features; if your data contains
categorical features, load it as a numpy array first and then perform
[one-hot encoding](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html).)
* To load a numpy array into ```DMatrix```: * To load a numpy array into ```DMatrix```:
```python ```python
data = np.random.rand(5, 10) # 5 entities, each contains 10 features data = np.random.rand(5, 10) # 5 entities, each contains 10 features

3
doc/requirements.txt Normal file
View File

@@ -0,0 +1,3 @@
sphinx==1.5.6
commonmark==0.5.4
mock

View File

@@ -1,9 +1,9 @@
DART booster DART booster
============ ============
[XGBoost](https://github.com/dmlc/xgboost)) mostly combines a huge number of regression trees with a small learning rate. [XGBoost](https://github.com/dmlc/xgboost) mostly combines a huge number of regression trees with a small learning rate.
In this situation, trees added early are significant and trees added late are unimportant. In this situation, trees added early are significant and trees added late are unimportant.
Rasmi et al. proposed a new method to add dropout techniques from the deep neural net community to boosted trees, and reported better results in some situations. Vinayak and Gilad-Bachrach proposed a new method to add dropout techniques from the deep neural net community to boosted trees, and reported better results in some situations.
This is a instruction of new tree booster `dart`. This is a instruction of new tree booster `dart`.

View File

@@ -76,3 +76,15 @@ Some other examples:
- ```(1,0)```: An increasing constraint on the first predictor and no constraint on the second. - ```(1,0)```: An increasing constraint on the first predictor and no constraint on the second.
- ```(0,-1)```: No constraint on the first predictor and a decreasing constraint on the second. - ```(0,-1)```: No constraint on the first predictor and a decreasing constraint on the second.
**Choise of tree construction algorithm**. To use monotonic constraints, be
sure to set the `tree_method` parameter to one of `'exact'`, `'hist'`, and
`'gpu_hist'`.
**Note for the `'hist'` tree construction algorithm**.
If `tree_method` is set to either `'hist'` or `'gpu_hist'`, enabling monotonic
constraints may produce unnecessarily shallow trees. This is because the
`'hist'` method reduces the number of candidate splits to be considered at each
split. Monotonic constraints may wipe out all available split candidates, in
which case no split is made. To reduce the effect, you may want to increase
the `max_bin` parameter to consider more split candidates.

View File

@@ -95,7 +95,7 @@ XGB_EXTERN_C typedef int XGBCallbackDataIterNext(
* this function is thread safe and can be called by different thread * this function is thread safe and can be called by different thread
* \return const char* error information * \return const char* error information
*/ */
XGB_DLL const char *XGBGetLastError(); XGB_DLL const char *XGBGetLastError(void);
/*! /*!
* \brief load a data matrix * \brief load a data matrix

View File

@@ -12,6 +12,7 @@
#include <string> #include <string>
#include <memory> #include <memory>
#include <vector> #include <vector>
#include <numeric>
#include "./base.h" #include "./base.h"
namespace xgboost { namespace xgboost {
@@ -76,6 +77,19 @@ struct MetaInfo {
inline unsigned GetRoot(size_t i) const { inline unsigned GetRoot(size_t i) const {
return root_index.size() != 0 ? root_index[i] : 0U; return root_index.size() != 0 ? root_index[i] : 0U;
} }
/*! \brief get sorted indexes (argsort) of labels by absolute value (used by cox loss) */
inline const std::vector<size_t>& LabelAbsSort() const {
if (label_order_cache.size() == labels.size()) {
return label_order_cache;
}
label_order_cache.resize(labels.size());
std::iota(label_order_cache.begin(), label_order_cache.end(), 0);
const auto l = labels;
XGBOOST_PARALLEL_SORT(label_order_cache.begin(), label_order_cache.end(),
[&l](size_t i1, size_t i2) {return std::abs(l[i1]) < std::abs(l[i2]);});
return label_order_cache;
}
/*! \brief clear all the information */ /*! \brief clear all the information */
void Clear(); void Clear();
/*! /*!
@@ -96,6 +110,10 @@ struct MetaInfo {
* \param num Number of elements in the source array. * \param num Number of elements in the source array.
*/ */
void SetInfo(const char* key, const void* dptr, DataType dtype, size_t num); void SetInfo(const char* key, const void* dptr, DataType dtype, size_t num);
private:
/*! \brief argsort of labels */
mutable std::vector<size_t> label_order_cache;
}; };
/*! \brief read-only sparse instance batch in CSR format */ /*! \brief read-only sparse instance batch in CSR format */
@@ -256,14 +274,16 @@ class DMatrix {
* \param subsample subsample ratio when generating column access. * \param subsample subsample ratio when generating column access.
* \param max_row_perbatch auxiliary information, maximum row used in each column batch. * \param max_row_perbatch auxiliary information, maximum row used in each column batch.
* this is a hint information that can be ignored by the implementation. * this is a hint information that can be ignored by the implementation.
* \param sorted If column features should be in sorted order
* \return Number of column blocks in the column access. * \return Number of column blocks in the column access.
*/ */
virtual void InitColAccess(const std::vector<bool>& enabled, virtual void InitColAccess(const std::vector<bool>& enabled,
float subsample, float subsample,
size_t max_row_perbatch) = 0; size_t max_row_perbatch, bool sorted) = 0;
// the following are column meta data, should be able to answer them fast. // the following are column meta data, should be able to answer them fast.
/*! \return whether column access is enabled */ /*! \return whether column access is enabled */
virtual bool HaveColAccess() const = 0; virtual bool HaveColAccess(bool sorted) const = 0;
/*! \return Whether the data columns single column block. */ /*! \return Whether the data columns single column block. */
virtual bool SingleColBlock() const = 0; virtual bool SingleColBlock() const = 0;
/*! \brief get number of non-missing entries in column */ /*! \brief get number of non-missing entries in column */

View File

@@ -18,6 +18,7 @@
#include "./data.h" #include "./data.h"
#include "./objective.h" #include "./objective.h"
#include "./feature_map.h" #include "./feature_map.h"
#include "../../src/common/host_device_vector.h"
namespace xgboost { namespace xgboost {
/*! /*!
@@ -68,8 +69,9 @@ class GradientBooster {
* the booster may change content of gpair * the booster may change content of gpair
*/ */
virtual void DoBoost(DMatrix* p_fmat, virtual void DoBoost(DMatrix* p_fmat,
std::vector<bst_gpair>* in_gpair, HostDeviceVector<bst_gpair>* in_gpair,
ObjFunction* obj = nullptr) = 0; ObjFunction* obj = nullptr) = 0;
/*! /*!
* \brief generate predictions for given feature matrix * \brief generate predictions for given feature matrix
* \param dmat feature matrix * \param dmat feature matrix
@@ -78,7 +80,7 @@ class GradientBooster {
* we do not limit number of trees, this parameter is only valid for gbtree, but not for gblinear * we do not limit number of trees, this parameter is only valid for gbtree, but not for gblinear
*/ */
virtual void PredictBatch(DMatrix* dmat, virtual void PredictBatch(DMatrix* dmat,
std::vector<bst_float>* out_preds, HostDeviceVector<bst_float>* out_preds,
unsigned ntree_limit = 0) = 0; unsigned ntree_limit = 0) = 0;
/*! /*!
* \brief online prediction function, predict score for one instance at a time * \brief online prediction function, predict score for one instance at a time
@@ -116,10 +118,17 @@ class GradientBooster {
* \param ntree_limit limit the number of trees used in prediction, when it equals 0, this means * \param ntree_limit limit the number of trees used in prediction, when it equals 0, this means
* we do not limit number of trees * we do not limit number of trees
* \param approximate use a faster (inconsistent) approximation of SHAP values * \param approximate use a faster (inconsistent) approximation of SHAP values
* \param condition condition on the condition_feature (0=no, -1=cond off, 1=cond on).
* \param condition_feature feature to condition on (i.e. fix) during calculations
*/ */
virtual void PredictContribution(DMatrix* dmat, virtual void PredictContribution(DMatrix* dmat,
std::vector<bst_float>* out_contribs, std::vector<bst_float>* out_contribs,
unsigned ntree_limit = 0, bool approximate = false) = 0; unsigned ntree_limit = 0, bool approximate = false,
int condition = 0, unsigned condition_feature = 0) = 0;
virtual void PredictInteractionContributions(DMatrix* dmat,
std::vector<bst_float>* out_contribs,
unsigned ntree_limit, bool approximate) = 0;
/*! /*!
* \brief dump the model in the requested format * \brief dump the model in the requested format

View File

@@ -84,7 +84,7 @@ class Learner : public rabit::Serializable {
*/ */
virtual void BoostOneIter(int iter, virtual void BoostOneIter(int iter,
DMatrix* train, DMatrix* train,
std::vector<bst_gpair>* in_gpair) = 0; HostDeviceVector<bst_gpair>* in_gpair) = 0;
/*! /*!
* \brief evaluate the model for specific iteration using the configured metrics. * \brief evaluate the model for specific iteration using the configured metrics.
* \param iter iteration number * \param iter iteration number
@@ -105,14 +105,17 @@ class Learner : public rabit::Serializable {
* \param pred_leaf whether to only predict the leaf index of each tree in a boosted tree predictor * \param pred_leaf whether to only predict the leaf index of each tree in a boosted tree predictor
* \param pred_contribs whether to only predict the feature contributions * \param pred_contribs whether to only predict the feature contributions
* \param approx_contribs whether to approximate the feature contributions for speed * \param approx_contribs whether to approximate the feature contributions for speed
* \param pred_interactions whether to compute the feature pair contributions
*/ */
virtual void Predict(DMatrix* data, virtual void Predict(DMatrix* data,
bool output_margin, bool output_margin,
std::vector<bst_float> *out_preds, HostDeviceVector<bst_float> *out_preds,
unsigned ntree_limit = 0, unsigned ntree_limit = 0,
bool pred_leaf = false, bool pred_leaf = false,
bool pred_contribs = false, bool pred_contribs = false,
bool approx_contribs = false) const = 0; bool approx_contribs = false,
bool pred_interactions = false) const = 0;
/*! /*!
* \brief Set additional attribute to the Booster. * \brief Set additional attribute to the Booster.
* The property will be saved along the booster. * The property will be saved along the booster.
@@ -166,7 +169,7 @@ class Learner : public rabit::Serializable {
*/ */
inline void Predict(const SparseBatch::Inst &inst, inline void Predict(const SparseBatch::Inst &inst,
bool output_margin, bool output_margin,
std::vector<bst_float> *out_preds, HostDeviceVector<bst_float> *out_preds,
unsigned ntree_limit = 0) const; unsigned ntree_limit = 0) const;
/*! /*!
* \brief Create a new instance of learner. * \brief Create a new instance of learner.
@@ -189,9 +192,9 @@ class Learner : public rabit::Serializable {
// implementation of inline functions. // implementation of inline functions.
inline void Learner::Predict(const SparseBatch::Inst& inst, inline void Learner::Predict(const SparseBatch::Inst& inst,
bool output_margin, bool output_margin,
std::vector<bst_float>* out_preds, HostDeviceVector<bst_float>* out_preds,
unsigned ntree_limit) const { unsigned ntree_limit) const {
gbm_->PredictInstance(inst, out_preds, ntree_limit); gbm_->PredictInstance(inst, &out_preds->data_h(), ntree_limit);
if (!output_margin) { if (!output_margin) {
obj_->PredTransform(out_preds); obj_->PredTransform(out_preds);
} }

View File

@@ -0,0 +1,66 @@
/*
* Copyright 2018 by Contributors
*/
#pragma once
#include <dmlc/registry.h>
#include <xgboost/base.h>
#include <xgboost/data.h>
#include <functional>
#include <string>
#include <utility>
#include <vector>
#include "../../src/gbm/gblinear_model.h"
namespace xgboost {
/*!
* \brief interface of linear updater
*/
class LinearUpdater {
public:
/*! \brief virtual destructor */
virtual ~LinearUpdater() {}
/*!
* \brief Initialize the updater with given arguments.
* \param args arguments to the objective function.
*/
virtual void Init(
const std::vector<std::pair<std::string, std::string> >& args) = 0;
/**
* \brief Updates linear model given gradients.
*
* \param in_gpair The gradient pair statistics of the data.
* \param data Input data matrix.
* \param model Model to be updated.
* \param sum_instance_weight The sum instance weights, used to normalise l1/l2 penalty.
*/
virtual void Update(std::vector<bst_gpair>* in_gpair, DMatrix* data,
gbm::GBLinearModel* model,
double sum_instance_weight) = 0;
/*!
* \brief Create a linear updater given name
* \param name Name of the linear updater.
*/
static LinearUpdater* Create(const std::string& name);
};
/*!
* \brief Registry entry for linear updater.
*/
struct LinearUpdaterReg
: public dmlc::FunctionRegEntryBase<LinearUpdaterReg,
std::function<LinearUpdater*()> > {};
/*!
* \brief Macro to register linear updater.
*/
#define XGBOOST_REGISTER_LINEAR_UPDATER(UniqueId, Name) \
static DMLC_ATTRIBUTE_UNUSED ::xgboost::LinearUpdaterReg& \
__make_##LinearUpdaterReg##_##UniqueId##__ = \
::dmlc::Registry< ::xgboost::LinearUpdaterReg>::Get()->__REGISTER__( \
Name)
} // namespace xgboost

View File

@@ -14,8 +14,11 @@
#include <functional> #include <functional>
#include "./data.h" #include "./data.h"
#include "./base.h" #include "./base.h"
#include "../../src/common/host_device_vector.h"
namespace xgboost { namespace xgboost {
/*! \brief interface of objective function */ /*! \brief interface of objective function */
class ObjFunction { class ObjFunction {
public: public:
@@ -41,10 +44,11 @@ class ObjFunction {
* \param iteration current iteration number. * \param iteration current iteration number.
* \param out_gpair output of get gradient, saves gradient and second order gradient in * \param out_gpair output of get gradient, saves gradient and second order gradient in
*/ */
virtual void GetGradient(const std::vector<bst_float>& preds, virtual void GetGradient(HostDeviceVector<bst_float>* preds,
const MetaInfo& info, const MetaInfo& info,
int iteration, int iteration,
std::vector<bst_gpair>* out_gpair) = 0; HostDeviceVector<bst_gpair>* out_gpair) = 0;
/*! \return the default evaluation metric for the objective */ /*! \return the default evaluation metric for the objective */
virtual const char* DefaultEvalMetric() const = 0; virtual const char* DefaultEvalMetric() const = 0;
// the following functions are optional, most of time default implementation is good enough // the following functions are optional, most of time default implementation is good enough
@@ -52,13 +56,14 @@ class ObjFunction {
* \brief transform prediction values, this is only called when Prediction is called * \brief transform prediction values, this is only called when Prediction is called
* \param io_preds prediction values, saves to this vector as well * \param io_preds prediction values, saves to this vector as well
*/ */
virtual void PredTransform(std::vector<bst_float> *io_preds) {} virtual void PredTransform(HostDeviceVector<bst_float> *io_preds) {}
/*! /*!
* \brief transform prediction values, this is only called when Eval is called, * \brief transform prediction values, this is only called when Eval is called,
* usually it redirect to PredTransform * usually it redirect to PredTransform
* \param io_preds prediction values, saves to this vector as well * \param io_preds prediction values, saves to this vector as well
*/ */
virtual void EvalTransform(std::vector<bst_float> *io_preds) { virtual void EvalTransform(HostDeviceVector<bst_float> *io_preds) {
this->PredTransform(io_preds); this->PredTransform(io_preds);
} }
/*! /*!

View File

@@ -13,6 +13,7 @@
#include <utility> #include <utility>
#include <vector> #include <vector>
#include "../../src/gbm/gbtree_model.h" #include "../../src/gbm/gbtree_model.h"
#include "../../src/common/host_device_vector.h"
// Forward declarations // Forward declarations
namespace xgboost { namespace xgboost {
@@ -51,10 +52,6 @@ class Predictor {
const std::vector<std::shared_ptr<DMatrix>>& cache); const std::vector<std::shared_ptr<DMatrix>>& cache);
/** /**
* \fn virtual void Predictor::PredictBatch( DMatrix* dmat,
* std::vector<bst_float>* out_preds, const gbm::GBTreeModel &model, int
* tree_begin, unsigned ntree_limit = 0) = 0;
*
* \brief Generate batch predictions for a given feature matrix. May use * \brief Generate batch predictions for a given feature matrix. May use
* cached predictions if available instead of calculating from scratch. * cached predictions if available instead of calculating from scratch.
* *
@@ -66,7 +63,7 @@ class Predictor {
* limit trees. * limit trees.
*/ */
virtual void PredictBatch(DMatrix* dmat, std::vector<bst_float>* out_preds, virtual void PredictBatch(DMatrix* dmat, HostDeviceVector<bst_float>* out_preds,
const gbm::GBTreeModel& model, int tree_begin, const gbm::GBTreeModel& model, int tree_begin,
unsigned ntree_limit = 0) = 0; unsigned ntree_limit = 0) = 0;
@@ -145,9 +142,19 @@ class Predictor {
* \param model Model to make predictions from. * \param model Model to make predictions from.
* \param ntree_limit (Optional) The ntree limit. * \param ntree_limit (Optional) The ntree limit.
* \param approximate Use fast approximate algorithm. * \param approximate Use fast approximate algorithm.
* \param condition Condition on the condition_feature (0=no, -1=cond off, 1=cond on).
* \param condition_feature Feature to condition on (i.e. fix) during calculations.
*/ */
virtual void PredictContribution(DMatrix* dmat, virtual void PredictContribution(DMatrix* dmat,
std::vector<bst_float>* out_contribs,
const gbm::GBTreeModel& model,
unsigned ntree_limit = 0,
bool approximate = false,
int condition = 0,
unsigned condition_feature = 0) = 0;
virtual void PredictInteractionContributions(DMatrix* dmat,
std::vector<bst_float>* out_contribs, std::vector<bst_float>* out_contribs,
const gbm::GBTreeModel& model, const gbm::GBTreeModel& model,
unsigned ntree_limit = 0, unsigned ntree_limit = 0,
@@ -163,41 +170,14 @@ class Predictor {
static Predictor* Create(std::string name); static Predictor* Create(std::string name);
protected: protected:
/**
* \fn bool PredictFromCache(DMatrix* dmat, std::vector<bst_float>*
* out_preds, const gbm::GBTreeModel& model, unsigned ntree_limit = 0)
*
* \brief Attempt to predict from cache.
*
* \return True if it succeeds, false if it fails.
*/
bool PredictFromCache(DMatrix* dmat, std::vector<bst_float>* out_preds,
const gbm::GBTreeModel& model,
unsigned ntree_limit = 0);
/**
* \fn void Predictor::InitOutPredictions(const MetaInfo& info,
* std::vector<bst_float>* out_preds, const gbm::GBTreeModel& model) const;
*
* \brief Init out predictions according to base margin.
*
* \param info Dmatrix info possibly containing base margin.
* \param [in,out] out_preds The out preds.
* \param model The model.
*/
void InitOutPredictions(const MetaInfo& info,
std::vector<bst_float>* out_preds,
const gbm::GBTreeModel& model) const;
/** /**
* \struct PredictionCacheEntry * \struct PredictionCacheEntry
* *
* \brief Contains pointer to input matrix and associated cached predictions. * \brief Contains pointer to input matrix and associated cached predictions.
*/ */
struct PredictionCacheEntry { struct PredictionCacheEntry {
std::shared_ptr<DMatrix> data; std::shared_ptr<DMatrix> data;
std::vector<bst_float> predictions; HostDeviceVector<bst_float> predictions;
}; };
/** /**

View File

@@ -501,13 +501,33 @@ class RegTree: public TreeModel<bst_float, RTreeNodeStat> {
* \param feat dense feature vector, if the feature is missing the field is set to NaN * \param feat dense feature vector, if the feature is missing the field is set to NaN
* \param root_id starting root index of the instance * \param root_id starting root index of the instance
* \param out_contribs output vector to hold the contributions * \param out_contribs output vector to hold the contributions
* \param condition fix one feature to either off (-1) on (1) or not fixed (0 default)
* \param condition_feature the index of the feature to fix
*/ */
inline void CalculateContributions(const RegTree::FVec& feat, unsigned root_id, inline void CalculateContributions(const RegTree::FVec& feat, unsigned root_id,
bst_float *out_contribs) const; bst_float *out_contribs,
int condition = 0,
unsigned condition_feature = 0) const;
/*!
* \brief Recursive function that computes the feature attributions for a single tree.
* \param feat dense feature vector, if the feature is missing the field is set to NaN
* \param phi dense output vector of feature attributions
* \param node_index the index of the current node in the tree
* \param unique_depth how many unique features are above the current node in the tree
* \param parent_unique_path a vector of statistics about our current path through the tree
* \param parent_zero_fraction what fraction of the parent path weight is coming as 0 (integrated)
* \param parent_one_fraction what fraction of the parent path weight is coming as 1 (fixed)
* \param parent_feature_index what feature the parent node used to split
* \param condition fix one feature to either off (-1) on (1) or not fixed (0 default)
* \param condition_feature the index of the feature to fix
* \param condition_fraction what fraction of the current weight matches our conditioning feature
*/
inline void TreeShap(const RegTree::FVec& feat, bst_float *phi, inline void TreeShap(const RegTree::FVec& feat, bst_float *phi,
unsigned node_index, unsigned unique_depth, unsigned node_index, unsigned unique_depth,
PathElement *parent_unique_path, bst_float parent_zero_fraction, PathElement *parent_unique_path, bst_float parent_zero_fraction,
bst_float parent_one_fraction, int parent_feature_index) const; bst_float parent_one_fraction, int parent_feature_index,
int condition, unsigned condition_feature,
bst_float condition_fraction) const;
/*! /*!
* \brief calculate the approximate feature contributions for the given root * \brief calculate the approximate feature contributions for the given root
@@ -700,7 +720,7 @@ inline bst_float UnwoundPathSum(const PathElement *unique_path, unsigned unique_
/ static_cast<bst_float>((i + 1) * one_fraction); / static_cast<bst_float>((i + 1) * one_fraction);
total += tmp; total += tmp;
next_one_portion = unique_path[i].pweight - tmp * zero_fraction * ((unique_depth - i) next_one_portion = unique_path[i].pweight - tmp * zero_fraction * ((unique_depth - i)
/ static_cast<bst_float>(unique_depth+1)); / static_cast<bst_float>(unique_depth + 1));
} else { } else {
total += (unique_path[i].pweight / zero_fraction) / ((unique_depth - i) total += (unique_path[i].pweight / zero_fraction) / ((unique_depth - i)
/ static_cast<bst_float>(unique_depth + 1)); / static_cast<bst_float>(unique_depth + 1));
@@ -713,15 +733,22 @@ inline bst_float UnwoundPathSum(const PathElement *unique_path, unsigned unique_
inline void RegTree::TreeShap(const RegTree::FVec& feat, bst_float *phi, inline void RegTree::TreeShap(const RegTree::FVec& feat, bst_float *phi,
unsigned node_index, unsigned unique_depth, unsigned node_index, unsigned unique_depth,
PathElement *parent_unique_path, bst_float parent_zero_fraction, PathElement *parent_unique_path, bst_float parent_zero_fraction,
bst_float parent_one_fraction, int parent_feature_index) const { bst_float parent_one_fraction, int parent_feature_index,
int condition, unsigned condition_feature,
bst_float condition_fraction) const {
const auto node = (*this)[node_index]; const auto node = (*this)[node_index];
// stop if we have no weight coming down to us
if (condition_fraction == 0) return;
// extend the unique path // extend the unique path
PathElement *unique_path = parent_unique_path + unique_depth; PathElement *unique_path = parent_unique_path + unique_depth + 1;
if (unique_depth > 0) std::copy(parent_unique_path, std::copy(parent_unique_path, parent_unique_path + unique_depth + 1, unique_path);
parent_unique_path + unique_depth, unique_path);
if (condition == 0 || condition_feature != static_cast<unsigned>(parent_feature_index)) {
ExtendPath(unique_path, unique_depth, parent_zero_fraction, ExtendPath(unique_path, unique_depth, parent_zero_fraction,
parent_one_fraction, parent_feature_index); parent_one_fraction, parent_feature_index);
}
const unsigned split_index = node.split_index(); const unsigned split_index = node.split_index();
// leaf node // leaf node
@@ -729,7 +756,8 @@ inline void RegTree::TreeShap(const RegTree::FVec& feat, bst_float *phi,
for (unsigned i = 1; i <= unique_depth; ++i) { for (unsigned i = 1; i <= unique_depth; ++i) {
const bst_float w = UnwoundPathSum(unique_path, unique_depth, i); const bst_float w = UnwoundPathSum(unique_path, unique_depth, i);
const PathElement &el = unique_path[i]; const PathElement &el = unique_path[i];
phi[el.feature_index] += w * (el.one_fraction - el.zero_fraction) * node.leaf_value(); phi[el.feature_index] += w * (el.one_fraction - el.zero_fraction)
* node.leaf_value() * condition_fraction;
} }
// internal node // internal node
@@ -764,34 +792,44 @@ inline void RegTree::TreeShap(const RegTree::FVec& feat, bst_float *phi,
unique_depth -= 1; unique_depth -= 1;
} }
// divide up the condition_fraction among the recursive calls
bst_float hot_condition_fraction = condition_fraction;
bst_float cold_condition_fraction = condition_fraction;
if (condition > 0 && split_index == condition_feature) {
cold_condition_fraction = 0;
unique_depth -= 1;
} else if (condition < 0 && split_index == condition_feature) {
hot_condition_fraction *= hot_zero_fraction;
cold_condition_fraction *= cold_zero_fraction;
unique_depth -= 1;
}
TreeShap(feat, phi, hot_index, unique_depth + 1, unique_path, TreeShap(feat, phi, hot_index, unique_depth + 1, unique_path,
hot_zero_fraction*incoming_zero_fraction, incoming_one_fraction, split_index); hot_zero_fraction * incoming_zero_fraction, incoming_one_fraction,
split_index, condition, condition_feature, hot_condition_fraction);
TreeShap(feat, phi, cold_index, unique_depth + 1, unique_path, TreeShap(feat, phi, cold_index, unique_depth + 1, unique_path,
cold_zero_fraction*incoming_zero_fraction, 0, split_index); cold_zero_fraction * incoming_zero_fraction, 0,
split_index, condition, condition_feature, cold_condition_fraction);
} }
} }
inline void RegTree::CalculateContributions(const RegTree::FVec& feat, unsigned root_id, inline void RegTree::CalculateContributions(const RegTree::FVec& feat, unsigned root_id,
bst_float *out_contribs) const { bst_float *out_contribs,
int condition,
unsigned condition_feature) const {
// find the expected value of the tree's predictions // find the expected value of the tree's predictions
bst_float base_value = 0.0f; if (condition == 0) {
bst_float total_cover = 0.0f; bst_float node_value = this->node_mean_values[static_cast<int>(root_id)];
for (int i = 0; i < (*this).param.num_nodes; ++i) { out_contribs[feat.size()] += node_value;
const auto node = (*this)[i];
if (node.is_leaf()) {
const auto cover = this->stat(i).sum_hess;
base_value += cover * node.leaf_value();
total_cover += cover;
} }
}
out_contribs[feat.size()] += base_value / total_cover;
// Preallocate space for the unique path data // Preallocate space for the unique path data
const int maxd = this->MaxDepth(root_id) + 1; const int maxd = this->MaxDepth(root_id) + 2;
PathElement *unique_path_data = new PathElement[(maxd * (maxd + 1)) / 2]; PathElement *unique_path_data = new PathElement[(maxd * (maxd + 1)) / 2];
TreeShap(feat, out_contribs, root_id, 0, unique_path_data, 1, 1, -1); TreeShap(feat, out_contribs, root_id, 0, unique_path_data,
1, 1, -1, condition, condition_feature, 1);
delete[] unique_path_data; delete[] unique_path_data;
} }

View File

@@ -16,6 +16,7 @@
#include "./base.h" #include "./base.h"
#include "./data.h" #include "./data.h"
#include "./tree_model.h" #include "./tree_model.h"
#include "../../src/common/host_device_vector.h"
namespace xgboost { namespace xgboost {
/*! /*!
@@ -39,7 +40,7 @@ class TreeUpdater {
* but maybe different random seeds, usually one tree is passed in at a time, * but maybe different random seeds, usually one tree is passed in at a time,
* there can be multiple trees when we train random forest style model * there can be multiple trees when we train random forest style model
*/ */
virtual void Update(const std::vector<bst_gpair>& gpair, virtual void Update(HostDeviceVector<bst_gpair>* gpair,
DMatrix* data, DMatrix* data,
const std::vector<RegTree*>& trees) = 0; const std::vector<RegTree*>& trees) = 0;
@@ -54,9 +55,10 @@ class TreeUpdater {
* updated by the time this function returns. * updated by the time this function returns.
*/ */
virtual bool UpdatePredictionCache(const DMatrix* data, virtual bool UpdatePredictionCache(const DMatrix* data,
std::vector<bst_float>* out_preds) { HostDeviceVector<bst_float>* out_preds) {
return false; return false;
} }
/*! /*!
* \brief Create a tree updater given name * \brief Create a tree updater given name
* \param name Name of the tree updater. * \param name Name of the tree updater.

View File

@@ -20,3 +20,11 @@ You can find more about XGBoost on [Documentation](https://xgboost.readthedocs.o
Full code examples for Scala, Java, Apache Spark, and Apache Flink can Full code examples for Scala, Java, Apache Spark, and Apache Flink can
be found in the [examples package](https://github.com/dmlc/xgboost/tree/master/jvm-packages/xgboost4j-example). be found in the [examples package](https://github.com/dmlc/xgboost/tree/master/jvm-packages/xgboost4j-example).
**NOTE on LIBSVM Format**:
* Use *1-based* ascending indexes for the LIBSVM format in distributed training mode
* Spark does the internal conversion, and does not accept formats that are 0-based
* Whereas, use *0-based* indexes format when predicting in normal mode - for instance, while using the saved model in the Python package

View File

@@ -0,0 +1,44 @@
#!/bin/sh
#!/bin/bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# (Yizhi) This is mainly inspired by the script in apache/spark.
# I did some modificaiton to get it with our project.
# (Nan) Modified from MxNet
set -e
if [[ ($# -ne 2) || ( $1 == "--help") || $1 == "-h" ]]; then
echo "Usage: $(basename $0) [-h|--help] <from_version> <to_version>" 1>&2
exit 1
fi
FROM_VERSION=$1
TO_VERSION=$2
sed_i() {
perl -p -000 -e "$1" "$2" > "$2.tmp" && mv "$2.tmp" "$2"
}
export -f sed_i
BASEDIR=$(dirname $0)/..
find "$BASEDIR" -name 'pom.xml' -not -path '*target*' -print \
-exec bash -c \
"sed_i 's/(<artifactId>(xgboost-jvm|xgboost4j.*)<\/artifactId>\s+<version)>'$FROM_VERSION'(<\/version>)/\1>'$TO_VERSION'\3/g' {}" \;

View File

@@ -6,16 +6,15 @@
<groupId>ml.dmlc</groupId> <groupId>ml.dmlc</groupId>
<artifactId>xgboost-jvm</artifactId> <artifactId>xgboost-jvm</artifactId>
<version>0.7</version> <version>0.8-SNAPSHOT</version>
<packaging>pom</packaging> <packaging>pom</packaging>
<properties> <properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding> <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
<maven.compiler.source>1.7</maven.compiler.source> <maven.compiler.source>1.7</maven.compiler.source>
<maven.compiler.target>1.7</maven.compiler.target> <maven.compiler.target>1.7</maven.compiler.target>
<maven.version>3.3.9</maven.version>
<flink.version>0.10.2</flink.version> <flink.version>0.10.2</flink.version>
<spark.version>2.1.0</spark.version> <spark.version>2.2.1</spark.version>
<scala.version>2.11.8</scala.version> <scala.version>2.11.8</scala.version>
<scala.binary.version>2.11</scala.binary.version> <scala.binary.version>2.11</scala.binary.version>
</properties> </properties>
@@ -32,20 +31,6 @@
<module>xgboost4j-spark</module> <module>xgboost4j-spark</module>
<module>xgboost4j-flink</module> <module>xgboost4j-flink</module>
</modules> </modules>
<profiles>
<profile>
<id>spark-2.x</id>
<activation>
<activeByDefault>true</activeByDefault>
</activation>
<!--<properties>-->
<!--<flink.version>0.10.2</flink.version> -->
<!--<spark.version>2.0.1</spark.version>-->
<!--<scala.version>2.11.8</scala.version>-->
<!--<scala.binary.version>2.11</scala.binary.version>-->
<!--</properties>-->
</profile>
</profiles>
<build> <build>
<plugins> <plugins>
<plugin> <plugin>

View File

@@ -6,10 +6,10 @@
<parent> <parent>
<groupId>ml.dmlc</groupId> <groupId>ml.dmlc</groupId>
<artifactId>xgboost-jvm</artifactId> <artifactId>xgboost-jvm</artifactId>
<version>0.7</version> <version>0.8-SNAPSHOT</version>
</parent> </parent>
<artifactId>xgboost4j-example</artifactId> <artifactId>xgboost4j-example</artifactId>
<version>0.7</version> <version>0.8-SNAPSHOT</version>
<packaging>jar</packaging> <packaging>jar</packaging>
<build> <build>
<plugins> <plugins>
@@ -26,7 +26,7 @@
<dependency> <dependency>
<groupId>ml.dmlc</groupId> <groupId>ml.dmlc</groupId>
<artifactId>xgboost4j-spark</artifactId> <artifactId>xgboost4j-spark</artifactId>
<version>0.7</version> <version>0.8-SNAPSHOT</version>
</dependency> </dependency>
<dependency> <dependency>
<groupId>org.apache.spark</groupId> <groupId>org.apache.spark</groupId>
@@ -37,7 +37,7 @@
<dependency> <dependency>
<groupId>ml.dmlc</groupId> <groupId>ml.dmlc</groupId>
<artifactId>xgboost4j-flink</artifactId> <artifactId>xgboost4j-flink</artifactId>
<version>0.7</version> <version>0.8-SNAPSHOT</version>
</dependency> </dependency>
<dependency> <dependency>
<groupId>org.apache.commons</groupId> <groupId>org.apache.commons</groupId>

View File

@@ -6,10 +6,10 @@
<parent> <parent>
<groupId>ml.dmlc</groupId> <groupId>ml.dmlc</groupId>
<artifactId>xgboost-jvm</artifactId> <artifactId>xgboost-jvm</artifactId>
<version>0.7</version> <version>0.8-SNAPSHOT</version>
</parent> </parent>
<artifactId>xgboost4j-flink</artifactId> <artifactId>xgboost4j-flink</artifactId>
<version>0.7</version> <version>0.8-SNAPSHOT</version>
<build> <build>
<plugins> <plugins>
<plugin> <plugin>
@@ -26,7 +26,7 @@
<dependency> <dependency>
<groupId>ml.dmlc</groupId> <groupId>ml.dmlc</groupId>
<artifactId>xgboost4j</artifactId> <artifactId>xgboost4j</artifactId>
<version>0.7</version> <version>0.8-SNAPSHOT</version>
</dependency> </dependency>
<dependency> <dependency>
<groupId>org.apache.commons</groupId> <groupId>org.apache.commons</groupId>

View File

@@ -6,7 +6,7 @@
<parent> <parent>
<groupId>ml.dmlc</groupId> <groupId>ml.dmlc</groupId>
<artifactId>xgboost-jvm</artifactId> <artifactId>xgboost-jvm</artifactId>
<version>0.7</version> <version>0.8-SNAPSHOT</version>
</parent> </parent>
<artifactId>xgboost4j-spark</artifactId> <artifactId>xgboost4j-spark</artifactId>
<build> <build>
@@ -24,7 +24,19 @@
<dependency> <dependency>
<groupId>ml.dmlc</groupId> <groupId>ml.dmlc</groupId>
<artifactId>xgboost4j</artifactId> <artifactId>xgboost4j</artifactId>
<version>0.7</version> <version>0.8-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency> </dependency>
<dependency> <dependency>
<groupId>org.apache.spark</groupId> <groupId>org.apache.spark</groupId>

View File

@@ -22,12 +22,16 @@ import org.apache.hadoop.fs.{FileSystem, Path}
import org.apache.spark.SparkContext import org.apache.spark.SparkContext
/** /**
* A class which allows user to save checkpoint boosters every a few rounds. If a previous job * A class which allows user to save checkpoints every a few rounds. If a previous job fails,
* fails, the job can restart training from a saved booster instead of from scratch. This class * the job can restart training from a saved checkpoints instead of from scratch. This class
* provides interface and helper methods for the checkpoint functionality. * provides interface and helper methods for the checkpoint functionality.
* *
* NOTE: This checkpoint is different from Rabit checkpoint. Rabit checkpoint is a native-level
* checkpoint stored in executor memory. This is a checkpoint which Spark driver store on HDFS
* for every a few iterations.
*
* @param sc the sparkContext object * @param sc the sparkContext object
* @param checkpointPath the hdfs path to store checkpoint boosters * @param checkpointPath the hdfs path to store checkpoints
*/ */
private[spark] class CheckpointManager(sc: SparkContext, checkpointPath: String) { private[spark] class CheckpointManager(sc: SparkContext, checkpointPath: String) {
private val logger = LogFactory.getLog("XGBoostSpark") private val logger = LogFactory.getLog("XGBoostSpark")
@@ -49,11 +53,11 @@ private[spark] class CheckpointManager(sc: SparkContext, checkpointPath: String)
} }
/** /**
* Load existing checkpoint with the highest version. * Load existing checkpoint with the highest version as a Booster object
* *
* @return the booster with the highest version, null if no checkpoints available. * @return the booster with the highest version, null if no checkpoints available.
*/ */
private[spark] def loadBooster: Booster = { private[spark] def loadCheckpointAsBooster: Booster = {
val versions = getExistingVersions val versions = getExistingVersions
if (versions.nonEmpty) { if (versions.nonEmpty) {
val version = versions.max val version = versions.max
@@ -68,16 +72,16 @@ private[spark] class CheckpointManager(sc: SparkContext, checkpointPath: String)
} }
/** /**
* Clean up all previous models and save a new model * Clean up all previous checkpoints and save a new checkpoint
* *
* @param model the xgboost model to save * @param checkpoint the checkpoint to save as an XGBoostModel
*/ */
private[spark] def updateModel(model: XGBoostModel): Unit = { private[spark] def updateCheckpoint(checkpoint: XGBoostModel): Unit = {
val fs = FileSystem.get(sc.hadoopConfiguration) val fs = FileSystem.get(sc.hadoopConfiguration)
val prevModelPaths = getExistingVersions.map(version => new Path(getPath(version))) val prevModelPaths = getExistingVersions.map(version => new Path(getPath(version)))
val fullPath = getPath(model.version) val fullPath = getPath(checkpoint.version)
logger.info(s"Saving checkpoint model with version ${model.version} to $fullPath") logger.info(s"Saving checkpoint model with version ${checkpoint.version} to $fullPath")
model.saveModelAsHadoopFile(fullPath)(sc) checkpoint.saveModelAsHadoopFile(fullPath)(sc)
prevModelPaths.foreach(path => fs.delete(path, true)) prevModelPaths.foreach(path => fs.delete(path, true))
} }
@@ -95,22 +99,22 @@ private[spark] class CheckpointManager(sc: SparkContext, checkpointPath: String)
} }
/** /**
* Calculate a list of checkpoint rounds to save checkpoints based on the savingFreq and * Calculate a list of checkpoint rounds to save checkpoints based on the checkpointInterval
* total number of rounds for the training. Concretely, the saving rounds start with * and total number of rounds for the training. Concretely, the checkpoint rounds start with
* prevRounds + savingFreq, and increase by savingFreq in each step until it reaches total * prevRounds + checkpointInterval, and increase by checkpointInterval in each step until it
* number of rounds. If savingFreq is 0, the checkpoint will be disabled and the method * reaches total number of rounds. If checkpointInterval is 0, the checkpoint will be disabled
* returns Seq(round) * and the method returns Seq(round)
* *
* @param savingFreq the increase on rounds during each step of training * @param checkpointInterval Period (in iterations) between checkpoints.
* @param round the total number of rounds for the training * @param round the total number of rounds for the training
* @return a seq of integers, each represent the index of round to save the checkpoints * @return a seq of integers, each represent the index of round to save the checkpoints
*/ */
private[spark] def getSavingRounds(savingFreq: Int, round: Int): Seq[Int] = { private[spark] def getCheckpointRounds(checkpointInterval: Int, round: Int): Seq[Int] = {
if (checkpointPath.nonEmpty && savingFreq > 0) { if (checkpointPath.nonEmpty && checkpointInterval > 0) {
val prevRounds = getExistingVersions.map(_ / 2) val prevRounds = getExistingVersions.map(_ / 2)
val firstSavingRound = (0 +: prevRounds).max + savingFreq val firstCheckpointRound = (0 +: prevRounds).max + checkpointInterval
(firstSavingRound until round by savingFreq) :+ round (firstCheckpointRound until round by checkpointInterval) :+ round
} else if (savingFreq <= 0) { } else if (checkpointInterval <= 0) {
Seq(round) Seq(round)
} else { } else {
throw new IllegalArgumentException("parameters \"checkpoint_path\" should also be set.") throw new IllegalArgumentException("parameters \"checkpoint_path\" should also be set.")
@@ -128,12 +132,12 @@ object CheckpointManager {
" an instance of String.") " an instance of String.")
} }
val savingFreq: Int = params.get("saving_frequency") match { val checkpointInterval: Int = params.get("checkpoint_interval") match {
case None => 0 case None => 0
case Some(freq: Int) => freq case Some(freq: Int) => freq
case _ => throw new IllegalArgumentException("parameter \"saving_frequency\" must be" + case _ => throw new IllegalArgumentException("parameter \"checkpoint_interval\" must be" +
" an instance of Int.") " an instance of Int.")
} }
(checkpointPath, savingFreq) (checkpointPath, checkpointInterval)
} }
} }

View File

@@ -17,6 +17,7 @@
package ml.dmlc.xgboost4j.scala.spark package ml.dmlc.xgboost4j.scala.spark
import java.io.File import java.io.File
import java.nio.file.Files
import scala.collection.mutable import scala.collection.mutable
import scala.util.Random import scala.util.Random
@@ -24,6 +25,7 @@ import ml.dmlc.xgboost4j.java.{IRabitTracker, Rabit, XGBoostError, RabitTracker
import ml.dmlc.xgboost4j.scala.rabit.RabitTracker import ml.dmlc.xgboost4j.scala.rabit.RabitTracker
import ml.dmlc.xgboost4j.scala.{XGBoost => SXGBoost, _} import ml.dmlc.xgboost4j.scala.{XGBoost => SXGBoost, _}
import ml.dmlc.xgboost4j.{LabeledPoint => XGBLabeledPoint} import ml.dmlc.xgboost4j.{LabeledPoint => XGBLabeledPoint}
import org.apache.commons.io.FileUtils
import org.apache.commons.logging.LogFactory import org.apache.commons.logging.LogFactory
import org.apache.hadoop.fs.{FSDataInputStream, Path} import org.apache.hadoop.fs.{FSDataInputStream, Path}
import org.apache.spark.rdd.RDD import org.apache.spark.rdd.RDD
@@ -120,11 +122,8 @@ object XGBoost extends Serializable {
} }
val taskId = TaskContext.getPartitionId().toString val taskId = TaskContext.getPartitionId().toString
val cacheDirName = if (useExternalMemory) { val cacheDirName = if (useExternalMemory) {
val dir = new File(s"${TaskContext.get().stageId()}-cache-$taskId") val dir = Files.createTempDirectory(s"${TaskContext.get().stageId()}-cache-$taskId")
if (!(dir.exists() || dir.mkdirs())) { Some(dir.toAbsolutePath.toString)
throw new XGBoostError(s"failed to create cache directory: $dir")
}
Some(dir.toString)
} else { } else {
None None
} }
@@ -325,23 +324,24 @@ object XGBoost extends Serializable {
case _ => throw new IllegalArgumentException("parameter \"timeout_request_workers\" must be" + case _ => throw new IllegalArgumentException("parameter \"timeout_request_workers\" must be" +
" an instance of Long.") " an instance of Long.")
} }
val (checkpointPath, savingFeq) = CheckpointManager.extractParams(params) val (checkpointPath, checkpointInterval) = CheckpointManager.extractParams(params)
val partitionedData = repartitionForTraining(trainingData, nWorkers) val partitionedData = repartitionForTraining(trainingData, nWorkers)
val sc = trainingData.sparkContext val sc = trainingData.sparkContext
val checkpointManager = new CheckpointManager(sc, checkpointPath) val checkpointManager = new CheckpointManager(sc, checkpointPath)
checkpointManager.cleanUpHigherVersions(round) checkpointManager.cleanUpHigherVersions(round)
var prevBooster = checkpointManager.loadBooster var prevBooster = checkpointManager.loadCheckpointAsBooster
// Train for every ${savingRound} rounds and save the partially completed booster // Train for every ${savingRound} rounds and save the partially completed booster
checkpointManager.getSavingRounds(savingFeq, round).map { checkpointManager.getCheckpointRounds(checkpointInterval, round).map {
savingRound: Int => checkpointRound: Int =>
val tracker = startTracker(nWorkers, trackerConf) val tracker = startTracker(nWorkers, trackerConf)
try { try {
val parallelismTracker = new SparkParallelismTracker(sc, timeoutRequestWorkers, nWorkers) val parallelismTracker = new SparkParallelismTracker(sc, timeoutRequestWorkers, nWorkers)
val overriddenParams = overrideParamsAccordingToTaskCPUs(params, sc) val overriddenParams = overrideParamsAccordingToTaskCPUs(params, sc)
val boostersAndMetrics = buildDistributedBoosters(partitionedData, overriddenParams, val boostersAndMetrics = buildDistributedBoosters(partitionedData, overriddenParams,
tracker.getWorkerEnvs, savingRound, obj, eval, useExternalMemory, missing, prevBooster) tracker.getWorkerEnvs, checkpointRound, obj, eval, useExternalMemory, missing,
prevBooster)
val sparkJobThread = new Thread() { val sparkJobThread = new Thread() {
override def run() { override def run() {
// force the job // force the job
@@ -359,9 +359,9 @@ object XGBoost extends Serializable {
model.asInstanceOf[XGBoostClassificationModel].numOfClasses = model.asInstanceOf[XGBoostClassificationModel].numOfClasses =
params.getOrElse("num_class", "2").toString.toInt params.getOrElse("num_class", "2").toString.toInt
} }
if (savingRound < round) { if (checkpointRound < round) {
prevBooster = model.booster prevBooster = model.booster
checkpointManager.updateModel(model) checkpointManager.updateCheckpoint(model)
} }
model model
} finally { } finally {
@@ -480,11 +480,7 @@ private class Watches private(
def delete(): Unit = { def delete(): Unit = {
toMap.values.foreach(_.delete()) toMap.values.foreach(_.delete())
cacheDirName.foreach { name => cacheDirName.foreach { name =>
for (cacheFile <- new File(name).listFiles()) { FileUtils.deleteDirectory(new File(name))
if (!cacheFile.delete()) {
throw new IllegalStateException(s"failed to delete $cacheFile")
}
}
} }
} }

View File

@@ -169,12 +169,12 @@ abstract class XGBoostModel(protected var _booster: Booster)
def predict(testSet: RDD[MLDenseVector], missingValue: Float): RDD[Array[Float]] = { def predict(testSet: RDD[MLDenseVector], missingValue: Float): RDD[Array[Float]] = {
val broadcastBooster = testSet.sparkContext.broadcast(_booster) val broadcastBooster = testSet.sparkContext.broadcast(_booster)
testSet.mapPartitions { testSamples => testSet.mapPartitions { testSamples =>
val sampleArray = testSamples.toList val sampleArray = testSamples.toArray
val numRows = sampleArray.size val numRows = sampleArray.length
val numColumns = sampleArray.head.size
if (numRows == 0) { if (numRows == 0) {
Iterator() Iterator()
} else { } else {
val numColumns = sampleArray.head.size
val rabitEnv = Map("DMLC_TASK_ID" -> TaskContext.getPartitionId().toString) val rabitEnv = Map("DMLC_TASK_ID" -> TaskContext.getPartitionId().toString)
Rabit.init(rabitEnv.asJava) Rabit.init(rabitEnv.asJava)
// translate to required format // translate to required format

View File

@@ -71,7 +71,7 @@ trait GeneralParams extends Params {
val missing = new FloatParam(this, "missing", "the value treated as missing") val missing = new FloatParam(this, "missing", "the value treated as missing")
/** /**
* the interval to check whether total numCores is no smaller than nWorkers. default: 30 minutes * the maximum time to wait for the job requesting new workers. default: 30 minutes
*/ */
val timeoutRequestWorkers = new LongParam(this, "timeout_request_workers", "the maximum time to" + val timeoutRequestWorkers = new LongParam(this, "timeout_request_workers", "the maximum time to" +
" request new Workers if numCores are insufficient. The timeout will be disabled if this" + " request new Workers if numCores are insufficient. The timeout will be disabled if this" +
@@ -81,16 +81,19 @@ trait GeneralParams extends Params {
* The hdfs folder to load and save checkpoint boosters. default: `empty_string` * The hdfs folder to load and save checkpoint boosters. default: `empty_string`
*/ */
val checkpointPath = new Param[String](this, "checkpoint_path", "the hdfs folder to load and " + val checkpointPath = new Param[String](this, "checkpoint_path", "the hdfs folder to load and " +
"save checkpoints. The job will try to load the existing booster as the starting point for " + "save checkpoints. If there are existing checkpoints in checkpoint_path. The job will load " +
"training. If saving_frequency is also set, the job will save a checkpoint every a few rounds.") "the checkpoint with highest version as the starting point for training. If " +
"checkpoint_interval is also set, the job will save a checkpoint every a few rounds.")
/** /**
* The frequency to save checkpoint boosters. default: 0 * Param for set checkpoint interval (&gt;= 1) or disable checkpoint (-1). E.g. 10 means that
* the trained model will get checkpointed every 10 iterations. Note: `checkpoint_path` must
* also be set if the checkpoint interval is greater than 0.
*/ */
val savingFrequency = new IntParam(this, "saving_frequency", "if checkpoint_path is also set," + val checkpointInterval: IntParam = new IntParam(this, "checkpointInterval", "set checkpoint " +
" the job will save checkpoints at this frequency. If the job fails and gets restarted with" + "interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the trained model will get " +
" same setting, it will load the existing booster instead of training from scratch." + "checkpointed every 10 iterations. Note: `checkpoint_path` must also be set if the checkpoint" +
" Checkpoint will be disabled if set to 0.") " interval is greater than 0.", (interval: Int) => interval == -1 || interval >= 1)
/** /**
* Rabit tracker configurations. The parameter must be provided as an instance of the * Rabit tracker configurations. The parameter must be provided as an instance of the
@@ -128,6 +131,6 @@ trait GeneralParams extends Params {
useExternalMemory -> false, silent -> 0, useExternalMemory -> false, silent -> 0,
customObj -> null, customEval -> null, missing -> Float.NaN, customObj -> null, customEval -> null, missing -> Float.NaN,
trackerConf -> TrackerConf(), seed -> 0, timeoutRequestWorkers -> 30 * 60 * 1000L, trackerConf -> TrackerConf(), seed -> 0, timeoutRequestWorkers -> 30 * 60 * 1000L,
checkpointPath -> "", savingFrequency -> 0 checkpointPath -> "", checkpointInterval -> -1
) )
} }

View File

@@ -45,7 +45,8 @@ trait LearningTaskParams extends Params {
/** /**
* evaluation metrics for validation data, a default metric will be assigned according to * evaluation metrics for validation data, a default metric will be assigned according to
* objective(rmse for regression, and error for classification, mean average precision for * objective(rmse for regression, and error for classification, mean average precision for
* ranking). options: rmse, mae, logloss, error, merror, mlogloss, auc, ndcg, map, gamma-deviance * ranking). options: rmse, mae, logloss, error, merror, mlogloss, auc, aucpr, ndcg, map,
* gamma-deviance
*/ */
val evalMetric = new Param[String](this, "eval_metric", "evaluation metrics for validation" + val evalMetric = new Param[String](this, "eval_metric", "evaluation metrics for validation" +
" data, a default metric will be assigned according to objective (rmse for regression, and" + " data, a default metric will be assigned according to objective (rmse for regression, and" +
@@ -97,5 +98,5 @@ private[spark] object LearningTaskParams {
"reg:gamma") "reg:gamma")
val supportedEvalMetrics = HashSet("rmse", "mae", "logloss", "error", "merror", "mlogloss", val supportedEvalMetrics = HashSet("rmse", "mae", "logloss", "error", "merror", "mlogloss",
"auc", "ndcg", "map", "gamma-deviance") "auc", "aucpr", "ndcg", "map", "gamma-deviance")
} }

View File

@@ -76,11 +76,12 @@ class SparkParallelismTracker(
} }
private[this] def safeExecute[T](body: => T): T = { private[this] def safeExecute[T](body: => T): T = {
sc.listenerBus.listeners.add(0, new TaskFailedListener) val listener = new TaskFailedListener;
sc.addSparkListener(listener)
try { try {
body body
} finally { } finally {
sc.listenerBus.listeners.remove(0) sc.listenerBus.removeListener(listener)
} }
} }

View File

@@ -45,23 +45,23 @@ class CheckpointManagerSuite extends FunSuite with BeforeAndAfterAll {
test("test update/load models") { test("test update/load models") {
val tmpPath = Files.createTempDirectory("test").toAbsolutePath.toString val tmpPath = Files.createTempDirectory("test").toAbsolutePath.toString
val manager = new CheckpointManager(sc, tmpPath) val manager = new CheckpointManager(sc, tmpPath)
manager.updateModel(model4) manager.updateCheckpoint(model4)
var files = FileSystem.get(sc.hadoopConfiguration).listStatus(new Path(tmpPath)) var files = FileSystem.get(sc.hadoopConfiguration).listStatus(new Path(tmpPath))
assert(files.length == 1) assert(files.length == 1)
assert(files.head.getPath.getName == "4.model") assert(files.head.getPath.getName == "4.model")
assert(manager.loadBooster.booster.getVersion == 4) assert(manager.loadCheckpointAsBooster.booster.getVersion == 4)
manager.updateModel(model8) manager.updateCheckpoint(model8)
files = FileSystem.get(sc.hadoopConfiguration).listStatus(new Path(tmpPath)) files = FileSystem.get(sc.hadoopConfiguration).listStatus(new Path(tmpPath))
assert(files.length == 1) assert(files.length == 1)
assert(files.head.getPath.getName == "8.model") assert(files.head.getPath.getName == "8.model")
assert(manager.loadBooster.booster.getVersion == 8) assert(manager.loadCheckpointAsBooster.booster.getVersion == 8)
} }
test("test cleanUpHigherVersions") { test("test cleanUpHigherVersions") {
val tmpPath = Files.createTempDirectory("test").toAbsolutePath.toString val tmpPath = Files.createTempDirectory("test").toAbsolutePath.toString
val manager = new CheckpointManager(sc, tmpPath) val manager = new CheckpointManager(sc, tmpPath)
manager.updateModel(model8) manager.updateCheckpoint(model8)
manager.cleanUpHigherVersions(round = 8) manager.cleanUpHigherVersions(round = 8)
assert(new File(s"$tmpPath/8.model").exists()) assert(new File(s"$tmpPath/8.model").exists())
@@ -69,12 +69,12 @@ class CheckpointManagerSuite extends FunSuite with BeforeAndAfterAll {
assert(!new File(s"$tmpPath/8.model").exists()) assert(!new File(s"$tmpPath/8.model").exists())
} }
test("test saving rounds") { test("test checkpoint rounds") {
val tmpPath = Files.createTempDirectory("test").toAbsolutePath.toString val tmpPath = Files.createTempDirectory("test").toAbsolutePath.toString
val manager = new CheckpointManager(sc, tmpPath) val manager = new CheckpointManager(sc, tmpPath)
assertResult(Seq(7))(manager.getSavingRounds(savingFreq = 0, round = 7)) assertResult(Seq(7))(manager.getCheckpointRounds(checkpointInterval = 0, round = 7))
assertResult(Seq(2, 4, 6, 7))(manager.getSavingRounds(savingFreq = 2, round = 7)) assertResult(Seq(2, 4, 6, 7))(manager.getCheckpointRounds(checkpointInterval = 2, round = 7))
manager.updateModel(model4) manager.updateCheckpoint(model4)
assertResult(Seq(4, 6, 7))(manager.getSavingRounds(2, 7)) assertResult(Seq(4, 6, 7))(manager.getCheckpointRounds(2, 7))
} }
} }

View File

@@ -338,7 +338,7 @@ class XGBoostGeneralSuite extends FunSuite with PerTest {
} }
} }
test("training with saving checkpoint boosters") { test("training with checkpoint boosters") {
import DataUtils._ import DataUtils._
val eval = new EvalError() val eval = new EvalError()
val trainingRDD = sc.parallelize(Classification.train).map(_.asML) val trainingRDD = sc.parallelize(Classification.train).map(_.asML)
@@ -347,7 +347,7 @@ class XGBoostGeneralSuite extends FunSuite with PerTest {
val tmpPath = Files.createTempDirectory("model1").toAbsolutePath.toString val tmpPath = Files.createTempDirectory("model1").toAbsolutePath.toString
val paramMap = List("eta" -> "1", "max_depth" -> 2, "silent" -> "1", val paramMap = List("eta" -> "1", "max_depth" -> 2, "silent" -> "1",
"objective" -> "binary:logistic", "checkpoint_path" -> tmpPath, "objective" -> "binary:logistic", "checkpoint_path" -> tmpPath,
"saving_frequency" -> 2).toMap "checkpoint_interval" -> 2).toMap
val prevModel = XGBoost.trainWithRDD(trainingRDD, paramMap, round = 5, val prevModel = XGBoost.trainWithRDD(trainingRDD, paramMap, round = 5,
nWorkers = numWorkers) nWorkers = numWorkers)
def error(model: XGBoostModel): Float = eval.eval( def error(model: XGBoostModel): Float = eval.eval(

View File

@@ -6,10 +6,10 @@
<parent> <parent>
<groupId>ml.dmlc</groupId> <groupId>ml.dmlc</groupId>
<artifactId>xgboost-jvm</artifactId> <artifactId>xgboost-jvm</artifactId>
<version>0.7</version> <version>0.8-SNAPSHOT</version>
</parent> </parent>
<artifactId>xgboost4j</artifactId> <artifactId>xgboost4j</artifactId>
<version>0.7</version> <version>0.8-SNAPSHOT</version>
<packaging>jar</packaging> <packaging>jar</packaging>
<dependencies> <dependencies>

View File

@@ -16,8 +16,6 @@
package ml.dmlc.xgboost4j.scala package ml.dmlc.xgboost4j.scala
import java.io.IOException
import com.esotericsoftware.kryo.io.{Output, Input} import com.esotericsoftware.kryo.io.{Output, Input}
import com.esotericsoftware.kryo.{Kryo, KryoSerializable} import com.esotericsoftware.kryo.{Kryo, KryoSerializable}
import ml.dmlc.xgboost4j.java.{Booster => JBooster} import ml.dmlc.xgboost4j.java.{Booster => JBooster}
@@ -25,6 +23,12 @@ import ml.dmlc.xgboost4j.java.XGBoostError
import scala.collection.JavaConverters._ import scala.collection.JavaConverters._
import scala.collection.mutable import scala.collection.mutable
/**
* Booster for xgboost, this is a model API that support interactive build of a XGBoost Model
*
* DEVELOPER WARNING: A Java Booster must not be shared by more than one Scala Booster
* @param booster the java booster object.
*/
class Booster private[xgboost4j](private[xgboost4j] var booster: JBooster) class Booster private[xgboost4j](private[xgboost4j] var booster: JBooster)
extends Serializable with KryoSerializable { extends Serializable with KryoSerializable {

View File

@@ -66,7 +66,12 @@ object XGBoost {
// we have to filter null value for customized obj and eval // we have to filter null value for customized obj and eval
params.filter(_._2 != null).mapValues(_.toString.asInstanceOf[AnyRef]).asJava, params.filter(_._2 != null).mapValues(_.toString.asInstanceOf[AnyRef]).asJava,
round, jWatches, metrics, obj, eval, earlyStoppingRound, jBooster) round, jWatches, metrics, obj, eval, earlyStoppingRound, jBooster)
if (booster == null) {
new Booster(xgboostInJava) new Booster(xgboostInJava)
} else {
// Avoid creating a new SBooster with the same JBooster
booster
}
} }
/** /**

View File

@@ -198,4 +198,16 @@ class ScalaBoosterImplSuite extends FunSuite {
trainBoosterWithFastHisto(trainMat, Map("training" -> trainMat), trainBoosterWithFastHisto(trainMat, Map("training" -> trainMat),
round = 10, paramMap, 0.85f) round = 10, paramMap, 0.85f)
} }
test("test training from existing model in scala") {
val trainMat = new DMatrix("../../demo/data/agaricus.txt.train")
val paramMap = List("max_depth" -> "0", "silent" -> "0",
"objective" -> "binary:logistic", "tree_method" -> "hist",
"grow_policy" -> "depthwise", "max_depth" -> "2", "max_bin" -> "2",
"eval_metric" -> "auc").toMap
val prevBooster = XGBoost.train(trainMat, paramMap, round = 2)
val nextBooster = XGBoost.train(trainMat, paramMap, round = 4, booster = prevBooster)
assert(prevBooster == nextBooster)
}
} }

View File

@@ -33,30 +33,32 @@ class MyLogistic : public ObjFunction {
void Configure(const std::vector<std::pair<std::string, std::string> >& args) override { void Configure(const std::vector<std::pair<std::string, std::string> >& args) override {
param_.InitAllowUnknown(args); param_.InitAllowUnknown(args);
} }
void GetGradient(const std::vector<bst_float> &preds, void GetGradient(HostDeviceVector<bst_float> *preds,
const MetaInfo &info, const MetaInfo &info,
int iter, int iter,
std::vector<bst_gpair> *out_gpair) override { HostDeviceVector<bst_gpair> *out_gpair) override {
out_gpair->resize(preds.size()); out_gpair->resize(preds->size());
for (size_t i = 0; i < preds.size(); ++i) { std::vector<bst_float>& preds_h = preds->data_h();
std::vector<bst_gpair>& out_gpair_h = out_gpair->data_h();
for (size_t i = 0; i < preds_h.size(); ++i) {
bst_float w = info.GetWeight(i); bst_float w = info.GetWeight(i);
// scale the negative examples! // scale the negative examples!
if (info.labels[i] == 0.0f) w *= param_.scale_neg_weight; if (info.labels[i] == 0.0f) w *= param_.scale_neg_weight;
// logistic transformation // logistic transformation
bst_float p = 1.0f / (1.0f + std::exp(-preds[i])); bst_float p = 1.0f / (1.0f + std::exp(-preds_h[i]));
// this is the gradient // this is the gradient
bst_float grad = (p - info.labels[i]) * w; bst_float grad = (p - info.labels[i]) * w;
// this is the second order gradient // this is the second order gradient
bst_float hess = p * (1.0f - p) * w; bst_float hess = p * (1.0f - p) * w;
out_gpair->at(i) = bst_gpair(grad, hess); out_gpair_h.at(i) = bst_gpair(grad, hess);
} }
} }
const char* DefaultEvalMetric() const override { const char* DefaultEvalMetric() const override {
return "error"; return "error";
} }
void PredTransform(std::vector<bst_float> *io_preds) override { void PredTransform(HostDeviceVector<bst_float> *io_preds) override {
// transform margin value to probability. // transform margin value to probability.
std::vector<bst_float> &preds = *io_preds; std::vector<bst_float> &preds = io_preds->data_h();
for (size_t i = 0; i < preds.size(); ++i) { for (size_t i = 0; i < preds.size(); ++i) {
preds[i] = 1.0f / (1.0f + std::exp(-preds[i])); preds[i] = 1.0f / (1.0f + std::exp(-preds[i]));
} }

View File

@@ -26,6 +26,11 @@ Please install ``gcc@5`` from `Homebrew <https://brew.sh/>`_::
brew install gcc@5 brew install gcc@5
After installing ``gcc@5``, set it as your compiler::
export CC = gcc-5
export CXX = g++-5
Linux Linux
----- -----

View File

@@ -10,7 +10,7 @@ Linux platform (also Mac OS X in general)
------------ ------------
**Trouble 0**: I see error messages like this when install from github using `python setup.py install`. **Trouble 0**: I see error messages like this when install from github using `python setup.py install`.
XGBoostLibraryNotFound: Cannot find XGBoost Libarary in the candidate path, did you install compilers and run build.sh in root path? XGBoostLibraryNotFound: Cannot find XGBoost Library in the candidate path, did you install compilers and run build.sh in root path?
List of candidates: List of candidates:
/home/dmlc/anaconda/lib/python2.7/site-packages/xgboost-0.4-py2.7.egg/xgboost/libxgboostwrapper.so /home/dmlc/anaconda/lib/python2.7/site-packages/xgboost-0.4-py2.7.egg/xgboost/libxgboostwrapper.so
/home/dmlc/anaconda/lib/python2.7/site-packages/xgboost-0.4-py2.7.egg/xgboost/../../wrapper/libxgboostwrapper.so /home/dmlc/anaconda/lib/python2.7/site-packages/xgboost-0.4-py2.7.egg/xgboost/../../wrapper/libxgboostwrapper.so

View File

@@ -12,6 +12,9 @@ sys.path.insert(0, '.')
# please don't use this file for installing from github # please don't use this file for installing from github
if os.name != 'nt': # if not windows, compile and install if os.name != 'nt': # if not windows, compile and install
# if not windows, compile and install
if len(sys.argv) < 2 or sys.argv[1] != 'sdist':
# do not build for sdist
os.system('sh ./xgboost/build-python.sh') os.system('sh ./xgboost/build-python.sh')
else: else:
print('Windows users please use github installation.') print('Windows users please use github installation.')
@@ -30,16 +33,14 @@ class BinaryDistribution(Distribution):
# We can not import `xgboost.libpath` in setup.py directly since xgboost/__init__.py # We can not import `xgboost.libpath` in setup.py directly since xgboost/__init__.py
# import `xgboost.core` and finally will import `numpy` and `scipy` which are setup # import `xgboost.core` and finally will import `numpy` and `scipy` which are setup
# `install_requires`. That's why we're using `exec` here. # `install_requires`. That's why we're using `exec` here.
libpath_py = os.path.join(CURRENT_DIR, 'xgboost/libpath.py') # do not import libpath for sdist
libpath = {'__file__': libpath_py} if len(sys.argv) < 2 or sys.argv[1] != 'sdist':
exec(compile(open(libpath_py, "rb").read(), libpath_py, 'exec'), libpath, libpath) libpath_py = os.path.join(CURRENT_DIR, 'xgboost/libpath.py')
libpath = {'__file__': libpath_py}
exec(compile(open(libpath_py, "rb").read(), libpath_py, 'exec'), libpath, libpath)
LIB_PATH = libpath['find_lib_path']() LIB_PATH = libpath['find_lib_path']()
# to deploy to pip, please use
# make pythonpack
# python setup.py register sdist upload
# and be sure to test it firstly using "python setup.py register sdist upload -r pypitest"
setup(name='xgboost', setup(name='xgboost',
version=open(os.path.join(CURRENT_DIR, 'xgboost/VERSION')).read().strip(), version=open(os.path.join(CURRENT_DIR, 'xgboost/VERSION')).read().strip(),
description='XGBoost Python Package', description='XGBoost Python Package',

View File

@@ -1 +1 @@
0.7 0.71

View File

@@ -1,4 +1,4 @@
#!/bin/bash #!/bin/sh
# This is a simple script to make xgboost in MAC and Linux for python wrapper only # This is a simple script to make xgboost in MAC and Linux for python wrapper only
# Basically, it first try to make with OpenMP, if fails, disable OpenMP and make it again. # Basically, it first try to make with OpenMP, if fails, disable OpenMP and make it again.
# This will automatically make xgboost for MAC users who don't have OpenMP support. # This will automatically make xgboost for MAC users who don't have OpenMP support.
@@ -9,22 +9,44 @@
# note: this script is build for python package only, and it might have some filename # note: this script is build for python package only, and it might have some filename
# conflict with build.sh which is for everything. # conflict with build.sh which is for everything.
set -e
set -x
#pushd xgboost #pushd xgboost
oldpath=`pwd` oldpath=`pwd`
cd ./xgboost/ cd ./xgboost/
if echo "${OSTYPE}" | grep -q "darwin"; then
LIB_XGBOOST=libxgboost.dylib
# Use OpenMP-capable compiler if possible
if which g++-5; then
export CC=gcc-5
export CXX=g++-5
elif which g++-7; then
export CC=gcc-7
export CXX=g++-7
elif which clang++; then
export CC=clang
export CXX=clang++
fi
else
LIB_XGBOOST=libxgboost.so
fi
#remove the pre-compiled .so and trigger the system's on-the-fly compiling #remove the pre-compiled .so and trigger the system's on-the-fly compiling
make clean make clean
if make lib/libxgboost.so -j4; then if make lib/${LIB_XGBOOST} -j4; then
echo "Successfully build multi-thread xgboost" echo "Successfully build multi-thread xgboost"
else else
echo "-----------------------------" echo "-----------------------------"
echo "Building multi-thread xgboost failed" echo "Building multi-thread xgboost failed"
echo "Start to build single-thread xgboost" echo "Start to build single-thread xgboost"
make clean make clean
make lib/libxgboost.so -j4 USE_OPENMP=0 make lib/${LIB_XGBOOST} -j4 USE_OPENMP=0
echo "Successfully build single-thread xgboost" echo "Successfully build single-thread xgboost"
echo "If you want multi-threaded version" echo "If you want multi-threaded version"
echo "See additional instructions in doc/build.md" echo "See additional instructions in doc/build.md"
fi fi
cd $oldpath cd $oldpath
set +x

View File

@@ -50,7 +50,7 @@ def print_evaluation(period=1, show_stdv=True):
""" """
def callback(env): def callback(env):
"""internal function""" """internal function"""
if env.rank != 0 or len(env.evaluation_result_list) == 0 or period is False: if env.rank != 0 or len(env.evaluation_result_list) == 0 or period is False or period == 0:
return return
i = env.iteration i = env.iteration
if (i % period == 0 or i + 1 == env.begin_iteration or i + 1 == env.end_iteration): if (i % period == 0 or i + 1 == env.begin_iteration or i + 1 == env.end_iteration):

View File

@@ -235,8 +235,6 @@ class DMatrix(object):
feature_names=None, feature_types=None, feature_names=None, feature_types=None,
nthread=None): nthread=None):
""" """
Data matrix used in XGBoost.
Parameters Parameters
---------- ----------
data : string/numpy array/scipy.sparse/pd.DataFrame data : string/numpy array/scipy.sparse/pd.DataFrame
@@ -706,7 +704,7 @@ class DMatrix(object):
class Booster(object): class Booster(object):
""""A Booster of of XGBoost. """A Booster of of XGBoost.
Booster is the model of xgboost, that contains low level routines for Booster is the model of xgboost, that contains low level routines for
training, prediction and evaluation. training, prediction and evaluation.
@@ -716,8 +714,7 @@ class Booster(object):
def __init__(self, params=None, cache=(), model_file=None): def __init__(self, params=None, cache=(), model_file=None):
# pylint: disable=invalid-name # pylint: disable=invalid-name
"""Initialize the Booster. """
Parameters Parameters
---------- ----------
params : dict params : dict
@@ -992,7 +989,7 @@ class Booster(object):
return self.eval_set([(data, name)], iteration) return self.eval_set([(data, name)], iteration)
def predict(self, data, output_margin=False, ntree_limit=0, pred_leaf=False, def predict(self, data, output_margin=False, ntree_limit=0, pred_leaf=False,
pred_contribs=False, approx_contribs=False): pred_contribs=False, approx_contribs=False, pred_interactions=False):
""" """
Predict with data. Predict with data.
@@ -1019,14 +1016,21 @@ class Booster(object):
in both tree 1 and tree 0. in both tree 1 and tree 0.
pred_contribs : bool pred_contribs : bool
When this option is on, the output will be a matrix of (nsample, nfeats+1) When this is True the output will be a matrix of size (nsample, nfeats + 1)
with each record indicating the feature contributions (SHAP values) for that with each record indicating the feature contributions (SHAP values) for that
prediction. The sum of all feature contributions is equal to the prediction. prediction. The sum of all feature contributions is equal to the raw untransformed
Note that the bias is added as the final column, on top of the regular features. margin value of the prediction. Note the final column is the bias term.
approx_contribs : bool approx_contribs : bool
Approximate the contributions of each feature Approximate the contributions of each feature
pred_interactions : bool
When this is True the output will be a matrix of size (nsample, nfeats + 1, nfeats + 1)
indicating the SHAP interaction values for each pair of features. The sum of each
row (or column) of the interaction values equals the corresponding SHAP value (from
pred_contribs), and the sum of the entire matrix equals the raw untransformed margin
value of the prediction. Note the last row and column correspond to the bias term.
Returns Returns
------- -------
prediction : numpy array prediction : numpy array
@@ -1040,6 +1044,8 @@ class Booster(object):
option_mask |= 0x04 option_mask |= 0x04
if approx_contribs: if approx_contribs:
option_mask |= 0x08 option_mask |= 0x08
if pred_interactions:
option_mask |= 0x10
self._validate_features(data) self._validate_features(data)
@@ -1055,8 +1061,22 @@ class Booster(object):
preds = preds.astype(np.int32) preds = preds.astype(np.int32)
nrow = data.num_row() nrow = data.num_row()
if preds.size != nrow and preds.size % nrow == 0: if preds.size != nrow and preds.size % nrow == 0:
ncol = int(preds.size / nrow) chunk_size = int(preds.size / nrow)
preds = preds.reshape(nrow, ncol)
if pred_interactions:
ngroup = int(chunk_size / ((data.num_col() + 1) * (data.num_col() + 1)))
if ngroup == 1:
preds = preds.reshape(nrow, data.num_col() + 1, data.num_col() + 1)
else:
preds = preds.reshape(nrow, ngroup, data.num_col() + 1, data.num_col() + 1)
elif pred_contribs:
ngroup = int(chunk_size / (data.num_col() + 1))
if ngroup == 1:
preds = preds.reshape(nrow, data.num_col() + 1)
else:
preds = preds.reshape(nrow, ngroup, data.num_col() + 1)
else:
preds = preds.reshape(nrow, chunk_size)
return preds return preds
def save_model(self, fname): def save_model(self, fname):

View File

@@ -520,6 +520,24 @@ class XGBClassifier(XGBModel, XGBClassifierBase):
return self return self
def predict(self, data, output_margin=False, ntree_limit=0): def predict(self, data, output_margin=False, ntree_limit=0):
"""
Predict with `data`.
NOTE: This function is not thread safe.
For each booster object, predict can only be called from one thread.
If you want to run prediction using multiple thread, call xgb.copy() to make copies
of model object and then call predict
Parameters
----------
data : DMatrix
The dmatrix storing the input.
output_margin : bool
Whether to output the raw untransformed margin value.
ntree_limit : int
Limit number of trees in the prediction; defaults to 0 (use all trees).
Returns
-------
prediction : numpy array
"""
test_dmatrix = DMatrix(data, missing=self.missing, nthread=self.n_jobs) test_dmatrix = DMatrix(data, missing=self.missing, nthread=self.n_jobs)
class_probs = self.get_booster().predict(test_dmatrix, class_probs = self.get_booster().predict(test_dmatrix,
output_margin=output_margin, output_margin=output_margin,
@@ -532,6 +550,25 @@ class XGBClassifier(XGBModel, XGBClassifierBase):
return self._le.inverse_transform(column_indexes) return self._le.inverse_transform(column_indexes)
def predict_proba(self, data, output_margin=False, ntree_limit=0): def predict_proba(self, data, output_margin=False, ntree_limit=0):
"""
Predict the probability of each `data` example being of a given class.
NOTE: This function is not thread safe.
For each booster object, predict can only be called from one thread.
If you want to run prediction using multiple thread, call xgb.copy() to make copies
of model object and then call predict
Parameters
----------
data : DMatrix
The dmatrix storing the input.
output_margin : bool
Whether to output the raw untransformed margin value.
ntree_limit : int
Limit number of trees in the prediction; defaults to 0 (use all trees).
Returns
-------
prediction : numpy array
a numpy array with the probability of each data example being of a given class.
"""
test_dmatrix = DMatrix(data, missing=self.missing, nthread=self.n_jobs) test_dmatrix = DMatrix(data, missing=self.missing, nthread=self.n_jobs)
class_probs = self.get_booster().predict(test_dmatrix, class_probs = self.get_booster().predict(test_dmatrix,
output_margin=output_margin, output_margin=output_margin,

View File

@@ -191,9 +191,9 @@ struct XGBAPIThreadLocalEntry {
/*! \brief result holder for returning string pointers */ /*! \brief result holder for returning string pointers */
std::vector<const char *> ret_vec_charp; std::vector<const char *> ret_vec_charp;
/*! \brief returning float vector. */ /*! \brief returning float vector. */
std::vector<bst_float> ret_vec_float; HostDeviceVector<bst_float> ret_vec_float;
/*! \brief temp variable of gradient pairs. */ /*! \brief temp variable of gradient pairs. */
std::vector<bst_gpair> tmp_gpair; HostDeviceVector<bst_gpair> tmp_gpair;
}; };
// define the threadlocal store. // define the threadlocal store.
@@ -406,6 +406,7 @@ void prefixsum_inplace(size_t *x, size_t N) {
suma[0] = 0; suma[0] = 0;
} }
size_t sum = 0; size_t sum = 0;
size_t offset = 0;
#pragma omp for schedule(static) #pragma omp for schedule(static)
for (omp_ulong i = 0; i < N; i++) { for (omp_ulong i = 0; i < N; i++) {
sum += x[i]; sum += x[i];
@@ -413,7 +414,6 @@ void prefixsum_inplace(size_t *x, size_t N) {
} }
suma[ithread+1] = sum; suma[ithread+1] = sum;
#pragma omp barrier #pragma omp barrier
size_t offset = 0;
for (omp_ulong i = 0; i < static_cast<omp_ulong>(ithread+1); i++) { for (omp_ulong i = 0; i < static_cast<omp_ulong>(ithread+1); i++) {
offset += suma[i]; offset += suma[i];
} }
@@ -452,11 +452,8 @@ XGB_DLL int XGDMatrixCreateFromMat_omp(const bst_float* data,
// Check for errors in missing elements // Check for errors in missing elements
// Count elements per row (to avoid otherwise need to copy) // Count elements per row (to avoid otherwise need to copy)
bool nan_missing = common::CheckNAN(missing); bool nan_missing = common::CheckNAN(missing);
int *badnan; std::vector<int> badnan;
badnan = new int[nthread]; badnan.resize(nthread, 0);
for (int i = 0; i < nthread; i++) {
badnan[i] = 0;
}
#pragma omp parallel num_threads(nthread) #pragma omp parallel num_threads(nthread)
{ {
@@ -705,14 +702,15 @@ XGB_DLL int XGBoosterBoostOneIter(BoosterHandle handle,
bst_float *grad, bst_float *grad,
bst_float *hess, bst_float *hess,
xgboost::bst_ulong len) { xgboost::bst_ulong len) {
std::vector<bst_gpair>& tmp_gpair = XGBAPIThreadLocalStore::Get()->tmp_gpair; HostDeviceVector<bst_gpair>& tmp_gpair = XGBAPIThreadLocalStore::Get()->tmp_gpair;
API_BEGIN(); API_BEGIN();
Booster* bst = static_cast<Booster*>(handle); Booster* bst = static_cast<Booster*>(handle);
std::shared_ptr<DMatrix>* dtr = std::shared_ptr<DMatrix>* dtr =
static_cast<std::shared_ptr<DMatrix>*>(dtrain); static_cast<std::shared_ptr<DMatrix>*>(dtrain);
tmp_gpair.resize(len); tmp_gpair.resize(len);
std::vector<bst_gpair>& tmp_gpair_h = tmp_gpair.data_h();
for (xgboost::bst_ulong i = 0; i < len; ++i) { for (xgboost::bst_ulong i = 0; i < len; ++i) {
tmp_gpair[i] = bst_gpair(grad[i], hess[i]); tmp_gpair_h[i] = bst_gpair(grad[i], hess[i]);
} }
bst->LazyInit(); bst->LazyInit();
@@ -749,7 +747,8 @@ XGB_DLL int XGBoosterPredict(BoosterHandle handle,
unsigned ntree_limit, unsigned ntree_limit,
xgboost::bst_ulong *len, xgboost::bst_ulong *len,
const bst_float **out_result) { const bst_float **out_result) {
std::vector<bst_float>& preds = XGBAPIThreadLocalStore::Get()->ret_vec_float; HostDeviceVector<bst_float>& preds =
XGBAPIThreadLocalStore::Get()->ret_vec_float;
API_BEGIN(); API_BEGIN();
Booster *bst = static_cast<Booster*>(handle); Booster *bst = static_cast<Booster*>(handle);
bst->LazyInit(); bst->LazyInit();
@@ -759,8 +758,9 @@ XGB_DLL int XGBoosterPredict(BoosterHandle handle,
&preds, ntree_limit, &preds, ntree_limit,
(option_mask & 2) != 0, (option_mask & 2) != 0,
(option_mask & 4) != 0, (option_mask & 4) != 0,
(option_mask & 8) != 0); (option_mask & 8) != 0,
*out_result = dmlc::BeginPtr(preds); (option_mask & 16) != 0);
*out_result = dmlc::BeginPtr(preds.data_h());
*len = static_cast<xgboost::bst_ulong>(preds.size()); *len = static_cast<xgboost::bst_ulong>(preds.size());
API_END(); API_END();
} }

View File

@@ -324,7 +324,7 @@ void CLIPredict(const CLIParam& param) {
if (param.silent == 0) { if (param.silent == 0) {
LOG(CONSOLE) << "start prediction..."; LOG(CONSOLE) << "start prediction...";
} }
std::vector<bst_float> preds; HostDeviceVector<bst_float> preds;
learner->Predict(dtest.get(), param.pred_margin, &preds, param.ntree_limit); learner->Predict(dtest.get(), param.pred_margin, &preds, param.ntree_limit);
if (param.silent == 0) { if (param.silent == 0) {
LOG(CONSOLE) << "writing prediction to " << param.name_pred; LOG(CONSOLE) << "writing prediction to " << param.name_pred;
@@ -332,7 +332,7 @@ void CLIPredict(const CLIParam& param) {
std::unique_ptr<dmlc::Stream> fo( std::unique_ptr<dmlc::Stream> fo(
dmlc::Stream::Create(param.name_pred.c_str(), "w")); dmlc::Stream::Create(param.name_pred.c_str(), "w"));
dmlc::ostream os(fo.get()); dmlc::ostream os(fo.get());
for (bst_float p : preds) { for (bst_float p : preds.data_h()) {
os << p << '\n'; os << p << '\n';
} }
// force flush before fo destruct. // force flush before fo destruct.

View File

@@ -369,6 +369,16 @@ class dvec {
} }
thrust::copy(begin, end, this->tbegin()); thrust::copy(begin, end, this->tbegin());
} }
void copy(thrust::device_ptr<T> begin, thrust::device_ptr<T> end) {
safe_cuda(cudaSetDevice(this->device_idx()));
if (end - begin != size()) {
throw std::runtime_error(
"Cannot copy assign vector to dvec, sizes are different");
}
safe_cuda(cudaMemcpy(this->data(), begin.get(),
size() * sizeof(T), cudaMemcpyDefault));
}
}; };
/** /**
@@ -484,6 +494,13 @@ class bulk_allocator {
} }
public: public:
bulk_allocator() {}
// prevent accidental copying, moving or assignment of this object
bulk_allocator(const bulk_allocator<MemoryT>&) = delete;
bulk_allocator(bulk_allocator<MemoryT>&&) = delete;
void operator=(const bulk_allocator<MemoryT>&) = delete;
void operator=(bulk_allocator<MemoryT>&&) = delete;
~bulk_allocator() { ~bulk_allocator() {
for (size_t i = 0; i < d_ptr.size(); i++) { for (size_t i = 0; i < d_ptr.size(); i++) {
if (!(d_ptr[i] == nullptr)) { if (!(d_ptr[i] == nullptr)) {
@@ -780,6 +797,29 @@ void sumReduction(dh::CubMemory &tmp_mem, dh::dvec<T> &in, dh::dvec<T> &out,
in.data(), out.data(), nVals)); in.data(), out.data(), nVals));
} }
/**
* @brief Helper function to perform device-wide sum-reduction, returns to the
* host
* @param tmp_mem cub temporary memory info
* @param in the input array to be reduced
* @param nVals number of elements in the input array
*/
template <typename T>
T sumReduction(dh::CubMemory &tmp_mem, T *in, int nVals) {
size_t tmpSize;
dh::safe_cuda(cub::DeviceReduce::Sum(nullptr, tmpSize, in, in, nVals));
// Allocate small extra memory for the return value
tmp_mem.LazyAllocate(tmpSize + sizeof(T));
auto ptr = reinterpret_cast<T *>(tmp_mem.d_temp_storage) + 1;
dh::safe_cuda(cub::DeviceReduce::Sum(
reinterpret_cast<void *>(ptr), tmpSize, in,
reinterpret_cast<T *>(tmp_mem.d_temp_storage), nVals));
T sum;
dh::safe_cuda(cudaMemcpy(&sum, tmp_mem.d_temp_storage, sizeof(T),
cudaMemcpyDeviceToHost));
return sum;
}
/** /**
* @brief Fill a given constant value across all elements in the buffer * @brief Fill a given constant value across all elements in the buffer
* @param out the buffer to be filled * @param out the buffer to be filled

View File

@@ -75,7 +75,7 @@ void HistCutMatrix::Init(DMatrix* p_fmat, uint32_t max_num_bins) {
a.Reserve(max_num_bins); a.Reserve(max_num_bins);
a.SetPrune(summary_array[fid], max_num_bins); a.SetPrune(summary_array[fid], max_num_bins);
const bst_float mval = a.data[0].value; const bst_float mval = a.data[0].value;
this->min_val[fid] = mval - fabs(mval); this->min_val[fid] = mval - (fabs(mval) + 1e-5);
if (a.size > 1 && a.size <= 16) { if (a.size > 1 && a.size <= 16) {
/* specialized code categorial / ordinal data -- use midpoints */ /* specialized code categorial / ordinal data -- use midpoints */
for (size_t i = 1; i < a.size; ++i) { for (size_t i = 1; i < a.size; ++i) {
@@ -96,9 +96,10 @@ void HistCutMatrix::Init(DMatrix* p_fmat, uint32_t max_num_bins) {
if (a.size != 0) { if (a.size != 0) {
bst_float cpt = a.data[a.size - 1].value; bst_float cpt = a.data[a.size - 1].value;
// this must be bigger than last value in a scale // this must be bigger than last value in a scale
bst_float last = cpt + fabs(cpt); bst_float last = cpt + (fabs(cpt) + 1e-5);
cut.push_back(last); cut.push_back(last);
} }
row_ptr.push_back(static_cast<bst_uint>(cut.size())); row_ptr.push_back(static_cast<bst_uint>(cut.size()));
} }
} }

View File

@@ -0,0 +1,68 @@
/*!
* Copyright 2017 XGBoost contributors
*/
#ifndef XGBOOST_USE_CUDA
// dummy implementation of HostDeviceVector in case CUDA is not used
#include <xgboost/base.h>
#include "./host_device_vector.h"
namespace xgboost {
template <typename T>
struct HostDeviceVectorImpl {
explicit HostDeviceVectorImpl(size_t size, T v) : data_h_(size, v) {}
explicit HostDeviceVectorImpl(std::initializer_list<T> init) : data_h_(init) {}
explicit HostDeviceVectorImpl(const std::vector<T>& init) : data_h_(init) {}
std::vector<T> data_h_;
};
template <typename T>
HostDeviceVector<T>::HostDeviceVector(size_t size, T v, int device) : impl_(nullptr) {
impl_ = new HostDeviceVectorImpl<T>(size, v);
}
template <typename T>
HostDeviceVector<T>::HostDeviceVector(std::initializer_list<T> init, int device)
: impl_(nullptr) {
impl_ = new HostDeviceVectorImpl<T>(init);
}
template <typename T>
HostDeviceVector<T>::HostDeviceVector(const std::vector<T>& init, int device)
: impl_(nullptr) {
impl_ = new HostDeviceVectorImpl<T>(init);
}
template <typename T>
HostDeviceVector<T>::~HostDeviceVector() {
HostDeviceVectorImpl<T>* tmp = impl_;
impl_ = nullptr;
delete tmp;
}
template <typename T>
size_t HostDeviceVector<T>::size() const { return impl_->data_h_.size(); }
template <typename T>
int HostDeviceVector<T>::device() const { return -1; }
template <typename T>
T* HostDeviceVector<T>::ptr_d(int device) { return nullptr; }
template <typename T>
std::vector<T>& HostDeviceVector<T>::data_h() { return impl_->data_h_; }
template <typename T>
void HostDeviceVector<T>::resize(size_t new_size, T v, int new_device) {
impl_->data_h_.resize(new_size, v);
}
// explicit instantiations are required, as HostDeviceVector isn't header-only
template class HostDeviceVector<bst_float>;
template class HostDeviceVector<bst_gpair>;
} // namespace xgboost
#endif

View File

@@ -0,0 +1,161 @@
/*!
* Copyright 2017 XGBoost contributors
*/
#include "./host_device_vector.h"
#include "./device_helpers.cuh"
namespace xgboost {
template <typename T>
struct HostDeviceVectorImpl {
HostDeviceVectorImpl(size_t size, T v, int device)
: device_(device), on_d_(device >= 0) {
if (on_d_) {
dh::safe_cuda(cudaSetDevice(device_));
data_d_.resize(size, v);
} else {
data_h_.resize(size, v);
}
}
// Init can be std::vector<T> or std::initializer_list<T>
template <class Init>
HostDeviceVectorImpl(const Init& init, int device)
: device_(device), on_d_(device >= 0) {
if (on_d_) {
dh::safe_cuda(cudaSetDevice(device_));
data_d_.resize(init.size());
thrust::copy(init.begin(), init.end(), data_d_.begin());
} else {
data_h_ = init;
}
}
HostDeviceVectorImpl(const HostDeviceVectorImpl<T>&) = delete;
HostDeviceVectorImpl(HostDeviceVectorImpl<T>&&) = delete;
void operator=(const HostDeviceVectorImpl<T>&) = delete;
void operator=(HostDeviceVectorImpl<T>&&) = delete;
size_t size() const { return on_d_ ? data_d_.size() : data_h_.size(); }
int device() const { return device_; }
T* ptr_d(int device) {
lazy_sync_device(device);
return data_d_.data().get();
}
thrust::device_ptr<T> tbegin(int device) {
return thrust::device_ptr<T>(ptr_d(device));
}
thrust::device_ptr<T> tend(int device) {
auto begin = tbegin(device);
return begin + size();
}
std::vector<T>& data_h() {
lazy_sync_host();
return data_h_;
}
void resize(size_t new_size, T v, int new_device) {
if (new_size == this->size() && new_device == device_)
return;
if (new_device != -1)
device_ = new_device;
// if !on_d_, but the data size is 0 and the device is set,
// resize the data on device instead
if (!on_d_ && (data_h_.size() > 0 || device_ == -1)) {
data_h_.resize(new_size, v);
} else {
dh::safe_cuda(cudaSetDevice(device_));
data_d_.resize(new_size, v);
on_d_ = true;
}
}
void lazy_sync_host() {
if (!on_d_)
return;
if (data_h_.size() != this->size())
data_h_.resize(this->size());
dh::safe_cuda(cudaSetDevice(device_));
thrust::copy(data_d_.begin(), data_d_.end(), data_h_.begin());
on_d_ = false;
}
void lazy_sync_device(int device) {
if (on_d_)
return;
if (device != device_) {
CHECK_EQ(device_, -1);
device_ = device;
}
if (data_d_.size() != this->size()) {
dh::safe_cuda(cudaSetDevice(device_));
data_d_.resize(this->size());
}
dh::safe_cuda(cudaSetDevice(device_));
thrust::copy(data_h_.begin(), data_h_.end(), data_d_.begin());
on_d_ = true;
}
std::vector<T> data_h_;
thrust::device_vector<T> data_d_;
// true if there is an up-to-date copy of data on device, false otherwise
bool on_d_;
int device_;
};
template <typename T>
HostDeviceVector<T>::HostDeviceVector(size_t size, T v, int device) : impl_(nullptr) {
impl_ = new HostDeviceVectorImpl<T>(size, v, device);
}
template <typename T>
HostDeviceVector<T>::HostDeviceVector(std::initializer_list<T> init, int device)
: impl_(nullptr) {
impl_ = new HostDeviceVectorImpl<T>(init, device);
}
template <typename T>
HostDeviceVector<T>::HostDeviceVector(const std::vector<T>& init, int device)
: impl_(nullptr) {
impl_ = new HostDeviceVectorImpl<T>(init, device);
}
template <typename T>
HostDeviceVector<T>::~HostDeviceVector() {
HostDeviceVectorImpl<T>* tmp = impl_;
impl_ = nullptr;
delete tmp;
}
template <typename T>
size_t HostDeviceVector<T>::size() const { return impl_->size(); }
template <typename T>
int HostDeviceVector<T>::device() const { return impl_->device(); }
template <typename T>
T* HostDeviceVector<T>::ptr_d(int device) { return impl_->ptr_d(device); }
template <typename T>
thrust::device_ptr<T> HostDeviceVector<T>::tbegin(int device) {
return impl_->tbegin(device);
}
template <typename T>
thrust::device_ptr<T> HostDeviceVector<T>::tend(int device) {
return impl_->tend(device);
}
template <typename T>
std::vector<T>& HostDeviceVector<T>::data_h() { return impl_->data_h(); }
template <typename T>
void HostDeviceVector<T>::resize(size_t new_size, T v, int new_device) {
impl_->resize(new_size, v, new_device);
}
// explicit instantiations are required, as HostDeviceVector isn't header-only
template class HostDeviceVector<bst_float>;
template class HostDeviceVector<bst_gpair>;
} // namespace xgboost

View File

@@ -0,0 +1,96 @@
/*!
* Copyright 2017 XGBoost contributors
*/
#ifndef XGBOOST_COMMON_HOST_DEVICE_VECTOR_H_
#define XGBOOST_COMMON_HOST_DEVICE_VECTOR_H_
#include <cstdlib>
#include <initializer_list>
#include <vector>
// only include thrust-related files if host_device_vector.h
// is included from a .cu file
#ifdef __CUDACC__
#include <thrust/device_ptr.h>
#endif
namespace xgboost {
template <typename T> struct HostDeviceVectorImpl;
/**
* @file host_device_vector.h
* @brief A device-and-host vector abstraction layer.
*
* Why HostDeviceVector?<br/>
* With CUDA, one has to explicitly manage memory through 'cudaMemcpy' calls.
* This wrapper class hides this management from the users, thereby making it
* easy to integrate GPU/CPU usage under a single interface.
*
* Initialization/Allocation:<br/>
* One can choose to initialize the vector on CPU or GPU during constructor.
* (use the 'device' argument) Or, can choose to use the 'resize' method to
* allocate/resize memory explicitly.
*
* Accessing underling data:<br/>
* Use 'data_h' method to explicitly query for the underlying std::vector.
* If you need the raw device pointer, use the 'ptr_d' method. For perf
* implications of these calls, see below.
*
* Accessing underling data and their perf implications:<br/>
* There are 4 scenarios to be considered here:
* data_h and data on CPU --> no problems, std::vector returned immediately
* data_h but data on GPU --> this causes a cudaMemcpy to be issued internally.
* subsequent calls to data_h, will NOT incur this penalty.
* (assuming 'ptr_d' is not called in between)
* ptr_d but data on CPU --> this causes a cudaMemcpy to be issued internally.
* subsequent calls to ptr_d, will NOT incur this penalty.
* (assuming 'data_h' is not called in between)
* ptr_d and data on GPU --> no problems, the device ptr will be returned immediately
*
* What if xgboost is compiled without CUDA?<br/>
* In that case, there's a special implementation which always falls-back to
* working with std::vector. This logic can be found in host_device_vector.cc
*
* Why not consider CUDA unified memory?<br/>
* We did consider. However, it poses complications if we need to support both
* compiling with and without CUDA toolkit. It was easier to have
* 'HostDeviceVector' with a special-case implementation in host_device_vector.cc
*
* @note: This is not thread-safe!
*/
template <typename T>
class HostDeviceVector {
public:
explicit HostDeviceVector(size_t size = 0, T v = T(), int device = -1);
HostDeviceVector(std::initializer_list<T> init, int device = -1);
explicit HostDeviceVector(const std::vector<T>& init, int device = -1);
~HostDeviceVector();
HostDeviceVector(const HostDeviceVector<T>&) = delete;
HostDeviceVector(HostDeviceVector<T>&&) = delete;
void operator=(const HostDeviceVector<T>&) = delete;
void operator=(HostDeviceVector<T>&&) = delete;
size_t size() const;
int device() const;
T* ptr_d(int device);
T* ptr_h() { return data_h().data(); }
// only define functions returning device_ptr
// if HostDeviceVector.h is included from a .cu file
#ifdef __CUDACC__
thrust::device_ptr<T> tbegin(int device);
thrust::device_ptr<T> tend(int device);
#endif
std::vector<T>& data_h();
// passing in new_device == -1 keeps the device as is
void resize(size_t new_size, T v = T(), int new_device = -1);
private:
HostDeviceVectorImpl<T>* impl_;
};
} // namespace xgboost
#endif // XGBOOST_COMMON_HOST_DEVICE_VECTOR_H_

View File

@@ -20,8 +20,8 @@ namespace common {
* \param x input parameter * \param x input parameter
* \return the transformed value. * \return the transformed value.
*/ */
inline float Sigmoid(float x) { XGBOOST_DEVICE inline float Sigmoid(float x) {
return 1.0f / (1.0f + std::exp(-x)); return 1.0f / (1.0f + expf(-x));
} }
inline avx::Float8 Sigmoid(avx::Float8 x) { inline avx::Float8 Sigmoid(avx::Float8 x) {

View File

@@ -281,7 +281,7 @@ struct WQSummary {
// helper function to print the current content of sketch // helper function to print the current content of sketch
inline void Print() const { inline void Print() const {
for (size_t i = 0; i < this->size; ++i) { for (size_t i = 0; i < this->size; ++i) {
LOG(INFO) << "[" << i << "] rmin=" << data[i].rmin LOG(CONSOLE) << "[" << i << "] rmin=" << data[i].rmin
<< ", rmax=" << data[i].rmax << ", rmax=" << data[i].rmax
<< ", wmin=" << data[i].wmin << ", wmin=" << data[i].wmin
<< ", v=" << data[i].value; << ", v=" << data[i].value;
@@ -321,7 +321,7 @@ struct WQSummary {
for (size_t i = 0; i < this->size; ++i) { for (size_t i = 0; i < this->size; ++i) {
if (data[i].rmin + data[i].wmin > data[i].rmax + tol || if (data[i].rmin + data[i].wmin > data[i].rmax + tol ||
data[i].rmin < -1e-6f || data[i].rmax < -1e-6f) { data[i].rmin < -1e-6f || data[i].rmax < -1e-6f) {
LOG(INFO) << "----------check not pass----------"; LOG(INFO) << "---------- WQSummary::Check did not pass ----------";
this->Print(); this->Print();
return false; return false;
} }
@@ -503,9 +503,8 @@ struct GKSummary {
/*! \brief used for debug purpose, print the summary */ /*! \brief used for debug purpose, print the summary */
inline void Print() const { inline void Print() const {
for (size_t i = 0; i < size; ++i) { for (size_t i = 0; i < size; ++i) {
std::cout << "x=" << data[i].value << "\t" LOG(CONSOLE) << "x=" << data[i].value << "\t"
<< "[" << data[i].rmin << "," << data[i].rmax << "]" << "[" << data[i].rmin << "," << data[i].rmax << "]";
<< std::endl;
} }
} }
/*! /*!

View File

@@ -2,6 +2,7 @@
* Copyright by Contributors 2017 * Copyright by Contributors 2017
*/ */
#pragma once #pragma once
#include <xgboost/logging.h>
#include <chrono> #include <chrono>
#include <iostream> #include <iostream>
#include <map> #include <map>
@@ -27,7 +28,10 @@ struct Timer {
void Stop() { elapsed += ClockT::now() - start; } void Stop() { elapsed += ClockT::now() - start; }
double ElapsedSeconds() const { return SecondsT(elapsed).count(); } double ElapsedSeconds() const { return SecondsT(elapsed).count(); }
void PrintElapsed(std::string label) { void PrintElapsed(std::string label) {
printf("%s:\t %fs\n", label.c_str(), SecondsT(elapsed).count()); char buffer[255];
snprintf(buffer, sizeof(buffer), "%s:\t %fs", label.c_str(),
SecondsT(elapsed).count());
LOG(CONSOLE) << buffer;
Reset(); Reset();
} }
}; };
@@ -50,9 +54,7 @@ struct Monitor {
~Monitor() { ~Monitor() {
if (!debug_verbose) return; if (!debug_verbose) return;
std::cout << "========\n"; LOG(CONSOLE) << "======== Monitor: " << label << " ========";
std::cout << "Monitor: " << label << "\n";
std::cout << "========\n";
for (auto &kv : timer_map) { for (auto &kv : timer_map) {
kv.second.PrintElapsed(kv.first); kv.second.PrintElapsed(kv.first);
} }

View File

@@ -54,16 +54,16 @@ dmlc::DataIter<ColBatch>* SimpleDMatrix::ColIterator(const std::vector<bst_uint>
void SimpleDMatrix::InitColAccess(const std::vector<bool> &enabled, void SimpleDMatrix::InitColAccess(const std::vector<bool> &enabled,
float pkeep, float pkeep,
size_t max_row_perbatch) { size_t max_row_perbatch, bool sorted) {
if (this->HaveColAccess()) return; if (this->HaveColAccess(sorted)) return;
col_iter_.sorted = sorted;
col_iter_.cpages_.clear(); col_iter_.cpages_.clear();
if (info().num_row < max_row_perbatch) { if (info().num_row < max_row_perbatch) {
std::unique_ptr<SparsePage> page(new SparsePage()); std::unique_ptr<SparsePage> page(new SparsePage());
this->MakeOneBatch(enabled, pkeep, page.get()); this->MakeOneBatch(enabled, pkeep, page.get(), sorted);
col_iter_.cpages_.push_back(std::move(page)); col_iter_.cpages_.push_back(std::move(page));
} else { } else {
this->MakeManyBatch(enabled, pkeep, max_row_perbatch); this->MakeManyBatch(enabled, pkeep, max_row_perbatch, sorted);
} }
// setup col-size // setup col-size
col_size_.resize(info().num_col); col_size_.resize(info().num_col);
@@ -77,9 +77,8 @@ void SimpleDMatrix::InitColAccess(const std::vector<bool> &enabled,
} }
// internal function to make one batch from row iter. // internal function to make one batch from row iter.
void SimpleDMatrix::MakeOneBatch(const std::vector<bool>& enabled, void SimpleDMatrix::MakeOneBatch(const std::vector<bool>& enabled, float pkeep,
float pkeep, SparsePage* pcol, bool sorted) {
SparsePage *pcol) {
// clear rowset // clear rowset
buffered_rowset_.clear(); buffered_rowset_.clear();
// bit map // bit map
@@ -144,9 +143,11 @@ void SimpleDMatrix::MakeOneBatch(const std::vector<bool>& enabled,
} }
CHECK_EQ(pcol->Size(), info().num_col); CHECK_EQ(pcol->Size(), info().num_col);
if (sorted) {
// sort columns // sort columns
bst_omp_uint ncol = static_cast<bst_omp_uint>(pcol->Size()); bst_omp_uint ncol = static_cast<bst_omp_uint>(pcol->Size());
#pragma omp parallel for schedule(dynamic, 1) num_threads(nthread) #pragma omp parallel for schedule(dynamic, 1) num_threads(nthread)
for (bst_omp_uint i = 0; i < ncol; ++i) { for (bst_omp_uint i = 0; i < ncol; ++i) {
if (pcol->offset[i] < pcol->offset[i + 1]) { if (pcol->offset[i] < pcol->offset[i + 1]) {
std::sort(dmlc::BeginPtr(pcol->data) + pcol->offset[i], std::sort(dmlc::BeginPtr(pcol->data) + pcol->offset[i],
@@ -154,11 +155,12 @@ void SimpleDMatrix::MakeOneBatch(const std::vector<bool>& enabled,
SparseBatch::Entry::CmpValue); SparseBatch::Entry::CmpValue);
} }
} }
}
} }
void SimpleDMatrix::MakeManyBatch(const std::vector<bool>& enabled, void SimpleDMatrix::MakeManyBatch(const std::vector<bool>& enabled,
float pkeep, float pkeep,
size_t max_row_perbatch) { size_t max_row_perbatch, bool sorted) {
size_t btop = 0; size_t btop = 0;
std::bernoulli_distribution coin_flip(pkeep); std::bernoulli_distribution coin_flip(pkeep);
auto& rnd = common::GlobalRandom(); auto& rnd = common::GlobalRandom();
@@ -179,7 +181,7 @@ void SimpleDMatrix::MakeManyBatch(const std::vector<bool>& enabled,
} }
if (tmp.Size() >= max_row_perbatch) { if (tmp.Size() >= max_row_perbatch) {
std::unique_ptr<SparsePage> page(new SparsePage()); std::unique_ptr<SparsePage> page(new SparsePage());
this->MakeColPage(tmp.GetRowBatch(0), btop, enabled, page.get()); this->MakeColPage(tmp.GetRowBatch(0), btop, enabled, page.get(), sorted);
col_iter_.cpages_.push_back(std::move(page)); col_iter_.cpages_.push_back(std::move(page));
btop = buffered_rowset_.size(); btop = buffered_rowset_.size();
tmp.Clear(); tmp.Clear();
@@ -189,7 +191,7 @@ void SimpleDMatrix::MakeManyBatch(const std::vector<bool>& enabled,
if (tmp.Size() != 0) { if (tmp.Size() != 0) {
std::unique_ptr<SparsePage> page(new SparsePage()); std::unique_ptr<SparsePage> page(new SparsePage());
this->MakeColPage(tmp.GetRowBatch(0), btop, enabled, page.get()); this->MakeColPage(tmp.GetRowBatch(0), btop, enabled, page.get(), sorted);
col_iter_.cpages_.push_back(std::move(page)); col_iter_.cpages_.push_back(std::move(page));
} }
} }
@@ -198,7 +200,7 @@ void SimpleDMatrix::MakeManyBatch(const std::vector<bool>& enabled,
void SimpleDMatrix::MakeColPage(const RowBatch& batch, void SimpleDMatrix::MakeColPage(const RowBatch& batch,
size_t buffer_begin, size_t buffer_begin,
const std::vector<bool>& enabled, const std::vector<bool>& enabled,
SparsePage* pcol) { SparsePage* pcol, bool sorted) {
const int nthread = std::min(omp_get_max_threads(), std::max(omp_get_num_procs() / 2 - 2, 1)); const int nthread = std::min(omp_get_max_threads(), std::max(omp_get_num_procs() / 2 - 2, 1));
pcol->Clear(); pcol->Clear();
common::ParallelGroupBuilder<SparseBatch::Entry> common::ParallelGroupBuilder<SparseBatch::Entry>
@@ -231,8 +233,9 @@ void SimpleDMatrix::MakeColPage(const RowBatch& batch,
} }
CHECK_EQ(pcol->Size(), info().num_col); CHECK_EQ(pcol->Size(), info().num_col);
// sort columns // sort columns
if (sorted) {
bst_omp_uint ncol = static_cast<bst_omp_uint>(pcol->Size()); bst_omp_uint ncol = static_cast<bst_omp_uint>(pcol->Size());
#pragma omp parallel for schedule(dynamic, 1) num_threads(nthread) #pragma omp parallel for schedule(dynamic, 1) num_threads(nthread)
for (bst_omp_uint i = 0; i < ncol; ++i) { for (bst_omp_uint i = 0; i < ncol; ++i) {
if (pcol->offset[i] < pcol->offset[i + 1]) { if (pcol->offset[i] < pcol->offset[i + 1]) {
std::sort(dmlc::BeginPtr(pcol->data) + pcol->offset[i], std::sort(dmlc::BeginPtr(pcol->data) + pcol->offset[i],
@@ -240,6 +243,7 @@ void SimpleDMatrix::MakeColPage(const RowBatch& batch,
SparseBatch::Entry::CmpValue); SparseBatch::Entry::CmpValue);
} }
} }
}
} }
bool SimpleDMatrix::SingleColBlock() const { bool SimpleDMatrix::SingleColBlock() const {

View File

@@ -36,8 +36,8 @@ class SimpleDMatrix : public DMatrix {
return iter; return iter;
} }
bool HaveColAccess() const override { bool HaveColAccess(bool sorted) const override {
return col_size_.size() != 0; return col_size_.size() != 0 && col_iter_.sorted == sorted;
} }
const RowSet& buffered_rowset() const override { const RowSet& buffered_rowset() const override {
@@ -59,7 +59,7 @@ class SimpleDMatrix : public DMatrix {
void InitColAccess(const std::vector<bool>& enabled, void InitColAccess(const std::vector<bool>& enabled,
float subsample, float subsample,
size_t max_row_perbatch) override; size_t max_row_perbatch, bool sorted) override;
bool SingleColBlock() const override; bool SingleColBlock() const override;
@@ -67,7 +67,7 @@ class SimpleDMatrix : public DMatrix {
// in-memory column batch iterator. // in-memory column batch iterator.
struct ColBatchIter: dmlc::DataIter<ColBatch> { struct ColBatchIter: dmlc::DataIter<ColBatch> {
public: public:
ColBatchIter() : data_ptr_(0) {} ColBatchIter() : data_ptr_(0), sorted(false) {}
void BeforeFirst() override { void BeforeFirst() override {
data_ptr_ = 0; data_ptr_ = 0;
} }
@@ -89,6 +89,8 @@ class SimpleDMatrix : public DMatrix {
size_t data_ptr_; size_t data_ptr_;
// temporal space for batch // temporal space for batch
ColBatch batch_; ColBatch batch_;
// Is column sorted?
bool sorted;
}; };
// source data pointer. // source data pointer.
@@ -103,16 +105,16 @@ class SimpleDMatrix : public DMatrix {
// internal function to make one batch from row iter. // internal function to make one batch from row iter.
void MakeOneBatch(const std::vector<bool>& enabled, void MakeOneBatch(const std::vector<bool>& enabled,
float pkeep, float pkeep,
SparsePage *pcol); SparsePage *pcol, bool sorted);
void MakeManyBatch(const std::vector<bool>& enabled, void MakeManyBatch(const std::vector<bool>& enabled,
float pkeep, float pkeep,
size_t max_row_perbatch); size_t max_row_perbatch, bool sorted);
void MakeColPage(const RowBatch& batch, void MakeColPage(const RowBatch& batch,
size_t buffer_begin, size_t buffer_begin,
const std::vector<bool>& enabled, const std::vector<bool>& enabled,
SparsePage* pcol); SparsePage* pcol, bool sorted);
}; };
} // namespace data } // namespace data
} // namespace xgboost } // namespace xgboost

View File

@@ -119,7 +119,7 @@ ColIterator(const std::vector<bst_uint>& fset) {
} }
bool SparsePageDMatrix::TryInitColData() { bool SparsePageDMatrix::TryInitColData(bool sorted) {
// load meta data. // load meta data.
std::vector<std::string> cache_shards = common::Split(cache_info_, ':'); std::vector<std::string> cache_shards = common::Split(cache_info_, ':');
{ {
@@ -140,15 +140,16 @@ bool SparsePageDMatrix::TryInitColData() {
files.push_back(std::move(fdata)); files.push_back(std::move(fdata));
} }
col_iter_.reset(new ColPageIter(std::move(files))); col_iter_.reset(new ColPageIter(std::move(files)));
// warning: no attempt to check here whether the cached data was sorted
col_iter_->sorted = sorted;
return true; return true;
} }
void SparsePageDMatrix::InitColAccess(const std::vector<bool>& enabled, void SparsePageDMatrix::InitColAccess(const std::vector<bool>& enabled,
float pkeep, float pkeep,
size_t max_row_perbatch) { size_t max_row_perbatch, bool sorted) {
if (HaveColAccess()) return; if (HaveColAccess(sorted)) return;
if (TryInitColData()) return; if (TryInitColData(sorted)) return;
const MetaInfo& info = this->info(); const MetaInfo& info = this->info();
if (max_row_perbatch == std::numeric_limits<size_t>::max()) { if (max_row_perbatch == std::numeric_limits<size_t>::max()) {
max_row_perbatch = kMaxRowPerBatch; max_row_perbatch = kMaxRowPerBatch;
@@ -197,8 +198,9 @@ void SparsePageDMatrix::InitColAccess(const std::vector<bool>& enabled,
} }
CHECK_EQ(pcol->Size(), info.num_col); CHECK_EQ(pcol->Size(), info.num_col);
// sort columns // sort columns
if (sorted) {
bst_omp_uint ncol = static_cast<bst_omp_uint>(pcol->Size()); bst_omp_uint ncol = static_cast<bst_omp_uint>(pcol->Size());
#pragma omp parallel for schedule(dynamic, 1) num_threads(nthread) #pragma omp parallel for schedule(dynamic, 1) num_threads(nthread)
for (bst_omp_uint i = 0; i < ncol; ++i) { for (bst_omp_uint i = 0; i < ncol; ++i) {
if (pcol->offset[i] < pcol->offset[i + 1]) { if (pcol->offset[i] < pcol->offset[i + 1]) {
std::sort(dmlc::BeginPtr(pcol->data) + pcol->offset[i], std::sort(dmlc::BeginPtr(pcol->data) + pcol->offset[i],
@@ -206,6 +208,7 @@ void SparsePageDMatrix::InitColAccess(const std::vector<bool>& enabled,
SparseBatch::Entry::CmpValue); SparseBatch::Entry::CmpValue);
} }
} }
}
}; };
auto make_next_col = [&] (SparsePage* dptr) { auto make_next_col = [&] (SparsePage* dptr) {
@@ -290,7 +293,7 @@ void SparsePageDMatrix::InitColAccess(const std::vector<bool>& enabled,
fo.reset(nullptr); fo.reset(nullptr);
} }
// initialize column data // initialize column data
CHECK(TryInitColData()); CHECK(TryInitColData(sorted));
} }
} // namespace data } // namespace data

View File

@@ -40,8 +40,8 @@ class SparsePageDMatrix : public DMatrix {
return iter; return iter;
} }
bool HaveColAccess() const override { bool HaveColAccess(bool sorted) const override {
return col_iter_.get() != nullptr; return col_iter_.get() != nullptr && col_iter_->sorted == sorted;
} }
const RowSet& buffered_rowset() const override { const RowSet& buffered_rowset() const override {
@@ -67,7 +67,7 @@ class SparsePageDMatrix : public DMatrix {
void InitColAccess(const std::vector<bool>& enabled, void InitColAccess(const std::vector<bool>& enabled,
float subsample, float subsample,
size_t max_row_perbatch) override; size_t max_row_perbatch, bool sorted) override;
/*! \brief page size 256 MB */ /*! \brief page size 256 MB */
static const size_t kPageSize = 256UL << 20UL; static const size_t kPageSize = 256UL << 20UL;
@@ -87,6 +87,8 @@ class SparsePageDMatrix : public DMatrix {
bool Next() override; bool Next() override;
// initialize the column iterator with the specified index set. // initialize the column iterator with the specified index set.
void Init(const std::vector<bst_uint>& index_set, bool load_all); void Init(const std::vector<bst_uint>& index_set, bool load_all);
// If the column features are sorted
bool sorted;
private: private:
// the temp page. // the temp page.
@@ -114,7 +116,7 @@ class SparsePageDMatrix : public DMatrix {
* \brief Try to initialize column data. * \brief Try to initialize column data.
* \return true if data already exists, false if they do not. * \return true if data already exists, false if they do not.
*/ */
bool TryInitColData(); bool TryInitColData(bool sorted);
// source data pointer. // source data pointer.
std::unique_ptr<DataSource> source_; std::unique_ptr<DataSource> source_;
// the cache prefix // the cache prefix

View File

@@ -9,92 +9,67 @@
#include <dmlc/parameter.h> #include <dmlc/parameter.h>
#include <xgboost/gbm.h> #include <xgboost/gbm.h>
#include <xgboost/logging.h> #include <xgboost/logging.h>
#include <xgboost/linear_updater.h>
#include <vector> #include <vector>
#include <string> #include <string>
#include <sstream> #include <sstream>
#include <cstring>
#include <algorithm> #include <algorithm>
#include "../common/timer.h"
namespace xgboost { namespace xgboost {
namespace gbm { namespace gbm {
DMLC_REGISTRY_FILE_TAG(gblinear); DMLC_REGISTRY_FILE_TAG(gblinear);
// model parameter // training parameters
struct GBLinearModelParam :public dmlc::Parameter<GBLinearModelParam> {
// number of feature dimension
unsigned num_feature;
// number of output group
int num_output_group;
// reserved field
int reserved[32];
// constructor
GBLinearModelParam() {
std::memset(this, 0, sizeof(GBLinearModelParam));
}
DMLC_DECLARE_PARAMETER(GBLinearModelParam) {
DMLC_DECLARE_FIELD(num_feature).set_lower_bound(0)
.describe("Number of features used in classification.");
DMLC_DECLARE_FIELD(num_output_group).set_lower_bound(1).set_default(1)
.describe("Number of output groups in the setting.");
}
};
// training parameter
struct GBLinearTrainParam : public dmlc::Parameter<GBLinearTrainParam> { struct GBLinearTrainParam : public dmlc::Parameter<GBLinearTrainParam> {
/*! \brief learning_rate */ std::string updater;
float learning_rate; float tolerance;
/*! \brief regularization weight for L2 norm */ size_t max_row_perbatch;
float reg_lambda; int debug_verbose;
/*! \brief regularization weight for L1 norm */
float reg_alpha;
/*! \brief regularization weight for L2 norm in bias */
float reg_lambda_bias;
// declare parameters
DMLC_DECLARE_PARAMETER(GBLinearTrainParam) { DMLC_DECLARE_PARAMETER(GBLinearTrainParam) {
DMLC_DECLARE_FIELD(learning_rate).set_lower_bound(0.0f).set_default(1.0f) DMLC_DECLARE_FIELD(updater)
.describe("Learning rate of each update."); .set_default("shotgun")
DMLC_DECLARE_FIELD(reg_lambda).set_lower_bound(0.0f).set_default(0.0f) .describe("Update algorithm for linear model. One of shotgun/coord_descent");
.describe("L2 regularization on weights."); DMLC_DECLARE_FIELD(tolerance)
DMLC_DECLARE_FIELD(reg_alpha).set_lower_bound(0.0f).set_default(0.0f) .set_lower_bound(0.0f)
.describe("L1 regularization on weights."); .set_default(0.0f)
DMLC_DECLARE_FIELD(reg_lambda_bias).set_lower_bound(0.0f).set_default(0.0f) .describe("Stop if largest weight update is smaller than this number.");
.describe("L2 regularization on bias."); DMLC_DECLARE_FIELD(max_row_perbatch)
// alias of parameters .set_default(std::numeric_limits<size_t>::max())
DMLC_DECLARE_ALIAS(learning_rate, eta); .describe("Maximum rows per batch.");
DMLC_DECLARE_ALIAS(reg_lambda, lambda); DMLC_DECLARE_FIELD(debug_verbose)
DMLC_DECLARE_ALIAS(reg_alpha, alpha); .set_lower_bound(0)
DMLC_DECLARE_ALIAS(reg_lambda_bias, lambda_bias); .set_default(0)
} .describe("flag to print out detailed breakdown of runtime");
// given original weight calculate delta
inline double CalcDelta(double sum_grad, double sum_hess, double w) const {
if (sum_hess < 1e-5f) return 0.0f;
double tmp = w - (sum_grad + reg_lambda * w) / (sum_hess + reg_lambda);
if (tmp >=0) {
return std::max(-(sum_grad + reg_lambda * w + reg_alpha) / (sum_hess + reg_lambda), -w);
} else {
return std::min(-(sum_grad + reg_lambda * w - reg_alpha) / (sum_hess + reg_lambda), -w);
}
}
// given original weight calculate delta bias
inline double CalcDeltaBias(double sum_grad, double sum_hess, double w) const {
return - (sum_grad + reg_lambda_bias * w) / (sum_hess + reg_lambda_bias);
} }
}; };
/*! /*!
* \brief gradient boosted linear model * \brief gradient boosted linear model
*/ */
class GBLinear : public GradientBooster { class GBLinear : public GradientBooster {
public: public:
explicit GBLinear(bst_float base_margin) explicit GBLinear(const std::vector<std::shared_ptr<DMatrix> > &cache,
: base_margin_(base_margin) { bst_float base_margin)
: base_margin_(base_margin),
sum_instance_weight(0),
sum_weight_complete(false),
is_converged(false) {
// Add matrices to the prediction cache
for (auto &d : cache) {
PredictionCacheEntry e;
e.data = d;
cache_[d.get()] = std::move(e);
}
} }
void Configure(const std::vector<std::pair<std::string, std::string> >& cfg) override { void Configure(const std::vector<std::pair<std::string, std::string> >& cfg) override {
if (model.weight.size() == 0) { if (model.weight.size() == 0) {
model.param.InitAllowUnknown(cfg); model.param.InitAllowUnknown(cfg);
} }
param.InitAllowUnknown(cfg); param.InitAllowUnknown(cfg);
updater.reset(LinearUpdater::Create(param.updater));
updater->Init(cfg);
monitor.Init("GBLinear ", param.debug_verbose);
} }
void Load(dmlc::Stream* fi) override { void Load(dmlc::Stream* fi) override {
model.Load(fi); model.Load(fi);
@@ -102,108 +77,45 @@ class GBLinear : public GradientBooster {
void Save(dmlc::Stream* fo) const override { void Save(dmlc::Stream* fo) const override {
model.Save(fo); model.Save(fo);
} }
void DoBoost(DMatrix *p_fmat, void DoBoost(DMatrix *p_fmat,
std::vector<bst_gpair> *in_gpair, HostDeviceVector<bst_gpair> *in_gpair,
ObjFunction* obj) override { ObjFunction* obj) override {
// lazily initialize the model when not ready. monitor.Start("DoBoost");
if (model.weight.size() == 0) {
model.InitModel(); if (!p_fmat->HaveColAccess(false)) {
std::vector<bool> enabled(p_fmat->info().num_col, true);
p_fmat->InitColAccess(enabled, 1.0f, param.max_row_perbatch, false);
} }
std::vector<bst_gpair> &gpair = *in_gpair; model.LazyInitModel();
const int ngroup = model.param.num_output_group; this->LazySumWeights(p_fmat);
const RowSet &rowset = p_fmat->buffered_rowset();
// for all the output group if (!this->CheckConvergence()) {
for (int gid = 0; gid < ngroup; ++gid) { updater->Update(&in_gpair->data_h(), p_fmat, &model, sum_instance_weight);
double sum_grad = 0.0, sum_hess = 0.0;
const bst_omp_uint ndata = static_cast<bst_omp_uint>(rowset.size());
#pragma omp parallel for schedule(static) reduction(+: sum_grad, sum_hess)
for (bst_omp_uint i = 0; i < ndata; ++i) {
bst_gpair &p = gpair[rowset[i] * ngroup + gid];
if (p.GetHess() >= 0.0f) {
sum_grad += p.GetGrad();
sum_hess += p.GetHess();
}
}
// remove bias effect
bst_float dw = static_cast<bst_float>(
param.learning_rate * param.CalcDeltaBias(sum_grad, sum_hess, model.bias()[gid]));
model.bias()[gid] += dw;
// update grad value
#pragma omp parallel for schedule(static)
for (bst_omp_uint i = 0; i < ndata; ++i) {
bst_gpair &p = gpair[rowset[i] * ngroup + gid];
if (p.GetHess() >= 0.0f) {
p += bst_gpair(p.GetHess() * dw, 0);
}
}
}
dmlc::DataIter<ColBatch> *iter = p_fmat->ColIterator();
while (iter->Next()) {
// number of features
const ColBatch &batch = iter->Value();
const bst_omp_uint nfeat = static_cast<bst_omp_uint>(batch.size);
#pragma omp parallel for schedule(static)
for (bst_omp_uint i = 0; i < nfeat; ++i) {
const bst_uint fid = batch.col_index[i];
ColBatch::Inst col = batch[i];
for (int gid = 0; gid < ngroup; ++gid) {
double sum_grad = 0.0, sum_hess = 0.0;
for (bst_uint j = 0; j < col.length; ++j) {
const bst_float v = col[j].fvalue;
bst_gpair &p = gpair[col[j].index * ngroup + gid];
if (p.GetHess() < 0.0f) continue;
sum_grad += p.GetGrad() * v;
sum_hess += p.GetHess() * v * v;
}
bst_float &w = model[fid][gid];
bst_float dw = static_cast<bst_float>(param.learning_rate *
param.CalcDelta(sum_grad, sum_hess, w));
w += dw;
// update grad value
for (bst_uint j = 0; j < col.length; ++j) {
bst_gpair &p = gpair[col[j].index * ngroup + gid];
if (p.GetHess() < 0.0f) continue;
p += bst_gpair(p.GetHess() * col[j].fvalue * dw, 0);
}
}
}
} }
this->UpdatePredictionCache();
monitor.Stop("DoBoost");
} }
void PredictBatch(DMatrix *p_fmat, void PredictBatch(DMatrix *p_fmat,
std::vector<bst_float> *out_preds, HostDeviceVector<bst_float> *out_preds,
unsigned ntree_limit) override { unsigned ntree_limit) override {
if (model.weight.size() == 0) { monitor.Start("PredictBatch");
model.InitModel();
}
CHECK_EQ(ntree_limit, 0U) CHECK_EQ(ntree_limit, 0U)
<< "GBLinear::Predict ntrees is only valid for gbtree predictor"; << "GBLinear::Predict ntrees is only valid for gbtree predictor";
std::vector<bst_float> &preds = *out_preds;
const std::vector<bst_float>& base_margin = p_fmat->info().base_margin; // Try to predict from cache
preds.resize(0); auto it = cache_.find(p_fmat);
// start collecting the prediction if (it != cache_.end() && it->second.predictions.size() != 0) {
dmlc::DataIter<RowBatch> *iter = p_fmat->RowIterator(); std::vector<bst_float> &y = it->second.predictions;
const int ngroup = model.param.num_output_group; out_preds->resize(y.size());
while (iter->Next()) { std::copy(y.begin(), y.end(), out_preds->data_h().begin());
const RowBatch &batch = iter->Value(); } else {
CHECK_EQ(batch.base_rowid * ngroup, preds.size()); this->PredictBatchInternal(p_fmat, &out_preds->data_h());
// output convention: nrow * k, where nrow is number of rows
// k is number of group
preds.resize(preds.size() + batch.size * ngroup);
// parallel over local batch
const omp_ulong nsize = static_cast<omp_ulong>(batch.size);
#pragma omp parallel for schedule(static)
for (omp_ulong i = 0; i < nsize; ++i) {
const size_t ridx = batch.base_rowid + i;
// loop over output groups
for (int gid = 0; gid < ngroup; ++gid) {
bst_float margin = (base_margin.size() != 0) ?
base_margin[ridx * ngroup + gid] : base_margin_;
this->Pred(batch[i], &preds[ridx * ngroup], gid, margin);
}
}
} }
monitor.Stop("PredictBatch");
} }
// add base margin // add base margin
void PredictInstance(const SparseBatch::Inst &inst, void PredictInstance(const SparseBatch::Inst &inst,
@@ -224,10 +136,9 @@ class GBLinear : public GradientBooster {
void PredictContribution(DMatrix* p_fmat, void PredictContribution(DMatrix* p_fmat,
std::vector<bst_float>* out_contribs, std::vector<bst_float>* out_contribs,
unsigned ntree_limit, bool approximate) override { unsigned ntree_limit, bool approximate, int condition = 0,
if (model.weight.size() == 0) { unsigned condition_feature = 0) override {
model.InitModel(); model.LazyInitModel();
}
CHECK_EQ(ntree_limit, 0U) CHECK_EQ(ntree_limit, 0U)
<< "GBLinear::PredictContribution: ntrees is only valid for gbtree predictor"; << "GBLinear::PredictContribution: ntrees is only valid for gbtree predictor";
const std::vector<bst_float>& base_margin = p_fmat->info().base_margin; const std::vector<bst_float>& base_margin = p_fmat->info().base_margin;
@@ -265,47 +176,95 @@ class GBLinear : public GradientBooster {
} }
} }
void PredictInteractionContributions(DMatrix* p_fmat,
std::vector<bst_float>* out_contribs,
unsigned ntree_limit, bool approximate) override {
std::vector<bst_float>& contribs = *out_contribs;
// linear models have no interaction effects
const size_t nelements = model.param.num_feature*model.param.num_feature;
contribs.resize(p_fmat->info().num_row * nelements * model.param.num_output_group);
std::fill(contribs.begin(), contribs.end(), 0);
}
std::vector<std::string> DumpModel(const FeatureMap& fmap, std::vector<std::string> DumpModel(const FeatureMap& fmap,
bool with_stats, bool with_stats,
std::string format) const override { std::string format) const override {
const int ngroup = model.param.num_output_group; return model.DumpModel(fmap, with_stats, format);
const unsigned nfeature = model.param.num_feature;
std::stringstream fo("");
if (format == "json") {
fo << " { \"bias\": [" << std::endl;
for (int gid = 0; gid < ngroup; ++gid) {
if (gid != 0) fo << "," << std::endl;
fo << " " << model.bias()[gid];
}
fo << std::endl << " ]," << std::endl
<< " \"weight\": [" << std::endl;
for (unsigned i = 0; i < nfeature; ++i) {
for (int gid = 0; gid < ngroup; ++gid) {
if (i != 0 || gid != 0) fo << "," << std::endl;
fo << " " << model[i][gid];
}
}
fo << std::endl << " ]" << std::endl << " }";
} else {
fo << "bias:\n";
for (int gid = 0; gid < ngroup; ++gid) {
fo << model.bias()[gid] << std::endl;
}
fo << "weight:\n";
for (unsigned i = 0; i < nfeature; ++i) {
for (int gid = 0; gid < ngroup; ++gid) {
fo << model[i][gid] << std::endl;
}
}
}
std::vector<std::string> v;
v.push_back(fo.str());
return v;
} }
protected: protected:
inline void Pred(const RowBatch::Inst &inst, bst_float *preds, int gid, bst_float base) { void PredictBatchInternal(DMatrix *p_fmat,
std::vector<bst_float> *out_preds) {
monitor.Start("PredictBatchInternal");
model.LazyInitModel();
std::vector<bst_float> &preds = *out_preds;
const std::vector<bst_float>& base_margin = p_fmat->info().base_margin;
// start collecting the prediction
dmlc::DataIter<RowBatch> *iter = p_fmat->RowIterator();
const int ngroup = model.param.num_output_group;
preds.resize(p_fmat->info().num_row * ngroup);
while (iter->Next()) {
const RowBatch &batch = iter->Value();
// output convention: nrow * k, where nrow is number of rows
// k is number of group
// parallel over local batch
const omp_ulong nsize = static_cast<omp_ulong>(batch.size);
#pragma omp parallel for schedule(static)
for (omp_ulong i = 0; i < nsize; ++i) {
const size_t ridx = batch.base_rowid + i;
// loop over output groups
for (int gid = 0; gid < ngroup; ++gid) {
bst_float margin = (base_margin.size() != 0) ?
base_margin[ridx * ngroup + gid] : base_margin_;
this->Pred(batch[i], &preds[ridx * ngroup], gid, margin);
}
}
}
monitor.Stop("PredictBatchInternal");
}
void UpdatePredictionCache() {
// update cache entry
for (auto &kv : cache_) {
PredictionCacheEntry &e = kv.second;
if (e.predictions.size() == 0) {
size_t n = model.param.num_output_group * e.data->info().num_row;
e.predictions.resize(n);
}
this->PredictBatchInternal(e.data.get(), &e.predictions);
}
}
bool CheckConvergence() {
if (param.tolerance == 0.0f) return false;
if (is_converged) return true;
if (previous_model.weight.size() != model.weight.size()) {
previous_model = model;
return false;
}
float largest_dw = 0.0;
for (size_t i = 0; i < model.weight.size(); i++) {
largest_dw = std::max(
largest_dw, std::abs(model.weight[i] - previous_model.weight[i]));
}
previous_model = model;
is_converged = largest_dw <= param.tolerance;
return is_converged;
}
void LazySumWeights(DMatrix *p_fmat) {
if (!sum_weight_complete) {
auto &info = p_fmat->info();
for (size_t i = 0; i < info.num_row; i++) {
sum_instance_weight += info.GetWeight(i);
}
sum_weight_complete = true;
}
}
inline void Pred(const RowBatch::Inst &inst, bst_float *preds, int gid,
bst_float base) {
bst_float psum = model.bias()[gid] + base; bst_float psum = model.bias()[gid] + base;
for (bst_uint i = 0; i < inst.length; ++i) { for (bst_uint i = 0; i < inst.length; ++i) {
if (inst[i].index >= model.param.num_feature) continue; if (inst[i].index >= model.param.num_feature) continue;
@@ -313,52 +272,33 @@ class GBLinear : public GradientBooster {
} }
preds[gid] = psum; preds[gid] = psum;
} }
// model for linear booster
class Model {
public:
// parameter
GBLinearModelParam param;
// weight for each of feature, bias is the last one
std::vector<bst_float> weight;
// initialize the model parameter
inline void InitModel(void) {
// bias is the last weight
weight.resize((param.num_feature + 1) * param.num_output_group);
std::fill(weight.begin(), weight.end(), 0.0f);
}
// save the model to file
inline void Save(dmlc::Stream* fo) const {
fo->Write(&param, sizeof(param));
fo->Write(weight);
}
// load model from file
inline void Load(dmlc::Stream* fi) {
CHECK_EQ(fi->Read(&param, sizeof(param)), sizeof(param));
fi->Read(&weight);
}
// model bias
inline bst_float* bias() {
return &weight[param.num_feature * param.num_output_group];
}
inline const bst_float* bias() const {
return &weight[param.num_feature * param.num_output_group];
}
// get i-th weight
inline bst_float* operator[](size_t i) {
return &weight[i * param.num_output_group];
}
inline const bst_float* operator[](size_t i) const {
return &weight[i * param.num_output_group];
}
};
// biase margin score // biase margin score
bst_float base_margin_; bst_float base_margin_;
// model field // model field
Model model; GBLinearModel model;
// training parameter GBLinearModel previous_model;
GBLinearTrainParam param; GBLinearTrainParam param;
// Per feature: shuffle index of each feature index std::unique_ptr<LinearUpdater> updater;
std::vector<bst_uint> feat_index; double sum_instance_weight;
bool sum_weight_complete;
common::Monitor monitor;
bool is_converged;
/**
* \struct PredictionCacheEntry
*
* \brief Contains pointer to input matrix and associated cached predictions.
*/
struct PredictionCacheEntry {
std::shared_ptr<DMatrix> data;
std::vector<bst_float> predictions;
};
/**
* \brief Map of matrices and associated cached predictions to facilitate
* storing and looking up predictions.
*/
std::unordered_map<DMatrix*, PredictionCacheEntry> cache_;
}; };
// register the objective functions // register the objective functions
@@ -366,9 +306,10 @@ DMLC_REGISTER_PARAMETER(GBLinearModelParam);
DMLC_REGISTER_PARAMETER(GBLinearTrainParam); DMLC_REGISTER_PARAMETER(GBLinearTrainParam);
XGBOOST_REGISTER_GBM(GBLinear, "gblinear") XGBOOST_REGISTER_GBM(GBLinear, "gblinear")
.describe("Linear booster, implement generalized linear model.") .describe("Linear booster, implement generalized linear model.")
.set_body([](const std::vector<std::shared_ptr<DMatrix> >&cache, bst_float base_margin) { .set_body([](const std::vector<std::shared_ptr<DMatrix> > &cache,
return new GBLinear(base_margin); bst_float base_margin) {
return new GBLinear(cache, base_margin);
}); });
} // namespace gbm } // namespace gbm
} // namespace xgboost } // namespace xgboost

113
src/gbm/gblinear_model.h Normal file
View File

@@ -0,0 +1,113 @@
/*!
* Copyright by Contributors 2018
*/
#pragma once
#include <dmlc/io.h>
#include <dmlc/parameter.h>
#include <xgboost/feature_map.h>
#include <vector>
#include <string>
#include <cstring>
namespace xgboost {
namespace gbm {
// model parameter
struct GBLinearModelParam : public dmlc::Parameter<GBLinearModelParam> {
// number of feature dimension
unsigned num_feature;
// number of output group
int num_output_group;
// reserved field
int reserved[32];
// constructor
GBLinearModelParam() { std::memset(this, 0, sizeof(GBLinearModelParam)); }
DMLC_DECLARE_PARAMETER(GBLinearModelParam) {
DMLC_DECLARE_FIELD(num_feature)
.set_lower_bound(0)
.describe("Number of features used in classification.");
DMLC_DECLARE_FIELD(num_output_group)
.set_lower_bound(1)
.set_default(1)
.describe("Number of output groups in the setting.");
}
};
// model for linear booster
class GBLinearModel {
public:
// parameter
GBLinearModelParam param;
// weight for each of feature, bias is the last one
std::vector<bst_float> weight;
// initialize the model parameter
inline void LazyInitModel(void) {
if (!weight.empty()) return;
// bias is the last weight
weight.resize((param.num_feature + 1) * param.num_output_group);
std::fill(weight.begin(), weight.end(), 0.0f);
}
// save the model to file
inline void Save(dmlc::Stream* fo) const {
fo->Write(&param, sizeof(param));
fo->Write(weight);
}
// load model from file
inline void Load(dmlc::Stream* fi) {
CHECK_EQ(fi->Read(&param, sizeof(param)), sizeof(param));
fi->Read(&weight);
}
// model bias
inline bst_float* bias() {
return &weight[param.num_feature * param.num_output_group];
}
inline const bst_float* bias() const {
return &weight[param.num_feature * param.num_output_group];
}
// get i-th weight
inline bst_float* operator[](size_t i) {
return &weight[i * param.num_output_group];
}
inline const bst_float* operator[](size_t i) const {
return &weight[i * param.num_output_group];
}
std::vector<std::string> DumpModel(const FeatureMap& fmap, bool with_stats,
std::string format) const {
const int ngroup = param.num_output_group;
const unsigned nfeature = param.num_feature;
std::stringstream fo("");
if (format == "json") {
fo << " { \"bias\": [" << std::endl;
for (int gid = 0; gid < ngroup; ++gid) {
if (gid != 0) fo << "," << std::endl;
fo << " " << this->bias()[gid];
}
fo << std::endl << " ]," << std::endl
<< " \"weight\": [" << std::endl;
for (unsigned i = 0; i < nfeature; ++i) {
for (int gid = 0; gid < ngroup; ++gid) {
if (i != 0 || gid != 0) fo << "," << std::endl;
fo << " " << (*this)[i][gid];
}
}
fo << std::endl << " ]" << std::endl << " }";
} else {
fo << "bias:\n";
for (int gid = 0; gid < ngroup; ++gid) {
fo << this->bias()[gid] << std::endl;
}
fo << "weight:\n";
for (unsigned i = 0; i < nfeature; ++i) {
for (int gid = 0; gid < ngroup; ++gid) {
fo << (*this)[i][gid] << std::endl;
}
}
}
std::vector<std::string> v;
v.push_back(fo.str());
return v;
}
};
} // namespace gbm
} // namespace xgboost

View File

@@ -21,6 +21,7 @@ GradientBooster* GradientBooster::Create(
} }
return (e->body)(cache_mats, base_margin); return (e->body)(cache_mats, base_margin);
} }
} // namespace xgboost } // namespace xgboost
namespace xgboost { namespace xgboost {

View File

@@ -18,6 +18,7 @@
#include <limits> #include <limits>
#include <algorithm> #include <algorithm>
#include "../common/common.h" #include "../common/common.h"
#include "../common/host_device_vector.h"
#include "../common/random.h" #include "../common/random.h"
#include "gbtree_model.h" #include "gbtree_model.h"
#include "../common/timer.h" #include "../common/timer.h"
@@ -180,41 +181,42 @@ class GBTree : public GradientBooster {
} }
void DoBoost(DMatrix* p_fmat, void DoBoost(DMatrix* p_fmat,
std::vector<bst_gpair>* in_gpair, HostDeviceVector<bst_gpair>* in_gpair,
ObjFunction* obj) override { ObjFunction* obj) override {
const std::vector<bst_gpair>& gpair = *in_gpair;
std::vector<std::vector<std::unique_ptr<RegTree> > > new_trees; std::vector<std::vector<std::unique_ptr<RegTree> > > new_trees;
const int ngroup = model_.param.num_output_group; const int ngroup = model_.param.num_output_group;
monitor.Start("BoostNewTrees"); monitor.Start("BoostNewTrees");
if (ngroup == 1) { if (ngroup == 1) {
std::vector<std::unique_ptr<RegTree> > ret; std::vector<std::unique_ptr<RegTree> > ret;
BoostNewTrees(gpair, p_fmat, 0, &ret); BoostNewTrees(in_gpair, p_fmat, 0, &ret);
new_trees.push_back(std::move(ret)); new_trees.push_back(std::move(ret));
} else { } else {
CHECK_EQ(gpair.size() % ngroup, 0U) CHECK_EQ(in_gpair->size() % ngroup, 0U)
<< "must have exactly ngroup*nrow gpairs"; << "must have exactly ngroup*nrow gpairs";
std::vector<bst_gpair> tmp(gpair.size() / ngroup); // TODO(canonizer): perform this on GPU if HostDeviceVector has device set.
for (int gid = 0; gid < ngroup; ++gid) { HostDeviceVector<bst_gpair> tmp(in_gpair->size() / ngroup,
bst_gpair(), in_gpair->device());
std::vector<bst_gpair>& gpair_h = in_gpair->data_h();
bst_omp_uint nsize = static_cast<bst_omp_uint>(tmp.size()); bst_omp_uint nsize = static_cast<bst_omp_uint>(tmp.size());
for (int gid = 0; gid < ngroup; ++gid) {
std::vector<bst_gpair>& tmp_h = tmp.data_h();
#pragma omp parallel for schedule(static) #pragma omp parallel for schedule(static)
for (bst_omp_uint i = 0; i < nsize; ++i) { for (bst_omp_uint i = 0; i < nsize; ++i) {
tmp[i] = gpair[i * ngroup + gid]; tmp_h[i] = gpair_h[i * ngroup + gid];
} }
std::vector<std::unique_ptr<RegTree> > ret; std::vector<std::unique_ptr<RegTree> > ret;
BoostNewTrees(tmp, p_fmat, gid, &ret); BoostNewTrees(&tmp, p_fmat, gid, &ret);
new_trees.push_back(std::move(ret)); new_trees.push_back(std::move(ret));
} }
} }
monitor.Stop("BoostNewTrees"); monitor.Stop("BoostNewTrees");
monitor.Start("CommitModel"); monitor.Start("CommitModel");
for (int gid = 0; gid < ngroup; ++gid) { this->CommitModel(std::move(new_trees));
this->CommitModel(std::move(new_trees[gid]), gid);
}
monitor.Stop("CommitModel"); monitor.Stop("CommitModel");
} }
void PredictBatch(DMatrix* p_fmat, void PredictBatch(DMatrix* p_fmat,
std::vector<bst_float>* out_preds, HostDeviceVector<bst_float>* out_preds,
unsigned ntree_limit) override { unsigned ntree_limit) override {
predictor->PredictBatch(p_fmat, out_preds, model_, 0, ntree_limit); predictor->PredictBatch(p_fmat, out_preds, model_, 0, ntree_limit);
} }
@@ -235,10 +237,18 @@ class GBTree : public GradientBooster {
void PredictContribution(DMatrix* p_fmat, void PredictContribution(DMatrix* p_fmat,
std::vector<bst_float>* out_contribs, std::vector<bst_float>* out_contribs,
unsigned ntree_limit, bool approximate) override { unsigned ntree_limit, bool approximate, int condition,
unsigned condition_feature) override {
predictor->PredictContribution(p_fmat, out_contribs, model_, ntree_limit, approximate); predictor->PredictContribution(p_fmat, out_contribs, model_, ntree_limit, approximate);
} }
void PredictInteractionContributions(DMatrix* p_fmat,
std::vector<bst_float>* out_contribs,
unsigned ntree_limit, bool approximate) override {
predictor->PredictInteractionContributions(p_fmat, out_contribs, model_,
ntree_limit, approximate);
}
std::vector<std::string> DumpModel(const FeatureMap& fmap, std::vector<std::string> DumpModel(const FeatureMap& fmap,
bool with_stats, bool with_stats,
std::string format) const override { std::string format) const override {
@@ -257,9 +267,9 @@ class GBTree : public GradientBooster {
updaters.push_back(std::move(up)); updaters.push_back(std::move(up));
} }
} }
// do group specific group // do group specific group
inline void inline void BoostNewTrees(HostDeviceVector<bst_gpair>* gpair,
BoostNewTrees(const std::vector<bst_gpair> &gpair,
DMatrix *p_fmat, DMatrix *p_fmat,
int bst_group, int bst_group,
std::vector<std::unique_ptr<RegTree> >* ret) { std::vector<std::unique_ptr<RegTree> >* ret) {
@@ -285,17 +295,19 @@ class GBTree : public GradientBooster {
} }
} }
// update the trees // update the trees
for (auto& up : updaters) { for (auto& up : updaters)
up->Update(gpair, p_fmat, new_trees); up->Update(gpair, p_fmat, new_trees);
} }
}
// commit new trees all at once // commit new trees all at once
virtual void virtual void
CommitModel(std::vector<std::unique_ptr<RegTree> >&& new_trees, CommitModel(std::vector<std::vector<std::unique_ptr<RegTree>>>&& new_trees) {
int bst_group) { int num_new_trees = 0;
model_.CommitModel(std::move(new_trees), bst_group); for (int gid = 0; gid < model_.param.num_output_group; ++gid) {
num_new_trees += new_trees[gid].size();
predictor->UpdatePredictionCache(model_, &updaters, new_trees.size()); model_.CommitModel(std::move(new_trees[gid]), gid);
}
predictor->UpdatePredictionCache(model_, &updaters, num_new_trees);
} }
// --- data structure --- // --- data structure ---
@@ -342,10 +354,10 @@ class Dart : public GBTree {
// predict the leaf scores with dropout if ntree_limit = 0 // predict the leaf scores with dropout if ntree_limit = 0
void PredictBatch(DMatrix* p_fmat, void PredictBatch(DMatrix* p_fmat,
std::vector<bst_float>* out_preds, HostDeviceVector<bst_float>* out_preds,
unsigned ntree_limit) override { unsigned ntree_limit) override {
DropTrees(ntree_limit); DropTrees(ntree_limit);
PredLoopInternal<Dart>(p_fmat, out_preds, 0, ntree_limit, true); PredLoopInternal<Dart>(p_fmat, &out_preds->data_h(), 0, ntree_limit, true);
} }
void PredictInstance(const SparseBatch::Inst& inst, void PredictInstance(const SparseBatch::Inst& inst,
@@ -467,20 +479,22 @@ class Dart : public GBTree {
} }
} }
} }
// commit new trees all at once // commit new trees all at once
void CommitModel(std::vector<std::unique_ptr<RegTree> >&& new_trees, void
int bst_group) override { CommitModel(std::vector<std::vector<std::unique_ptr<RegTree>>>&& new_trees) override {
for (size_t i = 0; i < new_trees.size(); ++i) { int num_new_trees = 0;
model_.trees.push_back(std::move(new_trees[i])); for (int gid = 0; gid < model_.param.num_output_group; ++gid) {
model_.tree_info.push_back(bst_group); num_new_trees += new_trees[gid].size();
model_.CommitModel(std::move(new_trees[gid]), gid);
} }
model_.param.num_trees += static_cast<int>(new_trees.size()); size_t num_drop = NormalizeTrees(num_new_trees);
size_t num_drop = NormalizeTrees(new_trees.size());
if (dparam.silent != 1) { if (dparam.silent != 1) {
LOG(INFO) << "drop " << num_drop << " trees, " LOG(INFO) << "drop " << num_drop << " trees, "
<< "weight = " << weight_drop.back(); << "weight = " << weight_drop.back();
} }
} }
// predict the leaf scores without dropped trees // predict the leaf scores without dropped trees
inline bst_float PredValue(const RowBatch::Inst &inst, inline bst_float PredValue(const RowBatch::Inst &inst,
int bst_group, int bst_group,
@@ -503,16 +517,17 @@ class Dart : public GBTree {
return psum; return psum;
} }
// select dropped trees // select which trees to drop
inline void DropTrees(unsigned ntree_limit_drop) { inline void DropTrees(unsigned ntree_limit_drop) {
idx_drop.clear();
if (ntree_limit_drop > 0) return;
std::uniform_real_distribution<> runif(0.0, 1.0); std::uniform_real_distribution<> runif(0.0, 1.0);
auto& rnd = common::GlobalRandom(); auto& rnd = common::GlobalRandom();
// reset
idx_drop.clear();
// sample dropped trees
bool skip = false; bool skip = false;
if (dparam.skip_drop > 0.0) skip = (runif(rnd) < dparam.skip_drop); if (dparam.skip_drop > 0.0) skip = (runif(rnd) < dparam.skip_drop);
if (ntree_limit_drop == 0 && !skip) { // sample some trees to drop
if (!skip) {
if (dparam.sample_type == 1) { if (dparam.sample_type == 1) {
bst_float sum_weight = 0.0; bst_float sum_weight = 0.0;
for (size_t i = 0; i < weight_drop.size(); ++i) { for (size_t i = 0; i < weight_drop.size(); ++i) {
@@ -547,6 +562,7 @@ class Dart : public GBTree {
} }
} }
} }
// set normalization factors // set normalization factors
inline size_t NormalizeTrees(size_t size_new_trees) { inline size_t NormalizeTrees(size_t size_new_trees) {
float lr = 1.0 * dparam.learning_rate / size_new_trees; float lr = 1.0 * dparam.learning_rate / size_new_trees;

View File

@@ -16,10 +16,12 @@
#include <utility> #include <utility>
#include <vector> #include <vector>
#include "./common/common.h" #include "./common/common.h"
#include "./common/host_device_vector.h"
#include "./common/io.h" #include "./common/io.h"
#include "./common/random.h" #include "./common/random.h"
#include "common/timer.h" #include "common/timer.h"
namespace xgboost { namespace xgboost {
// implementation of base learner. // implementation of base learner.
bool Learner::AllowLazyCheckPoint() const { bool Learner::AllowLazyCheckPoint() const {
@@ -363,14 +365,14 @@ class LearnerImpl : public Learner {
this->PredictRaw(train, &preds_); this->PredictRaw(train, &preds_);
monitor.Stop("PredictRaw"); monitor.Stop("PredictRaw");
monitor.Start("GetGradient"); monitor.Start("GetGradient");
obj_->GetGradient(preds_, train->info(), iter, &gpair_); obj_->GetGradient(&preds_, train->info(), iter, &gpair_);
monitor.Stop("GetGradient"); monitor.Stop("GetGradient");
gbm_->DoBoost(train, &gpair_, obj_.get()); gbm_->DoBoost(train, &gpair_, obj_.get());
monitor.Stop("UpdateOneIter"); monitor.Stop("UpdateOneIter");
} }
void BoostOneIter(int iter, DMatrix* train, void BoostOneIter(int iter, DMatrix* train,
std::vector<bst_gpair>* in_gpair) override { HostDeviceVector<bst_gpair>* in_gpair) override {
monitor.Start("BoostOneIter"); monitor.Start("BoostOneIter");
if (tparam.seed_per_iteration || rabit::IsDistributed()) { if (tparam.seed_per_iteration || rabit::IsDistributed()) {
common::GlobalRandom().seed(tparam.seed * kRandSeedMagic + iter); common::GlobalRandom().seed(tparam.seed * kRandSeedMagic + iter);
@@ -393,7 +395,7 @@ class LearnerImpl : public Learner {
obj_->EvalTransform(&preds_); obj_->EvalTransform(&preds_);
for (auto& ev : metrics_) { for (auto& ev : metrics_) {
os << '\t' << data_names[i] << '-' << ev->Name() << ':' os << '\t' << data_names[i] << '-' << ev->Name() << ':'
<< ev->Eval(preds_, data_sets[i]->info(), tparam.dsplit == 2); << ev->Eval(preds_.data_h(), data_sets[i]->info(), tparam.dsplit == 2);
} }
} }
@@ -436,16 +438,20 @@ class LearnerImpl : public Learner {
this->PredictRaw(data, &preds_); this->PredictRaw(data, &preds_);
obj_->EvalTransform(&preds_); obj_->EvalTransform(&preds_);
return std::make_pair(metric, return std::make_pair(metric,
ev->Eval(preds_, data->info(), tparam.dsplit == 2)); ev->Eval(preds_.data_h(), data->info(), tparam.dsplit == 2));
} }
void Predict(DMatrix* data, bool output_margin, void Predict(DMatrix* data, bool output_margin,
std::vector<bst_float>* out_preds, unsigned ntree_limit, HostDeviceVector<bst_float>* out_preds, unsigned ntree_limit,
bool pred_leaf, bool pred_contribs, bool approx_contribs) const override { bool pred_leaf, bool pred_contribs, bool approx_contribs,
bool pred_interactions) const override {
if (pred_contribs) { if (pred_contribs) {
gbm_->PredictContribution(data, out_preds, ntree_limit, approx_contribs); gbm_->PredictContribution(data, &out_preds->data_h(), ntree_limit, approx_contribs);
} else if (pred_interactions) {
gbm_->PredictInteractionContributions(data, &out_preds->data_h(), ntree_limit,
approx_contribs);
} else if (pred_leaf) { } else if (pred_leaf) {
gbm_->PredictLeaf(data, out_preds, ntree_limit); gbm_->PredictLeaf(data, &out_preds->data_h(), ntree_limit);
} else { } else {
this->PredictRaw(data, out_preds, ntree_limit); this->PredictRaw(data, out_preds, ntree_limit);
if (!output_margin) { if (!output_margin) {
@@ -459,18 +465,18 @@ class LearnerImpl : public Learner {
// if not, initialize the column access. // if not, initialize the column access.
inline void LazyInitDMatrix(DMatrix* p_train) { inline void LazyInitDMatrix(DMatrix* p_train) {
if (tparam.tree_method == 3 || tparam.tree_method == 4 || if (tparam.tree_method == 3 || tparam.tree_method == 4 ||
tparam.tree_method == 5) { tparam.tree_method == 5 || name_gbm_ == "gblinear") {
return; return;
} }
monitor.Start("LazyInitDMatrix"); monitor.Start("LazyInitDMatrix");
if (!p_train->HaveColAccess()) { if (!p_train->HaveColAccess(true)) {
int ncol = static_cast<int>(p_train->info().num_col); int ncol = static_cast<int>(p_train->info().num_col);
std::vector<bool> enabled(ncol, true); std::vector<bool> enabled(ncol, true);
// set max row per batch to limited value // set max row per batch to limited value
// in distributed mode, use safe choice otherwise // in distributed mode, use safe choice otherwise
size_t max_row_perbatch = tparam.max_row_perbatch; size_t max_row_perbatch = tparam.max_row_perbatch;
const size_t safe_max_row = static_cast<size_t>(32UL << 10UL); const size_t safe_max_row = static_cast<size_t>(32ul << 10ul);
if (tparam.tree_method == 0 && p_train->info().num_row >= (4UL << 20UL)) { if (tparam.tree_method == 0 && p_train->info().num_row >= (4UL << 20UL)) {
LOG(CONSOLE) LOG(CONSOLE)
@@ -490,7 +496,7 @@ class LearnerImpl : public Learner {
max_row_perbatch = std::min(max_row_perbatch, safe_max_row); max_row_perbatch = std::min(max_row_perbatch, safe_max_row);
} }
// initialize column access // initialize column access
p_train->InitColAccess(enabled, tparam.prob_buffer_row, max_row_perbatch); p_train->InitColAccess(enabled, tparam.prob_buffer_row, max_row_perbatch, true);
} }
if (!p_train->SingleColBlock() && cfg_.count("updater") == 0) { if (!p_train->SingleColBlock() && cfg_.count("updater") == 0) {
@@ -541,12 +547,13 @@ class LearnerImpl : public Learner {
* \param ntree_limit limit number of trees used for boosted tree * \param ntree_limit limit number of trees used for boosted tree
* predictor, when it equals 0, this means we are using all the trees * predictor, when it equals 0, this means we are using all the trees
*/ */
inline void PredictRaw(DMatrix* data, std::vector<bst_float>* out_preds, inline void PredictRaw(DMatrix* data, HostDeviceVector<bst_float>* out_preds,
unsigned ntree_limit = 0) const { unsigned ntree_limit = 0) const {
CHECK(gbm_.get() != nullptr) CHECK(gbm_.get() != nullptr)
<< "Predict must happen after Load or InitModel"; << "Predict must happen after Load or InitModel";
gbm_->PredictBatch(data, out_preds, ntree_limit); gbm_->PredictBatch(data, out_preds, ntree_limit);
} }
// model parameter // model parameter
LearnerModelParam mparam; LearnerModelParam mparam;
// training parameter // training parameter
@@ -560,9 +567,9 @@ class LearnerImpl : public Learner {
// name of objective function // name of objective function
std::string name_obj_; std::string name_obj_;
// temporal storages for prediction // temporal storages for prediction
std::vector<bst_float> preds_; HostDeviceVector<bst_float> preds_;
// gradient pairs // gradient pairs
std::vector<bst_gpair> gpair_; HostDeviceVector<bst_gpair> gpair_;
private: private:
/*! \brief random number transformation seed. */ /*! \brief random number transformation seed. */

View File

@@ -0,0 +1,487 @@
/*!
* Copyright 2018 by Contributors
* \author Rory Mitchell
*/
#pragma once
#include <algorithm>
#include <string>
#include <utility>
#include <vector>
#include <limits>
#include "../common/random.h"
namespace xgboost {
namespace linear {
/**
* \brief Calculate change in weight for a given feature. Applies l1/l2 penalty normalised by the
* number of training instances.
*
* \param sum_grad The sum gradient.
* \param sum_hess The sum hess.
* \param w The weight.
* \param reg_alpha Unnormalised L1 penalty.
* \param reg_lambda Unnormalised L2 penalty.
*
* \return The weight update.
*/
inline double CoordinateDelta(double sum_grad, double sum_hess, double w,
double reg_alpha, double reg_lambda) {
if (sum_hess < 1e-5f) return 0.0f;
const double sum_grad_l2 = sum_grad + reg_lambda * w;
const double sum_hess_l2 = sum_hess + reg_lambda;
const double tmp = w - sum_grad_l2 / sum_hess_l2;
if (tmp >= 0) {
return std::max(-(sum_grad_l2 + reg_alpha) / sum_hess_l2, -w);
} else {
return std::min(-(sum_grad_l2 - reg_alpha) / sum_hess_l2, -w);
}
}
/**
* \brief Calculate update to bias.
*
* \param sum_grad The sum gradient.
* \param sum_hess The sum hess.
*
* \return The weight update.
*/
inline double CoordinateDeltaBias(double sum_grad, double sum_hess) {
return -sum_grad / sum_hess;
}
/**
* \brief Get the gradient with respect to a single feature.
*
* \param group_idx Zero-based index of the group.
* \param num_group Number of groups.
* \param fidx The target feature.
* \param gpair Gradients.
* \param p_fmat The feature matrix.
*
* \return The gradient and diagonal Hessian entry for a given feature.
*/
inline std::pair<double, double> GetGradient(int group_idx, int num_group, int fidx,
const std::vector<bst_gpair> &gpair,
DMatrix *p_fmat) {
double sum_grad = 0.0, sum_hess = 0.0;
dmlc::DataIter<ColBatch> *iter = p_fmat->ColIterator({static_cast<bst_uint>(fidx)});
while (iter->Next()) {
const ColBatch &batch = iter->Value();
ColBatch::Inst col = batch[0];
const bst_omp_uint ndata = static_cast<bst_omp_uint>(col.length);
for (bst_omp_uint j = 0; j < ndata; ++j) {
const bst_float v = col[j].fvalue;
auto &p = gpair[col[j].index * num_group + group_idx];
if (p.GetHess() < 0.0f) continue;
sum_grad += p.GetGrad() * v;
sum_hess += p.GetHess() * v * v;
}
}
return std::make_pair(sum_grad, sum_hess);
}
/**
* \brief Get the gradient with respect to a single feature. Row-wise multithreaded.
*
* \param group_idx Zero-based index of the group.
* \param num_group Number of groups.
* \param fidx The target feature.
* \param gpair Gradients.
* \param p_fmat The feature matrix.
*
* \return The gradient and diagonal Hessian entry for a given feature.
*/
inline std::pair<double, double> GetGradientParallel(int group_idx, int num_group, int fidx,
const std::vector<bst_gpair> &gpair,
DMatrix *p_fmat) {
double sum_grad = 0.0, sum_hess = 0.0;
dmlc::DataIter<ColBatch> *iter = p_fmat->ColIterator({static_cast<bst_uint>(fidx)});
while (iter->Next()) {
const ColBatch &batch = iter->Value();
ColBatch::Inst col = batch[0];
const bst_omp_uint ndata = static_cast<bst_omp_uint>(col.length);
#pragma omp parallel for schedule(static) reduction(+ : sum_grad, sum_hess)
for (bst_omp_uint j = 0; j < ndata; ++j) {
const bst_float v = col[j].fvalue;
auto &p = gpair[col[j].index * num_group + group_idx];
if (p.GetHess() < 0.0f) continue;
sum_grad += p.GetGrad() * v;
sum_hess += p.GetHess() * v * v;
}
}
return std::make_pair(sum_grad, sum_hess);
}
/**
* \brief Get the gradient with respect to the bias. Row-wise multithreaded.
*
* \param group_idx Zero-based index of the group.
* \param num_group Number of groups.
* \param gpair Gradients.
* \param p_fmat The feature matrix.
*
* \return The gradient and diagonal Hessian entry for the bias.
*/
inline std::pair<double, double> GetBiasGradientParallel(int group_idx, int num_group,
const std::vector<bst_gpair> &gpair,
DMatrix *p_fmat) {
const RowSet &rowset = p_fmat->buffered_rowset();
double sum_grad = 0.0, sum_hess = 0.0;
const bst_omp_uint ndata = static_cast<bst_omp_uint>(rowset.size());
#pragma omp parallel for schedule(static) reduction(+ : sum_grad, sum_hess)
for (bst_omp_uint i = 0; i < ndata; ++i) {
auto &p = gpair[rowset[i] * num_group + group_idx];
if (p.GetHess() >= 0.0f) {
sum_grad += p.GetGrad();
sum_hess += p.GetHess();
}
}
return std::make_pair(sum_grad, sum_hess);
}
/**
* \brief Updates the gradient vector with respect to a change in weight.
*
* \param fidx The feature index.
* \param group_idx Zero-based index of the group.
* \param num_group Number of groups.
* \param dw The change in weight.
* \param in_gpair The gradient vector to be updated.
* \param p_fmat The input feature matrix.
*/
inline void UpdateResidualParallel(int fidx, int group_idx, int num_group,
float dw, std::vector<bst_gpair> *in_gpair,
DMatrix *p_fmat) {
if (dw == 0.0f) return;
dmlc::DataIter<ColBatch> *iter = p_fmat->ColIterator({static_cast<bst_uint>(fidx)});
while (iter->Next()) {
const ColBatch &batch = iter->Value();
ColBatch::Inst col = batch[0];
// update grad value
const bst_omp_uint num_row = static_cast<bst_omp_uint>(col.length);
#pragma omp parallel for schedule(static)
for (bst_omp_uint j = 0; j < num_row; ++j) {
bst_gpair &p = (*in_gpair)[col[j].index * num_group + group_idx];
if (p.GetHess() < 0.0f) continue;
p += bst_gpair(p.GetHess() * col[j].fvalue * dw, 0);
}
}
}
/**
* \brief Updates the gradient vector based on a change in the bias.
*
* \param group_idx Zero-based index of the group.
* \param num_group Number of groups.
* \param dbias The change in bias.
* \param in_gpair The gradient vector to be updated.
* \param p_fmat The input feature matrix.
*/
inline void UpdateBiasResidualParallel(int group_idx, int num_group, float dbias,
std::vector<bst_gpair> *in_gpair,
DMatrix *p_fmat) {
if (dbias == 0.0f) return;
const RowSet &rowset = p_fmat->buffered_rowset();
const bst_omp_uint ndata = static_cast<bst_omp_uint>(p_fmat->info().num_row);
#pragma omp parallel for schedule(static)
for (bst_omp_uint i = 0; i < ndata; ++i) {
bst_gpair &g = (*in_gpair)[rowset[i] * num_group + group_idx];
if (g.GetHess() < 0.0f) continue;
g += bst_gpair(g.GetHess() * dbias, 0);
}
}
/**
* \brief Abstract class for stateful feature selection or ordering
* in coordinate descent algorithms.
*/
class FeatureSelector {
public:
/*! \brief factory method */
static FeatureSelector *Create(int choice);
/*! \brief virtual destructor */
virtual ~FeatureSelector() {}
/**
* \brief Setting up the selector state prior to looping through features.
*
* \param model The model.
* \param gpair The gpair.
* \param p_fmat The feature matrix.
* \param alpha Regularisation alpha.
* \param lambda Regularisation lambda.
* \param param A parameter with algorithm-dependent use.
*/
virtual void Setup(const gbm::GBLinearModel &model,
const std::vector<bst_gpair> &gpair,
DMatrix *p_fmat,
float alpha, float lambda, int param) {}
/**
* \brief Select next coordinate to update.
*
* \param iteration The iteration in a loop through features
* \param model The model.
* \param group_idx Zero-based index of the group.
* \param gpair The gpair.
* \param p_fmat The feature matrix.
* \param alpha Regularisation alpha.
* \param lambda Regularisation lambda.
*
* \return The index of the selected feature. -1 indicates none selected.
*/
virtual int NextFeature(int iteration,
const gbm::GBLinearModel &model,
int group_idx,
const std::vector<bst_gpair> &gpair,
DMatrix *p_fmat, float alpha, float lambda) = 0;
};
/**
* \brief Deterministic selection by cycling through features one at a time.
*/
class CyclicFeatureSelector : public FeatureSelector {
public:
int NextFeature(int iteration, const gbm::GBLinearModel &model,
int group_idx, const std::vector<bst_gpair> &gpair,
DMatrix *p_fmat, float alpha, float lambda) override {
return iteration % model.param.num_feature;
}
};
/**
* \brief Similar to Cyclyc but with random feature shuffling prior to each update.
* \note Its randomness is controllable by setting a random seed.
*/
class ShuffleFeatureSelector : public FeatureSelector {
public:
void Setup(const gbm::GBLinearModel &model,
const std::vector<bst_gpair> &gpair,
DMatrix *p_fmat, float alpha, float lambda, int param) override {
if (feat_index.size() == 0) {
feat_index.resize(model.param.num_feature);
std::iota(feat_index.begin(), feat_index.end(), 0);
}
std::shuffle(feat_index.begin(), feat_index.end(), common::GlobalRandom());
}
int NextFeature(int iteration, const gbm::GBLinearModel &model,
int group_idx, const std::vector<bst_gpair> &gpair,
DMatrix *p_fmat, float alpha, float lambda) override {
return feat_index[iteration % model.param.num_feature];
}
protected:
std::vector<bst_uint> feat_index;
};
/**
* \brief A random (with replacement) coordinate selector.
* \note Its randomness is controllable by setting a random seed.
*/
class RandomFeatureSelector : public FeatureSelector {
public:
int NextFeature(int iteration, const gbm::GBLinearModel &model,
int group_idx, const std::vector<bst_gpair> &gpair,
DMatrix *p_fmat, float alpha, float lambda) override {
return common::GlobalRandom()() % model.param.num_feature;
}
};
/**
* \brief Select coordinate with the greatest gradient magnitude.
* \note It has O(num_feature^2) complexity. It is fully deterministic.
*
* \note It allows restricting the selection to top_k features per group with
* the largest magnitude of univariate weight change, by passing the top_k value
* through the `param` argument of Setup(). That would reduce the complexity to
* O(num_feature*top_k).
*/
class GreedyFeatureSelector : public FeatureSelector {
public:
void Setup(const gbm::GBLinearModel &model,
const std::vector<bst_gpair> &gpair,
DMatrix *p_fmat, float alpha, float lambda, int param) override {
top_k = static_cast<bst_uint>(param);
const bst_uint ngroup = model.param.num_output_group;
if (param <= 0) top_k = std::numeric_limits<bst_uint>::max();
if (counter.size() == 0) {
counter.resize(ngroup);
gpair_sums.resize(model.param.num_feature * ngroup);
}
for (bst_uint gid = 0u; gid < ngroup; ++gid) {
counter[gid] = 0u;
}
}
int NextFeature(int iteration, const gbm::GBLinearModel &model,
int group_idx, const std::vector<bst_gpair> &gpair,
DMatrix *p_fmat, float alpha, float lambda) override {
// k-th selected feature for a group
auto k = counter[group_idx]++;
// stop after either reaching top-K or going through all the features in a group
if (k >= top_k || counter[group_idx] == model.param.num_feature) return -1;
const int ngroup = model.param.num_output_group;
const bst_omp_uint nfeat = model.param.num_feature;
// Calculate univariate gradient sums
std::fill(gpair_sums.begin(), gpair_sums.end(), std::make_pair(0., 0.));
dmlc::DataIter<ColBatch> *iter = p_fmat->ColIterator();
while (iter->Next()) {
const ColBatch &batch = iter->Value();
#pragma omp parallel for schedule(static)
for (bst_omp_uint i = 0; i < nfeat; ++i) {
const ColBatch::Inst col = batch[i];
const bst_uint ndata = col.length;
auto &sums = gpair_sums[group_idx * nfeat + i];
for (bst_uint j = 0u; j < ndata; ++j) {
const bst_float v = col[j].fvalue;
auto &p = gpair[col[j].index * ngroup + group_idx];
if (p.GetHess() < 0.f) continue;
sums.first += p.GetGrad() * v;
sums.second += p.GetHess() * v * v;
}
}
}
// Find a feature with the largest magnitude of weight change
int best_fidx = 0;
double best_weight_update = 0.0f;
for (bst_omp_uint fidx = 0; fidx < nfeat; ++fidx) {
auto &s = gpair_sums[group_idx * nfeat + fidx];
float dw = std::abs(static_cast<bst_float>(
CoordinateDelta(s.first, s.second, model[fidx][group_idx], alpha, lambda)));
if (dw > best_weight_update) {
best_weight_update = dw;
best_fidx = fidx;
}
}
return best_fidx;
}
protected:
bst_uint top_k;
std::vector<bst_uint> counter;
std::vector<std::pair<double, double>> gpair_sums;
};
/**
* \brief Thrifty, approximately-greedy feature selector.
*
* \note Prior to cyclic updates, reorders features in descending magnitude of
* their univariate weight changes. This operation is multithreaded and is a
* linear complexity approximation of the quadratic greedy selection.
*
* \note It allows restricting the selection to top_k features per group with
* the largest magnitude of univariate weight change, by passing the top_k value
* through the `param` argument of Setup().
*/
class ThriftyFeatureSelector : public FeatureSelector {
public:
void Setup(const gbm::GBLinearModel &model,
const std::vector<bst_gpair> &gpair,
DMatrix *p_fmat, float alpha, float lambda, int param) override {
top_k = static_cast<bst_uint>(param);
if (param <= 0) top_k = std::numeric_limits<bst_uint>::max();
const bst_uint ngroup = model.param.num_output_group;
const bst_omp_uint nfeat = model.param.num_feature;
if (deltaw.size() == 0) {
deltaw.resize(nfeat * ngroup);
sorted_idx.resize(nfeat * ngroup);
counter.resize(ngroup);
gpair_sums.resize(nfeat * ngroup);
}
// Calculate univariate gradient sums
std::fill(gpair_sums.begin(), gpair_sums.end(), std::make_pair(0., 0.));
dmlc::DataIter<ColBatch> *iter = p_fmat->ColIterator();
while (iter->Next()) {
const ColBatch &batch = iter->Value();
// column-parallel is usually faster than row-parallel
#pragma omp parallel for schedule(static)
for (bst_omp_uint i = 0; i < nfeat; ++i) {
const ColBatch::Inst col = batch[i];
const bst_uint ndata = col.length;
for (bst_uint gid = 0u; gid < ngroup; ++gid) {
auto &sums = gpair_sums[gid * nfeat + i];
for (bst_uint j = 0u; j < ndata; ++j) {
const bst_float v = col[j].fvalue;
auto &p = gpair[col[j].index * ngroup + gid];
if (p.GetHess() < 0.f) continue;
sums.first += p.GetGrad() * v;
sums.second += p.GetHess() * v * v;
}
}
}
}
// rank by descending weight magnitude within the groups
std::fill(deltaw.begin(), deltaw.end(), 0.f);
std::iota(sorted_idx.begin(), sorted_idx.end(), 0);
bst_float *pdeltaw = &deltaw[0];
for (bst_uint gid = 0u; gid < ngroup; ++gid) {
// Calculate univariate weight changes
for (bst_omp_uint i = 0; i < nfeat; ++i) {
auto ii = gid * nfeat + i;
auto &s = gpair_sums[ii];
deltaw[ii] = static_cast<bst_float>(CoordinateDelta(
s.first, s.second, model[i][gid], alpha, lambda));
}
// sort in descending order of deltaw abs values
auto start = sorted_idx.begin() + gid * nfeat;
std::sort(start, start + nfeat,
[pdeltaw](size_t i, size_t j) {
return std::abs(*(pdeltaw + i)) > std::abs(*(pdeltaw + j));
});
counter[gid] = 0u;
}
}
int NextFeature(int iteration, const gbm::GBLinearModel &model,
int group_idx, const std::vector<bst_gpair> &gpair,
DMatrix *p_fmat, float alpha, float lambda) override {
// k-th selected feature for a group
auto k = counter[group_idx]++;
// stop after either reaching top-N or going through all the features in a group
if (k >= top_k || counter[group_idx] == model.param.num_feature) return -1;
// note that sorted_idx stores the "long" indices
const size_t grp_offset = group_idx * model.param.num_feature;
return static_cast<int>(sorted_idx[grp_offset + k] - grp_offset);
}
protected:
bst_uint top_k;
std::vector<bst_float> deltaw;
std::vector<size_t> sorted_idx;
std::vector<bst_uint> counter;
std::vector<std::pair<double, double>> gpair_sums;
};
/**
* \brief A set of available FeatureSelector's
*/
enum FeatureSelectorEnum {
kCyclic = 0,
kShuffle,
kThrifty,
kGreedy,
kRandom
};
inline FeatureSelector *FeatureSelector::Create(int choice) {
switch (choice) {
case kCyclic:
return new CyclicFeatureSelector();
case kShuffle:
return new ShuffleFeatureSelector();
case kThrifty:
return new ThriftyFeatureSelector();
case kGreedy:
return new GreedyFeatureSelector();
case kRandom:
return new RandomFeatureSelector();
default:
LOG(FATAL) << "unknown coordinate selector: " << choice;
}
return nullptr;
}
} // namespace linear
} // namespace xgboost

View File

@@ -0,0 +1,29 @@
/*!
* Copyright 2018
*/
#include <xgboost/linear_updater.h>
#include <dmlc/registry.h>
namespace dmlc {
DMLC_REGISTRY_ENABLE(::xgboost::LinearUpdaterReg);
} // namespace dmlc
namespace xgboost {
LinearUpdater* LinearUpdater::Create(const std::string& name) {
auto *e = ::dmlc::Registry< ::xgboost::LinearUpdaterReg>::Get()->Find(name);
if (e == nullptr) {
LOG(FATAL) << "Unknown linear updater " << name;
}
return (e->body)();
}
} // namespace xgboost
namespace xgboost {
namespace linear {
// List of files that will be force linked in static links.
DMLC_REGISTRY_LINK_TAG(updater_shotgun);
DMLC_REGISTRY_LINK_TAG(updater_coordinate);
} // namespace linear
} // namespace xgboost

View File

@@ -0,0 +1,142 @@
/*!
* Copyright 2018 by Contributors
* \author Rory Mitchell
*/
#include <xgboost/linear_updater.h>
#include "../common/timer.h"
#include "coordinate_common.h"
namespace xgboost {
namespace linear {
DMLC_REGISTRY_FILE_TAG(updater_coordinate);
// training parameter
struct CoordinateTrainParam : public dmlc::Parameter<CoordinateTrainParam> {
/*! \brief learning_rate */
float learning_rate;
/*! \brief regularization weight for L2 norm */
float reg_lambda;
/*! \brief regularization weight for L1 norm */
float reg_alpha;
int feature_selector;
int top_k;
int debug_verbose;
// declare parameters
DMLC_DECLARE_PARAMETER(CoordinateTrainParam) {
DMLC_DECLARE_FIELD(learning_rate)
.set_lower_bound(0.0f)
.set_default(1.0f)
.describe("Learning rate of each update.");
DMLC_DECLARE_FIELD(reg_lambda)
.set_lower_bound(0.0f)
.set_default(0.0f)
.describe("L2 regularization on weights.");
DMLC_DECLARE_FIELD(reg_alpha)
.set_lower_bound(0.0f)
.set_default(0.0f)
.describe("L1 regularization on weights.");
DMLC_DECLARE_FIELD(feature_selector)
.set_default(kCyclic)
.add_enum("cyclic", kCyclic)
.add_enum("shuffle", kShuffle)
.add_enum("thrifty", kThrifty)
.add_enum("greedy", kGreedy)
.add_enum("random", kRandom)
.describe("Feature selection or ordering method.");
DMLC_DECLARE_FIELD(top_k)
.set_lower_bound(0)
.set_default(0)
.describe("The number of top features to select in 'thrifty' feature_selector. "
"The value of zero means using all the features.");
DMLC_DECLARE_FIELD(debug_verbose)
.set_lower_bound(0)
.set_default(0)
.describe("flag to print out detailed breakdown of runtime");
// alias of parameters
DMLC_DECLARE_ALIAS(learning_rate, eta);
DMLC_DECLARE_ALIAS(reg_lambda, lambda);
DMLC_DECLARE_ALIAS(reg_alpha, alpha);
}
/*! \brief Denormalizes the regularization penalties - to be called at each update */
void DenormalizePenalties(double sum_instance_weight) {
reg_lambda_denorm = reg_lambda * sum_instance_weight;
reg_alpha_denorm = reg_alpha * sum_instance_weight;
}
// denormalizated regularization penalties
float reg_lambda_denorm;
float reg_alpha_denorm;
};
/**
* \class CoordinateUpdater
*
* \brief Coordinate descent algorithm that updates one feature per iteration
*/
class CoordinateUpdater : public LinearUpdater {
public:
// set training parameter
void Init(
const std::vector<std::pair<std::string, std::string> > &args) override {
param.InitAllowUnknown(args);
selector.reset(FeatureSelector::Create(param.feature_selector));
monitor.Init("CoordinateUpdater", param.debug_verbose);
}
void Update(std::vector<bst_gpair> *in_gpair, DMatrix *p_fmat,
gbm::GBLinearModel *model, double sum_instance_weight) override {
param.DenormalizePenalties(sum_instance_weight);
const int ngroup = model->param.num_output_group;
// update bias
for (int group_idx = 0; group_idx < ngroup; ++group_idx) {
auto grad = GetBiasGradientParallel(group_idx, ngroup, *in_gpair, p_fmat);
auto dbias = static_cast<float>(param.learning_rate *
CoordinateDeltaBias(grad.first, grad.second));
model->bias()[group_idx] += dbias;
UpdateBiasResidualParallel(group_idx, ngroup, dbias, in_gpair, p_fmat);
}
// prepare for updating the weights
selector->Setup(*model, *in_gpair, p_fmat, param.reg_alpha_denorm,
param.reg_lambda_denorm, param.top_k);
// update weights
for (int group_idx = 0; group_idx < ngroup; ++group_idx) {
for (unsigned i = 0U; i < model->param.num_feature; i++) {
int fidx = selector->NextFeature(i, *model, group_idx, *in_gpair, p_fmat,
param.reg_alpha_denorm, param.reg_lambda_denorm);
if (fidx < 0) break;
this->UpdateFeature(fidx, group_idx, in_gpair, p_fmat, model);
}
}
}
inline void UpdateFeature(int fidx, int group_idx, std::vector<bst_gpair> *in_gpair,
DMatrix *p_fmat, gbm::GBLinearModel *model) {
const int ngroup = model->param.num_output_group;
bst_float &w = (*model)[fidx][group_idx];
monitor.Start("GetGradientParallel");
auto gradient = GetGradientParallel(group_idx, ngroup, fidx, *in_gpair, p_fmat);
monitor.Stop("GetGradientParallel");
auto dw = static_cast<float>(
param.learning_rate *
CoordinateDelta(gradient.first, gradient.second, w, param.reg_alpha_denorm,
param.reg_lambda_denorm));
w += dw;
monitor.Start("UpdateResidualParallel");
UpdateResidualParallel(fidx, group_idx, ngroup, dw, in_gpair, p_fmat);
monitor.Stop("UpdateResidualParallel");
}
// training parameter
CoordinateTrainParam param;
std::unique_ptr<FeatureSelector> selector;
common::Monitor monitor;
};
DMLC_REGISTER_PARAMETER(CoordinateTrainParam);
XGBOOST_REGISTER_LINEAR_UPDATER(CoordinateUpdater, "coord_descent")
.describe("Update linear model according to coordinate descent algorithm.")
.set_body([]() { return new CoordinateUpdater(); });
} // namespace linear
} // namespace xgboost

View File

@@ -0,0 +1,135 @@
/*!
* Copyright 2018 by Contributors
* \author Tianqi Chen, Rory Mitchell
*/
#include <xgboost/linear_updater.h>
#include "coordinate_common.h"
namespace xgboost {
namespace linear {
DMLC_REGISTRY_FILE_TAG(updater_shotgun);
// training parameter
struct ShotgunTrainParam : public dmlc::Parameter<ShotgunTrainParam> {
/*! \brief learning_rate */
float learning_rate;
/*! \brief regularization weight for L2 norm */
float reg_lambda;
/*! \brief regularization weight for L1 norm */
float reg_alpha;
int feature_selector;
// declare parameters
DMLC_DECLARE_PARAMETER(ShotgunTrainParam) {
DMLC_DECLARE_FIELD(learning_rate)
.set_lower_bound(0.0f)
.set_default(0.5f)
.describe("Learning rate of each update.");
DMLC_DECLARE_FIELD(reg_lambda)
.set_lower_bound(0.0f)
.set_default(0.0f)
.describe("L2 regularization on weights.");
DMLC_DECLARE_FIELD(reg_alpha)
.set_lower_bound(0.0f)
.set_default(0.0f)
.describe("L1 regularization on weights.");
DMLC_DECLARE_FIELD(feature_selector)
.set_default(kCyclic)
.add_enum("cyclic", kCyclic)
.add_enum("shuffle", kShuffle)
.describe("Feature selection or ordering method.");
// alias of parameters
DMLC_DECLARE_ALIAS(learning_rate, eta);
DMLC_DECLARE_ALIAS(reg_lambda, lambda);
DMLC_DECLARE_ALIAS(reg_alpha, alpha);
}
/*! \brief Denormalizes the regularization penalties - to be called at each update */
void DenormalizePenalties(double sum_instance_weight) {
reg_lambda_denorm = reg_lambda * sum_instance_weight;
reg_alpha_denorm = reg_alpha * sum_instance_weight;
}
// denormalizated regularization penalties
float reg_lambda_denorm;
float reg_alpha_denorm;
};
class ShotgunUpdater : public LinearUpdater {
public:
// set training parameter
void Init(const std::vector<std::pair<std::string, std::string> > &args) override {
param.InitAllowUnknown(args);
selector.reset(FeatureSelector::Create(param.feature_selector));
}
void Update(std::vector<bst_gpair> *in_gpair, DMatrix *p_fmat,
gbm::GBLinearModel *model, double sum_instance_weight) override {
param.DenormalizePenalties(sum_instance_weight);
std::vector<bst_gpair> &gpair = *in_gpair;
const int ngroup = model->param.num_output_group;
// update bias
for (int gid = 0; gid < ngroup; ++gid) {
auto grad = GetBiasGradientParallel(gid, ngroup, *in_gpair, p_fmat);
auto dbias = static_cast<bst_float>(param.learning_rate *
CoordinateDeltaBias(grad.first, grad.second));
model->bias()[gid] += dbias;
UpdateBiasResidualParallel(gid, ngroup, dbias, in_gpair, p_fmat);
}
// lock-free parallel updates of weights
selector->Setup(*model, *in_gpair, p_fmat, param.reg_alpha_denorm, param.reg_lambda_denorm, 0);
dmlc::DataIter<ColBatch> *iter = p_fmat->ColIterator();
while (iter->Next()) {
const ColBatch &batch = iter->Value();
const bst_omp_uint nfeat = static_cast<bst_omp_uint>(batch.size);
#pragma omp parallel for schedule(static)
for (bst_omp_uint i = 0; i < nfeat; ++i) {
int ii = selector->NextFeature(i, *model, 0, *in_gpair, p_fmat,
param.reg_alpha_denorm, param.reg_lambda_denorm);
if (ii < 0) continue;
const bst_uint fid = batch.col_index[ii];
ColBatch::Inst col = batch[ii];
for (int gid = 0; gid < ngroup; ++gid) {
double sum_grad = 0.0, sum_hess = 0.0;
for (bst_uint j = 0; j < col.length; ++j) {
bst_gpair &p = gpair[col[j].index * ngroup + gid];
if (p.GetHess() < 0.0f) continue;
const bst_float v = col[j].fvalue;
sum_grad += p.GetGrad() * v;
sum_hess += p.GetHess() * v * v;
}
bst_float &w = (*model)[fid][gid];
bst_float dw = static_cast<bst_float>(
param.learning_rate *
CoordinateDelta(sum_grad, sum_hess, w, param.reg_alpha_denorm,
param.reg_lambda_denorm));
if (dw == 0.f) continue;
w += dw;
// update grad values
for (bst_uint j = 0; j < col.length; ++j) {
bst_gpair &p = gpair[col[j].index * ngroup + gid];
if (p.GetHess() < 0.0f) continue;
p += bst_gpair(p.GetHess() * col[j].fvalue * dw, 0);
}
}
}
}
}
protected:
// training parameters
ShotgunTrainParam param;
std::unique_ptr<FeatureSelector> selector;
};
DMLC_REGISTER_PARAMETER(ShotgunTrainParam);
XGBOOST_REGISTER_LINEAR_UPDATER(ShotgunUpdater, "shotgun")
.describe(
"Update linear model according to shotgun coordinate descent "
"algorithm.")
.set_body([]() { return new ShotgunUpdater(); });
} // namespace linear
} // namespace xgboost

View File

@@ -304,6 +304,140 @@ struct EvalMAP : public EvalRankList {
} }
}; };
/*! \brief Cox: Partial likelihood of the Cox proportional hazards model */
struct EvalCox : public Metric {
public:
EvalCox() {}
bst_float Eval(const std::vector<bst_float> &preds,
const MetaInfo &info,
bool distributed) const override {
CHECK(!distributed) << "Cox metric does not support distributed evaluation";
using namespace std; // NOLINT(*)
const bst_omp_uint ndata = static_cast<bst_omp_uint>(info.labels.size());
const std::vector<size_t> &label_order = info.LabelAbsSort();
// pre-compute a sum for the denominator
double exp_p_sum = 0; // we use double because we might need the precision with large datasets
for (omp_ulong i = 0; i < ndata; ++i) {
exp_p_sum += preds[i];
}
double out = 0;
double accumulated_sum = 0;
bst_omp_uint num_events = 0;
for (bst_omp_uint i = 0; i < ndata; ++i) {
const size_t ind = label_order[i];
const auto label = info.labels[ind];
if (label > 0) {
out -= log(preds[ind]) - log(exp_p_sum);
++num_events;
}
// only update the denominator after we move forward in time (labels are sorted)
accumulated_sum += preds[ind];
if (i == ndata - 1 || std::abs(label) < std::abs(info.labels[label_order[i + 1]])) {
exp_p_sum -= accumulated_sum;
accumulated_sum = 0;
}
}
return out/num_events; // normalize by the number of events
}
const char* Name() const override {
return "cox-nloglik";
}
};
/*! \brief Area Under PR Curve, for both classification and rank */
struct EvalAucPR : public Metric {
// implementation of AUC-PR for weighted data
// translated from PRROC R Package
// see https://doi.org/10.1371/journal.pone.0092209
bst_float Eval(const std::vector<bst_float> &preds, const MetaInfo &info,
bool distributed) const override {
CHECK_NE(info.labels.size(), 0U) << "label set cannot be empty";
CHECK_EQ(preds.size(), info.labels.size())
<< "label size predict size not match";
std::vector<unsigned> tgptr(2, 0);
tgptr[1] = static_cast<unsigned>(info.labels.size());
const std::vector<unsigned> &gptr =
info.group_ptr.size() == 0 ? tgptr : info.group_ptr;
CHECK_EQ(gptr.back(), info.labels.size())
<< "EvalAucPR: group structure must match number of prediction";
const bst_omp_uint ngroup = static_cast<bst_omp_uint>(gptr.size() - 1);
// sum statistics
double auc = 0.0;
int auc_error = 0, auc_gt_one = 0;
// each thread takes a local rec
std::vector<std::pair<bst_float, unsigned>> rec;
for (bst_omp_uint k = 0; k < ngroup; ++k) {
double total_pos = 0.0;
double total_neg = 0.0;
rec.clear();
for (unsigned j = gptr[k]; j < gptr[k + 1]; ++j) {
total_pos += info.GetWeight(j) * info.labels[j];
total_neg += info.GetWeight(j) * (1.0f - info.labels[j]);
rec.push_back(std::make_pair(preds[j], j));
}
XGBOOST_PARALLEL_SORT(rec.begin(), rec.end(), common::CmpFirst);
// we need pos > 0 && neg > 0
if (0.0 == total_pos || 0.0 == total_neg) {
auc_error = 1;
}
// calculate AUC
double tp = 0.0, prevtp = 0.0, fp = 0.0, prevfp = 0.0, h = 0.0, a = 0.0, b = 0.0;
for (size_t j = 0; j < rec.size(); ++j) {
tp += info.GetWeight(rec[j].second) * info.labels[rec[j].second];
fp += info.GetWeight(rec[j].second) * (1.0f - info.labels[rec[j].second]);
if ((j < rec.size() - 1 && rec[j].first != rec[j + 1].first) || j == rec.size() - 1) {
if (tp == prevtp) {
h = 1.0;
a = 1.0;
b = 0.0;
} else {
h = (fp - prevfp) / (tp - prevtp);
a = 1.0 + h;
b = (prevfp - h * prevtp) / total_pos;
}
if (0.0 != b) {
auc += (tp / total_pos - prevtp / total_pos -
b / a * (std::log(a * tp / total_pos + b) -
std::log(a * prevtp / total_pos + b))) / a;
} else {
auc += (tp / total_pos - prevtp / total_pos) / a;
}
if (auc > 1.0) {
auc_gt_one = 1;
}
prevtp = tp;
prevfp = fp;
}
}
// sanity check
if (tp < 0 || prevtp < 0 || fp < 0 || prevfp < 0) {
CHECK(!auc_error) << "AUC-PR: error in calculation";
}
}
CHECK(!auc_error) << "AUC-PR: the dataset only contains pos or neg samples";
CHECK(!auc_gt_one) << "AUC-PR: AUC > 1.0";
if (distributed) {
bst_float dat[2];
dat[0] = static_cast<bst_float>(auc);
dat[1] = static_cast<bst_float>(ngroup);
// approximately estimate auc using mean
rabit::Allreduce<rabit::op::Sum>(dat, 2);
return dat[0] / dat[1];
} else {
return static_cast<bst_float>(auc) / ngroup;
}
}
const char *Name() const override { return "aucpr"; }
};
XGBOOST_REGISTER_METRIC(AMS, "ams") XGBOOST_REGISTER_METRIC(AMS, "ams")
.describe("AMS metric for higgs.") .describe("AMS metric for higgs.")
.set_body([](const char* param) { return new EvalAMS(param); }); .set_body([](const char* param) { return new EvalAMS(param); });
@@ -312,6 +446,10 @@ XGBOOST_REGISTER_METRIC(Auc, "auc")
.describe("Area under curve for both classification and rank.") .describe("Area under curve for both classification and rank.")
.set_body([](const char* param) { return new EvalAuc(); }); .set_body([](const char* param) { return new EvalAuc(); });
XGBOOST_REGISTER_METRIC(AucPR, "aucpr")
.describe("Area under PR curve for both classification and rank.")
.set_body([](const char* param) { return new EvalAucPR(); });
XGBOOST_REGISTER_METRIC(Precision, "pre") XGBOOST_REGISTER_METRIC(Precision, "pre")
.describe("precision@k for rank.") .describe("precision@k for rank.")
.set_body([](const char* param) { return new EvalPrecision(param); }); .set_body([](const char* param) { return new EvalPrecision(param); });
@@ -323,5 +461,10 @@ XGBOOST_REGISTER_METRIC(NDCG, "ndcg")
XGBOOST_REGISTER_METRIC(MAP, "map") XGBOOST_REGISTER_METRIC(MAP, "map")
.describe("map@k for rank.") .describe("map@k for rank.")
.set_body([](const char* param) { return new EvalMAP(param); }); .set_body([](const char* param) { return new EvalMAP(param); });
XGBOOST_REGISTER_METRIC(Cox, "cox-nloglik")
.describe("Negative log partial likelihood of Cox proportioanl hazards model.")
.set_body([](const char* param) { return new EvalCox(); });
} // namespace metric } // namespace metric
} // namespace xgboost } // namespace xgboost

View File

@@ -35,16 +35,18 @@ class SoftmaxMultiClassObj : public ObjFunction {
void Configure(const std::vector<std::pair<std::string, std::string> >& args) override { void Configure(const std::vector<std::pair<std::string, std::string> >& args) override {
param_.InitAllowUnknown(args); param_.InitAllowUnknown(args);
} }
void GetGradient(const std::vector<bst_float>& preds, void GetGradient(HostDeviceVector<bst_float>* preds,
const MetaInfo& info, const MetaInfo& info,
int iter, int iter,
std::vector<bst_gpair>* out_gpair) override { HostDeviceVector<bst_gpair>* out_gpair) override {
CHECK_NE(info.labels.size(), 0U) << "label set cannot be empty"; CHECK_NE(info.labels.size(), 0U) << "label set cannot be empty";
CHECK(preds.size() == (static_cast<size_t>(param_.num_class) * info.labels.size())) CHECK(preds->size() == (static_cast<size_t>(param_.num_class) * info.labels.size()))
<< "SoftmaxMultiClassObj: label size and pred size does not match"; << "SoftmaxMultiClassObj: label size and pred size does not match";
out_gpair->resize(preds.size()); std::vector<bst_float>& preds_h = preds->data_h();
out_gpair->resize(preds_h.size());
std::vector<bst_gpair>& gpair = out_gpair->data_h();
const int nclass = param_.num_class; const int nclass = param_.num_class;
const omp_ulong ndata = static_cast<omp_ulong>(preds.size() / nclass); const omp_ulong ndata = static_cast<omp_ulong>(preds_h.size() / nclass);
int label_error = 0; int label_error = 0;
#pragma omp parallel #pragma omp parallel
@@ -53,7 +55,7 @@ class SoftmaxMultiClassObj : public ObjFunction {
#pragma omp for schedule(static) #pragma omp for schedule(static)
for (omp_ulong i = 0; i < ndata; ++i) { for (omp_ulong i = 0; i < ndata; ++i) {
for (int k = 0; k < nclass; ++k) { for (int k = 0; k < nclass; ++k) {
rec[k] = preds[i * nclass + k]; rec[k] = preds_h[i * nclass + k];
} }
common::Softmax(&rec); common::Softmax(&rec);
int label = static_cast<int>(info.labels[i]); int label = static_cast<int>(info.labels[i]);
@@ -65,9 +67,9 @@ class SoftmaxMultiClassObj : public ObjFunction {
bst_float p = rec[k]; bst_float p = rec[k];
const bst_float h = 2.0f * p * (1.0f - p) * wt; const bst_float h = 2.0f * p * (1.0f - p) * wt;
if (label == k) { if (label == k) {
(*out_gpair)[i * nclass + k] = bst_gpair((p - 1.0f) * wt, h); gpair[i * nclass + k] = bst_gpair((p - 1.0f) * wt, h);
} else { } else {
(*out_gpair)[i * nclass + k] = bst_gpair(p* wt, h); gpair[i * nclass + k] = bst_gpair(p* wt, h);
} }
} }
} }
@@ -77,10 +79,10 @@ class SoftmaxMultiClassObj : public ObjFunction {
<< " num_class=" << nclass << " num_class=" << nclass
<< " but found " << label_error << " in label."; << " but found " << label_error << " in label.";
} }
void PredTransform(std::vector<bst_float>* io_preds) override { void PredTransform(HostDeviceVector<bst_float>* io_preds) override {
this->Transform(io_preds, output_prob_); this->Transform(io_preds, output_prob_);
} }
void EvalTransform(std::vector<bst_float>* io_preds) override { void EvalTransform(HostDeviceVector<bst_float>* io_preds) override {
this->Transform(io_preds, true); this->Transform(io_preds, true);
} }
const char* DefaultEvalMetric() const override { const char* DefaultEvalMetric() const override {
@@ -88,8 +90,8 @@ class SoftmaxMultiClassObj : public ObjFunction {
} }
private: private:
inline void Transform(std::vector<bst_float> *io_preds, bool prob) { inline void Transform(HostDeviceVector<bst_float> *io_preds, bool prob) {
std::vector<bst_float> &preds = *io_preds; std::vector<bst_float> &preds = io_preds->data_h();
std::vector<bst_float> tmp; std::vector<bst_float> tmp;
const int nclass = param_.num_class; const int nclass = param_.num_class;
const omp_ulong ndata = static_cast<omp_ulong>(preds.size() / nclass); const omp_ulong ndata = static_cast<omp_ulong>(preds.size() / nclass);

Some files were not shown because too many files have changed in this diff Show More