93 Commits

Author SHA1 Message Date
Rory Mitchell
84c99f86f4
Combine TreeModel and RegTree (#3995) 2018-12-19 12:16:40 +13:00
Jiaming Yuan
c8c7b9649c
Fix and optimize logger (#4002)
* Fix logging switch statement.

* Remove debug_verbose_ in AllReducer.

* Don't construct the stream when not needed.

* Make default constructor deleted.

* Remove redundant IsVerbose.
2018-12-17 19:23:05 +08:00
Jiaming Yuan
e0a279114e
Unify logging facilities. (#3982)
* Unify logging facilities.

* Enhance `ConsoleLogger` to handle different verbosity.
* Override macros from `dmlc`.
* Don't use specialized gamma when building with GPU.
* Remove verbosity cache in monitor.
* Test monitor.
* Deprecate `silent`.
* Fix doc and messages.
* Fix python test.
* Fix silent tests.
2018-12-14 19:29:58 +08:00
Rory Mitchell
3d81c48d3f
Remove leaf vector, add tree serialisation test, fix Windows tests (#3989) 2018-12-13 10:28:38 +13:00
Jiaming Yuan
48dddfd635
Porting elementwise metrics to GPU. (#3952)
* Port elementwise metrics to GPU.

* All elementwise metrics are converted to static polymorphic.
* Create a reducer for metrics reduction.
* Remove const of Metric::Eval to accommodate CubMemory.
2018-12-01 18:46:45 +13:00
Philip Hyunsu Cho
91537e7353
Fix #3342 and h2oai/h2o4gpu#625: Save predictor parameters in model file (#3856)
* Fix #3342 and h2oai/h2o4gpu#625: Save predictor parameters in model file

This allows pickled models to retain predictor attributes, such as
'predictor' (whether to use CPU or GPU) and 'n_gpu' (number of GPUs
to use). Related: h2oai/h2o4gpu#625

Closes #3342.

TODO. Write a test.

* Fix lint

* Do not load GPU predictor into CPU-only XGBoost

* Add a test for pickling GPU predictors

* Make sample data big enough to pass multi GPU test

* Update test_gpu_predictor.cu
2018-11-03 21:45:38 -07:00
trivialfis
516457fadc Add basic unittests for gpu-hist method. (#3785)
* Split building histogram into separated class.
* Extract `InitCompressedRow` definition.
* Basic tests for gpu-hist.
* Document the code more verbosely.
* Removed `HistCutUnit`.
* Removed some duplicated copies in `GPUHistMaker`.
* Implement LCG and use it in tests.
2018-10-15 15:47:00 +13:00
Rory Mitchell
70d208d68c
Dmatrix refactor stage 2 (#3395)
* DMatrix refactor 2

* Remove buffered rowset usage where possible

* Transition to c++11 style iterators for row access

* Transition column iterators to C++ 11
2018-10-01 01:29:03 +13:00
Philip Hyunsu Cho
c87153ed32
Fix CRAN check by removing reference to std::cerr (#3660)
* Fix CRAN check by removing reference to std::cerr

* Mask tests that fail on 32-bit Windows R
2018-09-05 11:44:00 -07:00
Andy Adinets
72cd1517d6 Replaced std::vector with HostDeviceVector in MetaInfo and SparsePage. (#3446)
* Replaced std::vector with HostDeviceVector in MetaInfo and SparsePage.

- added distributions to HostDeviceVector
- using HostDeviceVector for labels, weights and base margings in MetaInfo
- using HostDeviceVector for offset and data in SparsePage
- other necessary refactoring

* Added const version of HostDeviceVector API calls.

- const versions added to calls that can trigger data transfers, e.g. DevicePointer()
- updated the code that uses HostDeviceVector
- objective functions now accept const HostDeviceVector<bst_float>& for predictions

* Updated src/linear/updater_gpu_coordinate.cu.

* Added read-only state for HostDeviceVector sync.

- this means no copies are performed if both host and devices access
  the HostDeviceVector read-only

* Fixed linter and test errors.

- updated the lz4 plugin
- added ConstDeviceSpan to HostDeviceVector
- using device % dh::NVisibleDevices() for the physical device number,
  e.g. in calls to cudaSetDevice()

* Fixed explicit template instantiation errors for HostDeviceVector.

- replaced HostDeviceVector<unsigned int> with HostDeviceVector<int>

* Fixed HostDeviceVector tests that require multiple GPUs.

- added a mock set device handler; when set, it is called instead of cudaSetDevice()
2018-08-30 14:28:47 +12:00
trivialfis
2c502784ff Span class. (#3548)
* Add basic Span class based on ISO++20.

* Use Span<Entry const> instead of Inst in SparsePage.

* Add DeviceSpan in HostDeviceVector, use it in regression obj.
2018-08-14 17:58:11 +12:00
Philip Hyunsu Cho
3c72654e3b
Revert "Fix #3485, #3540: Don't use dropout for predicting test sets" (#3563)
* Revert "Fix #3485, #3540: Don't use dropout for predicting test sets (#3556)"

This reverts commit 44811f233071c5805d70c287abd22b155b732727.

* Document behavior of predict() for DART booster

* Add notice to parameter.rst
2018-08-08 09:48:55 -07:00
Philip Hyunsu Cho
44811f2330
Fix #3485, #3540: Don't use dropout for predicting test sets (#3556)
* Fix #3485, #3540: Don't use dropout for predicting test sets

Dropout (for DART) should only be used at training time.

* Add regression test
2018-08-05 10:17:21 -07:00
trivialfis
34dc9155ab Use __CUDA__ macro with __NVCC__. (#3539)
* __CUDA__ is defined in clang. Making the change won't make clang
compile xgboost, but syntax checking from clang is at least partially
working.
2018-08-02 22:04:23 +12:00
Philip Hyunsu Cho
48d6e68690
Add callback interface to re-direct console output (#3438)
* Add callback interface to re-direct console output

* Exempt TrackerLogger from custom logging

* Fix lint
2018-07-05 11:32:30 -07:00
liuliang01
0cf88d036f Add qid like ranklib format (#2749)
* add qid for https://github.com/dmlc/xgboost/issues/2748

* change names

* change spaces

* change qid to bst_uint type

* change qid type to size_t

* change qid first to SIZE_MAX

* change qid type from size_t to uint64_t

* update dmlc-core

* fix qids name error

* fix group_ptr_ error

* Style fix

* Add qid handling logic to SparsePage

* New MetaInfo format + backward compatibility fix

Old MetaInfo format (1.0) doesn't contain qid field. We still want to be able
to read from MetaInfo files saved in old format. Also, define a new format
(2.0) that contains the qid field. This way, we can distinguish files that
contain qid and those that do not.

* Update MetaInfo test

* Simply group assignment logic

* Explicitly set qid=nullptr in NativeDataIter

NativeDataIter's callback does not support qid field. Users of NativeDataIter
will need to call setGroup() function separately to set group information.

* Save qids_ in SaveBinary()

* Upgrade dmlc-core submodule

* Add a test for reading qid

* Add contributor

* Check the size of qids_

* Document qid format
2018-06-30 20:24:03 +00:00
PSEUDOTENSOR / Jonathan McKinney
9ac163d0bb Allow import via python datatable. (#3272)
* Allow import via python datatable.

* Write unit tests

* Refactor dt API functions

* Refactor python code

* Lint fixes

* Address review comments
2018-06-20 13:16:18 -07:00
Rory Mitchell
a96039141a
Dmatrix refactor stage 1 (#3301)
* Use sparse page as singular CSR matrix representation

* Simplify dmatrix methods

* Reduce statefullness of batch iterators

* BREAKING CHANGE: Remove prob_buffer_row parameter. Users are instead recommended to sample their dataset as a preprocessing step before using XGBoost.
2018-06-07 10:25:58 +12:00
Andy Adinets
286dccb8e8 GPU binning and compression. (#3319)
* GPU binning and compression.

- binning and index compression are done inside the DeviceShard constructor
- in case of a DMatrix with multiple row batches, it is first converted into a single row batch
2018-06-05 17:15:13 +12:00
Rory Mitchell
a185ddfe03
Implement GPU accelerated coordinate descent algorithm (#3178)
* Implement GPU accelerated coordinate descent algorithm. 

* Exclude external memory tests for GPU
2018-04-20 14:56:35 +12:00
Rory Mitchell
ccf80703ef
Clang-tidy static analysis (#3222)
* Clang-tidy static analysis

* Modernise checks

* Google coding standard checks

* Identifier renaming according to Google style
2018-04-19 18:57:13 +12:00
Will Storey
c85995952f Allow compilation with -Werror=strict-prototypes (#3183) 2018-03-18 12:25:42 +13:00
Andrew V. Adinetz
d5992dd881 Replaced std::vector-based interfaces with HostDeviceVector-based interfaces. (#3116)
* Replaced std::vector-based interfaces with HostDeviceVector-based interfaces.

- replacement was performed in the learner, boosters, predictors,
  updaters, and objective functions
- only interfaces used in training were replaced;
  interfaces like PredictInstance() still use std::vector
- refactoring necessary for replacement of interfaces was also performed,
  such as using HostDeviceVector in prediction cache

* HostDeviceVector-based interfaces for custom objective function example plugin.
2018-02-28 13:00:04 +13:00
Rory Mitchell
10eb05a63a
Refactor linear modelling and add new coordinate descent updater (#3103)
* Refactor linear modelling and add new coordinate descent updater

* Allow unsorted column iterator

* Add prediction cacheing to gblinear
2018-02-17 09:17:01 +13:00
Scott Lundberg
d878c36c84 Add SHAP interaction effects, fix minor bug, and add cox loss (#3043)
* Add interaction effects and cox loss

* Minimize whitespace changes

* Cox loss now no longer needs a pre-sorted dataset.

* Address code review comments

* Remove mem check, rename to pred_interactions, include bias

* Make lint happy

* More lint fixes

* Fix cox loss indexing

* Fix main effects and tests

* Fix lint

* Use half interaction values on the off-diagonals

* Fix lint again
2018-02-07 20:38:01 -06:00
Thejaswi
84ab74f3a5 Objective function evaluation on GPU with minimal PCIe transfers (#2935)
* Added GPU objective function and no-copy interface.

- xgboost::HostDeviceVector<T> syncs automatically between host and device
- no-copy interfaces have been added
- default implementations just sync the data to host
  and call the implementations with std::vector
- GPU objective function, predictor, histogram updater process data
  directly on GPU
2018-01-12 21:33:39 +13:00
Rory Mitchell
7759ab99ee
Fix Google test warnings and error (#2957) 2017-12-20 00:13:56 +13:00
Vadim Khotilovich
e8a6597957 [R] maintenance Nov 2017; SHAP plots (#2888)
* [R] fix predict contributions for data with no colnames

* [R] add a render parameter for xgb.plot.multi.trees; fixes #2628

* [R] update Rd's

* [R] remove unnecessary dep-package from R cmake install

* silence type warnings; readability

* [R] silence complaint about incomplete line at the end

* [R] initial version of xgb.plot.shap()

* [R] more work on xgb.plot.shap

* [R] enforce black font in xgb.plot.tree; fixes #2640

* [R] if feature names are available, check in predict that they are the same; fixes #2857

* [R] cran check and lint fixes

* remove tabs

* [R] add references; a test for plot.shap
2017-12-05 09:45:34 -08:00
yskn67
3dcf966bc3 Fix XGDMatrixFree argument type (#2898) 2017-11-23 10:49:05 -08:00
Rory Mitchell
d9d5293cdb Add warnings for large labels when using GPU histogram algorithms (#2834) 2017-10-26 17:31:10 +13:00
Rory Mitchell
13e7a2cff0 Various bug fixes (#2825)
* Fatal error if GPU algorithm selected without GPU support compiled

* Resolve type conversion warnings

* Fix gpu unit test failure

* Fix compressed iterator edge case

* Fix python unit test failures due to flake8 update on pip
2017-10-25 14:45:01 +13:00
Scott Lundberg
78c4188cec SHAP values for feature contributions (#2438)
* SHAP values for feature contributions

* Fix commenting error

* New polynomial time SHAP value estimation algorithm

* Update API to support SHAP values

* Fix merge conflicts with updates in master

* Correct submodule hashes

* Fix variable sized stack allocation

* Make lint happy

* Add docs

* Fix typo

* Adjust tolerances

* Remove unneeded def

* Fixed cpp test setup

* Updated R API and cleaned up

* Fixed test typo
2017-10-12 12:35:51 -07:00
Rory Mitchell
4cb2f7598b -Add experimental GPU algorithm for lossguided mode (#2755)
-Improved GPU algorithm unit tests
-Removed some thrust code to improve compile times
2017-10-01 00:18:35 +13:00
Rory Mitchell
e6a9063344 Integer gradient summation for GPU histogram algorithm. (#2681) 2017-09-08 15:07:29 +12:00
Rory Mitchell
5661a67d20 Add parallel sort for MSVC (#2609) 2017-08-17 17:14:39 +12:00
Rory Mitchell
ef23e424f1 [GPU-Plugin] Add GPU accelerated prediction (#2593)
* [GPU-Plugin] Add GPU accelerated prediction

* Improve allocation message

* Update documentation

* Resolve linker error for predictor

* Add unit tests
2017-08-16 12:31:59 +12:00
Rory Mitchell
eda9e180f0 [GPU-Plugin] Various fixes (#2579)
* Fix test large

* Add check for max_depth 0

* Update readme

* Add LBS specialisation for dense data

* Add bst_gpair_precise

* Temporarily disable accuracy tests on test_large.py

* Solve unused variable compiler warning

* Fix max_bin > 1024 error
2017-08-05 22:16:23 +12:00
Rory Mitchell
0e06d1805d [WIP] Extract prediction into separate interface (#2531)
* [WIP] Extract prediction into separate interface

* Add copyright, fix linter errors

* Add predictor to amalgamation

* Fix documentation

* Move prediction cache into predictor, add GBTreeModel

* Updated predictor doc comments
2017-07-28 17:01:03 -07:00
Vadim Khotilovich
00eda28b3c MinGW: shared library prefix and appveyor CI (#2539)
* for MinGW, drop the 'lib' prefix from shared library name

* fix defines for 'g++ 4.8 or higher' to include g++ >= 5

* fix compile warnings

* [Appveyor] add MinGW with python; remove redundant jobs

* [Appveyor] also do python build for one of msvc jobs
2017-07-25 01:06:47 -05:00
PSEUDOTENSOR / Jonathan McKinney
6b375f6ad8 Multi-threaded XGDMatrixCreateFromMat for faster DMatrix creation (#2530)
* Multi-threaded XGDMatrixCreateFromMat for faster DMatrix creation from numpy arrays for python interface.
2017-07-21 14:43:17 +12:00
Vadim Khotilovich
7350085955 Fix broken make on windows (#2499)
* fix Makefile for make on windows

* clean up compilation warnings

* fix for `no file name for include` make warning
2017-07-08 09:17:31 -07:00
Rory Mitchell
5f1b0bb386 [GPU-Plugin] Unify gpu_gpair/bst_gpair. Refactor. (#2477) 2017-07-01 17:31:13 +12:00
Thejaswi
85b2fb3eee [GPU-Plugin] Integration of a faster version of grow_gpu plugin into mainstream (#2360)
* Integrating a faster version of grow_gpu plugin
1. Removed the older files to reduce duplication
2. Moved all of the grow_gpu files under 'exact' folder
3. All of them are inside 'exact' namespace to avoid any conflicts
4. Fixed a bug in benchmark.py while running only 'grow_gpu' plugin
5. Added cub and googletest submodules to ease integration and unit-testing
6. Updates to CMakeLists.txt to directly build cuda objects into libxgboost

* Added support for building gpu plugins through make flow
1. updated makefile and config.mk to add right targets
2. added unit-tests for gpu exact plugin code

* 1. Added support for building gpu plugin using 'make' flow as well
2. Updated instructions for building and testing gpu plugin

* Fix travis-ci errors for PR#2360
1. lint errors on unit-tests
2. removed googletest, instead depended upon dmlc-core provide gtest cache

* Some more fixes to travis-ci lint failures PR#2360

* Added Rory's copyrights to the files containing code from both.

* updated copyright statement as per Rory's request

* moved the static datasets into a script to generate them at runtime

* 1. memory usage print when silent=0
2. tests/ and test/ folder organization
3. removal of the dependency of googletest for just building xgboost
4. coding style updates for .cuh as well

* Fixes for compilation warnings

* add cuda object files as well when JVM_BINDINGS=ON
2017-06-06 09:39:53 +12:00
Artem Krylysov
ed8da45f9d Fix C API header compatibility with C compilers (#2369) 2017-06-02 10:14:30 -07:00
Vadim Khotilovich
b52db87d5c adding feature contributions to R and gblinear (#2295)
* [gblinear] add features contribution prediction; fix DumpModel bug

* [gbtree] minor changes to PredContrib

* [R] add feature contribution prediction to R

* [R] bump up version; update NEWS

* [gblinear] fix the base_margin issue; fixes #1969

* [R] list of matrices as output of multiclass feature contributions

* [gblinear] make order of DumpModel coefficients consistent: group index changes the fastest
2017-05-21 07:41:51 -04:00
Sergei Lebedev
e5e721722e Fix compilation on OS X with GCC 7 (#2256)
* Fix compilation on OS X with GCC 7

Compilation failed with

In file included from src/tree/tree_updater.cc:6:0:
include/xgboost/tree_updater.h:75:46: error: 'function' is not a member of 'std'
                                         std::function<TreeUpdater* ()> > {

caused by a missing <functional> include.

* Fixed another occurence of that issue spotted by @ClimberPG
2017-05-19 22:04:07 -07:00
Vadim Khotilovich
c66ca79221 [R] native routines registration (#2290)
* [R] add native routines registration

* c_api.h needs to include <cstdint> since it uses fixed width integer types

* [R] use registered native routines from R code

* [R] bump version; add info on native routine registration to the contributors guide

* make lint happy
2017-05-14 11:00:46 -07:00
Maurus Cuelenaere
6bd1869026 Add prediction of feature contributions (#2003)
* Add prediction of feature contributions

This implements the idea described at http://blog.datadive.net/interpreting-random-forests/
which tries to give insight in how a prediction is composed of its feature contributions
and a bias.

* Support multi-class models

* Calculate learning_rate per-tree instead of using the one from the first tree

* Do not rely on node.base_weight * learning_rate having the same value as the node mean value (aka leaf value, if it were a leaf); instead calculate them (lazily) on-the-fly

* Add simple test for contributions feature

* Check against param.num_nodes instead of checking for non-zero length

* Loop over all roots instead of only the first
2017-05-14 00:58:10 -05:00
Rory Mitchell
6bf968efe6 [GPU Plugin] Fast histogram speed improvements. Updated benchmarks. (#2258) 2017-05-08 09:21:38 -07:00
Rory Mitchell
8ab5d4611c [GPU-Plugin] (#2227)
* Add fast histogram algorithm
* Fix Linux build
* Add 'gpu_id' parameter
2017-04-25 16:37:10 -07:00