56 Commits

Author SHA1 Message Date
Rory Mitchell
78bea0d204
Add google test for a column sampling, restore metainfo tests (#3637)
* Add google test for a column sampling, restore metainfo tests

* Update metainfo test for visual studio

* Fix multi-GPU bug introduced in #3635
2018-08-28 16:10:26 +12:00
trivialfis
60787ecebc Merge generic device helper functions into gpu set. (#3626)
* Remove the use of old NDevices* functions.
* Use GPUSet in timer.h.
2018-08-26 18:14:23 +12:00
trivialfis
cf2d86a4f6 Add travis sanitizers tests. (#3557)
* Add travis sanitizers tests.

* Add gcc-7 in Travis.
* Add SANITIZER_PATH for CMake.
* Enable sanitizer tests in Travis.

* Fix memory leaks in tests.

* Fix all memory leaks reported by Address Sanitizer.
* tests/cpp/helpers.h/CreateDMatrix now returns raw pointer.
2018-08-19 16:40:30 +12:00
trivialfis
2c502784ff Span class. (#3548)
* Add basic Span class based on ISO++20.

* Use Span<Entry const> instead of Inst in SparsePage.

* Add DeviceSpan in HostDeviceVector, use it in regression obj.
2018-08-14 17:58:11 +12:00
Rory Mitchell
bbb771f32e
Refactor parts of fast histogram utilities (#3564)
* Refactor parts of fast histogram utilities

* Removed byte packing from column matrix
2018-08-09 17:59:57 +12:00
Henry Gouk
69454d9487 Implementation of hinge loss for binary classification (#3477) 2018-08-07 10:06:42 +12:00
Andy Adinets
cc6a5a3666 Added finding quantiles on GPU. (#3393)
* Added finding quantiles on GPU.

- this includes datasets where weights are assigned to data rows
- as the quantiles found by the new algorithm are not the same
  as those found by the old one, test thresholds in
    tests/python-gpu/test_gpu_updaters.py have been adjusted.

* Adjustments and improved testing for finding quantiles on the GPU.

- added C++ tests for the DeviceSketch() function
- reduced one of the thresholds in test_gpu_updaters.py
- adjusted the cuts found by the find_cuts_k kernel
2018-07-27 14:03:16 +12:00
liuliang01
0cf88d036f Add qid like ranklib format (#2749)
* add qid for https://github.com/dmlc/xgboost/issues/2748

* change names

* change spaces

* change qid to bst_uint type

* change qid type to size_t

* change qid first to SIZE_MAX

* change qid type from size_t to uint64_t

* update dmlc-core

* fix qids name error

* fix group_ptr_ error

* Style fix

* Add qid handling logic to SparsePage

* New MetaInfo format + backward compatibility fix

Old MetaInfo format (1.0) doesn't contain qid field. We still want to be able
to read from MetaInfo files saved in old format. Also, define a new format
(2.0) that contains the qid field. This way, we can distinguish files that
contain qid and those that do not.

* Update MetaInfo test

* Simply group assignment logic

* Explicitly set qid=nullptr in NativeDataIter

NativeDataIter's callback does not support qid field. Users of NativeDataIter
will need to call setGroup() function separately to set group information.

* Save qids_ in SaveBinary()

* Upgrade dmlc-core submodule

* Add a test for reading qid

* Add contributor

* Check the size of qids_

* Document qid format
2018-06-30 20:24:03 +00:00
pdesahb
12e34f32e2 Fix tweedie handling of base_score (#3295)
* fix tweedie margin calculations

* add entry to contributors
2018-06-28 15:43:05 +00:00
ngoyal2707
5cd851ccef added code for instance based weighing for rank objectives (#3379)
* added code for instance based weighing for rank objectives

* Fix lint
2018-06-22 15:10:59 -07:00
PSEUDOTENSOR / Jonathan McKinney
9ac163d0bb Allow import via python datatable. (#3272)
* Allow import via python datatable.

* Write unit tests

* Refactor dt API functions

* Refactor python code

* Lint fixes

* Address review comments
2018-06-20 13:16:18 -07:00
Rory Mitchell
a96039141a
Dmatrix refactor stage 1 (#3301)
* Use sparse page as singular CSR matrix representation

* Simplify dmatrix methods

* Reduce statefullness of batch iterators

* BREAKING CHANGE: Remove prob_buffer_row parameter. Users are instead recommended to sample their dataset as a preprocessing step before using XGBoost.
2018-06-07 10:25:58 +12:00
Andy Adinets
286dccb8e8 GPU binning and compression. (#3319)
* GPU binning and compression.

- binning and index compression are done inside the DeviceShard constructor
- in case of a DMatrix with multiple row batches, it is first converted into a single row batch
2018-06-05 17:15:13 +12:00
trivialfis
34aeee2961 Fix test_param.cc header path (#3317) 2018-05-28 10:26:29 -07:00
Rory Mitchell
3ee725e3bb
Add cuda forwards compatibility (#3316) 2018-05-17 10:59:22 +12:00
Rory Mitchell
088bb4b27c
Prevent multiclass Hessian approaching 0 (#3304)
* Prevent Hessian in multiclass objective becoming zero

* Set default learning rate to 0.5 for "coord_descent" linear updater
2018-05-09 20:25:51 +12:00
Rory Mitchell
a185ddfe03
Implement GPU accelerated coordinate descent algorithm (#3178)
* Implement GPU accelerated coordinate descent algorithm. 

* Exclude external memory tests for GPU
2018-04-20 14:56:35 +12:00
Rory Mitchell
ccf80703ef
Clang-tidy static analysis (#3222)
* Clang-tidy static analysis

* Modernise checks

* Google coding standard checks

* Identifier renaming according to Google style
2018-04-19 18:57:13 +12:00
Rory Mitchell
a1ec7b1716
Change reduce operation from thrust to cub. Fix for cuda 9.1 error (#3218)
* Change reduce operation from thrust to cub. Fix for cuda 9.1 runtime error

* Unit test sum reduce
2018-04-04 14:21:48 +12:00
Arjan van der Velde
04221a7469 rank_metric: add AUC-PR (#3172)
* rank_metric: add AUC-PR

Implementation of the AUC-PR calculation for weighted data, proposed by Keilwagen, Grosse and Grau (https://doi.org/10.1371/journal.pone.0092209)

* rank_metric: fix lint warnings

* Implement tests for AUC-PR and fix implementation

* add aucpr to documentation for other languages
2018-03-23 10:43:47 -04:00
Vadim Khotilovich
706be4e5d4
Additional improvements for gblinear (#3134)
* fix rebase conflict

* [core] additional gblinear improvements

* [R] callback for gblinear coefficients history

* force eta=1 for gblinear python tests

* add top_k to GreedyFeatureSelector

* set eta=1 in shotgun test

* [core] fix SparsePage processing in gblinear; col-wise multithreading in greedy updater

* set sorted flag within TryInitColData

* gblinear tests: use scale, add external memory test

* fix multiclass for greedy updater

* fix whitespace

* fix typo
2018-03-13 01:27:13 -05:00
Andrew V. Adinetz
d5992dd881 Replaced std::vector-based interfaces with HostDeviceVector-based interfaces. (#3116)
* Replaced std::vector-based interfaces with HostDeviceVector-based interfaces.

- replacement was performed in the learner, boosters, predictors,
  updaters, and objective functions
- only interfaces used in training were replaced;
  interfaces like PredictInstance() still use std::vector
- refactoring necessary for replacement of interfaces was also performed,
  such as using HostDeviceVector in prediction cache

* HostDeviceVector-based interfaces for custom objective function example plugin.
2018-02-28 13:00:04 +13:00
Rory Mitchell
10eb05a63a
Refactor linear modelling and add new coordinate descent updater (#3103)
* Refactor linear modelling and add new coordinate descent updater

* Allow unsorted column iterator

* Add prediction cacheing to gblinear
2018-02-17 09:17:01 +13:00
Scott Lundberg
d878c36c84 Add SHAP interaction effects, fix minor bug, and add cox loss (#3043)
* Add interaction effects and cox loss

* Minimize whitespace changes

* Cox loss now no longer needs a pre-sorted dataset.

* Address code review comments

* Remove mem check, rename to pred_interactions, include bias

* Make lint happy

* More lint fixes

* Fix cox loss indexing

* Fix main effects and tests

* Fix lint

* Use half interaction values on the off-diagonals

* Fix lint again
2018-02-07 20:38:01 -06:00
Thejaswi
84ab74f3a5 Objective function evaluation on GPU with minimal PCIe transfers (#2935)
* Added GPU objective function and no-copy interface.

- xgboost::HostDeviceVector<T> syncs automatically between host and device
- no-copy interfaces have been added
- default implementations just sync the data to host
  and call the implementations with std::vector
- GPU objective function, predictor, histogram updater process data
  directly on GPU
2018-01-12 21:33:39 +13:00
Rory Mitchell
7759ab99ee
Fix Google test warnings and error (#2957) 2017-12-20 00:13:56 +13:00
Rory Mitchell
c55f14668e
Update gpu_hist algorithm (#2901) 2017-11-27 13:44:24 +13:00
Rory Mitchell
40c6e2f0c8
Improved gpu_hist_experimental algorithm (#2866)
- Implement colsampling, subsampling for gpu_hist_experimental

 - Optimised multi-GPU implementation for gpu_hist_experimental

 - Make nccl optional

 - Add Volta architecture flag

 - Optimise RegLossObj

 - Add timing utilities for debug verbose mode

 - Bump required cuda version to 8.0
2017-11-11 13:58:40 +13:00
Rory Mitchell
13e7a2cff0 Various bug fixes (#2825)
* Fatal error if GPU algorithm selected without GPU support compiled

* Resolve type conversion warnings

* Fix gpu unit test failure

* Fix compressed iterator edge case

* Fix python unit test failures due to flake8 update on pip
2017-10-25 14:45:01 +13:00
Scott Lundberg
78c4188cec SHAP values for feature contributions (#2438)
* SHAP values for feature contributions

* Fix commenting error

* New polynomial time SHAP value estimation algorithm

* Update API to support SHAP values

* Fix merge conflicts with updates in master

* Correct submodule hashes

* Fix variable sized stack allocation

* Make lint happy

* Add docs

* Fix typo

* Adjust tolerances

* Remove unneeded def

* Fixed cpp test setup

* Updated R API and cleaned up

* Fixed test typo
2017-10-12 12:35:51 -07:00
Rory Mitchell
4cb2f7598b -Add experimental GPU algorithm for lossguided mode (#2755)
-Improved GPU algorithm unit tests
-Removed some thrust code to improve compile times
2017-10-01 00:18:35 +13:00
Rory Mitchell
e6a9063344 Integer gradient summation for GPU histogram algorithm. (#2681) 2017-09-08 15:07:29 +12:00
Rory Mitchell
15267eedf2 [GPU-Plugin] Major refactor 2 (#2664)
* Change cmake option

* Move source files

* Move google tests

* Move python tests

* Move benchmarks

* Move documentation

* Remove makefile support

* Fix test run

* Move GPU tests
2017-09-08 09:57:16 +12:00
Rory Mitchell
ef23e424f1 [GPU-Plugin] Add GPU accelerated prediction (#2593)
* [GPU-Plugin] Add GPU accelerated prediction

* Improve allocation message

* Update documentation

* Resolve linker error for predictor

* Add unit tests
2017-08-16 12:31:59 +12:00
PSEUDOTENSOR / Jonathan McKinney
6b375f6ad8 Multi-threaded XGDMatrixCreateFromMat for faster DMatrix creation (#2530)
* Multi-threaded XGDMatrixCreateFromMat for faster DMatrix creation from numpy arrays for python interface.
2017-07-21 14:43:17 +12:00
PSEUDOTENSOR / Jonathan McKinney
ca7fc9fda3 [GPU-Plugin] Fix gpu_hist to allow matrices with more than just 2^{32} elements. Also fixed CPU hist algorithm. (#2518) 2017-07-18 11:19:27 +12:00
Rory Mitchell
530f01e21c [GPU-Plugin] Add load balancing search to gpu_hist. Add compressed iterator. (#2504) 2017-07-11 22:36:39 +12:00
Rory Mitchell
e939192978 Cmake improvements (#2487)
* Cmake improvements
* Add google test to cmake
2017-07-06 18:05:11 +12:00
Thejaswi
85b2fb3eee [GPU-Plugin] Integration of a faster version of grow_gpu plugin into mainstream (#2360)
* Integrating a faster version of grow_gpu plugin
1. Removed the older files to reduce duplication
2. Moved all of the grow_gpu files under 'exact' folder
3. All of them are inside 'exact' namespace to avoid any conflicts
4. Fixed a bug in benchmark.py while running only 'grow_gpu' plugin
5. Added cub and googletest submodules to ease integration and unit-testing
6. Updates to CMakeLists.txt to directly build cuda objects into libxgboost

* Added support for building gpu plugins through make flow
1. updated makefile and config.mk to add right targets
2. added unit-tests for gpu exact plugin code

* 1. Added support for building gpu plugin using 'make' flow as well
2. Updated instructions for building and testing gpu plugin

* Fix travis-ci errors for PR#2360
1. lint errors on unit-tests
2. removed googletest, instead depended upon dmlc-core provide gtest cache

* Some more fixes to travis-ci lint failures PR#2360

* Added Rory's copyrights to the files containing code from both.

* updated copyright statement as per Rory's request

* moved the static datasets into a script to generate them at runtime

* 1. memory usage print when silent=0
2. tests/ and test/ folder organization
3. removal of the dependency of googletest for just building xgboost
4. coding style updates for .cuh as well

* Fixes for compilation warnings

* add cuda object files as well when JVM_BINDINGS=ON
2017-06-06 09:39:53 +12:00
AbdealiJK
47ba2de7d4 tests/cpp: Add tests for multiclass_metric.cc 2016-12-04 11:25:57 -08:00
AbdealiJK
a7e20555a3 tests/cpp: Add tests for rank_metrics.cc 2016-12-04 11:25:57 -08:00
AbdealiJK
4a2ef130a7 tests/cpp: Add test for elementwise_metric.cc 2016-12-04 11:25:57 -08:00
AbdealiJK
03abd47f49 tests/cpp: Add tests for Metric RMSE 2016-12-04 11:25:57 -08:00
AbdealiJK
582c373274 tests/cpp: Add tests for metric.cc 2016-12-04 11:25:57 -08:00
AbdealiJK
cc859420ba tests/cpp: Add tests for TweedieRegression 2016-12-04 11:25:57 -08:00
AbdealiJK
fa865564f6 tests/cpp: Add tests for GammaRegression 2016-12-04 11:25:57 -08:00
AbdealiJK
401e4b5220 tests/cpp: Add tests for PoissonRegression 2016-12-04 11:25:57 -08:00
AbdealiJK
d41aab4f61 tests/cpp: Add tests for regression_obj.cc
Test the objective functions in regression_obj.cc

tests/cpp: Add tests for objective.cc and RegLossObj
2016-12-04 11:25:57 -08:00
AbdealiJK
fd99d39372 tests/cpp: Add tests for SplitEntry 2016-12-04 11:25:57 -08:00
AbdealiJK
62e3468603 tests/cpp: Add tests for param.h 2016-12-04 11:25:57 -08:00