972 Commits

Author SHA1 Message Date
fuhaoda
dd60fc23e6 Simplify INI-style config reader using C++11 STL (#4478)
* simplify the config.h file

* revise config.h

* revised config.h

* revise format

* revise format issues

* revise whitespace issues

* revise whitespace namespace format issues

* revise namespace format issues

* format issues

* format issues

* format issues

* format issues

* Revert submodule changes

* minor change

* Update src/common/config.h

Co-Authored-By: Philip Hyunsu Cho <chohyu01@cs.washington.edu>

* address format issue from trivialfis

* Use correct cub submodule
2019-05-30 11:57:56 -07:00
Jiaming Yuan
b48f895027
Fix prediction from loaded pickle. (#4516) 2019-05-30 15:05:09 +08:00
sriramch
fed665ae8a - training with external memory part 1 of 2 (#4486)
* - training with external memory part 1 of 2
   - this pr focuses on computing the quantiles using multiple gpus on a
     dataset that uses the external cache capabilities
   - there will a follow-up pr soon after this that will support creation
     of histogram indices on large dataset as well
   - both of these changes are required to support training with external memory
   - the sparse pages in dmatrix are taken in batches and the the cut matrices
     are incrementally built
   - also snuck in some (perf) changes related to sketches aggregation amongst multiple
     features across multiple sparse page batches. instead of aggregating the summary
     inside each device and merged later, it is aggregated in-place when the device
     is working on different rows but the same feature
2019-05-30 08:18:34 +12:00
sriramch
6e16900711 Fix crash with approx tree method on cpu (#4510) 2019-05-30 01:11:29 +08:00
Jiaming Yuan
c589eff941
De-duplicate GPU parameters. (#4454)
* Only define `gpu_id` and `n_gpus` in `LearnerTrainParam`
* Pass LearnerTrainParam through XGBoost vid factory method.
* Disable all GPU usage when GPU related parameters are not specified (fixes XGBoost choosing GPU over aggressively).
* Test learner train param io.
* Fix gpu pickling.
2019-05-29 11:55:57 +08:00
sriramch
a3fedbeaa8 - fix issues with training with external memory on cpu (#4487)
* - fix issues with training with external memory on cpu
   - use the batch size to determine the correct number of rows in a batch
   - use the right number of threads in omp parallalization if the batch size
     is less than the default omp max threads (applicable for the last batch)

* - handle scenarios where last batch size is < available number of threads
- augment tests such that we can test all scenarios (batch size <, >, = number of threads)
2019-05-29 12:31:30 +12:00
Jiaming Yuan
55e645c5f5
Revert hist init optimization. (#4502) 2019-05-26 08:57:41 +08:00
Bryan Woods
278562db13 Add support for cross-validation using query ID (#4474)
* adding support for matrix slicing with query ID for cross-validation

* hail mary test of unrar installation for windows tests

* trying to modify tests to run in Github CI

* Remove dependency on wget and unrar

* Save error log from R test

* Relax assertion in test_training

* Use int instead of bool in C function interface

* Revise R interface

* Add XGDMatrixSliceDMatrixEx and keep old XGDMatrixSliceDMatrix for API compatibility
2019-05-23 10:45:02 -07:00
Rong Ou
a9ec2dd295 only copy the model once when predicting multiple batches (#4457) 2019-05-15 11:04:22 +12:00
Rong Ou
df2cdaca50 add cuda 10.1 support (#4468) 2019-05-14 18:30:58 +00:00
Philip Hyunsu Cho
c6f2a7e186
[CI] Add Windows GPU to Jenkins CI pipeline (#4463)
* Fix #4462: Use /MT flag consistently for MSVC target

* First attempt at Windows CI

* Distinguish stages in Linux and Windows pipelines

* Try running CMake in Windows pipeline

* Add build step
2019-05-14 04:45:06 +00:00
Rong Ou
be0f346ec9 mgpu predictor using explicit offsets (#4438)
* mgpu prediction using explicit sharding
2019-05-11 09:35:06 +12:00
Jiaming Yuan
5de7e12704
Change obj name to reg:squarederror in learner. (#4427)
* Change memory dump size in R test.
2019-05-06 21:35:35 +08:00
Xin Yin
8d1098a983 In AUC and AUCPR metrics, detect whether weights are per-instance or per-group (#4216)
* In AUC and AUCPR metrics, detect whether weights are per-instance or per-group

* Fix C++ style check

* Add a test for weighted AUC
2019-05-04 00:53:04 -07:00
Philip Hyunsu Cho
9252b686ae
Make AUCPR work with multiple query groups (#4436)
* Make AUCPR work with multiple query groups

* Check AUCPR <= 1.0 in distributed setting
2019-05-03 10:34:44 -07:00
ras44
2be85fc62a max_digits10 guarantees float decimal roundtrip (#4435)
2 additional digits are not needed to guarantee that casting the decimal representation will result in the same float, see https://github.com/dmlc/xgboost/issues/3980#issuecomment-458702440
2019-05-02 20:01:26 -07:00
Rong Ou
feb6ae3e18 Initial support for external memory in gpu_predictor (#4284) 2019-05-03 13:01:27 +12:00
Philip Hyunsu Cho
bfddc2c42c Make CMakeLists.txt compatible with CMake 3.3 (#4420)
* Make CMakeLists.txt compatible with CMake 3.3; require CMake 3.11 for MSVC

* Use CMake 3.12 when sanitizer is enabled

* Disable funroll-loops for MSVC

* Use cmake version in container name

* Add missing arg

* Fix egrep use in ci_build.sh

* Display CMake version

* Do not set OpenMP_CXX_LIBRARIES for MSVC

* Use cmake_minimum_required()
2019-05-02 11:49:32 +08:00
Philip Hyunsu Cho
17df5fd296
Simplify bound checking in feature interaction constraints (#4428) 2019-05-01 16:59:53 -07:00
Xu Xiao
4c74336384 Use feature interaction constraints to narrow search space for split candidates (#4341)
* Use feature interaction constraints to narrow search space for split candidates.

* fix clang-tidy broken at updater_quantile_hist.cc:535:3

* make const

* fix

* try to fix exception thrown in java_test

* fix suspected mistake which cause EvaluateSplit error

* try fix

* Fix bug: feature ID and node ID swapped in argument

* Rename CheckValidation() to CheckFeatureConstraint() for clarity

* Do not create temporary vector validFeatures, to enable parallelism
2019-04-30 20:59:58 -07:00
Philip Hyunsu Cho
ba98e0cdf2
Add additional Python tests to test training under constraints (#4426) 2019-04-30 18:23:39 -07:00
Rong Ou
eaab364a63 More explict sharding methods for device memory (#4396)
* Rename the Reshard method to Shard

* Add a new Reshard method for sharding a vector that's already sharded
2019-05-01 11:47:22 +12:00
Rory Mitchell
5e582b0fa7
Combine thread launches into single launch per tree for gpu_hist (#4343)
* Combine thread launches into single launch per tree for gpu_hist
algorithm.

* Address deprecation warning

* Add manual column sampler constructor

* Turn off omp dynamic to get a guaranteed number of threads

* Enable openmp in cuda code
2019-04-29 09:58:34 +12:00
Egor Smirnov
711397d645 Optimizations of pre-processing for 'hist' tree method (#4310)
* oprimizations for pre-processing

* code cleaning

* code cleaning

* code cleaning after review

* Apply suggestions from code review

Co-Authored-By: SmirnovEgorRu <egor.smirnov@intel.com>
2019-04-16 17:36:19 -07:00
Jiaming Yuan
207f058711 Refactor CMake scripts. (#4323)
* Refactor CMake scripts.

* Remove CMake CUDA wrapper.
* Bump CMake version for CUDA.
* Use CMake to handle Doxygen.
* Split up CMakeList.
* Export install target.
* Use modern CMake.
* Remove build.sh
* Workaround for gpu_hist test.
* Use cmake 3.12.

* Revert machine.conf.

* Move CLI test to gpu.

* Small cleanup.

* Support using XGBoost as submodule.

* Fix windows

* Fix cpp tests on Windows

* Remove duplicated find_package.
2019-04-15 10:08:12 -07:00
Jiaming Yuan
84d992babc
GPU multiclass metrics (#4368)
* Port multi classes metrics to CUDA.
2019-04-15 17:47:47 +08:00
Jiaming Yuan
5c2575535f
Fix Histogram allocation. (#4347)
* Fix Histogram allocation.

nidx_map is cleared after `Reset`, but histogram data size isn't changed hence
histogram recycling is used in later iterations.  After a reset(building new
tree), newly allocated node will start from 0, while recycling always choose
the node with smallest index, which happens to be our newly allocated node 0.
2019-04-10 19:21:26 +08:00
Rong Ou
81c1cd40ca add a test for cpu predictor using external memory (#4308)
* add a test for cpu predictor using external memory

* allow different page size for testing
2019-04-10 13:25:10 +12:00
sriramch
2f7087eba1 Improve HostDeviceVector exception safety (#4301)
* make the assignments of HostDeviceVector exception safe.
* storing a dummy GPUDistribution instance in HDV for CPU based code.
* change testxgboost binary location to build directory.
2019-03-31 22:48:58 +08:00
Hajime Morrita
680a1b36f3 Get rid of a few trivial compiler warnings. (#4312) 2019-03-31 00:02:29 +08:00
Rory Mitchell
3f312e30db
Retire DVec class in favour of c++20 style span for device memory. (#4293) 2019-03-28 13:59:58 +13:00
Rory Mitchell
6d5b34d824
Further optimisations for gpu_hist. (#4283)
- Fuse final update position functions into a single more efficient kernel

- Refactor gpu_hist with a more explicit ellpack  matrix representation
2019-03-24 17:17:22 +13:00
Rong Ou
5aa42b5f11 jenkins build for cuda 10.0 (#4281)
* jenkins build for cuda 10.0

* yum install nccl2 for cuda 10.0
2019-03-22 22:35:18 -07:00
Rory Mitchell
8eab966998
Allow unique prediction vector for each input matrix (#4275) 2019-03-21 11:38:16 +13:00
Jiaming Yuan
09bd9e68cf
Use Monitor in quantile hist. (#4273) 2019-03-20 09:26:22 +08:00
Rory Mitchell
00465d243d
Optimisations for gpu_hist. (#4248)
* Optimisations for gpu_hist.

* Use streams to overlap operations.

* ColumnSampler now uses HostDeviceVector to prevent repeatedly copying feature vectors to the device.
2019-03-20 13:30:06 +13:00
Jiaming Yuan
29a1356669
Deprecate reg:linear' in favor of reg:squarederror'. (#4267)
* Deprecate `reg:linear' in favor of `reg:squarederror'.
* Replace the use of `reg:linear'.
* Replace the use of `silent`.
2019-03-17 17:55:04 +08:00
Jiaming Yuan
cf8d5b9b76
Mark CUDA 10.1 as unsupported. (#4265) 2019-03-17 16:59:15 +08:00
Jiaming Yuan
fdcae024e7
Remove deprecated C APIs. (#4266) 2019-03-17 16:42:44 +08:00
Rory Mitchell
5465b73e7c
Fix multi-GPU test failures (#4259) 2019-03-15 14:40:43 +13:00
Andy Adinets
b833b642ec Improved multi-node multi-GPU random forests. (#4238)
* Improved multi-node multi-GPU random forests.

- removed rabit::Broadcast() from each invocation of column sampling
- instead, syncing the PRNG seed when a ColumnSampler() object is constructed
- this makes non-trivial column sampling significantly faster in the distributed case
- refactored distributed GPU tests
- added distributed random forests tests
2019-03-13 12:36:28 +13:00
Jiaming Yuan
7b9043cf71
Fix clang-tidy warnings. (#4149)
* Upgrade gtest for clang-tidy.
* Use CMake to install GTest instead of mv.
* Don't enforce clang-tidy to return 0 due to errors in thrust.
* Add a small test for tidy itself.

* Reformat.
2019-03-13 02:25:51 +08:00
Tong He
259fb809e9
fix R-devel errors (#4251) 2019-03-12 10:06:54 -07:00
Rory Mitchell
4eeeded7d1
Remove various synchronisations from cuda API calls, instrument monitor (#4205)
* Remove various synchronisations from cuda API calls, instrument monitor
with nvtx profiler ranges.
2019-03-10 15:01:23 +13:00
Philip Hyunsu Cho
f83e62dca5
Address #4042: Prevent out-of-range access in column matrix (#4231) 2019-03-08 17:11:42 -08:00
Rong Ou
9837b09b20 support cuda 10.1 (#4223)
* support cuda 10.1

* add cuda 10.1 to jenkins build matrix
2019-03-08 12:22:12 +13:00
Rong Ou
0944360416 minor fix: log InitDataOnce() only when it is actually called (#4206) 2019-03-08 10:53:09 +13:00
Matthew Jones
92b7577c62 [REVIEW] Enable Multi-Node Multi-GPU functionality (#4095)
* Initial commit to support multi-node multi-gpu xgboost using dask

* Fixed NCCL initialization by not ignoring the opg parameter.

- it now crashes on NCCL initialization, but at least we're attempting it properly

* At the root node, perform a rabit::Allreduce to get initial sum_gradient across workers

* Synchronizing in a couple of more places.

- now the workers don't go down, but just hang
- no more "wild" values of gradients
- probably needs syncing in more places

* Added another missing max-allreduce operation inside BuildHistLeftRight

* Removed unnecessary collective operations.

* Simplified rabit::Allreduce() sync of gradient sums.

* Removed unnecessary rabit syncs around ncclAllReduce.

- this improves performance _significantly_ (7x faster for overall training,
  20x faster for xgboost proper)

* pulling in latest xgboost

* removing changes to updater_quantile_hist.cc

* changing use_nccl_opg initialization, removing unnecessary if statements

* added definition for opaque ncclUniqueId struct to properly encapsulate GetUniqueId

* placing struct defintion in guard to avoid duplicate code errors

* addressing linting errors

* removing

* removing additional arguments to AllReduer initialization

* removing distributed flag

* making comm init symmetric

* removing distributed flag

* changing ncclCommInit to support multiple modalities

* fix indenting

* updating ncclCommInitRank block with necessary group calls

* fix indenting

* adding print statement, and updating accessor in vector

* improving print statement to end-line

* generalizing nccl_rank construction using rabit

* assume device_ordinals is the same for every node

* test, assume device_ordinals is identical for all nodes

* test, assume device_ordinals is unique for all nodes

* changing names of offset variable to be more descriptive, editing indenting

* wrapping ncclUniqueId GetUniqueId() and aesthetic changes

* adding synchronization, and tests for distributed

* adding  to tests

* fixing broken #endif

* fixing initialization of gpu histograms, correcting errors in tests

* adding to contributors list

* adding distributed tests to jenkins

* fixing bad path in distributed test

* debugging

* adding kubernetes for distributed tests

* adding proper import for OrderedDict

* adding urllib3==1.22 to address ordered_dict import error

* added sleep to allow workers to save their models for comparison

* adding name to GPU contributors under docs
2019-03-02 10:03:22 +13:00
Jiaming Yuan
7ea5675679
Add PushCSC for SparsePage. (#4193)
* Add PushCSC for SparsePage.

* Move Push* definitions into cc file.
* Add std:: prefix to `size_t` make clang++ happy.
* Address monitor count == 0.
2019-03-02 01:58:08 +08:00
Philip Hyunsu Cho
2aaae2e7bb
Fix #4163: always copy sliced data (#4165)
* Revert "Accept numpy array view. (#4147)"

This reverts commit a985a99cf0dacb26a5d734835473d492d3c2a0df.

* Fix #4163: always copy sliced data

* Remove print() from the test; check shape equality

* Check if 'base' attribute exists

* Fix lint

* Address reviewer comment

* Fix lint
2019-02-20 14:46:34 -08:00