15 Commits

Author SHA1 Message Date
Jiaming Yuan
4d81c741e9
External memory support for hist (#7531)
* Generate column matrix from gHistIndex.
* Avoid synchronization with the sparse page once the cache is written.
* Cleanups: Remove member variables/functions, change the update routine to look like approx and gpu_hist.
* Remove pruner.
2022-03-22 00:13:20 +08:00
Jiaming Yuan
e6088366df
Export Python Interface for external memory. (#7070)
* Add Python iterator interface.
* Add tests.
* Add demo.
* Add documents.
* Handle empty dataset.
2021-07-22 15:15:53 +08:00
Jiaming Yuan
bd1f3a38f0
Rewrite sparse dmatrix using callbacks. (#7092)
- Reduce dependency on dmlc parsers and provide an interface for users to load data by themselves.
- Remove use of threaded iterator and IO queue.
- Remove `page_size`.
- Make sure the number of pages in memory is bounded.
- Make sure the cache can not be violated.
- Provide an interface for internal algorithms to process data asynchronously.
2021-07-16 12:33:31 +08:00
Jiaming Yuan
14afdb4d92
Support categorical data in ellpack. (#6140) 2020-09-24 19:28:57 +08:00
Jiaming Yuan
e533908922
Expose device sketching in header. (#5747) 2020-06-02 13:02:53 +08:00
Jiaming Yuan
eaf2a00b5c
Enhance nvtx support. (#5636) 2020-05-06 22:54:24 +08:00
Jiaming Yuan
67d267f9da
Move device dmatrix construction code into ellpack. (#5623) 2020-05-06 19:43:59 +08:00
Rory Mitchell
b745b7acce
Fix memory usage of device sketching (#5407) 2020-03-14 13:43:24 +13:00
Rory Mitchell
3ad4333b0e
Partial rewrite EllpackPage (#5352) 2020-03-11 10:15:53 +13:00
Rory Mitchell
a38e7bd19c
Sketching from adapters (#5365)
* Sketching from adapters

* Add weights test
2020-03-07 21:07:58 +13:00
Rory Mitchell
bc96ceb8b2
Refactor SparsePageSource, delete cache files after use (#5321)
* Refactor sparse page source

* Delete temporary cache files

* Log fatal if cache exists

* Log fatal if multiple threads used with prefetcher
2020-02-19 16:43:41 +13:00
Kodi Arfer
f2277e7106 Use DART tree weights when computing SHAPs (#5050)
This PR fixes tree weights in dart being ignored when computing contributions.

* Fix ellpack page source link.
* Add tree weights to compute contribution.
2019-12-03 19:55:53 +08:00
Rong Ou
0afcc55d98 Support multiple batches in gpu_hist (#5014)
* Initial external memory training support for GPU Hist tree method.
2019-11-16 14:50:20 +08:00
Jiaming Yuan
7663de956c
Run training with empty DMatrix. (#4990)
This makes GPU Hist robust in distributed environment as some workers might not
be associated with any data in either training or evaluation.

* Disable rabit mock test for now: See #5012 .

* Disable dask-cudf test at prediction for now: See #5003

* Launch dask job for all workers despite they might not have any data.
* Check 0 rows in elementwise evaluation metrics.

   Using AUC and AUC-PR still throws an error.  See #4663 for a robust fix.

* Add tests for edge cases.
* Add `LaunchKernel` wrapper handling zero sized grid.
* Move some parts of allreducer into a cu file.
* Don't validate feature names when the booster is empty.

* Sync number of columns in DMatrix.

  As num_feature is required to be the same across all workers in data split
  mode.

* Filtering in dask interface now by default syncs all booster that's not
empty, instead of using rank 0.

* Fix Jenkins' GPU tests.

* Install dask-cuda from source in Jenkins' test.

  Now all tests are actually running.

* Restore GPU Hist tree synchronization test.

* Check UUID of running devices.

  The check is only performed on CUDA version >= 10.x, as 9.x doesn't have UUID field.

* Fix CMake policy and project variables.

  Use xgboost_SOURCE_DIR uniformly, add policy for CMake >= 3.13.

* Fix copying data to CPU

* Fix race condition in cpu predictor.

* Fix duplicated DMatrix construction.

* Don't download extra nccl in CI script.
2019-11-06 16:13:13 +08:00
Rong Ou
5b1715d97c Write ELLPACK pages to disk (#4879)
* add ellpack source
* add batch param
* extract function to parse cache info
* construct ellpack info separately
* push batch to ellpack page
* write ellpack page.
* make sparse page source reusable
2019-10-22 23:44:32 -04:00