1021 Commits

Author SHA1 Message Date
Jiaming Yuan
0fd6391a77
[backport] Fix loading DMatrix binary in distributed env. (#8149) (#8185)
* Fix loading DMatrix binary in distributed env. (#8149)

- Try to load DMatrix binary before trying to parse text input.
- Remove some unmaintained code.

* Fix.
2022-08-19 04:11:12 +08:00
Philip Hyunsu Cho
922d2137dd
[CI] Fix R build on Jenkins. (#8154) (#8180)
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2022-08-17 22:06:07 -07:00
Jiaming Yuan
7036d4f22b
Disable modin test on 1.6.0 branch. (#8176) 2022-08-18 04:13:10 +08:00
Jiaming Yuan
51c330159a
[backport] Fix LTR with weighted Quantile DMatrix. (#7975) (#8170)
* Fix LTR with weighted Quantile DMatrix.

* Better tests.
2022-08-15 17:50:16 +08:00
Jiaming Yuan
b18c984035
[dask] Deterministic rank assignment. (#8018) (#8165) 2022-08-15 15:18:26 +08:00
Jiaming Yuan
97d89c3ca1
[dask] Use an invalid port for test. (#8064) (#8167) 2022-08-15 12:23:12 +08:00
Jiaming Yuan
9d816d9988
[CI] Test with latest RAPIDS. (#7816) (#8164) 2022-08-13 01:06:52 +08:00
Jiaming Yuan
9c653378e2
Fix monotone constraint with tuple input. (#7891) (#8159) 2022-08-12 22:05:53 +08:00
Jiaming Yuan
39c1488a42
[backport] Update CUDA docker image and NCCL. (#8139) (#8162)
* Update CUDA docker image and NCCL. (#8139)

* Rest of the CI.

* CPU test dependencies.
2022-08-12 18:57:42 +08:00
Jiaming Yuan
5973c6e74e
Fix rmm build (#7973) (#7977)
- Optionally switch to c++17
- Use rmm CMake target.
- Workaround compiler errors.
- Fix GPUMetric inheritance.
- Run death tests even if it's built with RMM support.

Co-authored-by: jakirkham <jakirkham@gmail.com>

Co-authored-by: jakirkham <jakirkham@gmail.com>
2022-06-07 14:20:50 +08:00
Jiaming Yuan
b7c3fc9182
Fix overflow in prediction size. (#7885) (#7980) 2022-06-07 12:30:41 +08:00
Jiaming Yuan
645855e8b1
[backport] Fix arrow compatibility, hypothesis tests. (#7979) 2022-06-07 01:47:45 +08:00
Jiaming Yuan
eefa1ddd8a
[CI] Rotate package repository keys (#7943) (#7978)
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2022-06-07 00:00:54 +08:00
Jiaming Yuan
c2508814ff
[backport] Use maximum category in sketch. (#7853) (#7866) 2022-05-06 21:11:33 +08:00
Jiaming Yuan
b1b6246e35
[backport] Always use partition based categorical splits. (#7857) (#7865) 2022-05-06 19:14:19 +08:00
Philip Hyunsu Cho
78d231264a
[CI] Enable faulthandler to show details when 0xC0000005 error occurs (#7771) 2022-03-30 19:16:54 -07:00
Jiaming Yuan
4615fa51ef
Drop support for deprecated CUDA architecture. (#7767)
* Drop support for deprecated CUDA architecture.

* Check file size at release branch.

* Use 200 MB limit

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2022-03-30 15:16:35 -07:00
Jiaming Yuan
9150fdbd4d
Support pandas nullable types. (#7760) 2022-03-30 08:51:52 +08:00
Jiaming Yuan
a50b84244e
Cleanup configuration for constraints. (#7758) 2022-03-29 04:22:46 +08:00
Jiaming Yuan
3c9b04460a
Move num_parallel_tree to model parameter. (#7751)
The size of forest should be a property of model itself instead of a training
hyper-parameter.
2022-03-29 02:32:42 +08:00
Jiaming Yuan
8b3ecfca25
Mitigate flaky tests. (#7749)
* Skip non-increasing test with external memory when subsample is used.
* Increase bin numbers for boost from prediction test. This mitigates the effect of
  non-deterministic partitioning.
2022-03-28 21:20:50 +08:00
Jiaming Yuan
64575591d8
Use context in SetInfo. (#7687)
* Use the name `Context`.
* Pass a context object into `SetInfo`.
* Add context to proxy matrix.
* Add context to iterative DMatrix.

This is to remove the use of the default number of threads during `SetInfo` as a follow-up on
removing the global omp variable while preparing for CUDA stream semantic.  Currently, XGBoost
uses the legacy CUDA stream, we will gradually remove them in the future in favor of non-blocking streams.
2022-03-24 22:16:26 +08:00
Jiaming Yuan
4d81c741e9
External memory support for hist (#7531)
* Generate column matrix from gHistIndex.
* Avoid synchronization with the sparse page once the cache is written.
* Cleanups: Remove member variables/functions, change the update routine to look like approx and gpu_hist.
* Remove pruner.
2022-03-22 00:13:20 +08:00
Jiaming Yuan
996cc705af
Small cleanup to hist tree method. (#7735)
* Remove special optimization using number of bins.
* Remove 1-based index for column sampling.
* Remove data layout.
* Unify update prediction cache.
2022-03-20 03:44:55 +08:00
Jiaming Yuan
e78a38b837
Sort sparse page index when constructing DMatrix. (#7731) 2022-03-16 18:01:05 +08:00
Xiaochang Wu
613ec36c5a
Support building SimpleDMatrix from Arrow data format (#7512)
* Integrate with Arrow C data API.
* Support Arrow dataset.
* Support Arrow table.

Co-authored-by: Xiaochang Wu <xiaochang.wu@intel.com>
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
Co-authored-by: Zhang Zhang <zhang.zhang@intel.com>
2022-03-15 13:25:19 +08:00
Jiaming Yuan
98d6faefd6
Implement slope for Pseduo-Huber. (#7727)
* Add objective and metric.
* Some refactoring for CPU/GPU dispatching using linalg module.
2022-03-14 21:42:38 +08:00
Haoming Chen
04fc575c0e
Run tests in a temporary directory (#7723)
Fix some tests to run in a temporary directory in case the root
directory is not writable. Note that most of tests are already
running in the temporary directory, so this PR just make them
consistent.
2022-03-12 21:24:36 +08:00
Jiaming Yuan
a62a3d991d
[dask] prediction with categorical data. (#7708) 2022-03-10 00:21:48 +08:00
Cheng Li
a92e0f6240
multi groups in the constraints (#7711) 2022-03-01 18:10:15 +08:00
Jiaming Yuan
18a4af63aa
Update documents and tests. (#7659)
* Revise documents after recent refactoring and cat support.
* Add tests for behavior of max_depth and max_leaves.
2022-02-26 03:57:47 +08:00
Philip Hyunsu Cho
1b25dd59f9
Use CUDA 11 in clang-tidy (#7701)
* Show command args when clang-tidy fails

* Add option to specify CUDA args

* Use clang-tidy 11

* [CI] Use CUDA 11
2022-02-24 15:15:07 -08:00
Jiaming Yuan
83a66b4994
Support categorical data for hist. (#7695)
* Extract partitioner from hist.
* Implement categorical data support by passing the gradient index directly into the partitioner.
* Organize/update document.
* Remove code for negative hessian.
2022-02-25 03:47:14 +08:00
Jiaming Yuan
6762c45494
Small cleanup to gradient index and hist. (#7668)
* Code comments.
* Const accessor to index.
* Remove some weird variables in the `Index` class.
* Simplify the `MemStackAllocator`.
2022-02-23 11:37:21 +08:00
Jiaming Yuan
f08c5dcb06
Cleanup some pylint errors. (#7667)
* Cleanup some pylint errors.

* Cleanup pylint errors in rabit modules.
* Make data iter an abstract class and cleanup private access.
* Cleanup no-self-use for booster.
2022-02-19 18:53:12 +08:00
Jiaming Yuan
7366d3b20c
Ensure models with categorical splits don't use old binary format. (#7666) 2022-02-19 08:05:28 +08:00
Philip Hyunsu Cho
0149f81a5a
[CI] Fix S3 upload (#7662) 2022-02-16 01:35:27 -08:00
Jiaming Yuan
0d0abe1845
Support optimal partitioning for GPU hist. (#7652)
* Implement `MaxCategory` in quantile.
* Implement partition-based split for GPU evaluation.  Currently, it's based on the existing evaluation function.
* Extract an evaluator from GPU Hist to store the needed states.
* Added some CUDA stream/event utilities.
* Update document with references.
* Fixed a bug in approx evaluator where the number of data points is less than the number of categories.
2022-02-15 03:03:12 +08:00
Jiaming Yuan
2369d55e9a
Add tests for prediction cache. (#7650)
* Extract the test from approx for other tree methods.
* Add note on how it works.
2022-02-15 00:28:00 +08:00
Jiaming Yuan
5cd1f71b51
[dask] Improve configuration for port. (#7645)
- Try port 0 to let the OS return the available port.
- Add port configuration.
2022-02-14 21:34:34 +08:00
Jiaming Yuan
b52c4e13b0
[dask] Fix empty partition with pandas input. (#7644)
Empty partition is different from empty dataset.  For the former case, each worker has
non-empty dask collections, but each collection might contain empty partition.
2022-02-14 19:35:51 +08:00
Jiaming Yuan
2775c2a1ab
Prepare external memory support for hist. (#7638)
This PR prepares the GHistIndexMatrix to host the column matrix which is used by the hist tree method by accepting sparse_threshold parameter.

Some cleanups are made to ensure the correct batch param is being passed into DMatrix along with some additional tests for correctness of SimpleDMatrix.
2022-02-10 16:58:02 +08:00
Jiaming Yuan
3e693e4f97
[dask] Fix nthread config with dask sklearn wrapper. (#7633) 2022-02-08 06:38:32 +08:00
Philip Hyunsu Cho
34a238ca98
[CI] Clean up Python wheel build pipeline (#7626)
* [CI] Always upload artifacts to [branch_name]/

* [CI] Move detailed setup inside build_python_wheels.sh

* Fix typo
2022-02-03 00:55:44 -08:00
Philip Hyunsu Cho
f6e6d0b2c0
[CI] Build Python wheels for MacOS (x86_64 and arm64) (#7621)
* Build Python wheels for OSX (x86_64 and arm64)

* Use Conda's libomp when running Python tests

* fix

* Add comment to explain CIBW_TARGET_OSX_ARM64

* Update release script

* Add comments in build_python_wheels.sh

* Document wheel pipeline
2022-02-02 17:35:48 -08:00
Philip Hyunsu Cho
c621775f34
Replace all uses of deprecated function sklearn.datasets.load_boston (#7373)
* Replace all uses of deprecated function sklearn.datasets.load_boston

* More renaming

* Fix bad name

* Update assertion

* Fix n boosted rounds.

* Avoid over regularization.

* Rebase.

* Avoid over regularization.

* Whac-a-mole

Co-authored-by: fis <jm.yuan@outlook.com>
2022-01-30 04:27:57 -08:00
Philip Hyunsu Cho
b4340abf56
Add special handling for multi:softmax in sklearn predict (#7607)
* Add special handling for multi:softmax in sklearn predict

* Add test coverage
2022-01-29 15:54:49 -08:00
Jiaming Yuan
81210420c6
Remove omp_get_max_threads (#7608)
This is the one last PR for removing omp global variable.

* Add context object to the `DMatrix`.  This bridges `DMatrix` with https://github.com/dmlc/xgboost/issues/7308 .
* Require context to be available at the construction time of booster.
* Add `n_threads` support for R csc DMatrix constructor.
* Remove `omp_get_max_threads` in R glue code.
* Remove threading utilities that rely on omp global variable.
2022-01-28 16:09:22 +08:00
Jiaming Yuan
5d7818e75d
Remove omp_get_max_threads in tree updaters. (#7590) 2022-01-26 19:55:47 +08:00
Jiaming Yuan
24789429fd
Support latest pandas Index type. (#7595) 2022-01-26 18:20:10 +08:00