5763 Commits

Author SHA1 Message Date
Jiaming Yuan
711f7f3851
Avoid std::terminate for R package. (#7661)
This is part of CRAN policies.
2022-02-17 01:27:20 +08:00
Jiaming Yuan
12949c6b31
[R] Implement feature weights. (#7660) 2022-02-16 22:20:52 +08:00
Philip Hyunsu Cho
0149f81a5a
[CI] Fix S3 upload (#7662) 2022-02-16 01:35:27 -08:00
Jiaming Yuan
93eebe8664
[doc] Fix broken link. [skip ci] (#7655) 2022-02-15 14:07:34 +08:00
Jiaming Yuan
0da7d872ef
[doc] Update for prediction. (#7648) 2022-02-15 05:01:55 +08:00
Jiaming Yuan
0d0abe1845
Support optimal partitioning for GPU hist. (#7652)
* Implement `MaxCategory` in quantile.
* Implement partition-based split for GPU evaluation.  Currently, it's based on the existing evaluation function.
* Extract an evaluator from GPU Hist to store the needed states.
* Added some CUDA stream/event utilities.
* Update document with references.
* Fixed a bug in approx evaluator where the number of data points is less than the number of categories.
2022-02-15 03:03:12 +08:00
Jiaming Yuan
2369d55e9a
Add tests for prediction cache. (#7650)
* Extract the test from approx for other tree methods.
* Add note on how it works.
2022-02-15 00:28:00 +08:00
Jiaming Yuan
5cd1f71b51
[dask] Improve configuration for port. (#7645)
- Try port 0 to let the OS return the available port.
- Add port configuration.
2022-02-14 21:34:34 +08:00
Jiaming Yuan
b52c4e13b0
[dask] Fix empty partition with pandas input. (#7644)
Empty partition is different from empty dataset.  For the former case, each worker has
non-empty dask collections, but each collection might contain empty partition.
2022-02-14 19:35:51 +08:00
Jiaming Yuan
1f020a6097
Add maintainer for R package. (#7649) 2022-02-12 23:45:30 +08:00
Jiaming Yuan
1441a6cd27
[CI] Update R cache. (#7646) 2022-02-11 19:50:11 +08:00
Jiaming Yuan
2775c2a1ab
Prepare external memory support for hist. (#7638)
This PR prepares the GHistIndexMatrix to host the column matrix which is used by the hist tree method by accepting sparse_threshold parameter.

Some cleanups are made to ensure the correct batch param is being passed into DMatrix along with some additional tests for correctness of SimpleDMatrix.
2022-02-10 16:58:02 +08:00
dependabot[bot]
87c01f49d8
Bump hadoop-common from 2.7.3 to 2.10.1 in /jvm-packages/xgboost4j-flink (#7641)
Bumps hadoop-common from 2.7.3 to 2.10.1.

---
updated-dependencies:
- dependency-name: org.apache.hadoop:hadoop-common
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-02-09 17:07:35 -08:00
Jiaming Yuan
fe4ce920b2
[dask] Cleanup dask module. (#7634)
* Add a new utility for mapping function onto workers.
* Unify the type for feature names.
* Clean up the iterator.
* Fix prediction with DaskDMatrix worker specification.
* Fix base margin with DeviceQuantileDMatrix.
* Support vs 2022 in setup.py.
2022-02-08 20:41:46 +08:00
Jiaming Yuan
926af9951e
Add missing train parameter for sklearn interface. (#7629)
Some other parameters are still missing and rely on **kwargs, for instance parameters from
dart.
2022-02-08 13:20:19 +08:00
Jiaming Yuan
3e693e4f97
[dask] Fix nthread config with dask sklearn wrapper. (#7633) 2022-02-08 06:38:32 +08:00
Ed Shee
d152c59a9c
fixed broken link to Seldon XGBoost server (#7628) 2022-02-05 01:03:29 +08:00
Philip Hyunsu Cho
34a238ca98
[CI] Clean up Python wheel build pipeline (#7626)
* [CI] Always upload artifacts to [branch_name]/

* [CI] Move detailed setup inside build_python_wheels.sh

* Fix typo
2022-02-03 00:55:44 -08:00
Philip Hyunsu Cho
f6e6d0b2c0
[CI] Build Python wheels for MacOS (x86_64 and arm64) (#7621)
* Build Python wheels for OSX (x86_64 and arm64)

* Use Conda's libomp when running Python tests

* fix

* Add comment to explain CIBW_TARGET_OSX_ARM64

* Update release script

* Add comments in build_python_wheels.sh

* Document wheel pipeline
2022-02-02 17:35:48 -08:00
Philip Hyunsu Cho
271a7c5d43
[Doc] fix typo in install doc (#7623) 2022-01-31 13:35:56 -08:00
Philip Hyunsu Cho
c621775f34
Replace all uses of deprecated function sklearn.datasets.load_boston (#7373)
* Replace all uses of deprecated function sklearn.datasets.load_boston

* More renaming

* Fix bad name

* Update assertion

* Fix n boosted rounds.

* Avoid over regularization.

* Rebase.

* Avoid over regularization.

* Whac-a-mole

Co-authored-by: fis <jm.yuan@outlook.com>
2022-01-30 04:27:57 -08:00
Philip Hyunsu Cho
b4340abf56
Add special handling for multi:softmax in sklearn predict (#7607)
* Add special handling for multi:softmax in sklearn predict

* Add test coverage
2022-01-29 15:54:49 -08:00
david-cortes
7f738e7f6f
[R] Accept CSR data for predictions (#7615) 2022-01-30 00:54:57 +08:00
Michael Chirico
549bd419bb
use exit hook to remove temp file (#7611)
This guarantees the removal will trigger for unexpected early exits
2022-01-29 16:06:52 +08:00
Philip Hyunsu Cho
f21301c749
[Doc] Add instruction to install XGBoost for Apple Silicon using Conda (#7612) 2022-01-28 01:06:39 -08:00
Jiaming Yuan
81210420c6
Remove omp_get_max_threads (#7608)
This is the one last PR for removing omp global variable.

* Add context object to the `DMatrix`.  This bridges `DMatrix` with https://github.com/dmlc/xgboost/issues/7308 .
* Require context to be available at the construction time of booster.
* Add `n_threads` support for R csc DMatrix constructor.
* Remove `omp_get_max_threads` in R glue code.
* Remove threading utilities that rely on omp global variable.
2022-01-28 16:09:22 +08:00
Philip Hyunsu Cho
028bdc1740
[R] Fix typo in docstring (#7606) 2022-01-26 23:33:25 +08:00
Jiaming Yuan
e060519d4f
Avoid regenerating the gradient index for approx. (#7591) 2022-01-26 21:41:30 +08:00
Jiaming Yuan
5d7818e75d
Remove omp_get_max_threads in tree updaters. (#7590) 2022-01-26 19:55:47 +08:00
Jiaming Yuan
24789429fd
Support latest pandas Index type. (#7595) 2022-01-26 18:20:10 +08:00
AJ Schmidt
511805c981
Compress fatbins (#7601)
* compress CUDA device code

Co-authored-by: ptaylor <paul.e.taylor@me.com>
2022-01-25 18:30:59 +08:00
Jiaming Yuan
6967ef7267
Remove omp_get_max_threads in objective. (#7589) 2022-01-24 04:35:49 +08:00
Jiaming Yuan
5817840858
Remove omp_get_max_threads in data. (#7588) 2022-01-24 02:44:07 +08:00
Jiaming Yuan
f84291c1e1
Fix max_cat_to_onehot doc annotation [skip ci] (#7592) 2022-01-23 16:33:23 +08:00
Jiaming Yuan
d262503781
[R] Implement new save raw in R. (#7571) 2022-01-22 20:55:47 +08:00
Jiaming Yuan
ef4dae4c0e
[dask] Add scheduler address to dask config. (#7581)
- Add user configuration.
- Bring back to the logic of using scheduler address from dask.  This was removed when we were trying to support GKE, now we bring it back and let xgboost try it if direct guess or host IP from user config failed.
2022-01-22 01:56:32 +08:00
Jiaming Yuan
5ddd4a9d06
Small cleanup to tests. (#7585)
* Use random port in dask tests to avoid warnings for occupied port.
* Increase the difficulty of AUC tests.
2022-01-21 06:26:57 +00:00
Philip Hyunsu Cho
9fd510faa5
[CI] Clarify steps for publishing artifacts to Maven Central (#7582) 2022-01-20 14:23:07 -08:00
Jiaming Yuan
529cf8a54a
Configure cub version automatically. (#7579)
Note that when cub inside CUDA is being used, XGBoost performs checks on input size
instead of using internal cub function to accept inputs larger than maximum integer.
2022-01-20 19:49:26 +08:00
Jiaming Yuan
ac7a36367c
[jvm-packages] Implement new save_raw in jvm-packages. (#7570)
* New `toByteArray` that accepts a parameter for format.
2022-01-19 16:00:14 +08:00
Jiaming Yuan
b4ec1682c6
Update document for multi output and categorical. (#7574)
* Group together categorical related parameters.
* Update documents about multioutput and categorical.
2022-01-19 04:35:17 +08:00
Jiaming Yuan
dac9eb13bd
Implement new save_raw in Python. (#7572)
* Expose the new C API function to Python.
* Remove old document and helper script.
* Small optimization to the `save_raw` and Json ctors.
2022-01-19 02:27:51 +08:00
Jiaming Yuan
9f20a3315e
Test with latest numpy. (#7573) 2022-01-19 00:46:23 +08:00
Jiaming Yuan
bb56bb9a13
Fix merge conflict. (#7577) 2022-01-18 23:01:34 +08:00
Jiaming Yuan
cc06fab9a7
Support distributed CPU env for categorical data. (#7575)
* Add support for cat data in sketch allreduce.
* Share tests between CPU and GPU.
2022-01-18 21:56:07 +08:00
Jiaming Yuan
deab0e32ba
Validate out of range categorical value. (#7576)
* Use float in CPU categorical set to preserve the input value.
* Check out of range values.
2022-01-18 20:16:19 +08:00
Jiaming Yuan
d6ea5cc1ed
Cover approx tree method for categorical data tests. (#7569)
* Add tree to df tests.
* Add plotting tests.
* Add histogram tests.
2022-01-16 11:31:40 +08:00
Jiaming Yuan
465dc63833
Fix tree param feature type. (#7565) 2022-01-16 04:46:29 +08:00
Jiaming Yuan
a1bcd33a3b
[breaking] Change internal model serialization to UBJSON. (#7556)
* Use typed array for models.
* Change the memory snapshot format.
* Add new C API for saving to raw format.
2022-01-16 02:11:53 +08:00
Jiaming Yuan
13b0fa4b97
Implement get_group. (#7564) 2022-01-16 02:07:42 +08:00