688 Commits

Author SHA1 Message Date
Jiaming Yuan
4d81c741e9
External memory support for hist (#7531)
* Generate column matrix from gHistIndex.
* Avoid synchronization with the sparse page once the cache is written.
* Cleanups: Remove member variables/functions, change the update routine to look like approx and gpu_hist.
* Remove pruner.
2022-03-22 00:13:20 +08:00
Jiaming Yuan
98d6faefd6
Implement slope for Pseduo-Huber. (#7727)
* Add objective and metric.
* Some refactoring for CPU/GPU dispatching using linalg module.
2022-03-14 21:42:38 +08:00
Jiaming Yuan
18a4af63aa
Update documents and tests. (#7659)
* Revise documents after recent refactoring and cat support.
* Add tests for behavior of max_depth and max_leaves.
2022-02-26 03:57:47 +08:00
Jiaming Yuan
83a66b4994
Support categorical data for hist. (#7695)
* Extract partitioner from hist.
* Implement categorical data support by passing the gradient index directly into the partitioner.
* Organize/update document.
* Remove code for negative hessian.
2022-02-25 03:47:14 +08:00
Jiaming Yuan
49c74a5369
Update R package description. (#7691)
* Change role.
* Remove cmake file when building the package.
2022-02-23 08:36:37 +08:00
Jiaming Yuan
584bae1fc6
Fix document build with scikit-learn (#7684)
* Require sphinx >= 4.4 for RTD.

* Install sklearn.
2022-02-22 08:58:54 +08:00
Jiaming Yuan
14d61b0141
[doc] Update document for building from source. (#7664)
- Mention standard install command for R package.
- Remove repeated "get source" step.
- Remove troubleshooting on Windows.  It's outdated considering VS 2022 is already out.
2022-02-19 04:57:03 +08:00
Jiaming Yuan
12949c6b31
[R] Implement feature weights. (#7660) 2022-02-16 22:20:52 +08:00
Jiaming Yuan
93eebe8664
[doc] Fix broken link. [skip ci] (#7655) 2022-02-15 14:07:34 +08:00
Jiaming Yuan
0da7d872ef
[doc] Update for prediction. (#7648) 2022-02-15 05:01:55 +08:00
Jiaming Yuan
0d0abe1845
Support optimal partitioning for GPU hist. (#7652)
* Implement `MaxCategory` in quantile.
* Implement partition-based split for GPU evaluation.  Currently, it's based on the existing evaluation function.
* Extract an evaluator from GPU Hist to store the needed states.
* Added some CUDA stream/event utilities.
* Update document with references.
* Fixed a bug in approx evaluator where the number of data points is less than the number of categories.
2022-02-15 03:03:12 +08:00
Jiaming Yuan
5cd1f71b51
[dask] Improve configuration for port. (#7645)
- Try port 0 to let the OS return the available port.
- Add port configuration.
2022-02-14 21:34:34 +08:00
Philip Hyunsu Cho
f6e6d0b2c0
[CI] Build Python wheels for MacOS (x86_64 and arm64) (#7621)
* Build Python wheels for OSX (x86_64 and arm64)

* Use Conda's libomp when running Python tests

* fix

* Add comment to explain CIBW_TARGET_OSX_ARM64

* Update release script

* Add comments in build_python_wheels.sh

* Document wheel pipeline
2022-02-02 17:35:48 -08:00
Philip Hyunsu Cho
271a7c5d43
[Doc] fix typo in install doc (#7623) 2022-01-31 13:35:56 -08:00
Philip Hyunsu Cho
f21301c749
[Doc] Add instruction to install XGBoost for Apple Silicon using Conda (#7612) 2022-01-28 01:06:39 -08:00
Jiaming Yuan
ef4dae4c0e
[dask] Add scheduler address to dask config. (#7581)
- Add user configuration.
- Bring back to the logic of using scheduler address from dask.  This was removed when we were trying to support GKE, now we bring it back and let xgboost try it if direct guess or host IP from user config failed.
2022-01-22 01:56:32 +08:00
Jiaming Yuan
b4ec1682c6
Update document for multi output and categorical. (#7574)
* Group together categorical related parameters.
* Update documents about multioutput and categorical.
2022-01-19 04:35:17 +08:00
Jiaming Yuan
dac9eb13bd
Implement new save_raw in Python. (#7572)
* Expose the new C API function to Python.
* Remove old document and helper script.
* Small optimization to the `save_raw` and Json ctors.
2022-01-19 02:27:51 +08:00
Jiaming Yuan
deab0e32ba
Validate out of range categorical value. (#7576)
* Use float in CPU categorical set to preserve the input value.
* Check out of range values.
2022-01-18 20:16:19 +08:00
Jiaming Yuan
a1bcd33a3b
[breaking] Change internal model serialization to UBJSON. (#7556)
* Use typed array for models.
* Change the memory snapshot format.
* Add new C API for saving to raw format.
2022-01-16 02:11:53 +08:00
Jiaming Yuan
e5e47c3c99
Clarify the behavior of invalid categorical value handling. (#7529) 2022-01-13 16:11:52 +08:00
Jiaming Yuan
001503186c
Rewrite approx (#7214)
This PR rewrites the approx tree method to use codebase from hist for better performance and code sharing.

The rewrite has many benefits:
- Support for both `max_leaves` and `max_depth`.
- Support for `grow_policy`.
- Support for mono constraint.
- Support for feature weights.
- Support for easier bin configuration (`max_bin`).
- Support for categorical data.
- Faster performance for most of the datasets. (many times faster)
- Support for prediction cache.
- Significantly better performance for external memory.
- Unites the code base between approx and hist.
2022-01-10 21:15:05 +08:00
Jiaming Yuan
ec56d5869b
[doc] Include dask examples into doc. (#7530) 2022-01-05 03:27:22 +08:00
Jiaming Yuan
54582f641a
[doc] Use cross references in sphinx doc. (#7522)
* Use cross references instead of URL.
* Fix auto doc for callback.
2022-01-05 03:21:25 +08:00
Jiaming Yuan
8f0a42a266
Initial support for multi-label classification. (#7521)
* Add support in sklearn classifier.
2022-01-04 23:58:21 +08:00
Randall Britten
a4a0ebb85d
[doc] Lowercase omega for per tree complexity (#7532)
As suggested on issue #7480
2021-12-29 23:05:54 +08:00
Jiaming Yuan
a512b4b394
[doc] Promote dask from experimental. [skip ci] (#7509) 2021-12-16 14:17:06 +08:00
Harvey
1864fab592
Minor edits to Parameters doc page. (#7500)
* bost -> both

* doc improvement

* use original filename

* syntax highlight false

* missed a few highlights
2021-12-07 15:46:44 +08:00
danmarinescu
6f38f5affa
Updated CMake version requirement in build.rst (#7487)
The documentation states that to build from source you need CMake 3.13 or higher. However, according to https://github.com/dmlc/xgboost/blob/master/CMakeLists.txt#L1 CMake 3.14 or higher is required.
2021-11-27 09:58:01 +08:00
Jiaming Yuan
c024c42dce
Modernize XGBoost Python document. (#7468)
* Use sphinx gallery to integrate examples.
* Remove mock objects.
* Add dask doc inventory.
2021-11-23 23:24:52 +08:00
Jiaming Yuan
d33854af1b
[Breaking] Accept multi-dim meta info. (#7405)
This PR changes base_margin into a 3-dim array, with one of them being reserved for multi-target classification. Also, a breaking change is made for binary serialization due to extra dimension along with a fix for saving the feature weights. Lastly, it unifies the prediction initialization between CPU and GPU. After this PR, the meta info setter in Python will be based on array interface.
2021-11-18 23:02:54 +08:00
Philip Hyunsu Cho
2adf222fb2
[CI] CI cost saving (#7407)
* [CI] Drop CUDA 10.1; Require 11.0

* Change NCCL version

* Use CUDA 10.1 for clang-tidy, for now

* Remove JDK 11 and 12

* Fix NCCL version

* Don't require 11.0 just yet, until clang-tidy is fixed

* Skip MultiClassesSerializationTest.GpuHist
2021-11-17 21:02:20 -08:00
Jiaming Yuan
97d7582457
Delay breaking changes to 1.6. (#7420)
The patch is too big to be backported.
2021-11-12 16:46:03 +08:00
Jiaming Yuan
8df0a252b7
[doc] Update document for GPU. [skip ci] (#7403)
* Remove outdated workaround and description.
2021-11-09 02:05:55 +08:00
Jiaming Yuan
c968217ca8
[R] Fix global feature importance and predict with 1 sample. (#7394)
* [R] Fix global feature importance.

* Add implementation for tree index.  The parameter is not documented in C API since we
should work on porting the model slicing to R instead of supporting more use of tree
index.

* Fix the difference between "gain" and "total_gain".

* debug.

* Fix prediction.
2021-11-05 10:07:00 +08:00
Jiaming Yuan
48aff0eabd
[doc][jvm-packages] Update information about Python tracker. [skip ci] (#7396) 2021-11-05 05:55:13 +08:00
Jiaming Yuan
232144ca09
Add note about CRAN release [skip ci] (#7395) 2021-11-05 00:34:14 +08:00
Jiaming Yuan
32e673d8c4
Support building with CTK11.5. (#7379)
* Support building with CTK11.5.

* Require system cub installation for CTK11.4+.
* Check thrust version for segmented sort.
2021-11-02 16:22:26 +08:00
Jiaming Yuan
45aef75cca
Move skl eval_metric and early_stopping rounds to model params. (#6751)
A new parameter `custom_metric` is added to `train` and `cv` to distinguish the behaviour from the old `feval`.  And `feval` is deprecated.  The new `custom_metric` receives transformed prediction when the built-in objective is used.  This enables XGBoost to use cost functions from other libraries like scikit-learn directly without going through the definition of the link function.

`eval_metric` and `early_stopping_rounds` in sklearn interface are moved from `fit` to `__init__` and is now saved as part of the scikit-learn model.  The old ones in `fit` function are now deprecated. The new `eval_metric` in `__init__` has the same new behaviour as `custom_metric`.

Added more detailed documents for the behaviour of custom objective and metric.
2021-10-28 17:20:20 +08:00
Jiaming Yuan
b9414b6477
Update GPU doc for PR-AUC. [skip ci] (#7368) 2021-10-27 16:31:07 +08:00
Jiaming Yuan
d4349426d8
Re-implement PR-AUC. (#7297)
* Support binary/multi-class classification, ranking.
* Add documents.
* Handle missing data.
2021-10-26 13:07:50 +08:00
Jiaming Yuan
e36b066344
[doc] Document the status of RTD hosting. [skip ci] (#7353) 2021-10-22 14:12:55 +08:00
Jiaming Yuan
864d236a82
[doc] Remove num_pbuffer. [skip ci] (#7356) 2021-10-22 14:12:32 +08:00
Jiaming Yuan
15685996fc
[doc] Small improvements for categorical data document. (#7330) 2021-10-20 18:04:32 +08:00
Philip Hyunsu Cho
b8e8f0fcd9
[doc] Use latest Sphinx RTD theme (#7347) 2021-10-20 00:04:43 -07:00
Jiaming Yuan
3b0b74fa94
[doc] Use RTD theme. (#7346) 2021-10-19 23:49:19 -07:00
Jiaming Yuan
376b448015
[doc] Fix broken links. (#7341)
* Fix most of the link checks from sphinx.
* Remove duplicate explicit target name.
2021-10-20 14:45:30 +08:00
Jiaming Yuan
5ff210ed75
Small fix for the release doc and script. [skip ci] (#7332)
Add Philip as co-maintainer of maven packages.
2021-10-20 12:49:12 +08:00
Jiaming Yuan
fbb0dc4275
Remove auto configuration of seed_per_iteration. (#7009)
* Remove auto configuration of seed_per_iteration.

This should be related to model recovery from rabit, which is removed.

* Document.
2021-10-17 15:58:57 +08:00
Jiaming Yuan
e6a142fe70
Fix document about best_iteration (#7324) 2021-10-14 15:30:46 -07:00