673 Commits

Author SHA1 Message Date
Jiaming Yuan
9c653378e2
Fix monotone constraint with tuple input. (#7891) (#8159) 2022-08-12 22:05:53 +08:00
Jiaming Yuan
140c377a96
[backport] Fix compatibility with latest cupy. (#8129) (#8160)
* Fix compatibility with latest cupy.

* Freeze mypy.
2022-08-12 22:02:05 +08:00
Jiaming Yuan
a55d3bdde2
[backport] Fix pylint errors. (#7967) (#7981)
* Fix pylint errors. (#7967)

* Rebase error.
2022-06-07 23:09:53 +08:00
Jiaming Yuan
645855e8b1
[backport] Fix arrow compatibility, hypothesis tests. (#7979) 2022-06-07 01:47:45 +08:00
Jiaming Yuan
5d92a7d936
Bump release version to 1.6.1. (#7872) 2022-05-08 14:20:50 +08:00
Jiaming Yuan
f75c007f27
Make 1.6.0 release. (#7813) 2022-04-16 08:43:21 +08:00
Jiaming Yuan
4bd5a33b10
Make rc1 release. (#7764) 2022-03-30 21:32:40 +08:00
Jiaming Yuan
9150fdbd4d
Support pandas nullable types. (#7760) 2022-03-30 08:51:52 +08:00
Jiaming Yuan
a50b84244e
Cleanup configuration for constraints. (#7758) 2022-03-29 04:22:46 +08:00
Jiaming Yuan
3c9b04460a
Move num_parallel_tree to model parameter. (#7751)
The size of forest should be a property of model itself instead of a training
hyper-parameter.
2022-03-29 02:32:42 +08:00
Jiaming Yuan
b3ba0e8708
Check cupy lazily. (#7752) 2022-03-26 06:09:58 +08:00
Chengyang
c92ab2ce49
Add type hints to core.py (#7707)
Co-authored-by: Chengyang Gu <bridgream@gmail.com>
Co-authored-by: jiamingy <jm.yuan@outlook.com>
2022-03-23 21:12:14 +08:00
Xiaochang Wu
613ec36c5a
Support building SimpleDMatrix from Arrow data format (#7512)
* Integrate with Arrow C data API.
* Support Arrow dataset.
* Support Arrow table.

Co-authored-by: Xiaochang Wu <xiaochang.wu@intel.com>
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
Co-authored-by: Zhang Zhang <zhang.zhang@intel.com>
2022-03-15 13:25:19 +08:00
Jiaming Yuan
a62a3d991d
[dask] prediction with categorical data. (#7708) 2022-03-10 00:21:48 +08:00
Pradipta Ghosh
68b6d6bbe2
Fix for Feature shape mismatch error (#7715) 2022-03-03 21:36:29 +08:00
Cheng Li
a92e0f6240
multi groups in the constraints (#7711) 2022-03-01 18:10:15 +08:00
Jiaming Yuan
83a66b4994
Support categorical data for hist. (#7695)
* Extract partitioner from hist.
* Implement categorical data support by passing the gradient index directly into the partitioner.
* Organize/update document.
* Remove code for negative hessian.
2022-02-25 03:47:14 +08:00
Jiaming Yuan
c859764d29
[doc] Clarify that states in callbacks are mutated. (#7685)
* Fix copy for cv.  This prevents inserting default callbacks into the input list.
* Clarify the behavior of callbacks in training/cv.
* Fix typos in doc.
2022-02-22 11:45:00 +08:00
Jiaming Yuan
e56d1779e1
Require Python 3.7. (#7682)
* Update setup.py.
2022-02-21 05:46:48 +08:00
Jiaming Yuan
f08c5dcb06
Cleanup some pylint errors. (#7667)
* Cleanup some pylint errors.

* Cleanup pylint errors in rabit modules.
* Make data iter an abstract class and cleanup private access.
* Cleanup no-self-use for booster.
2022-02-19 18:53:12 +08:00
Jiaming Yuan
b76c5d54bf
Define export symbols in callback module. (#7665) 2022-02-19 18:52:41 +08:00
Jiaming Yuan
0d0abe1845
Support optimal partitioning for GPU hist. (#7652)
* Implement `MaxCategory` in quantile.
* Implement partition-based split for GPU evaluation.  Currently, it's based on the existing evaluation function.
* Extract an evaluator from GPU Hist to store the needed states.
* Added some CUDA stream/event utilities.
* Update document with references.
* Fixed a bug in approx evaluator where the number of data points is less than the number of categories.
2022-02-15 03:03:12 +08:00
Jiaming Yuan
5cd1f71b51
[dask] Improve configuration for port. (#7645)
- Try port 0 to let the OS return the available port.
- Add port configuration.
2022-02-14 21:34:34 +08:00
Jiaming Yuan
b52c4e13b0
[dask] Fix empty partition with pandas input. (#7644)
Empty partition is different from empty dataset.  For the former case, each worker has
non-empty dask collections, but each collection might contain empty partition.
2022-02-14 19:35:51 +08:00
Jiaming Yuan
fe4ce920b2
[dask] Cleanup dask module. (#7634)
* Add a new utility for mapping function onto workers.
* Unify the type for feature names.
* Clean up the iterator.
* Fix prediction with DaskDMatrix worker specification.
* Fix base margin with DeviceQuantileDMatrix.
* Support vs 2022 in setup.py.
2022-02-08 20:41:46 +08:00
Jiaming Yuan
926af9951e
Add missing train parameter for sklearn interface. (#7629)
Some other parameters are still missing and rely on **kwargs, for instance parameters from
dart.
2022-02-08 13:20:19 +08:00
Jiaming Yuan
3e693e4f97
[dask] Fix nthread config with dask sklearn wrapper. (#7633) 2022-02-08 06:38:32 +08:00
Philip Hyunsu Cho
f6e6d0b2c0
[CI] Build Python wheels for MacOS (x86_64 and arm64) (#7621)
* Build Python wheels for OSX (x86_64 and arm64)

* Use Conda's libomp when running Python tests

* fix

* Add comment to explain CIBW_TARGET_OSX_ARM64

* Update release script

* Add comments in build_python_wheels.sh

* Document wheel pipeline
2022-02-02 17:35:48 -08:00
Philip Hyunsu Cho
b4340abf56
Add special handling for multi:softmax in sklearn predict (#7607)
* Add special handling for multi:softmax in sklearn predict

* Add test coverage
2022-01-29 15:54:49 -08:00
Jiaming Yuan
24789429fd
Support latest pandas Index type. (#7595) 2022-01-26 18:20:10 +08:00
Jiaming Yuan
f84291c1e1
Fix max_cat_to_onehot doc annotation [skip ci] (#7592) 2022-01-23 16:33:23 +08:00
Jiaming Yuan
ef4dae4c0e
[dask] Add scheduler address to dask config. (#7581)
- Add user configuration.
- Bring back to the logic of using scheduler address from dask.  This was removed when we were trying to support GKE, now we bring it back and let xgboost try it if direct guess or host IP from user config failed.
2022-01-22 01:56:32 +08:00
Jiaming Yuan
b4ec1682c6
Update document for multi output and categorical. (#7574)
* Group together categorical related parameters.
* Update documents about multioutput and categorical.
2022-01-19 04:35:17 +08:00
Jiaming Yuan
dac9eb13bd
Implement new save_raw in Python. (#7572)
* Expose the new C API function to Python.
* Remove old document and helper script.
* Small optimization to the `save_raw` and Json ctors.
2022-01-19 02:27:51 +08:00
Jiaming Yuan
13b0fa4b97
Implement get_group. (#7564) 2022-01-16 02:07:42 +08:00
Jiaming Yuan
52277cc3da
Rename build info function to be consistent with rest of the API. (#7553) 2022-01-14 00:39:28 +08:00
Jiaming Yuan
001503186c
Rewrite approx (#7214)
This PR rewrites the approx tree method to use codebase from hist for better performance and code sharing.

The rewrite has many benefits:
- Support for both `max_leaves` and `max_depth`.
- Support for `grow_policy`.
- Support for mono constraint.
- Support for feature weights.
- Support for easier bin configuration (`max_bin`).
- Support for categorical data.
- Faster performance for most of the datasets. (many times faster)
- Support for prediction cache.
- Significantly better performance for external memory.
- Unites the code base between approx and hist.
2022-01-10 21:15:05 +08:00
Jiaming Yuan
54582f641a
[doc] Use cross references in sphinx doc. (#7522)
* Use cross references instead of URL.
* Fix auto doc for callback.
2022-01-05 03:21:25 +08:00
Jiaming Yuan
eb1efb54b5
Define feature_names_in_. (#7526)
* Define `feature_names_in_`.
* Raise attribute error if it's not defined.
2022-01-05 01:35:34 +08:00
Jiaming Yuan
8f0a42a266
Initial support for multi-label classification. (#7521)
* Add support in sklearn classifier.
2022-01-04 23:58:21 +08:00
Jiaming Yuan
58a6723eb1
Initial support for multioutput regression. (#7514)
* Add num target model parameter, which is configured from input labels.
* Change elementwise metric and indexing for weights.
* Add demo.
* Add tests.
2021-12-18 09:28:38 +08:00
Jiaming Yuan
6f8a4633b7
Fix Python typehint with upgraded mypy. (#7513) 2021-12-16 23:08:08 +08:00
Jiaming Yuan
70b12d898a
[dask] Fix ddqdm with empty partition. (#7510)
* Fix empty partition.

* war.
2021-12-16 20:37:29 +08:00
Jiaming Yuan
05497a9141
[dask] Fix asyncio. (#7508) 2021-12-13 01:48:25 +08:00
Jiaming Yuan
021f8bf28b
Fix pylint. (#7498) 2021-12-07 13:23:30 +08:00
Jiaming Yuan
b124a27f57
Support scipy sparse in dask. (#7457) 2021-11-23 16:45:36 +08:00
Harvey
0552ca8021
Fix typo (#7469) 2021-11-23 08:58:45 +08:00
Jiaming Yuan
d33854af1b
[Breaking] Accept multi-dim meta info. (#7405)
This PR changes base_margin into a 3-dim array, with one of them being reserved for multi-target classification. Also, a breaking change is made for binary serialization due to extra dimension along with a fix for saving the feature weights. Lastly, it unifies the prediction initialization between CPU and GPU. After this PR, the meta info setter in Python will be based on array interface.
2021-11-18 23:02:54 +08:00
Jiaming Yuan
e27f543deb
Set use_logger in tracker to false. (#7438) 2021-11-16 05:12:42 +08:00
Kian Meng Ang
d27a11ff87
Fix typos in python package (#7432) 2021-11-14 17:20:19 +08:00