847 Commits

Author SHA1 Message Date
Jiaming Yuan
3e693e4f97
[dask] Fix nthread config with dask sklearn wrapper. (#7633) 2022-02-08 06:38:32 +08:00
Philip Hyunsu Cho
f6e6d0b2c0
[CI] Build Python wheels for MacOS (x86_64 and arm64) (#7621)
* Build Python wheels for OSX (x86_64 and arm64)

* Use Conda's libomp when running Python tests

* fix

* Add comment to explain CIBW_TARGET_OSX_ARM64

* Update release script

* Add comments in build_python_wheels.sh

* Document wheel pipeline
2022-02-02 17:35:48 -08:00
Philip Hyunsu Cho
b4340abf56
Add special handling for multi:softmax in sklearn predict (#7607)
* Add special handling for multi:softmax in sklearn predict

* Add test coverage
2022-01-29 15:54:49 -08:00
Jiaming Yuan
24789429fd
Support latest pandas Index type. (#7595) 2022-01-26 18:20:10 +08:00
Jiaming Yuan
f84291c1e1
Fix max_cat_to_onehot doc annotation [skip ci] (#7592) 2022-01-23 16:33:23 +08:00
Jiaming Yuan
ef4dae4c0e
[dask] Add scheduler address to dask config. (#7581)
- Add user configuration.
- Bring back to the logic of using scheduler address from dask.  This was removed when we were trying to support GKE, now we bring it back and let xgboost try it if direct guess or host IP from user config failed.
2022-01-22 01:56:32 +08:00
Jiaming Yuan
b4ec1682c6
Update document for multi output and categorical. (#7574)
* Group together categorical related parameters.
* Update documents about multioutput and categorical.
2022-01-19 04:35:17 +08:00
Jiaming Yuan
dac9eb13bd
Implement new save_raw in Python. (#7572)
* Expose the new C API function to Python.
* Remove old document and helper script.
* Small optimization to the `save_raw` and Json ctors.
2022-01-19 02:27:51 +08:00
Jiaming Yuan
13b0fa4b97
Implement get_group. (#7564) 2022-01-16 02:07:42 +08:00
Jiaming Yuan
52277cc3da
Rename build info function to be consistent with rest of the API. (#7553) 2022-01-14 00:39:28 +08:00
Jiaming Yuan
001503186c
Rewrite approx (#7214)
This PR rewrites the approx tree method to use codebase from hist for better performance and code sharing.

The rewrite has many benefits:
- Support for both `max_leaves` and `max_depth`.
- Support for `grow_policy`.
- Support for mono constraint.
- Support for feature weights.
- Support for easier bin configuration (`max_bin`).
- Support for categorical data.
- Faster performance for most of the datasets. (many times faster)
- Support for prediction cache.
- Significantly better performance for external memory.
- Unites the code base between approx and hist.
2022-01-10 21:15:05 +08:00
Jiaming Yuan
54582f641a
[doc] Use cross references in sphinx doc. (#7522)
* Use cross references instead of URL.
* Fix auto doc for callback.
2022-01-05 03:21:25 +08:00
Jiaming Yuan
eb1efb54b5
Define feature_names_in_. (#7526)
* Define `feature_names_in_`.
* Raise attribute error if it's not defined.
2022-01-05 01:35:34 +08:00
Jiaming Yuan
8f0a42a266
Initial support for multi-label classification. (#7521)
* Add support in sklearn classifier.
2022-01-04 23:58:21 +08:00
Jiaming Yuan
58a6723eb1
Initial support for multioutput regression. (#7514)
* Add num target model parameter, which is configured from input labels.
* Change elementwise metric and indexing for weights.
* Add demo.
* Add tests.
2021-12-18 09:28:38 +08:00
Jiaming Yuan
6f8a4633b7
Fix Python typehint with upgraded mypy. (#7513) 2021-12-16 23:08:08 +08:00
Jiaming Yuan
70b12d898a
[dask] Fix ddqdm with empty partition. (#7510)
* Fix empty partition.

* war.
2021-12-16 20:37:29 +08:00
Jiaming Yuan
05497a9141
[dask] Fix asyncio. (#7508) 2021-12-13 01:48:25 +08:00
Jiaming Yuan
021f8bf28b
Fix pylint. (#7498) 2021-12-07 13:23:30 +08:00
Jiaming Yuan
b124a27f57
Support scipy sparse in dask. (#7457) 2021-11-23 16:45:36 +08:00
Harvey
0552ca8021
Fix typo (#7469) 2021-11-23 08:58:45 +08:00
Jiaming Yuan
d33854af1b
[Breaking] Accept multi-dim meta info. (#7405)
This PR changes base_margin into a 3-dim array, with one of them being reserved for multi-target classification. Also, a breaking change is made for binary serialization due to extra dimension along with a fix for saving the feature weights. Lastly, it unifies the prediction initialization between CPU and GPU. After this PR, the meta info setter in Python will be based on array interface.
2021-11-18 23:02:54 +08:00
Jiaming Yuan
e27f543deb
Set use_logger in tracker to false. (#7438) 2021-11-16 05:12:42 +08:00
Kian Meng Ang
d27a11ff87
Fix typos in python package (#7432) 2021-11-14 17:20:19 +08:00
Jiaming Yuan
46726ec176
Expose build info (#7399) 2021-11-12 18:22:46 +08:00
Jiaming Yuan
97d7582457
Delay breaking changes to 1.6. (#7420)
The patch is too big to be backported.
2021-11-12 16:46:03 +08:00
Jiaming Yuan
c74df31bf9
Cleanup the train function. (#7377)
* Move attribute setter to callback.
* Remove the internal train function.
* Remove unnecessary initialization.
2021-11-02 18:00:26 +08:00
Jiaming Yuan
154b15060e
Move callbacks from fit to __init__. (#7375) 2021-11-02 17:51:42 +08:00
Jiaming Yuan
a13321148a
Support multi-class with base margin. (#7381)
This is already partially supported but never properly tested. So the only possible way to use it is calling `numpy.ndarray.flatten` with `base_margin` before passing it into XGBoost. This PR adds proper support
for most of the data types along with tests.
2021-11-02 13:38:00 +08:00
Jiaming Yuan
c6769488b3
Typehint for subset of core API. (#7348) 2021-10-28 20:47:04 +08:00
Jiaming Yuan
45aef75cca
Move skl eval_metric and early_stopping rounds to model params. (#6751)
A new parameter `custom_metric` is added to `train` and `cv` to distinguish the behaviour from the old `feval`.  And `feval` is deprecated.  The new `custom_metric` receives transformed prediction when the built-in objective is used.  This enables XGBoost to use cost functions from other libraries like scikit-learn directly without going through the definition of the link function.

`eval_metric` and `early_stopping_rounds` in sklearn interface are moved from `fit` to `__init__` and is now saved as part of the scikit-learn model.  The old ones in `fit` function are now deprecated. The new `eval_metric` in `__init__` has the same new behaviour as `custom_metric`.

Added more detailed documents for the behaviour of custom objective and metric.
2021-10-28 17:20:20 +08:00
Jiaming Yuan
6b074add66
Update setup.py. (#7360)
* Add new classifiers.
* Typehint.
2021-10-28 14:58:31 +08:00
Jiaming Yuan
3c4aa9b2ea
[breaking] Remove label encoder deprecated in 1.3. (#7357) 2021-10-28 13:24:29 +08:00
Jiaming Yuan
ac9bfaa4f2
Handle missing values in dataframe with category dtype. (#7331)
* Replace -1 in pandas initializer.
* Unify `IsValid` functor.
* Mimic pandas data handling in cuDF glue code.
* Check invalid categories.
* Fix DDM sketching.
2021-10-28 03:33:54 +08:00
Jiaming Yuan
f999897615
[dask] Use nthread in DMatrix construction. (#7337)
This is consistent with the thread overriding behavior.
2021-10-20 15:16:40 +08:00
Jiaming Yuan
376b448015
[doc] Fix broken links. (#7341)
* Fix most of the link checks from sphinx.
* Remove duplicate explicit target name.
2021-10-20 14:45:30 +08:00
Jiaming Yuan
f53da412aa
Add typehint to tracker. (#7338) 2021-10-20 12:49:36 +08:00
Jiaming Yuan
c42e3fbcf3
[doc] Fix early stopping document. (#7334) 2021-10-18 11:21:16 -07:00
Jiaming Yuan
f56e2e9a66
Support categorical data with pandas Dataframe in inplace prediction (#7322) 2021-10-17 14:32:06 +08:00
Jiaming Yuan
5b17bb0031
Fix prediction with cat data in sklearn interface. (#7306)
* Specify DMatrix parameter for pre-processing dataframe.
* Add document about the behaviour of prediction.
2021-10-12 14:31:12 +08:00
Jiaming Yuan
69d3b1b8b4
Remove old callback deprecated in 1.3. (#7280) 2021-10-08 17:24:59 +08:00
Jiaming Yuan
578de9f762
Fix cv verbose_eval (#7291) 2021-10-08 12:28:38 +08:00
Jiaming Yuan
f7caac2563
Bump version to 1.6.0 in master. (#7259) 2021-10-07 16:09:26 +08:00
Jiaming Yuan
ca17f8a5fc
Dispatch thrust versions and upgrade rmm. (#7254)
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2021-09-25 03:43:23 +08:00
Jiaming Yuan
9472be7d77
Fix initialization from pandas series. (#7243) 2021-09-23 04:43:25 +08:00
Jiaming Yuan
e48e05e6e2
Add typehint to rabit module. (#7240) 2021-09-17 18:31:02 +08:00
Jiaming Yuan
c735c17f33
Disable callback and ES on random forest. (#7236) 2021-09-17 18:21:17 +08:00
Jiaming Yuan
b18f5f61b0
Fix pylint (#7241) 2021-09-17 11:50:36 +08:00
Jiaming Yuan
22d56cebf1
Encode pandas categorical data automatically. (#7231) 2021-09-17 11:09:55 +08:00
Jiaming Yuan
0ed979b096
Support more input types for categorical data. (#7220)
* Support more input types for categorical data.

* Shorten the type name from "categorical" to "c".
* Tests for np/cp array and scipy csr/csc/coo.
* Specify the type for feature info.
2021-09-16 20:39:30 +08:00