290 Commits

Author SHA1 Message Date
Jiaming Yuan
d87f69215e
Quantile DMatrix for CPU. (#8130)
- Add a new `QuantileDMatrix` that works for both CPU and GPU.
- Deprecate `DeviceQuantileDMatrix`.
2022-08-02 15:51:23 +08:00
Jiaming Yuan
ff1c559084
Remove unused variable. (#8046) 2022-07-05 01:59:22 +08:00
Jiaming Yuan
6b55150e80
Fix pylint errors. (#7967) 2022-06-02 18:04:46 +08:00
Jiaming Yuan
d314680a15
Verify shared object version at load. (#7928) 2022-05-23 20:53:30 +08:00
Jiaming Yuan
f93a727869
Address remaining mypy errors in python package. (#7914) 2022-05-18 22:46:15 +08:00
Chengyang
806c92c80b
Add Type Hints for Python Package (#7742)
Co-authored-by: Chengyang Gu <bridgream@gmail.com>
Co-authored-by: Jiamingy <jm.yuan@outlook.com>
2022-05-17 22:14:09 +08:00
Jiaming Yuan
c8f9d4b6e6
Show libxgboost.so path in build info. (#7893) 2022-05-13 18:08:56 +08:00
Jiaming Yuan
db80671d6b
Fix monotone constraint with tuple input. (#7891) 2022-05-13 04:00:03 +08:00
Jiaming Yuan
f0f76259c9
Remove STRING_TYPES. (#7827) 2022-04-22 19:07:51 +08:00
Jiaming Yuan
c70fa502a5
Expose feature_types to sklearn interface. (#7821) 2022-04-21 20:23:35 +08:00
Jiaming Yuan
bcce17e688
Remove text loading in basic walk through demo. (#7753) 2022-04-01 00:59:42 +08:00
Jiaming Yuan
a50b84244e
Cleanup configuration for constraints. (#7758) 2022-03-29 04:22:46 +08:00
Jiaming Yuan
3c9b04460a
Move num_parallel_tree to model parameter. (#7751)
The size of forest should be a property of model itself instead of a training
hyper-parameter.
2022-03-29 02:32:42 +08:00
Jiaming Yuan
b3ba0e8708
Check cupy lazily. (#7752) 2022-03-26 06:09:58 +08:00
Chengyang
c92ab2ce49
Add type hints to core.py (#7707)
Co-authored-by: Chengyang Gu <bridgream@gmail.com>
Co-authored-by: jiamingy <jm.yuan@outlook.com>
2022-03-23 21:12:14 +08:00
Jiaming Yuan
a62a3d991d
[dask] prediction with categorical data. (#7708) 2022-03-10 00:21:48 +08:00
Pradipta Ghosh
68b6d6bbe2
Fix for Feature shape mismatch error (#7715) 2022-03-03 21:36:29 +08:00
Cheng Li
a92e0f6240
multi groups in the constraints (#7711) 2022-03-01 18:10:15 +08:00
Jiaming Yuan
83a66b4994
Support categorical data for hist. (#7695)
* Extract partitioner from hist.
* Implement categorical data support by passing the gradient index directly into the partitioner.
* Organize/update document.
* Remove code for negative hessian.
2022-02-25 03:47:14 +08:00
Jiaming Yuan
f08c5dcb06
Cleanup some pylint errors. (#7667)
* Cleanup some pylint errors.

* Cleanup pylint errors in rabit modules.
* Make data iter an abstract class and cleanup private access.
* Cleanup no-self-use for booster.
2022-02-19 18:53:12 +08:00
Jiaming Yuan
0d0abe1845
Support optimal partitioning for GPU hist. (#7652)
* Implement `MaxCategory` in quantile.
* Implement partition-based split for GPU evaluation.  Currently, it's based on the existing evaluation function.
* Extract an evaluator from GPU Hist to store the needed states.
* Added some CUDA stream/event utilities.
* Update document with references.
* Fixed a bug in approx evaluator where the number of data points is less than the number of categories.
2022-02-15 03:03:12 +08:00
Jiaming Yuan
fe4ce920b2
[dask] Cleanup dask module. (#7634)
* Add a new utility for mapping function onto workers.
* Unify the type for feature names.
* Clean up the iterator.
* Fix prediction with DaskDMatrix worker specification.
* Fix base margin with DeviceQuantileDMatrix.
* Support vs 2022 in setup.py.
2022-02-08 20:41:46 +08:00
Jiaming Yuan
dac9eb13bd
Implement new save_raw in Python. (#7572)
* Expose the new C API function to Python.
* Remove old document and helper script.
* Small optimization to the `save_raw` and Json ctors.
2022-01-19 02:27:51 +08:00
Jiaming Yuan
13b0fa4b97
Implement get_group. (#7564) 2022-01-16 02:07:42 +08:00
Jiaming Yuan
52277cc3da
Rename build info function to be consistent with rest of the API. (#7553) 2022-01-14 00:39:28 +08:00
Jiaming Yuan
54582f641a
[doc] Use cross references in sphinx doc. (#7522)
* Use cross references instead of URL.
* Fix auto doc for callback.
2022-01-05 03:21:25 +08:00
Jiaming Yuan
70b12d898a
[dask] Fix ddqdm with empty partition. (#7510)
* Fix empty partition.

* war.
2021-12-16 20:37:29 +08:00
Jiaming Yuan
021f8bf28b
Fix pylint. (#7498) 2021-12-07 13:23:30 +08:00
Kian Meng Ang
d27a11ff87
Fix typos in python package (#7432) 2021-11-14 17:20:19 +08:00
Jiaming Yuan
46726ec176
Expose build info (#7399) 2021-11-12 18:22:46 +08:00
Jiaming Yuan
a13321148a
Support multi-class with base margin. (#7381)
This is already partially supported but never properly tested. So the only possible way to use it is calling `numpy.ndarray.flatten` with `base_margin` before passing it into XGBoost. This PR adds proper support
for most of the data types along with tests.
2021-11-02 13:38:00 +08:00
Jiaming Yuan
c6769488b3
Typehint for subset of core API. (#7348) 2021-10-28 20:47:04 +08:00
Jiaming Yuan
45aef75cca
Move skl eval_metric and early_stopping rounds to model params. (#6751)
A new parameter `custom_metric` is added to `train` and `cv` to distinguish the behaviour from the old `feval`.  And `feval` is deprecated.  The new `custom_metric` receives transformed prediction when the built-in objective is used.  This enables XGBoost to use cost functions from other libraries like scikit-learn directly without going through the definition of the link function.

`eval_metric` and `early_stopping_rounds` in sklearn interface are moved from `fit` to `__init__` and is now saved as part of the scikit-learn model.  The old ones in `fit` function are now deprecated. The new `eval_metric` in `__init__` has the same new behaviour as `custom_metric`.

Added more detailed documents for the behaviour of custom objective and metric.
2021-10-28 17:20:20 +08:00
Jiaming Yuan
ac9bfaa4f2
Handle missing values in dataframe with category dtype. (#7331)
* Replace -1 in pandas initializer.
* Unify `IsValid` functor.
* Mimic pandas data handling in cuDF glue code.
* Check invalid categories.
* Fix DDM sketching.
2021-10-28 03:33:54 +08:00
Jiaming Yuan
376b448015
[doc] Fix broken links. (#7341)
* Fix most of the link checks from sphinx.
* Remove duplicate explicit target name.
2021-10-20 14:45:30 +08:00
Jiaming Yuan
f56e2e9a66
Support categorical data with pandas Dataframe in inplace prediction (#7322) 2021-10-17 14:32:06 +08:00
Jiaming Yuan
69d3b1b8b4
Remove old callback deprecated in 1.3. (#7280) 2021-10-08 17:24:59 +08:00
Jiaming Yuan
e48e05e6e2
Add typehint to rabit module. (#7240) 2021-09-17 18:31:02 +08:00
Jiaming Yuan
b18f5f61b0
Fix pylint (#7241) 2021-09-17 11:50:36 +08:00
Jiaming Yuan
0ed979b096
Support more input types for categorical data. (#7220)
* Support more input types for categorical data.

* Shorten the type name from "categorical" to "c".
* Tests for np/cp array and scipy csr/csc/coo.
* Specify the type for feature info.
2021-09-16 20:39:30 +08:00
Jiaming Yuan
ee8d1f5ed8
Fix histogram truncation. (#7181)
* Fix truncation.

* Lint.

* lint.
2021-08-24 18:34:32 -07:00
Jiaming Yuan
8a84be37b8
Pass scikit learn estimator checks for regressor. (#7130)
* Check data shape.
* Check labels.
2021-08-03 18:58:20 +08:00
Jiaming Yuan
778135f657
Fix parameter loading with training continuation. (#7121)
* Add a demo for training continuation.
2021-07-23 10:51:47 +08:00
Jiaming Yuan
e6088366df
Export Python Interface for external memory. (#7070)
* Add Python iterator interface.
* Add tests.
* Add demo.
* Add documents.
* Handle empty dataset.
2021-07-22 15:15:53 +08:00
Jiaming Yuan
5d7cdf2e36
[Breaking] Rename Quantile DMatrix C API. (#7082)
The role of ProxyDMatrix is going beyond what it was designed.  Now it's used by both
QuantileDeviceDMatrix and inplace prediction.  After the refactoring of sparse DMatrix it
will also be used for external memory.  Renaming the C API to extract it from
QuantileDeviceDMatrix.
2021-07-08 11:34:14 +08:00
Jiaming Yuan
a5d222fcdb
Handle categorical split in model histogram and dataframe. (#7065)
* Error on get_split_value_histogram when feature is categorical
* Add a category column to output dataframe
2021-07-02 13:10:36 +08:00
Jiaming Yuan
663136aa08
Implement feature score for linear model. (#7048)
* Add feature score support for linear model.
* Port R interface to the new implementation.
* Add linear model support in Python.

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2021-06-25 14:34:02 +08:00
Jiaming Yuan
da1ad798ca
Convert numpy float to Python float in feat score. (#7047) 2021-06-21 20:58:43 +08:00
Jiaming Yuan
86715e4cd4
Support categorical data for dask functional interface and DQM. (#7043)
* Support categorical data for dask functional interface and DQM.

* Implement categorical data support for GPU GK-merge.
* Add support for dask functional interface.
* Add support for DQM.

* Get newer cupy.
2021-06-18 13:06:52 +08:00
Jiaming Yuan
7dd29ffd47
Implement feature score in GBTree. (#7041)
* Categorical data support.
* Eliminate text parsing during feature score computation.
2021-06-18 11:53:16 +08:00