290 Commits

Author SHA1 Message Date
Jiaming Yuan
d9799b09d0
Categorical data support for cuDF. (#7042)
* Add support in DMatrix.
* Add support in DQM, except for iterator.
2021-06-17 13:54:33 +08:00
Jiaming Yuan
b56614e9b8
[R] Use new predict function. (#6819)
* Call new C prediction API.
* Add `strict_shape`.
* Add `iterationrange`.
* Update document.
2021-06-11 13:03:29 +08:00
Jiaming Yuan
ee4f51a631
Support for all primitive types from array. (#7003)
* Change C API name.
* Test for all primitive types from array.
* Add native support for CPU 128 float.
* Convert boolean and float16 in Python.

* Fix dask version for now.
2021-06-01 08:34:48 +08:00
vslaykovsky
2a9979e256
Fixed incorrect feature mismatch error message (#6949)
data.shape[0] denotes the number of samples, data.shape[1] is the number of features
2021-05-11 13:52:11 +08:00
Jiaming Yuan
37ad60fe25
Enforce input data is not object. (#6927)
* Check for object data type.
* Allow strided arrays with greater underlying buffer size.
2021-05-02 00:09:01 +08:00
Jiaming Yuan
ef473b1f09
Disable pylint error. (#6911) 2021-04-29 01:01:37 +08:00
Jiaming Yuan
a2ecbdaa31
Add an API guard to prevent global variables being changed. (#6891) 2021-04-23 10:27:57 +08:00
Jiaming Yuan
a5d7094a45
Update documents. (#6856)
* Add early stopping section to prediction doc.
* Remove best_ntree_limit.
* Better doxygen output.
2021-04-16 12:41:03 +08:00
Jiaming Yuan
dee5ef2dfd
Typehint for Sklearn. (#6799) 2021-04-14 06:55:21 +08:00
giladmaya
aa0d8f20c1
Support configuring constraints by feature names (#6783)
Co-authored-by: fis <jm.yuan@outlook.com>
2021-04-04 06:53:33 +08:00
Jiaming Yuan
7e06c81894
Fix approximated predict contribution. (#6811) 2021-04-03 02:15:03 +08:00
Jiaming Yuan
0cced530ea
[doc] Clarify prediction function. (#6813) 2021-04-03 02:12:04 +08:00
Jiaming Yuan
a5c852660b
Update document for sklearn model IO. (#6809)
* Update the use of JSON.
* Remove unnecessary type cast.
2021-04-01 15:52:36 +08:00
Jiaming Yuan
9da2287ab8
[breaking] Save booster feature info in JSON, remove feature name generation. (#6605)
* Save feature info in booster in JSON model.
* [breaking] Remove automatic feature name generation in `DMatrix`.

This PR is to enable reliable feature validation in Python package.
2021-02-25 18:54:16 +08:00
Jiaming Yuan
c375173dca
Support pylint 2.7.0 (#6726) 2021-02-25 12:49:58 +08:00
Jiaming Yuan
bdedaab8d1
Fix pylint. (#6714) 2021-02-19 11:53:27 +08:00
Roffild
4c5d2608e0
[python-package] Fix class Booster: feature_types = None (#6705) 2021-02-13 17:50:23 +08:00
Jiaming Yuan
4656b09d5d
[breaking] Add prediction fucntion for DMatrix and use inplace predict for dask. (#6668)
* Add a new API function for predicting on `DMatrix`.  This function aligns
with rest of the `XGBoosterPredictFrom*` functions on semantic of function
arguments.
* Purge `ntree_limit` from libxgboost, use iteration instead.
* [dask] Use `inplace_predict` by default for dask sklearn models.
* [dask] Run prediction shape inference on worker instead of client.

The breaking change is in the Python sklearn `apply` function, I made it to be
consistent with other prediction functions where `best_iteration` is used by
default.
2021-02-08 18:26:32 +08:00
Jiaming Yuan
411592a347
Enhance inplace prediction. (#6653)
* Accept array interface for csr and array.
* Accept an optional proxy dmatrix for metainfo.

This constructs an explicit `_ProxyDMatrix` type in Python.

* Remove unused doc.
* Add strict output.
2021-02-02 11:41:46 +08:00
Jiaming Yuan
4bf23c2391
Specify shape in prediction contrib and interaction. (#6614) 2021-01-26 02:08:22 +08:00
Jiaming Yuan
8942c98054
Define metainfo and other parameters for all DMatrix interfaces. (#6601)
This PR ensures all DMatrix types have a common interface.

* Fix logic in avoiding duplicated DMatrix in sklearn.
* Check for consistency between DMatrix types.
* Add doc for bounds.
2021-01-25 16:06:06 +08:00
Jiaming Yuan
d356b7a071
Restore unknown data support. (#6595) 2021-01-14 04:51:16 +08:00
Jiaming Yuan
0027220aa0
[breaking] Remove duplicated predict functions, Fix attributes IO. (#6593)
* Fix attributes not being restored.
* Rename all `data` to `X`. [breaking]
2021-01-13 16:56:49 +08:00
Jiaming Yuan
80065d571e
[dask] Add DaskXGBRanker (#6576)
* Initial support for distributed LTR using dask.

* Support `qid` in libxgboost.
* Refactor `predict` and `n_features_in_`, `best_[score/iteration/ntree_limit]`
  to avoid duplicated code.
* Define `DaskXGBRanker`.

The dask ranker doesn't support group structure, instead it uses query id and
convert to group ptr internally.
2021-01-08 18:35:09 +08:00
Jiaming Yuan
de8fd852a5
[dask] Add type hints. (#6519)
* Add validate_features.
* Show type hints in doc.

Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-12-29 19:41:02 +08:00
Philip Hyunsu Cho
fbb980d9d3
Expand ~ into the home directory on Linux and MacOS (#6531) 2020-12-19 23:35:13 -08:00
Jiaming Yuan
ca3da55de4
Support early stopping with training continuation, correct num boosted rounds. (#6506)
* Implement early stopping with training continuation.

* Add new C API for obtaining boosted rounds.

* Fix off by 1 in `save_best`.

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2020-12-17 19:59:19 +08:00
Jiaming Yuan
0e97d97d50
Fix merge conflict. (#6512) 2020-12-16 18:02:25 +08:00
Jiaming Yuan
347f593169
Accept numpy array for DMatrix slice index. (#6368) 2020-12-16 14:42:52 +08:00
Jiaming Yuan
ef4a0e0aac
Fix DMatrix feature names/types IO. (#6507)
* Fix feature names/types IO

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2020-12-16 14:24:27 +08:00
Jiaming Yuan
3c3f026ec1
Move metric configuration into booster. (#6504) 2020-12-16 05:35:04 +08:00
Jiaming Yuan
47b86180f6
Don't validate feature when number of rows is 0. (#6472) 2020-12-07 18:08:51 +08:00
Nikhil Choudhary
ae1662028a
Fixed few grammatical mistakes in doc (#6393) 2020-11-15 13:48:08 +08:00
Jiaming Yuan
fcfeb4959c
Deprecate positional arguments. (#6365)
Deprecate positional arguments in following functions:

- `__init__` for all classes in sklearn module.
- `fit` method for all classes in sklearn module.
- dask interface.
- `set_info` for `DMatrix` class.

Refactor the evaluation matrices handling.
2020-11-13 11:10:30 +08:00
Jiaming Yuan
2cc9662005
Support slicing tree model (#6302)
This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone.

* Implement the save_best option in early stopping.

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2020-11-02 23:27:39 -08:00
Rory Mitchell
29745c6df2
Fix inclusive scan for large sizes (#6234) 2020-11-03 17:01:43 +13:00
Jiaming Yuan
bed7ae4083
Loop over thrust::reduce. (#6229)
* Check input chunk size of dqdm.
* Add doc for current limitation.
2020-10-14 10:40:56 +13:00
Jiaming Yuan
2443275891
Cleanup Python code. (#6223)
* Remove pathlike as XGBoost 1.2 requires Python 3.6.
* Move conditional import of dask/distributed into dask module.
2020-10-12 15:44:41 +08:00
Jiaming Yuan
7622b8cdb8
Enable categorical data support on Python DMatrix. (#6166)
* Only pandas is recognized.
2020-09-29 11:22:56 +08:00
Jiaming Yuan
33d80ffad0
[dask] Support more meta data on functional interface. (#6132)
* Add base_margin, label_(lower|upper)_bound.
* Test survival training with dask.
2020-09-21 16:56:37 +08:00
Jiaming Yuan
a144daf034
Limit tree depth for GPU hist. (#6045) 2020-08-22 19:34:52 +08:00
ShvetsKS
24f2e6c97e
Optimize DMatrix build time. (#5877)
Co-authored-by: SHVETS, KIRILL <kirill.shvets@intel.com>
2020-08-20 01:37:03 +08:00
Qi Zhang
989ddd036f
Swap byte-order in binary serializer to support big-endian arch (#5813)
* fixed some endian issues

* Use dmlc::ByteSwap() to simplify code

* Fix lint check

* [CI] Add test for s390x

* Download latest CMake on s390x

* Fix a bug in my code

* Save magic number in dmatrix with byteswap on big-endian machine

* Save version in binary with byteswap on big-endian machine

* Load scalar with byteswap in MetaInfo

* Add a debugging message

* Handle arrays correctly when byteswapping

* EOF can also be 255

* Handle magic number in MetaInfo carefully

* Skip Tree.Load test for big-endian, since the test manually builds little-endian binary model

* Handle missing packages in Python tests

* Don't use boto3 in model compatibility tests

* Add s390 Docker file for local testing

* Add model compatibility tests

* Add R compatibility test

* Revert "Add R compatibility test"

This reverts commit c2d2bdcb7dbae133cbb927fcd20f7e83ee2b18a8.

Co-authored-by: Qi Zhang <q.zhang@ibm.com>
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-08-18 14:47:17 -07:00
Jiaming Yuan
4d99c58a5f
Feature weights (#5962) 2020-08-18 19:55:41 +08:00
Jiaming Yuan
18349a7ccf
[Breaking] Fix custom metric for multi output. (#5954)
* Set output margin to true for custom metric.  This fixes only R and Python.
2020-07-29 19:25:27 +08:00
Jiaming Yuan
029a8b533f
Simplify the data backends. (#5893) 2020-07-16 15:17:31 +08:00
Jiaming Yuan
d0a29c3135
Remove print. (#5867) 2020-07-08 04:12:14 +08:00
Jiaming Yuan
a3ec964346
Accept iterator in device dmatrix. (#5783)
* Remove Device DMatrix.
2020-07-07 21:44:48 +08:00
Jiaming Yuan
93c44a9a64
Move feature names and types of DMatrix from Python to C++. (#5858)
* Add thread local return entry for DMatrix.
* Save feature name and feature type in binary file.

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2020-07-07 09:40:13 +08:00
Jiaming Yuan
8104f10328
Update document for model dump. (#5818)
* Clarify the relationship between dump and save.
* Mention the schema.
2020-06-22 14:33:54 +08:00