384 Commits

Author SHA1 Message Date
Jiaming Yuan
77f6cf2d13
Support hessian in host sketch container. (#7081)
Prepare for migrating approx onto hist's codebase.
2021-07-08 16:33:58 +08:00
Jiaming Yuan
5d7cdf2e36
[Breaking] Rename Quantile DMatrix C API. (#7082)
The role of ProxyDMatrix is going beyond what it was designed.  Now it's used by both
QuantileDeviceDMatrix and inplace prediction.  After the refactoring of sparse DMatrix it
will also be used for external memory.  Renaming the C API to extract it from
QuantileDeviceDMatrix.
2021-07-08 11:34:14 +08:00
Jiaming Yuan
1cd20efe68
Move GHistIndex into DMatrix. (#7064) 2021-07-01 00:44:49 +08:00
Jiaming Yuan
8fa32fdda2
Implement categorical data support for SHAP. (#7053)
* Add CPU implementation.
* Update GPUTreeSHAP.
* Add GPU implementation by defining custom split condition.
2021-06-25 19:02:46 +08:00
Jiaming Yuan
663136aa08
Implement feature score for linear model. (#7048)
* Add feature score support for linear model.
* Port R interface to the new implementation.
* Add linear model support in Python.

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2021-06-25 14:34:02 +08:00
Jiaming Yuan
bbfffb444d
Fix race condition in CPU shap. (#7050) 2021-06-21 10:03:15 +08:00
Jiaming Yuan
29f8fd6fee
Support categorical split in tree model dump. (#7036) 2021-06-18 16:46:20 +08:00
Jiaming Yuan
7dd29ffd47
Implement feature score in GBTree. (#7041)
* Categorical data support.
* Eliminate text parsing during feature score computation.
2021-06-18 11:53:16 +08:00
Jiaming Yuan
f79cc4a7a4
Implement categorical prediction for CPU and GPU predict leaf. (#7001)
* Categorical prediction with CPU predictor and GPU predict leaf.

* Implement categorical prediction for CPU prediction.
* Implement categorical prediction for GPU predict leaf.
* Refactor the prediction functions to have a unified get next node function.

Co-authored-by: Shvets Kirill <kirill.shvets@intel.com>
2021-06-11 10:11:45 +08:00
Jiaming Yuan
ee4f51a631
Support for all primitive types from array. (#7003)
* Change C API name.
* Test for all primitive types from array.
* Add native support for CPU 128 float.
* Convert boolean and float16 in Python.

* Fix dask version for now.
2021-06-01 08:34:48 +08:00
Jiaming Yuan
4cf95a6041
Support numpy array interface (#6998) 2021-05-27 16:08:22 +08:00
Andrew Ziem
3e7e426b36
Fix spelling in documents (#6948)
* Update roxygen2 doc.

Co-authored-by: fis <jm.yuan@outlook.com>
2021-05-11 20:44:36 +08:00
Jiaming Yuan
146549260a
Bump version to 1.5.0 snapshot in master. (#6875) 2021-04-22 01:53:44 +08:00
Jiaming Yuan
556a83022d
Implement unified update prediction cache for (gpu_)hist. (#6860)
* Implement utilites for linalg.
* Unify the update prediction cache functions.
* Implement update prediction cache for multi-class gpu hist.
2021-04-17 00:29:34 +08:00
Jiaming Yuan
a5d7094a45
Update documents. (#6856)
* Add early stopping section to prediction doc.
* Remove best_ntree_limit.
* Better doxygen output.
2021-04-16 12:41:03 +08:00
Jiaming Yuan
7e06c81894
Fix approximated predict contribution. (#6811) 2021-04-03 02:15:03 +08:00
Jiaming Yuan
b1fdb220f4
Remove deprecated n_gpus parameter. (#6821) 2021-04-02 03:02:32 +08:00
Jiaming Yuan
1a73a28511
Add device argsort. (#6749)
This is part of https://github.com/dmlc/xgboost/pull/6747 .
2021-03-16 16:05:22 +08:00
Philip Hyunsu Cho
366f3cb9d8
Add use_rmm flag to global configuration (#6656)
* Ensure RMM is 0.18 or later

* Add use_rmm flag to global configuration

* Modify XGBCachingDeviceAllocatorImpl to skip CUB when use_rmm=True

* Update the demo

* [CI] Pin NumPy to 1.19.4, since NumPy 1.19.5 doesn't work with latest Shap
2021-03-09 14:53:05 -08:00
Jiaming Yuan
9da2287ab8
[breaking] Save booster feature info in JSON, remove feature name generation. (#6605)
* Save feature info in booster in JSON model.
* [breaking] Remove automatic feature name generation in `DMatrix`.

This PR is to enable reliable feature validation in Python package.
2021-02-25 18:54:16 +08:00
Louis Desreumaux
9b530e5697
Improve OpenMP exception handling (#6680) 2021-02-25 13:56:16 +08:00
Jiaming Yuan
e8c5c53e2f
Use Predictor for dart. (#6693)
* Use normal predictor for dart booster.
* Implement `inplace_predict` for dart.
* Enable `dart` for dask interface now that it's thread-safe.
* categorical data should be working out of box for dart now.

The implementation is not very efficient as it has to pull back the data and
apply weight for each tree, but still a significant improvement over previous
implementation as now we no longer binary search for each sample.

* Fix output prediction shape on dataframe.
2021-02-09 23:30:19 +08:00
Jiaming Yuan
218a5fb6dd
Simplify Span checks. (#6685)
* Stop printing out message.
* Remove R specialization.

The printed message is not really useful anyway, without a reproducible example
there's no way to fix it.  But if there's a reproducible example, we can always
obtain these information by a debugger.  Removing the `printf` function avoids
creating the context in kernel.
2021-02-09 08:12:58 +08:00
Jiaming Yuan
4656b09d5d
[breaking] Add prediction fucntion for DMatrix and use inplace predict for dask. (#6668)
* Add a new API function for predicting on `DMatrix`.  This function aligns
with rest of the `XGBoosterPredictFrom*` functions on semantic of function
arguments.
* Purge `ntree_limit` from libxgboost, use iteration instead.
* [dask] Use `inplace_predict` by default for dask sklearn models.
* [dask] Run prediction shape inference on worker instead of client.

The breaking change is in the Python sklearn `apply` function, I made it to be
consistent with other prediction functions where `best_iteration` is used by
default.
2021-02-08 18:26:32 +08:00
Jiaming Yuan
dbb5208a0a
Use __array_interface__ for creating DMatrix from CSR. (#6675)
* Use __array_interface__ for creating DMatrix from CSR.
* Add configuration.
2021-02-05 21:09:47 +08:00
Jiaming Yuan
411592a347
Enhance inplace prediction. (#6653)
* Accept array interface for csr and array.
* Accept an optional proxy dmatrix for metainfo.

This constructs an explicit `_ProxyDMatrix` type in Python.

* Remove unused doc.
* Add strict output.
2021-02-02 11:41:46 +08:00
Jiaming Yuan
c3c8e66fc9
Make prediction functions thread safe. (#6648) 2021-01-28 23:29:43 +08:00
Jiaming Yuan
1b70a323a7
Improve string view to reduce string allocation. (#6644) 2021-01-27 19:08:52 +08:00
ShvetsKS
7f4d3a91b9
Multiclass prediction caching for CPU Hist (#6550)
Co-authored-by: Kirill Shvets <kirill.shvets@intel.com>
2021-01-13 04:42:07 +08:00
Jiaming Yuan
f2f7dd87b8
Use view for SparsePage exclusively. (#6590) 2021-01-11 18:04:55 +08:00
Jiaming Yuan
80065d571e
[dask] Add DaskXGBRanker (#6576)
* Initial support for distributed LTR using dask.

* Support `qid` in libxgboost.
* Refactor `predict` and `n_features_in_`, `best_[score/iteration/ntree_limit]`
  to avoid duplicated code.
* Define `DaskXGBRanker`.

The dask ranker doesn't support group structure, instead it uses query id and
convert to group ptr internally.
2021-01-08 18:35:09 +08:00
Jiaming Yuan
8747885a8b
Support Solaris. (#6578)
* Add system header.

* Remove use of TR1 on Solaris

Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2021-01-07 09:05:05 +08:00
TP Boudreau
b2246ae7ef
Update dmlc-core submodule and conform to new API (#6431)
* Update dmlc-core submodule and conform to new API

* Remove unsupported parameter from method signature

* Update dmlc-core submodule and conform to new API

* Update dmlc-core

Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2021-01-05 16:12:22 -08:00
Jiaming Yuan
ca3da55de4
Support early stopping with training continuation, correct num boosted rounds. (#6506)
* Implement early stopping with training continuation.

* Add new C API for obtaining boosted rounds.

* Fix off by 1 in `save_best`.

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2020-12-17 19:59:19 +08:00
Jiaming Yuan
c5876277a8
Drop saving binary format for memory snapshot. (#6513) 2020-12-17 00:14:57 +08:00
hzy001
749364f25d
Update the C API comments (#6457)
Signed-off-by: Hao Ziyu <haoziyu@qiyi.com>

Co-authored-by: Hao Ziyu <haoziyu@qiyi.com>
2020-12-16 14:56:13 +08:00
Philip Hyunsu Cho
0d483cb7c1
Bump version to 1.4.0 snapshot in master (#6486) 2020-12-10 07:38:08 -08:00
Jiaming Yuan
703c2d06aa
Fix global config default value. (#6470) 2020-12-06 06:15:33 +08:00
Philip Hyunsu Cho
fb56da5e8b
Add global configuration (#6414)
* Add management functions for global configuration: XGBSetGlobalConfig(), XGBGetGlobalConfig().
* Add Python interface: set_config(), get_config(), and config_context().
* Add unit tests for Python
* Add R interface: xgb.set.config(), xgb.get.config()
* Add unit tests for R

Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2020-12-03 00:05:18 -08:00
Honza Sterba
b0036b339b
Optionaly fail when gpu_id is set to invalid value (#6342) 2020-11-28 15:14:12 +08:00
Jiaming Yuan
8a17610666
Implement GPU predict leaf. (#6187) 2020-11-11 17:33:47 +08:00
Jiaming Yuan
43efadea2e
Deterministic data partitioning for external memory (#6317)
* Make external memory data partitioning deterministic.

* Change the meaning of `page_size` from bytes to number of rows.

* Design a data pool.

* Note for external memory.

* Enable unity build on Windows CI.

* Force garbage collect on test.
2020-11-11 06:11:06 +08:00
Jiaming Yuan
519cee115a
Avoid resetting seed for every configuration. (#6349) 2020-11-06 10:28:35 +08:00
Jiaming Yuan
2cc9662005
Support slicing tree model (#6302)
This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone.

* Implement the save_best option in early stopping.

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2020-11-02 23:27:39 -08:00
Igor Moura
5e1e972aea
Clean up warnings (#6325) 2020-10-30 23:50:29 +08:00
Sergio Gavilán
b181a88f9f
Reduced some C++ compiler warnings (#6197)
* Removed some warnings

* Rebase with master

* Solved C++ Google Tests errors made by refactoring in order to remove warnings

* Undo renaming path -> path_

* Fix style check

Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-10-29 12:36:00 -07:00
vcarpani
671971e12e
Compiler warnings (#6286)
* Fix warnings for json.h

* Fix warnings for metric.h

* Fix warnings for updater_quantile_hist.cc.

* Fix warnings for updater_histmaker.cc.

Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-10-28 13:46:15 -07:00
Jiaming Yuan
3310e208fd
Fix inplace prediction interval. (#6259)
* Add back the interval in call.
* Make the interval non-optional.
2020-10-28 13:13:59 +08:00
Jiaming Yuan
bcfab4d726
Revert "Disable JSON full serialization for now. (#6248)" (#6266)
This reverts commit 6d293020fbfa2c67b532d550fe5d55689662caac.
2020-10-27 03:30:47 +08:00
Igor Moura
d1254808d5
Clean up C++ warnings (#6213) 2020-10-19 23:02:33 +08:00