1402 Commits

Author SHA1 Message Date
Jiaming Yuan
9da2287ab8
[breaking] Save booster feature info in JSON, remove feature name generation. (#6605)
* Save feature info in booster in JSON model.
* [breaking] Remove automatic feature name generation in `DMatrix`.

This PR is to enable reliable feature validation in Python package.
2021-02-25 18:54:16 +08:00
Louis Desreumaux
9b530e5697
Improve OpenMP exception handling (#6680) 2021-02-25 13:56:16 +08:00
ShvetsKS
9f15b9e322
Optimize CPU prediction (#6696)
Co-authored-by: Shvets Kirill <kirill.shvets@intel.com>
2021-02-16 14:41:22 +08:00
ShvetsKS
9a0399e898
Removed unnecessary PredictBatch calls (#6700)
Co-authored-by: Shvets Kirill <kirill.shvets@intel.com>
2021-02-10 20:15:14 +08:00
Jiaming Yuan
e8c5c53e2f
Use Predictor for dart. (#6693)
* Use normal predictor for dart booster.
* Implement `inplace_predict` for dart.
* Enable `dart` for dask interface now that it's thread-safe.
* categorical data should be working out of box for dart now.

The implementation is not very efficient as it has to pull back the data and
apply weight for each tree, but still a significant improvement over previous
implementation as now we no longer binary search for each sample.

* Fix output prediction shape on dataframe.
2021-02-09 23:30:19 +08:00
Jiaming Yuan
5d48d40d9a
Fix DMatrix slice with feature types. (#6689) 2021-02-09 08:13:51 +08:00
Jiaming Yuan
4656b09d5d
[breaking] Add prediction fucntion for DMatrix and use inplace predict for dask. (#6668)
* Add a new API function for predicting on `DMatrix`.  This function aligns
with rest of the `XGBoosterPredictFrom*` functions on semantic of function
arguments.
* Purge `ntree_limit` from libxgboost, use iteration instead.
* [dask] Use `inplace_predict` by default for dask sklearn models.
* [dask] Run prediction shape inference on worker instead of client.

The breaking change is in the Python sklearn `apply` function, I made it to be
consistent with other prediction functions where `best_iteration` is used by
default.
2021-02-08 18:26:32 +08:00
Jiaming Yuan
dbb5208a0a
Use __array_interface__ for creating DMatrix from CSR. (#6675)
* Use __array_interface__ for creating DMatrix from CSR.
* Add configuration.
2021-02-05 21:09:47 +08:00
Jiaming Yuan
1e949110da
Use generic dispatching routine for array interface. (#6672) 2021-02-05 09:23:38 +08:00
Jiaming Yuan
411592a347
Enhance inplace prediction. (#6653)
* Accept array interface for csr and array.
* Accept an optional proxy dmatrix for metainfo.

This constructs an explicit `_ProxyDMatrix` type in Python.

* Remove unused doc.
* Add strict output.
2021-02-02 11:41:46 +08:00
Jiaming Yuan
a9ec0ea6da
Align device id in predict transform with predictor. (#6662) 2021-02-02 08:33:29 +08:00
Jiaming Yuan
c3c8e66fc9
Make prediction functions thread safe. (#6648) 2021-01-28 23:29:43 +08:00
Philip Hyunsu Cho
0f2ed21a9d
[Breaking] Change default evaluation metric for binary:logitraw objective to logloss (#6647) 2021-01-29 00:12:12 +09:00
Jiaming Yuan
1b70a323a7
Improve string view to reduce string allocation. (#6644) 2021-01-27 19:08:52 +08:00
Jiaming Yuan
bc08e0c9d1
Remove experimental_json_serialization from tests. (#6640) 2021-01-27 17:44:49 +08:00
Jiaming Yuan
d132933550
Remove type check for solaris. (#6610) 2021-01-16 02:58:19 +08:00
ShvetsKS
7f4d3a91b9
Multiclass prediction caching for CPU Hist (#6550)
Co-authored-by: Kirill Shvets <kirill.shvets@intel.com>
2021-01-13 04:42:07 +08:00
Jiaming Yuan
f2f7dd87b8
Use view for SparsePage exclusively. (#6590) 2021-01-11 18:04:55 +08:00
Jiaming Yuan
80065d571e
[dask] Add DaskXGBRanker (#6576)
* Initial support for distributed LTR using dask.

* Support `qid` in libxgboost.
* Refactor `predict` and `n_features_in_`, `best_[score/iteration/ntree_limit]`
  to avoid duplicated code.
* Define `DaskXGBRanker`.

The dask ranker doesn't support group structure, instead it uses query id and
convert to group ptr internally.
2021-01-08 18:35:09 +08:00
Gorkem Ozkaya
2231940d1d
Clip small positive values in gamma-nloglik (#6537)
For the `gamma-nloglik` eval metric, small positive values in the labels are causing `NaN`'s in the outputs, as reported here: https://github.com/dmlc/xgboost/issues/5349. This will add clipping on them, similar to what is done in other metrics like `poisson-nloglik` and `logloss`.
2020-12-22 03:11:40 +08:00
Jiaming Yuan
ca3da55de4
Support early stopping with training continuation, correct num boosted rounds. (#6506)
* Implement early stopping with training continuation.

* Add new C API for obtaining boosted rounds.

* Fix off by 1 in `save_best`.

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2020-12-17 19:59:19 +08:00
Philip Hyunsu Cho
ad1a527709
Enable loading model from <1.0.0 trained with objective='binary:logitraw' (#6517)
* Enable loading model from <1.0.0 trained with objective='binary:logitraw'

* Add binary:logitraw in model compatibility testing suite

* Feedback from @trivialfis: Override ProbToMargin() for LogisticRaw

Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2020-12-16 16:53:46 -08:00
Philip Hyunsu Cho
bf6cfe3b99
[Breaking] Upgrade cuDF and RMM to 0.18 nightlies; require RMM 0.18+ for RMM plugin (#6510)
* [CI] Upgrade cuDF and RMM to 0.18 nightlies

* Modify RMM plugin to be compatible with RMM 0.18

* Update src/common/device_helpers.cuh

Co-authored-by: Mark Harris <mharris@nvidia.com>

Co-authored-by: Mark Harris <mharris@nvidia.com>
2020-12-16 10:07:52 -08:00
Jiaming Yuan
c5876277a8
Drop saving binary format for memory snapshot. (#6513) 2020-12-17 00:14:57 +08:00
Jiaming Yuan
347f593169
Accept numpy array for DMatrix slice index. (#6368) 2020-12-16 14:42:52 +08:00
Jiaming Yuan
886486a519
Support categorical data in GPU weighted sketching. (#6508) 2020-12-16 14:23:28 +08:00
Igor Rukhovich
5c8ccf4455
Improved InitSampling function speed by 2.12 times (#6410)
* Improved InitSampling function speed by 2.12 times

* Added explicit conversion
2020-12-15 20:59:24 -08:00
Philip Hyunsu Cho
c31e3efa7c
Pass correct split_type to GPU predictor (#6491)
* Pass correct split_type to GPU predictor

* Add a test
2020-12-11 19:30:00 -08:00
Philip Hyunsu Cho
fb56da5e8b
Add global configuration (#6414)
* Add management functions for global configuration: XGBSetGlobalConfig(), XGBGetGlobalConfig().
* Add Python interface: set_config(), get_config(), and config_context().
* Add unit tests for Python
* Add R interface: xgb.set.config(), xgb.get.config()
* Add unit tests for R

Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2020-12-03 00:05:18 -08:00
Jiaming Yuan
f4ff1c53fd
Fix CLI ranking demo. (#6439)
Save model at final round.
2020-11-29 03:12:06 +08:00
Honza Sterba
b0036b339b
Optionaly fail when gpu_id is set to invalid value (#6342) 2020-11-28 15:14:12 +08:00
ShvetsKS
956beead70
Thread local memory allocation for BuildHist (#6358)
* thread mem locality

* fix apply

* cleanup

* fix lint

* fix tests

* simple try

* fix

* fix

* apply comments

* fix comments

* fix

* apply simple comment

Co-authored-by: ShvetsKS <kirill.shvets@intel.com>
2020-11-25 17:50:12 +03:00
Jiaming Yuan
42d31d9dcb
Fix MPI build. (#6403) 2020-11-21 13:38:21 +08:00
Jiaming Yuan
44a9d69efb
Small cleanup to evaluator. (#6400) 2020-11-20 09:33:51 +08:00
ShvetsKS
512b464cfa
Disable HT for DMatrix creation (#6386)
Co-authored-by: SHVETS, KIRILL <kirill.shvets@intel.com>
2020-11-14 22:18:33 +08:00
Philip Hyunsu Cho
e5193c21a1
[dask] Allow empty data matrix in AFT survival (#6379)
* [dask] Allow empty data matrix in AFT survival

* Add unit test
2020-11-12 17:49:58 -08:00
Jiaming Yuan
d711d648cb
Fix label errors in graph visualization (#6369) 2020-11-11 17:44:59 -08:00
Jiaming Yuan
8a17610666
Implement GPU predict leaf. (#6187) 2020-11-11 17:33:47 +08:00
Jiaming Yuan
43efadea2e
Deterministic data partitioning for external memory (#6317)
* Make external memory data partitioning deterministic.

* Change the meaning of `page_size` from bytes to number of rows.

* Design a data pool.

* Note for external memory.

* Enable unity build on Windows CI.

* Force garbage collect on test.
2020-11-11 06:11:06 +08:00
ShvetsKS
d411f98d26
simple fix for static shedule in predict (#6357)
Co-authored-by: ShvetsKS <kirill.shvets@intel.com>
2020-11-09 17:01:30 +08:00
Jiaming Yuan
519cee115a
Avoid resetting seed for every configuration. (#6349) 2020-11-06 10:28:35 +08:00
Jack Dunn
51e6531315
Fix missing space in warning message (#6340) 2020-11-04 06:03:16 -05:00
Jiaming Yuan
2cc9662005
Support slicing tree model (#6302)
This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone.

* Implement the save_best option in early stopping.

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2020-11-02 23:27:39 -08:00
Rory Mitchell
29745c6df2
Fix inclusive scan for large sizes (#6234) 2020-11-03 17:01:43 +13:00
Igor Moura
5e1e972aea
Clean up warnings (#6325) 2020-10-30 23:50:29 +08:00
Sergio Gavilán
b181a88f9f
Reduced some C++ compiler warnings (#6197)
* Removed some warnings

* Rebase with master

* Solved C++ Google Tests errors made by refactoring in order to remove warnings

* Undo renaming path -> path_

* Fix style check

Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-10-29 12:36:00 -07:00
vcarpani
671971e12e
Compiler warnings (#6286)
* Fix warnings for json.h

* Fix warnings for metric.h

* Fix warnings for updater_quantile_hist.cc.

* Fix warnings for updater_histmaker.cc.

Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-10-28 13:46:15 -07:00
Jiaming Yuan
c4da967b5c
Support unity build. (#6295)
* Support unity build.

* Setup on Windows Jenkins.

* Revert "Setup on Windows Jenkins."

This reverts commit 8345cb8d2b009eec8ae9fa6f16412a7c9b6ec12c.
2020-10-28 11:49:28 -07:00
Jiaming Yuan
3310e208fd
Fix inplace prediction interval. (#6259)
* Add back the interval in call.
* Make the interval non-optional.
2020-10-28 13:13:59 +08:00
Jiaming Yuan
cc76724762
Reduce warning. (#6273) 2020-10-27 12:24:19 -07:00