Commit Graph

1192 Commits

Author SHA1 Message Date
TP Boudreau
bd2ca543c4 Fix BinarySearchBin() argument types (#7026) 2021-06-08 19:05:46 +08:00
ShvetsKS
5cdaac00c1 Remove feature grouping (#7018)
Co-authored-by: Kirill Shvets <kirill.shvets@intel.com>
2021-06-03 04:35:26 +08:00
ShvetsKS
57c732655e Merge lossgude and depthwise strategies for CPU hist (#7007)
* fix java/scala test: max depth is also valid parameter for lossguide

Co-authored-by: Kirill Shvets <kirill.shvets@intel.com>
2021-06-03 01:49:43 +08:00
Jiaming Yuan
ee4f51a631 Support for all primitive types from array. (#7003)
* Change C API name.
* Test for all primitive types from array.
* Add native support for CPU 128 float.
* Convert boolean and float16 in Python.

* Fix dask version for now.
2021-06-01 08:34:48 +08:00
Jiaming Yuan
816b789bf0 Add predictor to skl constructor. (#7000) 2021-05-29 04:52:56 +08:00
ShvetsKS
55b823b27d Reduce 'InitSampling' complexity and set gradients to zero (#6922)
Co-authored-by: Kirill Shvets <kirill.shvets@intel.com>
2021-05-29 04:52:23 +08:00
Jiaming Yuan
4cf95a6041 Support numpy array interface (#6998) 2021-05-27 16:08:22 +08:00
Jiaming Yuan
86e60e3ba8 Guard against index error in prediction. (#6982)
* Remove `best_ntree_limit` from documents.
2021-05-25 23:24:59 +08:00
Jiaming Yuan
6e52aefb37 Revert OMP guard. (#6987)
The guard protects the global variable from being changed by XGBoost.  But this leads to a
bug that the `n_threads` parameter is no longer used after the first iteration.  This is
due to the fact that `omp_set_num_threads` is only called once in `Learner::Configure` at
the beginning of the training process.

The guard is still useful for `gpu_id`, since this is called all the times in our codebase
doesn't matter which iteration we are currently running.
2021-05-25 08:56:28 +08:00
Livius
a4886c404a Fix compilation error on x86 (#6964)
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2021-05-14 13:31:49 +08:00
Jiaming Yuan
44cc9c04ea Fix multiclass auc with empty dataset. (#6947) 2021-05-12 15:01:14 +08:00
Andrew Ziem
3e7e426b36 Fix spelling in documents (#6948)
* Update roxygen2 doc.

Co-authored-by: fis <jm.yuan@outlook.com>
2021-05-11 20:44:36 +08:00
Jiaming Yuan
37ad60fe25 Enforce input data is not object. (#6927)
* Check for object data type.
* Allow strided arrays with greater underlying buffer size.
2021-05-02 00:09:01 +08:00
Jiaming Yuan
8760ec4827 Ensure predict leaf output 1-dim vector where there's only 1 tree. (#6889) 2021-04-23 15:07:48 +08:00
Jiaming Yuan
a2ecbdaa31 Add an API guard to prevent global variables being changed. (#6891) 2021-04-23 10:27:57 +08:00
Jiaming Yuan
556a83022d Implement unified update prediction cache for (gpu_)hist. (#6860)
* Implement utilites for linalg.
* Unify the update prediction cache functions.
* Implement update prediction cache for multi-class gpu hist.
2021-04-17 00:29:34 +08:00
Jiaming Yuan
1b26a2a561 Copy output data for argsort. (#6866)
Fix GPU AUC.
2021-04-16 21:05:01 +08:00
Jiaming Yuan
f294c4e023 Use constexpr in dh::CopyIf. (#6828) 2021-04-08 07:37:47 +08:00
Jiaming Yuan
7bcc8b3e5c Use batched copy if. (#6826) 2021-04-06 10:34:04 +08:00
Jiaming Yuan
7e06c81894 Fix approximated predict contribution. (#6811) 2021-04-03 02:15:03 +08:00
Jiaming Yuan
b1fdb220f4 Remove deprecated n_gpus parameter. (#6821) 2021-04-02 03:02:32 +08:00
Jiaming Yuan
905fdd3e08 Fix typos in AUC. (#6795) 2021-03-31 16:35:42 +08:00
Jiaming Yuan
3039dd194b Don't estimate sketch batch size when rmm is used. (#6807) 2021-03-31 15:29:56 +08:00
Jiaming Yuan
138fe8516a Remove unnecessary calls to iota. (#6797) 2021-03-31 15:27:23 +08:00
Jiaming Yuan
79b8b560d2 Optimize dart inplace predict perf. (#6804) 2021-03-31 15:20:54 +08:00
Jiaming Yuan
a59c7323b4 Fix inplace predict missing value. (#6787) 2021-03-27 05:36:10 +08:00
ShvetsKS
8825670c9c Memory consumption fix for row-major adapters (#6779)
Co-authored-by: Kirill Shvets <kirill.shvets@intel.com>
Co-authored-by: fis <jm.yuan@outlook.com>
2021-03-26 08:44:30 +08:00
Jiaming Yuan
a7083d3c13 Fix dart inplace prediction with GPU input. (#6777)
* Fix dart inplace predict with data on GPU, which might trigger a fatal check
for device access right.
* Avoid copying data whenever possible.
2021-03-25 12:00:32 +08:00
Jiaming Yuan
1d90577800 Verify strictly positive labels for gamma regression. (#6778)
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2021-03-25 11:46:52 +08:00
Jiaming Yuan
794fd6a46b Support v3 cuda array interface. (#6776) 2021-03-25 09:58:09 +08:00
Jiaming Yuan
bcc0277338 Re-implement ROC-AUC. (#6747)
* Re-implement ROC-AUC.

* Binary
* MultiClass
* LTR
* Add documents.

This PR resolves a few issues:
  - Define a value when the dataset is invalid, which can happen if there's an
  empty dataset, or when the dataset contains only positive or negative values.
  - Define ROC-AUC for multi-class classification.
  - Define weighted average value for distributed setting.
  - A correct implementation for learning to rank task.  Previous
  implementation is just binary classification with averaging across groups,
  which doesn't measure ordered learning to rank.
2021-03-20 16:52:40 +08:00
Jiaming Yuan
4ee8340e79 Support column major array. (#6765) 2021-03-20 05:19:46 +08:00
Jiaming Yuan
f6fe15d11f Improve parameter validation (#6769)
* Add quotes to unused parameters.
* Check for whitespace.
2021-03-20 01:56:55 +08:00
Jiaming Yuan
23b4165a6b Fix gamma deviance (#6761) 2021-03-20 01:56:17 +08:00
Philip Hyunsu Cho
4230dcb614 Re-introduce double buffer in UpdatePosition, to fix perf regression in gpu_hist (#6757)
* Revert "gpu_hist performance tweaks (#5707)"

This reverts commit f779980f7e.

* Address reviewer's comment

* Fix build error
2021-03-18 13:56:10 -07:00
Jiaming Yuan
4f75f514ce Fix GPU RF (#6755)
* Fix sampling.
2021-03-17 06:23:35 +08:00
Jiaming Yuan
1a73a28511 Add device argsort. (#6749)
This is part of https://github.com/dmlc/xgboost/pull/6747 .
2021-03-16 16:05:22 +08:00
Igor Rukhovich
19a2c54265 Prediction by indices (subsample < 1) (#6683)
* Another implementation of predicting by indices

* Fixed omp parallel_for variable type

* Removed SparsePageView from Updater
2021-03-16 15:08:20 +13:00
Philip Hyunsu Cho
366f3cb9d8 Add use_rmm flag to global configuration (#6656)
* Ensure RMM is 0.18 or later

* Add use_rmm flag to global configuration

* Modify XGBCachingDeviceAllocatorImpl to skip CUB when use_rmm=True

* Update the demo

* [CI] Pin NumPy to 1.19.4, since NumPy 1.19.5 doesn't work with latest Shap
2021-03-09 14:53:05 -08:00
Jiaming Yuan
f20074e826 Check for invalid data. (#6742) 2021-03-04 14:37:20 +08:00
Jiaming Yuan
9da2287ab8 [breaking] Save booster feature info in JSON, remove feature name generation. (#6605)
* Save feature info in booster in JSON model.
* [breaking] Remove automatic feature name generation in `DMatrix`.

This PR is to enable reliable feature validation in Python package.
2021-02-25 18:54:16 +08:00
Louis Desreumaux
9b530e5697 Improve OpenMP exception handling (#6680) 2021-02-25 13:56:16 +08:00
ShvetsKS
9f15b9e322 Optimize CPU prediction (#6696)
Co-authored-by: Shvets Kirill <kirill.shvets@intel.com>
2021-02-16 14:41:22 +08:00
ShvetsKS
9a0399e898 Removed unnecessary PredictBatch calls (#6700)
Co-authored-by: Shvets Kirill <kirill.shvets@intel.com>
2021-02-10 20:15:14 +08:00
Jiaming Yuan
e8c5c53e2f Use Predictor for dart. (#6693)
* Use normal predictor for dart booster.
* Implement `inplace_predict` for dart.
* Enable `dart` for dask interface now that it's thread-safe.
* categorical data should be working out of box for dart now.

The implementation is not very efficient as it has to pull back the data and
apply weight for each tree, but still a significant improvement over previous
implementation as now we no longer binary search for each sample.

* Fix output prediction shape on dataframe.
2021-02-09 23:30:19 +08:00
Jiaming Yuan
5d48d40d9a Fix DMatrix slice with feature types. (#6689) 2021-02-09 08:13:51 +08:00
Jiaming Yuan
4656b09d5d [breaking] Add prediction fucntion for DMatrix and use inplace predict for dask. (#6668)
* Add a new API function for predicting on `DMatrix`.  This function aligns
with rest of the `XGBoosterPredictFrom*` functions on semantic of function
arguments.
* Purge `ntree_limit` from libxgboost, use iteration instead.
* [dask] Use `inplace_predict` by default for dask sklearn models.
* [dask] Run prediction shape inference on worker instead of client.

The breaking change is in the Python sklearn `apply` function, I made it to be
consistent with other prediction functions where `best_iteration` is used by
default.
2021-02-08 18:26:32 +08:00
Jiaming Yuan
dbb5208a0a Use __array_interface__ for creating DMatrix from CSR. (#6675)
* Use __array_interface__ for creating DMatrix from CSR.
* Add configuration.
2021-02-05 21:09:47 +08:00
Jiaming Yuan
1e949110da Use generic dispatching routine for array interface. (#6672) 2021-02-05 09:23:38 +08:00
Jiaming Yuan
411592a347 Enhance inplace prediction. (#6653)
* Accept array interface for csr and array.
* Accept an optional proxy dmatrix for metainfo.

This constructs an explicit `_ProxyDMatrix` type in Python.

* Remove unused doc.
* Add strict output.
2021-02-02 11:41:46 +08:00