xgboost

Author	SHA1	Message	Date
Jiaming Yuan	556a83022d	Implement unified update prediction cache for (gpu_)hist. (#6860 ) * Implement utilites for linalg. * Unify the update prediction cache functions. * Implement update prediction cache for multi-class gpu hist.	2021-04-17 00:29:34 +08:00
Jiaming Yuan	1b26a2a561	Copy output data for argsort. (#6866 ) Fix GPU AUC.	2021-04-16 21:05:01 +08:00
Jiaming Yuan	f294c4e023	Use constexpr in `dh::CopyIf`. (#6828 )	2021-04-08 07:37:47 +08:00
Jiaming Yuan	7bcc8b3e5c	Use batched copy if. (#6826 )	2021-04-06 10:34:04 +08:00
Jiaming Yuan	7e06c81894	Fix approximated predict contribution. (#6811 )	2021-04-03 02:15:03 +08:00
Jiaming Yuan	b1fdb220f4	Remove deprecated `n_gpus` parameter. (#6821 )	2021-04-02 03:02:32 +08:00
Jiaming Yuan	905fdd3e08	Fix typos in AUC. (#6795 )	2021-03-31 16:35:42 +08:00
Jiaming Yuan	3039dd194b	Don't estimate sketch batch size when rmm is used. (#6807 )	2021-03-31 15:29:56 +08:00
Jiaming Yuan	138fe8516a	Remove unnecessary calls to iota. (#6797 )	2021-03-31 15:27:23 +08:00
Jiaming Yuan	79b8b560d2	Optimize dart inplace predict perf. (#6804 )	2021-03-31 15:20:54 +08:00
Jiaming Yuan	a59c7323b4	Fix inplace predict missing value. (#6787 )	2021-03-27 05:36:10 +08:00
ShvetsKS	8825670c9c	Memory consumption fix for row-major adapters (#6779 ) Co-authored-by: Kirill Shvets <kirill.shvets@intel.com> Co-authored-by: fis <jm.yuan@outlook.com>	2021-03-26 08:44:30 +08:00
Jiaming Yuan	a7083d3c13	Fix dart inplace prediction with GPU input. (#6777 ) * Fix dart inplace predict with data on GPU, which might trigger a fatal check for device access right. * Avoid copying data whenever possible.	2021-03-25 12:00:32 +08:00
Jiaming Yuan	1d90577800	Verify strictly positive labels for gamma regression. (#6778 ) Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2021-03-25 11:46:52 +08:00
Jiaming Yuan	794fd6a46b	Support v3 cuda array interface. (#6776 )	2021-03-25 09:58:09 +08:00
Jiaming Yuan	bcc0277338	Re-implement ROC-AUC. (#6747 ) * Re-implement ROC-AUC. * Binary * MultiClass * LTR * Add documents. This PR resolves a few issues: - Define a value when the dataset is invalid, which can happen if there's an empty dataset, or when the dataset contains only positive or negative values. - Define ROC-AUC for multi-class classification. - Define weighted average value for distributed setting. - A correct implementation for learning to rank task. Previous implementation is just binary classification with averaging across groups, which doesn't measure ordered learning to rank.	2021-03-20 16:52:40 +08:00
Jiaming Yuan	4ee8340e79	Support column major array. (#6765 )	2021-03-20 05:19:46 +08:00
Jiaming Yuan	f6fe15d11f	Improve parameter validation (#6769 ) * Add quotes to unused parameters. * Check for whitespace.	2021-03-20 01:56:55 +08:00
Jiaming Yuan	23b4165a6b	Fix gamma deviance (#6761 )	2021-03-20 01:56:17 +08:00
Philip Hyunsu Cho	4230dcb614	Re-introduce double buffer in UpdatePosition, to fix perf regression in gpu_hist (#6757 ) * Revert "gpu_hist performance tweaks (#5707)" This reverts commit f779980f7ea7f6f07e86229b8e78144e8a74e6b3. * Address reviewer's comment * Fix build error	2021-03-18 13:56:10 -07:00
Jiaming Yuan	4f75f514ce	Fix GPU RF (#6755 ) * Fix sampling.	2021-03-17 06:23:35 +08:00
Jiaming Yuan	1a73a28511	Add device argsort. (#6749 ) This is part of https://github.com/dmlc/xgboost/pull/6747 .	2021-03-16 16:05:22 +08:00
Igor Rukhovich	19a2c54265	Prediction by indices (subsample < 1) (#6683 ) * Another implementation of predicting by indices * Fixed omp parallel_for variable type * Removed SparsePageView from Updater	2021-03-16 15:08:20 +13:00
Philip Hyunsu Cho	366f3cb9d8	Add use_rmm flag to global configuration (#6656 ) * Ensure RMM is 0.18 or later * Add use_rmm flag to global configuration * Modify XGBCachingDeviceAllocatorImpl to skip CUB when use_rmm=True * Update the demo * [CI] Pin NumPy to 1.19.4, since NumPy 1.19.5 doesn't work with latest Shap	2021-03-09 14:53:05 -08:00
Jiaming Yuan	f20074e826	Check for invalid data. (#6742 )	2021-03-04 14:37:20 +08:00
Jiaming Yuan	9da2287ab8	[breaking] Save booster feature info in JSON, remove feature name generation. (#6605 ) * Save feature info in booster in JSON model. * [breaking] Remove automatic feature name generation in `DMatrix`. This PR is to enable reliable feature validation in Python package.	2021-02-25 18:54:16 +08:00
Louis Desreumaux	9b530e5697	Improve OpenMP exception handling (#6680 )	2021-02-25 13:56:16 +08:00
ShvetsKS	9f15b9e322	Optimize CPU prediction (#6696 ) Co-authored-by: Shvets Kirill <kirill.shvets@intel.com>	2021-02-16 14:41:22 +08:00
ShvetsKS	9a0399e898	Removed unnecessary PredictBatch calls (#6700 ) Co-authored-by: Shvets Kirill <kirill.shvets@intel.com>	2021-02-10 20:15:14 +08:00
Jiaming Yuan	e8c5c53e2f	Use `Predictor` for `dart`. (#6693 ) * Use normal predictor for dart booster. * Implement `inplace_predict` for dart. * Enable `dart` for dask interface now that it's thread-safe. * categorical data should be working out of box for dart now. The implementation is not very efficient as it has to pull back the data and apply weight for each tree, but still a significant improvement over previous implementation as now we no longer binary search for each sample. * Fix output prediction shape on dataframe.	2021-02-09 23:30:19 +08:00
Jiaming Yuan	5d48d40d9a	Fix DMatrix slice with feature types. (#6689 )	2021-02-09 08:13:51 +08:00
Jiaming Yuan	4656b09d5d	[breaking] Add prediction fucntion for DMatrix and use inplace predict for dask. (#6668 ) * Add a new API function for predicting on `DMatrix`. This function aligns with rest of the `XGBoosterPredictFrom` functions on semantic of function arguments. Purge `ntree_limit` from libxgboost, use iteration instead. * [dask] Use `inplace_predict` by default for dask sklearn models. * [dask] Run prediction shape inference on worker instead of client. The breaking change is in the Python sklearn `apply` function, I made it to be consistent with other prediction functions where `best_iteration` is used by default.	2021-02-08 18:26:32 +08:00
Jiaming Yuan	dbb5208a0a	Use __array_interface__ for creating DMatrix from CSR. (#6675 ) * Use __array_interface__ for creating DMatrix from CSR. * Add configuration.	2021-02-05 21:09:47 +08:00
Jiaming Yuan	1e949110da	Use generic dispatching routine for array interface. (#6672 )	2021-02-05 09:23:38 +08:00
Jiaming Yuan	411592a347	Enhance inplace prediction. (#6653 ) * Accept array interface for csr and array. * Accept an optional proxy dmatrix for metainfo. This constructs an explicit `_ProxyDMatrix` type in Python. * Remove unused doc. * Add strict output.	2021-02-02 11:41:46 +08:00
Jiaming Yuan	a9ec0ea6da	Align device id in predict transform with predictor. (#6662 )	2021-02-02 08:33:29 +08:00
Jiaming Yuan	c3c8e66fc9	Make prediction functions thread safe. (#6648 )	2021-01-28 23:29:43 +08:00
Philip Hyunsu Cho	0f2ed21a9d	[Breaking] Change default evaluation metric for binary:logitraw objective to logloss (#6647 )	2021-01-29 00:12:12 +09:00
Jiaming Yuan	1b70a323a7	Improve string view to reduce string allocation. (#6644 )	2021-01-27 19:08:52 +08:00
Jiaming Yuan	bc08e0c9d1	Remove `experimental_json_serialization` from tests. (#6640 )	2021-01-27 17:44:49 +08:00
Jiaming Yuan	d132933550	Remove type check for solaris. (#6610 )	2021-01-16 02:58:19 +08:00
ShvetsKS	7f4d3a91b9	Multiclass prediction caching for CPU Hist (#6550 ) Co-authored-by: Kirill Shvets <kirill.shvets@intel.com>	2021-01-13 04:42:07 +08:00
Jiaming Yuan	f2f7dd87b8	Use view for `SparsePage` exclusively. (#6590 )	2021-01-11 18:04:55 +08:00
Jiaming Yuan	80065d571e	[dask] Add DaskXGBRanker (#6576 ) * Initial support for distributed LTR using dask. * Support `qid` in libxgboost. * Refactor `predict` and `n_features_in_`, `best_[score/iteration/ntree_limit]` to avoid duplicated code. * Define `DaskXGBRanker`. The dask ranker doesn't support group structure, instead it uses query id and convert to group ptr internally.	2021-01-08 18:35:09 +08:00
Gorkem Ozkaya	2231940d1d	Clip small positive values in gamma-nloglik (#6537 ) For the `gamma-nloglik` eval metric, small positive values in the labels are causing `NaN`'s in the outputs, as reported here: https://github.com/dmlc/xgboost/issues/5349. This will add clipping on them, similar to what is done in other metrics like `poisson-nloglik` and `logloss`.	2020-12-22 03:11:40 +08:00
Jiaming Yuan	ca3da55de4	Support early stopping with training continuation, correct num boosted rounds. (#6506 ) * Implement early stopping with training continuation. * Add new C API for obtaining boosted rounds. * Fix off by 1 in `save_best`. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2020-12-17 19:59:19 +08:00
Philip Hyunsu Cho	ad1a527709	Enable loading model from <1.0.0 trained with objective='binary:logitraw' (#6517 ) * Enable loading model from <1.0.0 trained with objective='binary:logitraw' * Add binary:logitraw in model compatibility testing suite * Feedback from @trivialfis: Override ProbToMargin() for LogisticRaw Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>	2020-12-16 16:53:46 -08:00
Philip Hyunsu Cho	bf6cfe3b99	[Breaking] Upgrade cuDF and RMM to 0.18 nightlies; require RMM 0.18+ for RMM plugin (#6510 ) * [CI] Upgrade cuDF and RMM to 0.18 nightlies * Modify RMM plugin to be compatible with RMM 0.18 * Update src/common/device_helpers.cuh Co-authored-by: Mark Harris <mharris@nvidia.com> Co-authored-by: Mark Harris <mharris@nvidia.com>	2020-12-16 10:07:52 -08:00
Jiaming Yuan	c5876277a8	Drop saving binary format for memory snapshot. (#6513 )	2020-12-17 00:14:57 +08:00
Jiaming Yuan	347f593169	Accept numpy array for DMatrix slice index. (#6368 )	2020-12-16 14:42:52 +08:00

... 4 5 6 7 8 ...

1327 Commits