xgboost

Author	SHA1	Message	Date
Jiaming Yuan	bbfffb444d	Fix race condition in CPU shap. (#7050 )	2021-06-21 10:03:15 +08:00
Jiaming Yuan	29f8fd6fee	Support categorical split in tree model dump. (#7036 )	2021-06-18 16:46:20 +08:00
Jiaming Yuan	86715e4cd4	Support categorical data for dask functional interface and DQM. (#7043 ) * Support categorical data for dask functional interface and DQM. * Implement categorical data support for GPU GK-merge. * Add support for dask functional interface. * Add support for DQM. * Get newer cupy.	2021-06-18 13:06:52 +08:00
Jiaming Yuan	f79cc4a7a4	Implement categorical prediction for CPU and GPU predict leaf. (#7001 ) * Categorical prediction with CPU predictor and GPU predict leaf. * Implement categorical prediction for CPU prediction. * Implement categorical prediction for GPU predict leaf. * Refactor the prediction functions to have a unified get next node function. Co-authored-by: Shvets Kirill <kirill.shvets@intel.com>	2021-06-11 10:11:45 +08:00
Jiaming Yuan	a59c7323b4	Fix inplace predict missing value. (#6787 )	2021-03-27 05:36:10 +08:00
Louis Desreumaux	9b530e5697	Improve OpenMP exception handling (#6680 )	2021-02-25 13:56:16 +08:00
ShvetsKS	9f15b9e322	Optimize CPU prediction (#6696 ) Co-authored-by: Shvets Kirill <kirill.shvets@intel.com>	2021-02-16 14:41:22 +08:00
Jiaming Yuan	e8c5c53e2f	Use `Predictor` for `dart`. (#6693 ) * Use normal predictor for dart booster. * Implement `inplace_predict` for dart. * Enable `dart` for dask interface now that it's thread-safe. * categorical data should be working out of box for dart now. The implementation is not very efficient as it has to pull back the data and apply weight for each tree, but still a significant improvement over previous implementation as now we no longer binary search for each sample. * Fix output prediction shape on dataframe.	2021-02-09 23:30:19 +08:00
Jiaming Yuan	4656b09d5d	[breaking] Add prediction fucntion for DMatrix and use inplace predict for dask. (#6668 ) * Add a new API function for predicting on `DMatrix`. This function aligns with rest of the `XGBoosterPredictFrom` functions on semantic of function arguments. Purge `ntree_limit` from libxgboost, use iteration instead. * [dask] Use `inplace_predict` by default for dask sklearn models. * [dask] Run prediction shape inference on worker instead of client. The breaking change is in the Python sklearn `apply` function, I made it to be consistent with other prediction functions where `best_iteration` is used by default.	2021-02-08 18:26:32 +08:00
Jiaming Yuan	411592a347	Enhance inplace prediction. (#6653 ) * Accept array interface for csr and array. * Accept an optional proxy dmatrix for metainfo. This constructs an explicit `_ProxyDMatrix` type in Python. * Remove unused doc. * Add strict output.	2021-02-02 11:41:46 +08:00
Jiaming Yuan	c3c8e66fc9	Make prediction functions thread safe. (#6648 )	2021-01-28 23:29:43 +08:00
Jiaming Yuan	f2f7dd87b8	Use view for `SparsePage` exclusively. (#6590 )	2021-01-11 18:04:55 +08:00
Philip Hyunsu Cho	c31e3efa7c	Pass correct split_type to GPU predictor (#6491 ) * Pass correct split_type to GPU predictor * Add a test	2020-12-11 19:30:00 -08:00
Honza Sterba	b0036b339b	Optionaly fail when gpu_id is set to invalid value (#6342 )	2020-11-28 15:14:12 +08:00
Jiaming Yuan	8a17610666	Implement GPU predict leaf. (#6187 )	2020-11-11 17:33:47 +08:00
ShvetsKS	d411f98d26	simple fix for static shedule in predict (#6357 ) Co-authored-by: ShvetsKS <kirill.shvets@intel.com>	2020-11-09 17:01:30 +08:00
Igor Moura	5e1e972aea	Clean up warnings (#6325 )	2020-10-30 23:50:29 +08:00
Jiaming Yuan	c4da967b5c	Support unity build. (#6295 ) * Support unity build. * Setup on Windows Jenkins. * Revert "Setup on Windows Jenkins." This reverts commit 8345cb8d2b009eec8ae9fa6f16412a7c9b6ec12c.	2020-10-28 11:49:28 -07:00
Rory Mitchell	f0c3ff313f	Update GPUTreeShap, add docs (#6281 ) * Update GPUTreeShap, add docs * Fix test Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2020-10-27 18:22:12 +13:00
Igor Moura	d1254808d5	Clean up C++ warnings (#6213 )	2020-10-19 23:02:33 +08:00
ShvetsKS	a4ce0eae43	CPU predict performance improvement (#6127 ) Co-authored-by: ShvetsKS <kirill.shvets@intel.com>	2020-10-08 15:50:21 +03:00
Jiaming Yuan	798af22ff4	Add categorical data support to GPU predictor. (#6165 )	2020-09-29 11:25:34 +08:00
Jiaming Yuan	52c0b3f100	Fix error message. (#6176 )	2020-09-29 11:18:25 +08:00
Rory Mitchell	dda9e1e487	Update GPUTreeshap (#6163 ) * Reduce shap test duration * Test interoperability with shap package * Add feature interactions * Update GPUTreeShap	2020-09-28 09:43:47 +13:00
Jiaming Yuan	c6f2b8c841	Upgrade gputreeshap. (#6099 ) * Upgrade gputreeshap. Co-authored-by: Rory Mitchell <r.a.mitchell.nz@gmail.com>	2020-09-15 12:57:22 +12:00
Rory Mitchell	2e907abdb8	Updates to GPUTreeShap (#6087 ) * Extract paths on device * Update GPUTreeShap	2020-09-06 13:39:08 +12:00
Jiaming Yuan	80c8547147	Make binary bin search reusable. (#6058 ) * Move binary search row to hist util. * Remove dead code.	2020-08-26 05:05:11 +08:00
Rory Mitchell	9a4e8b1d81	GPUTreeShap (#6038 )	2020-08-25 12:47:41 +12:00
Philip Hyunsu Cho	9adb812a0a	RMM integration plugin (#5873 ) * [CI] Add RMM as an optional dependency * Replace caching allocator with pool allocator from RMM * Revert "Replace caching allocator with pool allocator from RMM" This reverts commit e15845d4e72e890c2babe31a988b26503a7d9038. * Use rmm::mr::get_default_resource() * Try setting default resource (doesn't work yet) * Allocate pool_mr in the heap * Prevent leaking pool_mr handle * Separate EXPECT_DEATH() in separate test suite suffixed DeathTest * Turn off death tests for RMM * Address reviewer's feedback * Prevent leaking of cuda_mr * Fix Jenkinsfile syntax * Remove unnecessary function in Jenkinsfile * [CI] Install NCCL into RMM container * Run Python tests * Try building with RMM, CUDA 10.0 * Do not use RMM for CUDA 10.0 target * Actually test for test_rmm flag * Fix TestPythonGPU * Use CNMeM allocator, since pool allocator doesn't yet support multiGPU * Use 10.0 container to build RMM-enabled XGBoost * Revert "Use 10.0 container to build RMM-enabled XGBoost" This reverts commit 789021fa31112e25b683aef39fff375403060141. * Fix Jenkinsfile * [CI] Assign larger /dev/shm to NCCL * Use 10.2 artifact to run multi-GPU Python tests * Add CUDA 10.0 -> 11.0 cross-version test; remove CUDA 10.0 target * Rename Conda env rmm_test -> gpu_test * Use env var to opt into CNMeM pool for C++ tests * Use identical CUDA version for RMM builds and tests * Use Pytest fixtures to enable RMM pool in Python tests * Move RMM to plugin/CMakeLists.txt; use PLUGIN_RMM * Use per-device MR; use command arg in gtest * Set CMake prefix path to use Conda env * Use 0.15 nightly version of RMM * Remove unnecessary header * Fix a unit test when cudf is missing * Add RMM demos * Remove print() * Use HostDeviceVector in GPU predictor * Simplify pytest setup; use LocalCUDACluster fixture * Address reviewers' commments Co-authored-by: Hyunsu Cho <chohyu01@cs.wasshington.edu>	2020-08-12 01:26:02 -07:00
Jiaming Yuan	ee70a2380b	Unify CPU hist sketching (#5880 )	2020-08-12 01:33:06 +08:00
boxdot	d268a2a463	Thread-safe prediction by making the prediction cache thread-local. (#5853 ) Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>	2020-07-30 12:33:50 +08:00
Philip Hyunsu Cho	1d22a9be1c	Revert "Reorder includes. (#5749 )" (#5771 ) This reverts commit `d3a0efbf16`.	2020-06-09 10:29:28 -07:00
Jiaming Yuan	cacff9232a	Remove column major specialization. (#5755 ) Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2020-06-05 16:19:14 +08:00
Jiaming Yuan	d3a0efbf16	Reorder includes. (#5749 ) * Reorder includes. * R.	2020-06-03 17:30:47 +12:00
Jiaming Yuan	6671b42dd4	Use ellpack for prediction only when sparsepage doesn't exist. (#5504 )	2020-04-10 12:15:46 +08:00
Jiaming Yuan	0012f2ef93	Upgrade clang-tidy on CI. (#5469 ) * Correct all clang-tidy errors. * Upgrade clang-tidy to 10 on CI. Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2020-04-05 04:42:29 +08:00
Jiaming Yuan	6601a641d7	Thread safe, inplace prediction. (#5389 ) Normal prediction with DMatrix is now thread safe with locks. Added inplace prediction is lock free thread safe. When data is on device (cupy, cudf), the returned data is also on device. * Implementation for numpy, csr, cudf and cupy. * Implementation for dask. * Remove sync in simple dmatrix.	2020-03-30 15:35:28 +08:00
Rory Mitchell	3ad4333b0e	Partial rewrite EllpackPage (#5352 )	2020-03-11 10:15:53 +13:00
Jiaming Yuan	655cf17b60	Predict on Ellpack. (#5327 ) * Unify GPU prediction node. * Add `PageExists`. * Dispatch prediction on input data for GPU Predictor.	2020-02-23 06:27:03 +08:00
Jiaming Yuan	0110754a76	Remove update prediction cache from predictors. (#5312 ) Move this function into gbtree, and uses only updater for doing so. As now the predictor knows exactly how many trees to predict, there's no need for it to update the prediction cache.	2020-02-17 11:35:47 +08:00
Jiaming Yuan	c35cdecddd	Move prediction cache to Learner. (#5220 ) * Move prediction cache into Learner. * Clean-ups - Remove duplicated cache in Learner and GBM. - Remove ad-hoc fix of invalid cache. - Remove `PredictFromCache` in predictors. - Remove prediction cache for linear altogether, as it's only moving the prediction into training process but doesn't provide any actual overall speed gain. - The cache is now unique to Learner, which means the ownership is no longer shared by any other components. * Changes - Add version to prediction cache. - Use weak ptr to check expired DMatrix. - Pass shared pointer instead of raw pointer.	2020-02-14 13:04:23 +08:00
Kodi Arfer	f100b8d878	[Breaking] Don't drop trees during DART prediction by default (#5115 ) * Simplify DropTrees calling logic * Add `training` parameter for prediction method. * [Breaking]: Add `training` to C API. * Change for R and Python custom objective. * Correct comment. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>	2020-01-13 21:48:30 +08:00
Jiaming Yuan	ee287808fb	Lazy initialization of device vector. (#5173 ) * Lazy initialization of device vector. * Fix #5162. * Disable copy constructor of HostDeviceVector. Prevents implicit copying. * Fix CPU build. * Bring back move assignment operator.	2020-01-07 11:23:05 +08:00
Jiaming Yuan	e089e16e3d	Pass pointer to model parameters. (#5101 ) * Pass pointer to model parameters. This PR de-duplicates most of the model parameters except the one in `tree_model.h`. One difficulty is `base_score` is a model property but can be changed at runtime by objective function. Hence when performing model IO, we need to save the one provided by users, instead of the one transformed by objective. Here we created an immutable version of `LearnerModelParam` that represents the value of model parameter after configuration.	2019-12-10 12:11:22 +08:00
Jiaming Yuan	608ebbe444	Fix GPU ID and prediction cache from pickle (#5086 ) * Hack for saving GPU ID. * Declare prediction cache on GBTree. * Add a simple test. * Add `auto` option for GPU Predictor.	2019-12-07 16:02:06 +08:00
Jiaming Yuan	64af1ecf86	[Breaking] Remove num roots. (#5059 )	2019-12-05 21:58:43 +08:00
Kodi Arfer	f2277e7106	Use DART tree weights when computing SHAPs (#5050 ) This PR fixes tree weights in dart being ignored when computing contributions. * Fix ellpack page source link. * Add tree weights to compute contribution.	2019-12-03 19:55:53 +08:00
Jiaming Yuan	97abcc7ee2	Extract interaction constraint from split evaluator. (#5034 ) * Extract interaction constraints from split evaluator. The reason for doing so is mostly for model IO, where num_feature and interaction_constraints are copied in split evaluator. Also interaction constraint by itself is a feature selector, acting like column sampler and it's inefficient to bury it deep in the evaluator chain. Lastly removing one another copied parameter is a win. * Enable inc for approx tree method. As now the implementation is spited up from evaluator class, it's also enabled for approx method. * Removing obsoleted code in colmaker. They are never documented nor actually used in real world. Also there isn't a single test for those code blocks. * Unifying the types used for row and column. As the size of input dataset is marching to billion, incorrect use of int is subject to overflow, also singed integer overflow is undefined behaviour. This PR starts the procedure for unifying used index type to unsigned integers. There's optimization that can utilize this undefined behaviour, but after some testings I don't see the optimization is beneficial to XGBoost.	2019-11-14 20:11:41 +08:00
KaiJin Ji	1733c9e8f7	Improve operation efficiency for single predict (#5016 ) * Improve operation efficiency for single predict	2019-11-10 02:01:28 +08:00
Jiaming Yuan	7663de956c	Run training with empty DMatrix. (#4990 ) This makes GPU Hist robust in distributed environment as some workers might not be associated with any data in either training or evaluation. * Disable rabit mock test for now: See #5012 . * Disable dask-cudf test at prediction for now: See #5003 * Launch dask job for all workers despite they might not have any data. * Check 0 rows in elementwise evaluation metrics. Using AUC and AUC-PR still throws an error. See #4663 for a robust fix. * Add tests for edge cases. * Add `LaunchKernel` wrapper handling zero sized grid. * Move some parts of allreducer into a cu file. * Don't validate feature names when the booster is empty. * Sync number of columns in DMatrix. As num_feature is required to be the same across all workers in data split mode. * Filtering in dask interface now by default syncs all booster that's not empty, instead of using rank 0. * Fix Jenkins' GPU tests. * Install dask-cuda from source in Jenkins' test. Now all tests are actually running. * Restore GPU Hist tree synchronization test. * Check UUID of running devices. The check is only performed on CUDA version >= 10.x, as 9.x doesn't have UUID field. * Fix CMake policy and project variables. Use xgboost_SOURCE_DIR uniformly, add policy for CMake >= 3.13. * Fix copying data to CPU * Fix race condition in cpu predictor. * Fix duplicated DMatrix construction. * Don't download extra nccl in CI script.	2019-11-06 16:13:13 +08:00

1 2 3

102 Commits