xgboost

Author	SHA1	Message	Date
Jiaming Yuan	a7083d3c13	Fix dart inplace prediction with GPU input. (#6777 ) * Fix dart inplace predict with data on GPU, which might trigger a fatal check for device access right. * Avoid copying data whenever possible.	2021-03-25 12:00:32 +08:00
Jiaming Yuan	bcc0277338	Re-implement ROC-AUC. (#6747 ) * Re-implement ROC-AUC. * Binary * MultiClass * LTR * Add documents. This PR resolves a few issues: - Define a value when the dataset is invalid, which can happen if there's an empty dataset, or when the dataset contains only positive or negative values. - Define ROC-AUC for multi-class classification. - Define weighted average value for distributed setting. - A correct implementation for learning to rank task. Previous implementation is just binary classification with averaging across groups, which doesn't measure ordered learning to rank.	2021-03-20 16:52:40 +08:00
Jiaming Yuan	4ee8340e79	Support column major array. (#6765 )	2021-03-20 05:19:46 +08:00
Jiaming Yuan	4f75f514ce	Fix GPU RF (#6755 ) * Fix sampling.	2021-03-17 06:23:35 +08:00
Jiaming Yuan	325bc93e16	[dask] Use `distributed.MultiLock` (#6743 ) * [dask] Use `distributed.MultiLock` This enables training multiple models in parallel. * Conditionally import `MultiLock`. * Use async train directly in scikit learn interface. * Use `worker_client` when available.	2021-03-16 14:19:41 +08:00
Jiaming Yuan	1335db6113	[dask] Improve documents. (#6687 ) * Add tag for versions. * use autoclass in sphinx build. Made some class methods to be private to avoid exporting documents.	2021-02-09 09:20:58 +08:00
Jiaming Yuan	4656b09d5d	[breaking] Add prediction fucntion for DMatrix and use inplace predict for dask. (#6668 ) * Add a new API function for predicting on `DMatrix`. This function aligns with rest of the `XGBoosterPredictFrom` functions on semantic of function arguments. Purge `ntree_limit` from libxgboost, use iteration instead. * [dask] Use `inplace_predict` by default for dask sklearn models. * [dask] Run prediction shape inference on worker instead of client. The breaking change is in the Python sklearn `apply` function, I made it to be consistent with other prediction functions where `best_iteration` is used by default.	2021-02-08 18:26:32 +08:00
Jiaming Yuan	411592a347	Enhance inplace prediction. (#6653 ) * Accept array interface for csr and array. * Accept an optional proxy dmatrix for metainfo. This constructs an explicit `_ProxyDMatrix` type in Python. * Remove unused doc. * Add strict output.	2021-02-02 11:41:46 +08:00
Jiaming Yuan	bc08e0c9d1	Remove `experimental_json_serialization` from tests. (#6640 )	2021-01-27 17:44:49 +08:00
Jiaming Yuan	740d042255	Add base_margin for evaluation dataset. (#6591 ) * Add base margin to evaluation datasets. * Unify the code base for evaluation matrices.	2021-01-26 02:11:02 +08:00
Jiaming Yuan	8942c98054	Define metainfo and other parameters for all DMatrix interfaces. (#6601 ) This PR ensures all DMatrix types have a common interface. * Fix logic in avoiding duplicated DMatrix in sklearn. * Check for consistency between DMatrix types. * Add doc for bounds.	2021-01-25 16:06:06 +08:00
Jiaming Yuan	78f2cd83d7	Suppress hypothesis health check for dask client. (#6589 )	2021-01-11 14:11:57 +08:00
Jiaming Yuan	80065d571e	[dask] Add DaskXGBRanker (#6576 ) * Initial support for distributed LTR using dask. * Support `qid` in libxgboost. * Refactor `predict` and `n_features_in_`, `best_[score/iteration/ntree_limit]` to avoid duplicated code. * Define `DaskXGBRanker`. The dask ranker doesn't support group structure, instead it uses query id and convert to group ptr internally.	2021-01-08 18:35:09 +08:00
Jiaming Yuan	de8fd852a5	[dask] Add type hints. (#6519 ) * Add validate_features. * Show type hints in doc. Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2020-12-29 19:41:02 +08:00
Jiaming Yuan	a30461cf87	[dask] Support all parameters in regressor and classifier. (#6471 ) * Add eval_metric. * Add callback. * Add feature weights. * Add custom objective.	2020-12-14 07:35:56 +08:00
Philip Hyunsu Cho	c31e3efa7c	Pass correct split_type to GPU predictor (#6491 ) * Pass correct split_type to GPU predictor * Add a test	2020-12-11 19:30:00 -08:00
Honza Sterba	b0036b339b	Optionaly fail when gpu_id is set to invalid value (#6342 )	2020-11-28 15:14:12 +08:00
Philip Hyunsu Cho	9c9070aea2	Use pytest conventions consistently (#6337 ) * Do not derive from unittest.TestCase (not needed for pytest) * assertRaises -> pytest.raises * Simplify test_empty_dmatrix with test parametrization * setUpClass -> setup_class, tearDownClass -> teardown_class * Don't import unittest; import pytest * Use plain assert * Use parametrized tests in more places * Fix test_gpu_with_sklearn.py * Put back run_empty_dmatrix_reg / run_empty_dmatrix_cls * Fix test_eta_decay_gpu_hist * Add parametrized tests for monotone constraints * Fix test names * Remove test parametrization * Revise test_slice to be not flaky	2020-11-19 17:00:15 -08:00
Jiaming Yuan	c1a62b5fa2	Expect gpu external memory to fail. (#6381 )	2020-11-12 19:24:48 +08:00
Jiaming Yuan	6e12c2a6f8	[dask] Supoort running on GKE. (#6343 ) * Avoid accessing `scheduler_info()['workers']`. * Avoid calling `client.gather` inside task. * Avoid using `client.scheduler_address`.	2020-11-11 18:04:34 +08:00
Jiaming Yuan	8a17610666	Implement GPU predict leaf. (#6187 )	2020-11-11 17:33:47 +08:00
Jiaming Yuan	43efadea2e	Deterministic data partitioning for external memory (#6317 ) * Make external memory data partitioning deterministic. * Change the meaning of `page_size` from bytes to number of rows. * Design a data pool. * Note for external memory. * Enable unity build on Windows CI. * Force garbage collect on test.	2020-11-11 06:11:06 +08:00
Jiaming Yuan	184e2eac7d	Add period to evaluation monitor. (#6348 )	2020-11-10 07:47:48 +08:00
Rory Mitchell	29745c6df2	Fix inclusive scan for large sizes (#6234 )	2020-11-03 17:01:43 +13:00
Jiaming Yuan	048acf81cd	Enable shap sparse test. (#6332 )	2020-11-01 20:59:27 +08:00
Philip Hyunsu Cho	143b278267	Mark flaky tests as XFAIL (#6299 ) * Temporarily skip TestGPUUpdaters::test_categorical * Temporarily skip test_boost_from_prediction[approx]	2020-10-28 11:50:57 -07:00
Philip Hyunsu Cho	c8ec62103a	Deprecate LabelEncoder in XGBClassifier; Enable cuDF/cuPy inputs in XGBClassifier (#6269 ) * Deprecate LabelEncoder in XGBClassifier; skip LabelEncoder for cuDF/cuPy inputs * Add unit tests for cuDF and cuPy inputs with XGBClassifier * Fix lint * Clarify warning * Move use_label_encoder option to XGBClassifier constructor * Add a test for cudf.Series * Add use_label_encoder to XGBRFClassifier doc * Address reviewer feedback	2020-10-26 13:20:51 -07:00
Jiaming Yuan	5037abeb86	Fix linear gpu input (#6255 )	2020-10-19 12:02:36 +08:00
Philip Hyunsu Cho	65ea42bd42	[CI] Reduce testing load with RMM (#6249 ) * [CI] Reduce testing load with RMM * Address reviewer's comment	2020-10-18 19:16:46 -07:00
Jiaming Yuan	ab5b35134f	Rework Python callback functions. (#6199 ) * Define a new callback interface for Python. * Deprecate the old callbacks. * Enable early stopping on dask.	2020-10-10 17:52:36 +08:00
Jiaming Yuan	b5b24354b8	More categorical tests and disable shap sparse test. (#6219 ) * Fix tree load with 32 category.	2020-10-10 16:12:37 +08:00
Jiaming Yuan	70ce5216b5	Add high level tests for categorical data. (#6179 ) * Fix unique.	2020-10-09 09:27:23 +08:00
Rory Mitchell	dda9e1e487	Update GPUTreeshap (#6163 ) * Reduce shap test duration * Test interoperability with shap package * Add feature interactions * Update GPUTreeShap	2020-09-28 09:43:47 +13:00
Jiaming Yuan	78d72ef936	Add DaskDeviceQuantileDMatrix demo. (#6156 )	2020-09-24 14:08:28 +08:00
Jiaming Yuan	2fcc4f2886	Unify evaluation functions. (#6037 )	2020-08-26 14:23:27 +08:00
Rory Mitchell	9a4e8b1d81	GPUTreeShap (#6038 )	2020-08-25 12:47:41 +12:00
Jiaming Yuan	4d99c58a5f	Feature weights (#5962 )	2020-08-18 19:55:41 +08:00
Philip Hyunsu Cho	14d5ce712c	[CI] Fix Dask Pytest fixture (#6024 )	2020-08-17 16:45:22 -07:00
Philip Hyunsu Cho	9adb812a0a	RMM integration plugin (#5873 ) * [CI] Add RMM as an optional dependency * Replace caching allocator with pool allocator from RMM * Revert "Replace caching allocator with pool allocator from RMM" This reverts commit e15845d4e72e890c2babe31a988b26503a7d9038. * Use rmm::mr::get_default_resource() * Try setting default resource (doesn't work yet) * Allocate pool_mr in the heap * Prevent leaking pool_mr handle * Separate EXPECT_DEATH() in separate test suite suffixed DeathTest * Turn off death tests for RMM * Address reviewer's feedback * Prevent leaking of cuda_mr * Fix Jenkinsfile syntax * Remove unnecessary function in Jenkinsfile * [CI] Install NCCL into RMM container * Run Python tests * Try building with RMM, CUDA 10.0 * Do not use RMM for CUDA 10.0 target * Actually test for test_rmm flag * Fix TestPythonGPU * Use CNMeM allocator, since pool allocator doesn't yet support multiGPU * Use 10.0 container to build RMM-enabled XGBoost * Revert "Use 10.0 container to build RMM-enabled XGBoost" This reverts commit 789021fa31112e25b683aef39fff375403060141. * Fix Jenkinsfile * [CI] Assign larger /dev/shm to NCCL * Use 10.2 artifact to run multi-GPU Python tests * Add CUDA 10.0 -> 11.0 cross-version test; remove CUDA 10.0 target * Rename Conda env rmm_test -> gpu_test * Use env var to opt into CNMeM pool for C++ tests * Use identical CUDA version for RMM builds and tests * Use Pytest fixtures to enable RMM pool in Python tests * Move RMM to plugin/CMakeLists.txt; use PLUGIN_RMM * Use per-device MR; use command arg in gtest * Set CMake prefix path to use Conda env * Use 0.15 nightly version of RMM * Remove unnecessary header * Fix a unit test when cudf is missing * Add RMM demos * Remove print() * Use HostDeviceVector in GPU predictor * Simplify pytest setup; use LocalCUDACluster fixture * Address reviewers' commments Co-authored-by: Hyunsu Cho <chohyu01@cs.wasshington.edu>	2020-08-12 01:26:02 -07:00
Jiaming Yuan	ee70a2380b	Unify CPU hist sketching (#5880 )	2020-08-12 01:33:06 +08:00
Philip Hyunsu Cho	bf2990e773	Add missing Pytest marks to AsyncIO unit test (#5968 )	2020-08-01 10:56:24 +08:00
Jiaming Yuan	fa3715f584	[Dask] Asyncio support. (#5862 )	2020-07-30 06:23:58 +08:00
Philip Hyunsu Cho	ac9136ee49	Further improvements and savings in Jenkins pipeline (#5904 ) * Publish artifacts only on the master and release branches * Build CUDA only for Compute Capability 7.5 when building PRs * Run all Windows jobs in a single worker image * Build nightly XGBoost4J SNAPSHOT JARs with Scala 2.12 only * Show skipped Python tests on Windows * Make Graphviz optional for Python tests * Add back C++ tests * Unstash xgboost_cpp_tests * Fix label to CUDA 10.1 * Install cuPy for CUDA 10.1 * Install jsonschema * Address reviewer's feedback	2020-07-18 03:30:40 -07:00
Jiaming Yuan	7c2686146e	Dask device dmatrix (#5901 ) * Fix softprob with empty dmatrix.	2020-07-17 13:17:43 +08:00
Jiaming Yuan	029a8b533f	Simplify the data backends. (#5893 )	2020-07-16 15:17:31 +08:00
Jiaming Yuan	a3ec964346	Accept iterator in device dmatrix. (#5783 ) * Remove Device DMatrix.	2020-07-07 21:44:48 +08:00
Jiaming Yuan	048d969be4	Implement GK sketching on GPU. (#5846 ) * Implement GK sketching on GPU. * Strong tests on quantile building. * Handle sparse dataset by binary searching the column index. * Hypothesis test on dask.	2020-07-07 12:16:21 +08:00
Rory Mitchell	abdf894fcf	Add cupy to Windows CI (#5797 ) * Add cupy to Windows CI * Update Jenkinsfile-win64 Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> * Update Jenkinsfile-win64 Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> * Update tests/python-gpu/test_gpu_prediction.py Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2020-06-17 21:55:09 -07:00
Rory Mitchell	b47b5ac771	Use hypothesis (#5759 ) * Use hypothesis * Allow int64 array interface for groups * Add packages to Windows CI * Add to travis * Make sure device index is set correctly * Fix dask-cudf test * appveyor	2020-06-16 12:45:59 +12:00
Rory Mitchell	359023c0fa	Speed up python test (#5752 ) * Speed up tests * Prevent DeviceQuantileDMatrix initialisation with numpy * Use joblib.memory * Use RandomState	2020-06-05 11:39:24 +12:00

1 2 3 4 5

218 Commits