xgboost

Author	SHA1	Message	Date
Jiaming Yuan	e88ac9cc54	[dask] Extend tree stats tests. (#7128 ) * Add tests to GPU. * Assert cover in children sums up to the parent.	2021-07-27 12:22:13 +08:00
ShvetsKS	caa9e527dd	Remove extra sync for dense data (#7120 ) Co-authored-by: SHVETS, KIRILL <kirill.shvets@intel.com>	2021-07-22 19:02:31 +08:00
Jiaming Yuan	ffa66aace0	Persist data in dask test. (#7077 )	2021-07-06 11:47:17 +08:00
jmoralez	25514e104a	[dask] speed up tests (#7020 )	2021-06-11 11:43:01 +08:00
Jiaming Yuan	89a49cf30e	Fix dask predict on `DaskDMatrix` with `iteration_range`. (#7005 )	2021-05-29 04:43:12 +08:00
Jiaming Yuan	44cc9c04ea	Fix multiclass auc with empty dataset. (#6947 )	2021-05-12 15:01:14 +08:00
Jiaming Yuan	05ac415780	[dask] Set dataframe index in predict. (#6944 )	2021-05-12 13:24:21 +08:00
Jiaming Yuan	a1d23f6613	Relax test for decision stump in distributed environment. (#6919 )	2021-04-30 09:04:11 +08:00
Jiaming Yuan	47b62480af	More general predict proba. (#6817 ) * Use `output_margin` for `softmax`. * Add test for dask binary cls. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2021-04-01 19:52:12 +08:00
Jiaming Yuan	bcc0277338	Re-implement ROC-AUC. (#6747 ) * Re-implement ROC-AUC. * Binary * MultiClass * LTR * Add documents. This PR resolves a few issues: - Define a value when the dataset is invalid, which can happen if there's an empty dataset, or when the dataset contains only positive or negative values. - Define ROC-AUC for multi-class classification. - Define weighted average value for distributed setting. - A correct implementation for learning to rank task. Previous implementation is just binary classification with averaging across groups, which doesn't measure ordered learning to rank.	2021-03-20 16:52:40 +08:00
Jiaming Yuan	325bc93e16	[dask] Use `distributed.MultiLock` (#6743 ) * [dask] Use `distributed.MultiLock` This enables training multiple models in parallel. * Conditionally import `MultiLock`. * Use async train directly in scikit learn interface. * Use `worker_client` when available.	2021-03-16 14:19:41 +08:00
Philip Hyunsu Cho	366f3cb9d8	Add use_rmm flag to global configuration (#6656 ) * Ensure RMM is 0.18 or later * Add use_rmm flag to global configuration * Modify XGBCachingDeviceAllocatorImpl to skip CUB when use_rmm=True * Update the demo * [CI] Pin NumPy to 1.19.4, since NumPy 1.19.5 doesn't work with latest Shap	2021-03-09 14:53:05 -08:00
capybara	b6167cd2ff	[dask] Use client to persist collections (#6722 ) Co-authored-by: fis <jm.yuan@outlook.com>	2021-02-25 16:40:38 +08:00
Jiaming Yuan	e8c5c53e2f	Use `Predictor` for `dart`. (#6693 ) * Use normal predictor for dart booster. * Implement `inplace_predict` for dart. * Enable `dart` for dask interface now that it's thread-safe. * categorical data should be working out of box for dart now. The implementation is not very efficient as it has to pull back the data and apply weight for each tree, but still a significant improvement over previous implementation as now we no longer binary search for each sample. * Fix output prediction shape on dataframe.	2021-02-09 23:30:19 +08:00
Jiaming Yuan	1335db6113	[dask] Improve documents. (#6687 ) * Add tag for versions. * use autoclass in sphinx build. Made some class methods to be private to avoid exporting documents.	2021-02-09 09:20:58 +08:00
Jiaming Yuan	4656b09d5d	[breaking] Add prediction fucntion for DMatrix and use inplace predict for dask. (#6668 ) * Add a new API function for predicting on `DMatrix`. This function aligns with rest of the `XGBoosterPredictFrom` functions on semantic of function arguments. Purge `ntree_limit` from libxgboost, use iteration instead. * [dask] Use `inplace_predict` by default for dask sklearn models. * [dask] Run prediction shape inference on worker instead of client. The breaking change is in the Python sklearn `apply` function, I made it to be consistent with other prediction functions where `best_iteration` is used by default.	2021-02-08 18:26:32 +08:00
Jiaming Yuan	72892cc80d	[dask] Disable gblinear and dart. (#6665 )	2021-02-04 09:13:09 +08:00
Jiaming Yuan	87ab1ad607	[dask] Accept `Future` of model for prediction. (#6650 ) This PR changes predict and inplace_predict to accept a Future of model, to avoid sending models to workers repeatably. * Document is updated to reflect functionality additions in recent changes.	2021-02-02 08:45:52 +08:00
Jiaming Yuan	d8ec7aad5a	[dask] Add a 1 line sample to infer output shape. (#6645 ) * [dask] Use a 1 line sample to infer output shape. This is for inferring shape with direct prediction (without DaskDMatrix). There are a few things that requires known output shape before carrying out actual prediction, including dask meta data, output dataframe columns. * Infer output shape based on local prediction. * Remove set param in predict function as it's not thread safe nor necessary as we now let dask to decide the parallelism. * Simplify prediction on `DaskDMatrix`.	2021-01-30 18:55:50 +08:00
Jiaming Yuan	d167892c7e	[dask] Ensure model can be pickled. (#6651 )	2021-01-28 21:47:57 +08:00
Jiaming Yuan	740d042255	Add base_margin for evaluation dataset. (#6591 ) * Add base margin to evaluation datasets. * Unify the code base for evaluation matrices.	2021-01-26 02:11:02 +08:00
Jiaming Yuan	4bf23c2391	Specify shape in prediction contrib and interaction. (#6614 )	2021-01-26 02:08:22 +08:00
Jiaming Yuan	a275f40267	[dask] Rework base margin test. (#6627 )	2021-01-22 17:49:13 +08:00
Jiaming Yuan	7bc56fa0ed	Use simple print in tracker print function. (#6609 )	2021-01-21 21:15:43 +08:00
Jiaming Yuan	d6d72de339	Revert ntree limit fix (#6616 ) The old (before fix) best_ntree_limit ignores the num_class parameters, which is incorrect. In before we workarounded it in c++ layer to avoid possible breaking changes on other language bindings. But the Python interpretation stayed incorrect. The PR fixed that in Python to consider num_class, but didn't remove the old workaround, so tree calculation in predictor is incorrect, see PredictBatch in CPUPredictor.	2021-01-19 23:51:16 +08:00
Jiaming Yuan	89a00a5866	[dask] Random forest estimators (#6602 )	2021-01-13 20:59:20 +08:00
Jiaming Yuan	78f2cd83d7	Suppress hypothesis health check for dask client. (#6589 )	2021-01-11 14:11:57 +08:00
Jiaming Yuan	80065d571e	[dask] Add DaskXGBRanker (#6576 ) * Initial support for distributed LTR using dask. * Support `qid` in libxgboost. * Refactor `predict` and `n_features_in_`, `best_[score/iteration/ntree_limit]` to avoid duplicated code. * Define `DaskXGBRanker`. The dask ranker doesn't support group structure, instead it uses query id and convert to group ptr internally.	2021-01-08 18:35:09 +08:00
Jiaming Yuan	96d3d32265	[dask] Add shap tests. (#6575 )	2021-01-08 14:59:27 +08:00
Jiaming Yuan	f5ff90cd87	Support `_estimator_type`. (#6582 ) * Use `_estimator_type`. For more info, see: https://scikit-learn.org/stable/developers/develop.html#estimator-types * Model trained from dask can be loaded by single node skl interface.	2021-01-08 10:01:16 +08:00
Jiaming Yuan	60cfd14349	[dask, sklearn] Fix predict proba. (#6566 ) * For sklearn: - Handles user defined objective function. - Handles `softmax`. * For dask: - Use the implementation from sklearn, the previous implementation doesn't perform any extra handling.	2021-01-05 08:29:06 +08:00
Jiaming Yuan	de8fd852a5	[dask] Add type hints. (#6519 ) * Add validate_features. * Show type hints in doc. Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2020-12-29 19:41:02 +08:00
Jiaming Yuan	a30461cf87	[dask] Support all parameters in regressor and classifier. (#6471 ) * Add eval_metric. * Add callback. * Add feature weights. * Add custom objective.	2020-12-14 07:35:56 +08:00
Jiaming Yuan	47b86180f6	Don't validate feature when number of rows is 0. (#6472 )	2020-12-07 18:08:51 +08:00
Philip Hyunsu Cho	fb56da5e8b	Add global configuration (#6414 ) * Add management functions for global configuration: XGBSetGlobalConfig(), XGBGetGlobalConfig(). * Add Python interface: set_config(), get_config(), and config_context(). * Add unit tests for Python * Add R interface: xgb.set.config(), xgb.get.config() * Add unit tests for R Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>	2020-12-03 00:05:18 -08:00
Jiaming Yuan	4ccf92ea34	[dask] Fix union of workers. (#6375 )	2020-11-13 16:55:05 +08:00
Philip Hyunsu Cho	e5193c21a1	[dask] Allow empty data matrix in AFT survival (#6379 ) * [dask] Allow empty data matrix in AFT survival * Add unit test	2020-11-12 17:49:58 -08:00
Jiaming Yuan	6e12c2a6f8	[dask] Supoort running on GKE. (#6343 ) * Avoid accessing `scheduler_info()['workers']`. * Avoid calling `client.gather` inside task. * Avoid using `client.scheduler_address`.	2020-11-11 18:04:34 +08:00
Jiaming Yuan	7756192906	[dask] Fix prediction on `DaskDMatrix` with multiple meta data. (#6333 ) * Unify the meta handling methods.	2020-11-02 19:18:44 -05:00
Jiaming Yuan	c80657b542	Fix flaky data initialization test. (#6318 )	2020-10-30 03:11:22 +08:00
Philip Hyunsu Cho	143b278267	Mark flaky tests as XFAIL (#6299 ) * Temporarily skip TestGPUUpdaters::test_categorical * Temporarily skip test_boost_from_prediction[approx]	2020-10-28 11:50:57 -07:00
Jiaming Yuan	2686d32a36	Skip dask tests on ARM. (#6267 ) Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2020-10-26 15:09:05 +08:00
Jiaming Yuan	81c37c28d5	Time the CPU tests on Jenkins. (#6257 ) * Time the CPU tests on Jenkins. * Reduce thread contention. * Add doc. * Skip heavy tests on ARM.	2020-10-20 17:19:07 -07:00
Jiaming Yuan	3da5a69dc9	Fix typo in dask interface. (#6240 )	2020-10-15 15:26:29 +08:00
Jiaming Yuan	b05073bda5	[dask] Test for data initializaton. (#6226 )	2020-10-13 11:08:35 +08:00
Jiaming Yuan	ab5b35134f	Rework Python callback functions. (#6199 ) * Define a new callback interface for Python. * Deprecate the old callbacks. * Enable early stopping on dask.	2020-10-10 17:52:36 +08:00
Christian Lorentzen	cf4f019ed6	[Breaking] Change default evaluation metric for classification to logloss / mlogloss (#6183 ) * Change DefaultEvalMetric of classification from error to logloss * Change default binary metric in plugin/example/custom_obj.cc * Set old error metric in python tests * Set old error metric in R tests * Fix missed eval metrics and typos in R tests * Fix setting eval_metric twice in R tests * Add warning for empty eval_metric for classification * Fix Dask tests Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2020-10-02 12:06:47 -07:00
Kyle Nicholson	e6a238c020	Update base margin dask (#6155 ) * Add `base-margin` * Add `output_margin` to regressor. Co-authored-by: fis <jm.yuan@outlook.com>	2020-09-26 21:30:52 +08:00
Jiaming Yuan	33d80ffad0	[dask] Support more meta data on functional interface. (#6132 ) * Add base_margin, label_(lower\|upper)_bound. * Test survival training with dask.	2020-09-21 16:56:37 +08:00
Rory Mitchell	47350f6acb	Allow kwargs in dask predict (#6117 )	2020-09-15 13:04:03 +12:00

1 2

71 Commits