xgboost

Author	SHA1	Message	Date
Jiaming Yuan	1369133916	[dask] Remove the workaround for segfault. (#7146 )	2021-07-30 03:57:53 +08:00
Gil Forsyth	92ae3abc97	[dask] Disallow importing non-dask estimators from xgboost.dask (#7133 ) * Disallow importing non-dask estimators from xgboost.dask This is mostly a style change, but also avoids a user error (that I have committed on a few occasions). Since `XGBRegressor` and `XGBClassifier` are imported as parent classes for the `dask` estimators, without defining an `__all__`, autocomplete (or muscle) memory will produce the following with little prompting: ``` from xgboost.dask import XGBClassifier ``` There's nothing inherently wrong with that, but given that `XGBClassifier` is not `dask` enabled, it can lead to confusing behavior until you figure out you should've typed ``` from xgboost.dask import DaskXGBClassifier ``` Another option is to alias import the existing non-dask estimators. * Remove base/iter class, add train predict funcs	2021-07-28 02:07:23 +08:00
Jiaming Yuan	7017dd5a26	[JVM-Packages] Use Python tracker in XGBoost for JVM package. (#7132 )	2021-07-27 16:20:42 +08:00
Jiaming Yuan	2f524e9f41	[dask] Work around segfault in prediction. (#7112 )	2021-07-16 04:27:05 +08:00
Jiaming Yuan	86715e4cd4	Support categorical data for dask functional interface and DQM. (#7043 ) * Support categorical data for dask functional interface and DQM. * Implement categorical data support for GPU GK-merge. * Add support for dask functional interface. * Add support for DQM. * Get newer cupy.	2021-06-18 13:06:52 +08:00
Jiaming Yuan	c4b9f4f622	Add `enable_categorical` to sklearn. (#7011 )	2021-06-04 02:29:14 +08:00
Jiaming Yuan	89a49cf30e	Fix dask predict on `DaskDMatrix` with `iteration_range`. (#7005 )	2021-05-29 04:43:12 +08:00
Jiaming Yuan	7e846bb965	Fix prediction on df with latest dask. (#6969 )	2021-05-19 12:23:03 +08:00
Jiaming Yuan	05ac415780	[dask] Set dataframe index in predict. (#6944 )	2021-05-12 13:24:21 +08:00
Jiaming Yuan	dee5ef2dfd	Typehint for Sklearn. (#6799 )	2021-04-14 06:55:21 +08:00
Jiaming Yuan	47b62480af	More general predict proba. (#6817 ) * Use `output_margin` for `softmax`. * Add test for dask binary cls. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2021-04-01 19:52:12 +08:00
James Lamb	f01af43eb0	[dask] disable work stealing explicitly for training tasks (#6794 )	2021-03-29 16:47:56 +08:00
Jiaming Yuan	325bc93e16	[dask] Use `distributed.MultiLock` (#6743 ) * [dask] Use `distributed.MultiLock` This enables training multiple models in parallel. * Conditionally import `MultiLock`. * Use async train directly in scikit learn interface. * Use `worker_client` when available.	2021-03-16 14:19:41 +08:00
capybara	b6167cd2ff	[dask] Use client to persist collections (#6722 ) Co-authored-by: fis <jm.yuan@outlook.com>	2021-02-25 16:40:38 +08:00
Jiaming Yuan	c375173dca	Support pylint 2.7.0 (#6726 )	2021-02-25 12:49:58 +08:00
James Lamb	dc97b5f19f	[dask] remove outdated comment (#6699 )	2021-02-15 18:49:11 +08:00
Jiaming Yuan	e8c5c53e2f	Use `Predictor` for `dart`. (#6693 ) * Use normal predictor for dart booster. * Implement `inplace_predict` for dart. * Enable `dart` for dask interface now that it's thread-safe. * categorical data should be working out of box for dart now. The implementation is not very efficient as it has to pull back the data and apply weight for each tree, but still a significant improvement over previous implementation as now we no longer binary search for each sample. * Fix output prediction shape on dataframe.	2021-02-09 23:30:19 +08:00
Jiaming Yuan	1335db6113	[dask] Improve documents. (#6687 ) * Add tag for versions. * use autoclass in sphinx build. Made some class methods to be private to avoid exporting documents.	2021-02-09 09:20:58 +08:00
Jiaming Yuan	4656b09d5d	[breaking] Add prediction fucntion for DMatrix and use inplace predict for dask. (#6668 ) * Add a new API function for predicting on `DMatrix`. This function aligns with rest of the `XGBoosterPredictFrom` functions on semantic of function arguments. Purge `ntree_limit` from libxgboost, use iteration instead. * [dask] Use `inplace_predict` by default for dask sklearn models. * [dask] Run prediction shape inference on worker instead of client. The breaking change is in the Python sklearn `apply` function, I made it to be consistent with other prediction functions where `best_iteration` is used by default.	2021-02-08 18:26:32 +08:00
Jiaming Yuan	72892cc80d	[dask] Disable gblinear and dart. (#6665 )	2021-02-04 09:13:09 +08:00
Jiaming Yuan	9d62b14591	Fix document. [skip ci] (#6669 )	2021-02-02 20:43:31 +08:00
Jiaming Yuan	411592a347	Enhance inplace prediction. (#6653 ) * Accept array interface for csr and array. * Accept an optional proxy dmatrix for metainfo. This constructs an explicit `_ProxyDMatrix` type in Python. * Remove unused doc. * Add strict output.	2021-02-02 11:41:46 +08:00
Jiaming Yuan	87ab1ad607	[dask] Accept `Future` of model for prediction. (#6650 ) This PR changes predict and inplace_predict to accept a Future of model, to avoid sending models to workers repeatably. * Document is updated to reflect functionality additions in recent changes.	2021-02-02 08:45:52 +08:00
Jiaming Yuan	d8ec7aad5a	[dask] Add a 1 line sample to infer output shape. (#6645 ) * [dask] Use a 1 line sample to infer output shape. This is for inferring shape with direct prediction (without DaskDMatrix). There are a few things that requires known output shape before carrying out actual prediction, including dask meta data, output dataframe columns. * Infer output shape based on local prediction. * Remove set param in predict function as it's not thread safe nor necessary as we now let dask to decide the parallelism. * Simplify prediction on `DaskDMatrix`.	2021-01-30 18:55:50 +08:00
Jiaming Yuan	d167892c7e	[dask] Ensure model can be pickled. (#6651 )	2021-01-28 21:47:57 +08:00
Jiaming Yuan	740d042255	Add base_margin for evaluation dataset. (#6591 ) * Add base margin to evaluation datasets. * Unify the code base for evaluation matrices.	2021-01-26 02:11:02 +08:00
Jiaming Yuan	4bf23c2391	Specify shape in prediction contrib and interaction. (#6614 )	2021-01-26 02:08:22 +08:00
Jiaming Yuan	8942c98054	Define metainfo and other parameters for all DMatrix interfaces. (#6601 ) This PR ensures all DMatrix types have a common interface. * Fix logic in avoiding duplicated DMatrix in sklearn. * Check for consistency between DMatrix types. * Add doc for bounds.	2021-01-25 16:06:06 +08:00
Jiaming Yuan	89a00a5866	[dask] Random forest estimators (#6602 )	2021-01-13 20:59:20 +08:00
Jiaming Yuan	80065d571e	[dask] Add DaskXGBRanker (#6576 ) * Initial support for distributed LTR using dask. * Support `qid` in libxgboost. * Refactor `predict` and `n_features_in_`, `best_[score/iteration/ntree_limit]` to avoid duplicated code. * Define `DaskXGBRanker`. The dask ranker doesn't support group structure, instead it uses query id and convert to group ptr internally.	2021-01-08 18:35:09 +08:00
Jiaming Yuan	60cfd14349	[dask, sklearn] Fix predict proba. (#6566 ) * For sklearn: - Handles user defined objective function. - Handles `softmax`. * For dask: - Use the implementation from sklearn, the previous implementation doesn't perform any extra handling.	2021-01-05 08:29:06 +08:00
Jiaming Yuan	de8fd852a5	[dask] Add type hints. (#6519 ) * Add validate_features. * Show type hints in doc. Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2020-12-29 19:41:02 +08:00
Jiaming Yuan	610ee632cc	[Breaking] Rename `data` to `X` in `predict_proba`. (#6555 ) New Scikit-Learn version uses keyword argument, and `X` is the predefined keyword. * Use pip to install latest Python graphviz on Windows CI.	2020-12-28 21:36:03 +08:00
Philip Hyunsu Cho	125b3c0f2d	Lazy import cuDF and Dask (#6522 ) * Lazy import cuDF * Lazy import Dask Co-authored-by: PSEUDOTENSOR / Jonathan McKinney <pseudotensor@gmail.com> * Fix lint Co-authored-by: PSEUDOTENSOR / Jonathan McKinney <pseudotensor@gmail.com>	2020-12-17 01:51:35 -08:00
Jiaming Yuan	d45c0d843b	Show partition status in dask error. (#6366 )	2020-12-16 02:58:21 +08:00
Jiaming Yuan	a30461cf87	[dask] Support all parameters in regressor and classifier. (#6471 ) * Add eval_metric. * Add callback. * Add feature weights. * Add custom objective.	2020-12-14 07:35:56 +08:00
Jiaming Yuan	0ffaf0f5be	Fix dask ip resolution. (#6475 ) This adopts the solution used in dask/dask-xgboost#40 which employs the get_host_ip from dmlc-core tracker.	2020-12-07 16:36:23 -08:00
Jiaming Yuan	47b86180f6	Don't validate feature when number of rows is 0. (#6472 )	2020-12-07 18:08:51 +08:00
Jiaming Yuan	703c2d06aa	Fix global config default value. (#6470 )	2020-12-06 06:15:33 +08:00
Philip Hyunsu Cho	fb56da5e8b	Add global configuration (#6414 ) * Add management functions for global configuration: XGBSetGlobalConfig(), XGBGetGlobalConfig(). * Add Python interface: set_config(), get_config(), and config_context(). * Add unit tests for Python * Add R interface: xgb.set.config(), xgb.get.config() * Add unit tests for R Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>	2020-12-03 00:05:18 -08:00
Jiaming Yuan	a7b42adb74	Fix dask predict (#6412 )	2020-11-20 10:10:52 +08:00
Jiaming Yuan	fcd6fad822	[dask] Small cleanup. (#6391 )	2020-11-14 22:15:05 +08:00
Jiaming Yuan	4ccf92ea34	[dask] Fix union of workers. (#6375 )	2020-11-13 16:55:05 +08:00
Jiaming Yuan	fcfeb4959c	Deprecate positional arguments. (#6365 ) Deprecate positional arguments in following functions: - `__init__` for all classes in sklearn module. - `fit` method for all classes in sklearn module. - dask interface. - `set_info` for `DMatrix` class. Refactor the evaluation matrices handling.	2020-11-13 11:10:30 +08:00
Jiaming Yuan	6e12c2a6f8	[dask] Supoort running on GKE. (#6343 ) * Avoid accessing `scheduler_info()['workers']`. * Avoid calling `client.gather` inside task. * Avoid using `client.scheduler_address`.	2020-11-11 18:04:34 +08:00
Rory Mitchell	29745c6df2	Fix inclusive scan for large sizes (#6234 )	2020-11-03 17:01:43 +13:00
Jiaming Yuan	7756192906	[dask] Fix prediction on `DaskDMatrix` with multiple meta data. (#6333 ) * Unify the meta handling methods.	2020-11-02 19:18:44 -05:00
Jiaming Yuan	74ea82209b	Lazy import dask libraries. (#6309 ) * Lazy import dask libraries. * Lint && fix. * Use short name.	2020-10-28 15:50:11 -07:00
Jiaming Yuan	3da5a69dc9	Fix typo in dask interface. (#6240 )	2020-10-15 15:26:29 +08:00
Jiaming Yuan	bed7ae4083	Loop over `thrust::reduce`. (#6229 ) * Check input chunk size of dqdm. * Add doc for current limitation.	2020-10-14 10:40:56 +13:00

1 2

90 Commits