xgboost

Author	SHA1	Message	Date
Jiaming Yuan	4656b09d5d	[breaking] Add prediction fucntion for DMatrix and use inplace predict for dask. (#6668 ) * Add a new API function for predicting on `DMatrix`. This function aligns with rest of the `XGBoosterPredictFrom` functions on semantic of function arguments. Purge `ntree_limit` from libxgboost, use iteration instead. * [dask] Use `inplace_predict` by default for dask sklearn models. * [dask] Run prediction shape inference on worker instead of client. The breaking change is in the Python sklearn `apply` function, I made it to be consistent with other prediction functions where `best_iteration` is used by default.	2021-02-08 18:26:32 +08:00
Jiaming Yuan	dbb5208a0a	Use __array_interface__ for creating DMatrix from CSR. (#6675 ) * Use __array_interface__ for creating DMatrix from CSR. * Add configuration.	2021-02-05 21:09:47 +08:00
Jiaming Yuan	a4101de678	Fix divide by 0 in feature importance when no split is found. (#6676 )	2021-02-05 03:39:30 +08:00
Jiaming Yuan	72892cc80d	[dask] Disable gblinear and dart. (#6665 )	2021-02-04 09:13:09 +08:00
Jiaming Yuan	411592a347	Enhance inplace prediction. (#6653 ) * Accept array interface for csr and array. * Accept an optional proxy dmatrix for metainfo. This constructs an explicit `_ProxyDMatrix` type in Python. * Remove unused doc. * Add strict output.	2021-02-02 11:41:46 +08:00
Jiaming Yuan	87ab1ad607	[dask] Accept `Future` of model for prediction. (#6650 ) This PR changes predict and inplace_predict to accept a Future of model, to avoid sending models to workers repeatably. * Document is updated to reflect functionality additions in recent changes.	2021-02-02 08:45:52 +08:00
Jiaming Yuan	d8ec7aad5a	[dask] Add a 1 line sample to infer output shape. (#6645 ) * [dask] Use a 1 line sample to infer output shape. This is for inferring shape with direct prediction (without DaskDMatrix). There are a few things that requires known output shape before carrying out actual prediction, including dask meta data, output dataframe columns. * Infer output shape based on local prediction. * Remove set param in predict function as it's not thread safe nor necessary as we now let dask to decide the parallelism. * Simplify prediction on `DaskDMatrix`.	2021-01-30 18:55:50 +08:00
Jiaming Yuan	d167892c7e	[dask] Ensure model can be pickled. (#6651 )	2021-01-28 21:47:57 +08:00
Jiaming Yuan	740d042255	Add base_margin for evaluation dataset. (#6591 ) * Add base margin to evaluation datasets. * Unify the code base for evaluation matrices.	2021-01-26 02:11:02 +08:00
Jiaming Yuan	4bf23c2391	Specify shape in prediction contrib and interaction. (#6614 )	2021-01-26 02:08:22 +08:00
Jiaming Yuan	a275f40267	[dask] Rework base margin test. (#6627 )	2021-01-22 17:49:13 +08:00
Jiaming Yuan	7bc56fa0ed	Use simple print in tracker print function. (#6609 )	2021-01-21 21:15:43 +08:00
Jiaming Yuan	d6d72de339	Revert ntree limit fix (#6616 ) The old (before fix) best_ntree_limit ignores the num_class parameters, which is incorrect. In before we workarounded it in c++ layer to avoid possible breaking changes on other language bindings. But the Python interpretation stayed incorrect. The PR fixed that in Python to consider num_class, but didn't remove the old workaround, so tree calculation in predictor is incorrect, see PredictBatch in CPUPredictor.	2021-01-19 23:51:16 +08:00
Jiaming Yuan	d356b7a071	Restore unknown data support. (#6595 )	2021-01-14 04:51:16 +08:00
Jiaming Yuan	89a00a5866	[dask] Random forest estimators (#6602 )	2021-01-13 20:59:20 +08:00
Jiaming Yuan	0027220aa0	[breaking] Remove duplicated predict functions, Fix attributes IO. (#6593 ) * Fix attributes not being restored. * Rename all `data` to `X`. [breaking]	2021-01-13 16:56:49 +08:00
Jiaming Yuan	03cd087da1	Remove duplicated DMatrix. (#6592 )	2021-01-12 09:36:56 +08:00
Jiaming Yuan	c709f2aaaf	Fix evaluation result for XGBRanker. (#6594 ) * Remove duplicated code, which fixes typo `evals_result` -> `evals_result_`.	2021-01-12 09:36:41 +08:00
Jiaming Yuan	78f2cd83d7	Suppress hypothesis health check for dask client. (#6589 )	2021-01-11 14:11:57 +08:00
Jiaming Yuan	80065d571e	[dask] Add DaskXGBRanker (#6576 ) * Initial support for distributed LTR using dask. * Support `qid` in libxgboost. * Refactor `predict` and `n_features_in_`, `best_[score/iteration/ntree_limit]` to avoid duplicated code. * Define `DaskXGBRanker`. The dask ranker doesn't support group structure, instead it uses query id and convert to group ptr internally.	2021-01-08 18:35:09 +08:00
Jiaming Yuan	96d3d32265	[dask] Add shap tests. (#6575 )	2021-01-08 14:59:27 +08:00
Jiaming Yuan	7c9dcbedbc	Fix `best_ntree_limit` for dart and gblinear. (#6579 )	2021-01-08 10:05:39 +08:00
Jiaming Yuan	f5ff90cd87	Support `_estimator_type`. (#6582 ) * Use `_estimator_type`. For more info, see: https://scikit-learn.org/stable/developers/develop.html#estimator-types * Model trained from dask can be loaded by single node skl interface.	2021-01-08 10:01:16 +08:00
Jiaming Yuan	60cfd14349	[dask, sklearn] Fix predict proba. (#6566 ) * For sklearn: - Handles user defined objective function. - Handles `softmax`. * For dask: - Use the implementation from sklearn, the previous implementation doesn't perform any extra handling.	2021-01-05 08:29:06 +08:00
Jiaming Yuan	516a93d25c	Fix `best_ntree_limit`. (#6569 )	2021-01-03 05:58:54 +08:00
Jiaming Yuan	5e9e525223	Remove warnings in tests. (#6554 )	2020-12-31 13:41:18 +08:00
Jiaming Yuan	de8fd852a5	[dask] Add type hints. (#6519 ) * Add validate_features. * Show type hints in doc. Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2020-12-29 19:41:02 +08:00
Jiaming Yuan	ca3da55de4	Support early stopping with training continuation, correct num boosted rounds. (#6506 ) * Implement early stopping with training continuation. * Add new C API for obtaining boosted rounds. * Fix off by 1 in `save_best`. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2020-12-17 19:59:19 +08:00
Philip Hyunsu Cho	ad1a527709	Enable loading model from <1.0.0 trained with objective='binary:logitraw' (#6517 ) * Enable loading model from <1.0.0 trained with objective='binary:logitraw' * Add binary:logitraw in model compatibility testing suite * Feedback from @trivialfis: Override ProbToMargin() for LogisticRaw Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>	2020-12-16 16:53:46 -08:00
Jiaming Yuan	c5876277a8	Drop saving binary format for memory snapshot. (#6513 )	2020-12-17 00:14:57 +08:00
Jiaming Yuan	347f593169	Accept numpy array for DMatrix slice index. (#6368 )	2020-12-16 14:42:52 +08:00
Jiaming Yuan	ef4a0e0aac	Fix DMatrix feature names/types IO. (#6507 ) * Fix feature names/types IO Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2020-12-16 14:24:27 +08:00
Jiaming Yuan	3c3f026ec1	Move metric configuration into booster. (#6504 )	2020-12-16 05:35:04 +08:00
ShvetsKS	8139849ab6	Fix handling of print period in EvaluationMonitor (#6499 ) Co-authored-by: Kirill Shvets <kirill.shvets@intel.com>	2020-12-15 19:20:19 +08:00
Jiaming Yuan	a30461cf87	[dask] Support all parameters in regressor and classifier. (#6471 ) * Add eval_metric. * Add callback. * Add feature weights. * Add custom objective.	2020-12-14 07:35:56 +08:00
Jiaming Yuan	47b86180f6	Don't validate feature when number of rows is 0. (#6472 )	2020-12-07 18:08:51 +08:00
Jiaming Yuan	d6386e45e8	Fix filtering callable objects in skl xgb param. (#6466 ) Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2020-12-05 17:20:36 +08:00
Philip Hyunsu Cho	fb56da5e8b	Add global configuration (#6414 ) * Add management functions for global configuration: XGBSetGlobalConfig(), XGBGetGlobalConfig(). * Add Python interface: set_config(), get_config(), and config_context(). * Add unit tests for Python * Add R interface: xgb.set.config(), xgb.get.config() * Add unit tests for R Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>	2020-12-03 00:05:18 -08:00
Jiaming Yuan	927c316aeb	Fix period in evaluation monitor. (#6441 )	2020-11-29 03:18:33 +08:00
Jiaming Yuan	f4ff1c53fd	Fix CLI ranking demo. (#6439 ) Save model at final round.	2020-11-29 03:12:06 +08:00
Jiaming Yuan	2ce2a1a4d8	[SKL] Propagate parameters to booster during set_param. (#6416 )	2020-11-20 20:37:35 +08:00
Philip Hyunsu Cho	9c9070aea2	Use pytest conventions consistently (#6337 ) * Do not derive from unittest.TestCase (not needed for pytest) * assertRaises -> pytest.raises * Simplify test_empty_dmatrix with test parametrization * setUpClass -> setup_class, tearDownClass -> teardown_class * Don't import unittest; import pytest * Use plain assert * Use parametrized tests in more places * Fix test_gpu_with_sklearn.py * Put back run_empty_dmatrix_reg / run_empty_dmatrix_cls * Fix test_eta_decay_gpu_hist * Add parametrized tests for monotone constraints * Fix test names * Remove test parametrization * Revise test_slice to be not flaky	2020-11-19 17:00:15 -08:00
Jiaming Yuan	4ccf92ea34	[dask] Fix union of workers. (#6375 )	2020-11-13 16:55:05 +08:00
Jiaming Yuan	fcfeb4959c	Deprecate positional arguments. (#6365 ) Deprecate positional arguments in following functions: - `__init__` for all classes in sklearn module. - `fit` method for all classes in sklearn module. - dask interface. - `set_info` for `DMatrix` class. Refactor the evaluation matrices handling.	2020-11-13 11:10:30 +08:00
Philip Hyunsu Cho	e5193c21a1	[dask] Allow empty data matrix in AFT survival (#6379 ) * [dask] Allow empty data matrix in AFT survival * Add unit test	2020-11-12 17:49:58 -08:00
Jiaming Yuan	c90f968d92	Update Python documents. (#6376 )	2020-11-12 17:51:32 +08:00
Jiaming Yuan	6e12c2a6f8	[dask] Supoort running on GKE. (#6343 ) * Avoid accessing `scheduler_info()['workers']`. * Avoid calling `client.gather` inside task. * Avoid using `client.scheduler_address`.	2020-11-11 18:04:34 +08:00
Jiaming Yuan	8a17610666	Implement GPU predict leaf. (#6187 )	2020-11-11 17:33:47 +08:00
Jiaming Yuan	184e2eac7d	Add period to evaluation monitor. (#6348 )	2020-11-10 07:47:48 +08:00
Jiaming Yuan	2cc9662005	Support slicing tree model (#6302 ) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2020-11-02 23:27:39 -08:00

... 3 4 5 6 7 ...

518 Commits