xgboost

Author	SHA1	Message	Date
Jiaming Yuan	bac22734fb	Remove ntree limit in python package. (#8345 ) - Remove `ntree_limit`. The parameter has been deprecated since 1.4.0. - The SHAP package compatibility is broken.	2023-03-31 19:01:55 +08:00
Jiaming Yuan	c2b3a13e70	[breaking][skl] Remove parameter serialization. (#8963 ) - Remove parameter serialization in the scikit-learn interface. The scikit-lear interface `save_model` will save only the model and discard all hyper-parameters. This is to align with the native XGBoost interface, which distinguishes the hyper-parameter and model parameters. With the scikit-learn interface, model parameters are attributes of the estimator. For instance, `n_features_in_`, `n_classes_` are always accessible with `estimator.n_features_in_` and `estimator.n_classes_`, but not with the `estimator.get_params`. - Define a `load_model` method for classifier to load its own attributes. - Set n_estimators to None by default.	2023-03-27 21:34:10 +08:00
Jiaming Yuan	5891f752c8	Rework the MAP metric. (#8931 ) - The new implementation is more strict as only binary labels are accepted. The previous implementation converts values greater than 1 to 1. - Deterministic GPU. (no atomic add). - Fix top-k handling. - Precise definition of MAP. (There are other variants on how to handle top-k). - Refactor GPU ranking tests.	2023-03-22 17:45:20 +08:00
Jiaming Yuan	7eba285a1e	Support sklearn cross validation for ranker. (#8859 ) * Support sklearn cross validation for ranker. - Add a convention for X to include a special `qid` column. sklearn utilities consider only `X`, `y` and `sample_weight` for supervised learning algorithms, but we need an additional qid array for ranking. It's important to be able to support the cross validation function in sklearn since all other tuning functions like grid search are based on cross validation.	2023-03-07 00:22:08 +08:00
Jiaming Yuan	228a46e8ad	Support learning rate for zero-hessian objectives. (#8866 )	2023-03-06 20:33:28 +08:00
Jiaming Yuan	6a892ce281	Specify src path for isort. (#8867 )	2023-03-06 17:30:27 +08:00
Jiaming Yuan	225b3158f6	Support custom metric in sklearn ranker. (#8786 )	2023-02-12 13:14:07 +08:00
BenEfrati	213b5602d9	Add sample_weight to eval_metric (#8706 )	2023-02-05 00:06:38 +08:00
Jiaming Yuan	badeff1d74	Init estimation for regression. (#8272 )	2023-01-11 02:04:56 +08:00
Jiaming Yuan	1b58d81315	[doc] Document Python inputs. (#8643 )	2023-01-10 15:39:32 +08:00
Jiaming Yuan	e68a152d9e	Do not return internal value for `get_params`. (#8634 )	2023-01-05 17:48:26 +08:00
Rong Ou	42e6fbb0db	Fix sklearn test that calls a removed field (#8579 )	2022-12-09 13:06:44 -08:00
Jiaming Yuan	cfd2a9f872	Extract dask and spark test into distributed test. (#8395 ) - Move test files. - Run spark and dask separately to prevent conflicts. - Gather common code into the testing module.	2022-10-28 16:24:32 +08:00
Jiaming Yuan	cf70864fa3	Move Python testing utilities into xgboost module. (#8379 ) - Add typehints. - Fixes for pylint. Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>	2022-10-26 16:56:11 +08:00
Jiaming Yuan	c884b9e888	Validate features for inplace predict. (#8359 )	2022-10-19 23:05:36 +08:00
Jiaming Yuan	2176e511fc	Disable pytest-timeout for now. (#8348 )	2022-10-17 23:06:10 +08:00
Rory Mitchell	ce0382dcb0	[CI] Refactor tests to reduce CI time. (#8312 )	2022-10-12 11:32:06 +02:00
Jiaming Yuan	c70fa502a5	Expose `feature_types` to sklearn interface. (#7821 )	2022-04-21 20:23:35 +08:00
Jiaming Yuan	52d4eda786	Deprecate `use_label_encoder` in XGBClassifier. (#7822 ) * Deprecate `use_label_encoder` in XGBClassifier. * We have removed the encoder, now prepare to remove the indicator.	2022-04-21 13:14:02 +08:00
Jiaming Yuan	3c9b04460a	Move `num_parallel_tree` to model parameter. (#7751 ) The size of forest should be a property of model itself instead of a training hyper-parameter.	2022-03-29 02:32:42 +08:00
Philip Hyunsu Cho	c621775f34	Replace all uses of deprecated function sklearn.datasets.load_boston (#7373 ) * Replace all uses of deprecated function sklearn.datasets.load_boston * More renaming * Fix bad name * Update assertion * Fix n boosted rounds. * Avoid over regularization. * Rebase. * Avoid over regularization. * Whac-a-mole Co-authored-by: fis <jm.yuan@outlook.com>	2022-01-30 04:27:57 -08:00
Philip Hyunsu Cho	b4340abf56	Add special handling for multi:softmax in sklearn predict (#7607 ) * Add special handling for multi:softmax in sklearn predict * Add test coverage	2022-01-29 15:54:49 -08:00
Jiaming Yuan	001503186c	Rewrite approx (#7214 ) This PR rewrites the approx tree method to use codebase from hist for better performance and code sharing. The rewrite has many benefits: - Support for both `max_leaves` and `max_depth`. - Support for `grow_policy`. - Support for mono constraint. - Support for feature weights. - Support for easier bin configuration (`max_bin`). - Support for categorical data. - Faster performance for most of the datasets. (many times faster) - Support for prediction cache. - Significantly better performance for external memory. - Unites the code base between approx and hist.	2022-01-10 21:15:05 +08:00
Jiaming Yuan	eb1efb54b5	Define `feature_names_in_`. (#7526 ) * Define `feature_names_in_`. * Raise attribute error if it's not defined.	2022-01-05 01:35:34 +08:00
Jiaming Yuan	8f0a42a266	Initial support for multi-label classification. (#7521 ) * Add support in sklearn classifier.	2022-01-04 23:58:21 +08:00
Jiaming Yuan	58a6723eb1	Initial support for multioutput regression. (#7514 ) * Add num target model parameter, which is configured from input labels. * Change elementwise metric and indexing for weights. * Add demo. * Add tests.	2021-12-18 09:28:38 +08:00
Jiaming Yuan	8cc75f1576	Cleanup Python tests. (#7426 )	2021-11-14 15:47:05 +08:00
Jiaming Yuan	a13321148a	Support multi-class with base margin. (#7381 ) This is already partially supported but never properly tested. So the only possible way to use it is calling `numpy.ndarray.flatten` with `base_margin` before passing it into XGBoost. This PR adds proper support for most of the data types along with tests.	2021-11-02 13:38:00 +08:00
Jiaming Yuan	0f7a9b42f1	Use double precision in metric calculation. (#7364 )	2021-11-02 12:00:32 +08:00
Jiaming Yuan	45aef75cca	Move skl `eval_metric` and `early_stopping rounds` to model params. (#6751 ) A new parameter `custom_metric` is added to `train` and `cv` to distinguish the behaviour from the old `feval`. And `feval` is deprecated. The new `custom_metric` receives transformed prediction when the built-in objective is used. This enables XGBoost to use cost functions from other libraries like scikit-learn directly without going through the definition of the link function. `eval_metric` and `early_stopping_rounds` in sklearn interface are moved from `fit` to `__init__` and is now saved as part of the scikit-learn model. The old ones in `fit` function are now deprecated. The new `eval_metric` in `__init__` has the same new behaviour as `custom_metric`. Added more detailed documents for the behaviour of custom objective and metric.	2021-10-28 17:20:20 +08:00
Jiaming Yuan	3c4aa9b2ea	[breaking] Remove label encoder deprecated in 1.3. (#7357 )	2021-10-28 13:24:29 +08:00
Jiaming Yuan	c735c17f33	Disable callback and ES on random forest. (#7236 )	2021-09-17 18:21:17 +08:00
Jiaming Yuan	3f38d983a6	Fix prediction configuration. (#7159 ) After the predictor parameter was added to the constructor, this configuration was broken.	2021-08-11 16:34:36 +08:00
Jiaming Yuan	8a84be37b8	Pass scikit learn estimator checks for regressor. (#7130 ) * Check data shape. * Check labels.	2021-08-03 18:58:20 +08:00
Jiaming Yuan	663136aa08	Implement feature score for linear model. (#7048 ) * Add feature score support for linear model. * Port R interface to the new implementation. * Add linear model support in Python. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2021-06-25 14:34:02 +08:00
Jiaming Yuan	a5c852660b	Update document for sklearn model IO. (#6809 ) * Update the use of JSON. * Remove unnecessary type cast.	2021-04-01 15:52:36 +08:00
Jiaming Yuan	4f75f514ce	Fix GPU RF (#6755 ) * Fix sampling.	2021-03-17 06:23:35 +08:00
Jiaming Yuan	a4101de678	Fix divide by 0 in feature importance when no split is found. (#6676 )	2021-02-05 03:39:30 +08:00
Jiaming Yuan	740d042255	Add base_margin for evaluation dataset. (#6591 ) * Add base margin to evaluation datasets. * Unify the code base for evaluation matrices.	2021-01-26 02:11:02 +08:00
Jiaming Yuan	d6d72de339	Revert ntree limit fix (#6616 ) The old (before fix) best_ntree_limit ignores the num_class parameters, which is incorrect. In before we workarounded it in c++ layer to avoid possible breaking changes on other language bindings. But the Python interpretation stayed incorrect. The PR fixed that in Python to consider num_class, but didn't remove the old workaround, so tree calculation in predictor is incorrect, see PredictBatch in CPUPredictor.	2021-01-19 23:51:16 +08:00
Jiaming Yuan	c709f2aaaf	Fix evaluation result for XGBRanker. (#6594 ) * Remove duplicated code, which fixes typo `evals_result` -> `evals_result_`.	2021-01-12 09:36:41 +08:00
Jiaming Yuan	80065d571e	[dask] Add DaskXGBRanker (#6576 ) * Initial support for distributed LTR using dask. * Support `qid` in libxgboost. * Refactor `predict` and `n_features_in_`, `best_[score/iteration/ntree_limit]` to avoid duplicated code. * Define `DaskXGBRanker`. The dask ranker doesn't support group structure, instead it uses query id and convert to group ptr internally.	2021-01-08 18:35:09 +08:00
Jiaming Yuan	7c9dcbedbc	Fix `best_ntree_limit` for dart and gblinear. (#6579 )	2021-01-08 10:05:39 +08:00
Jiaming Yuan	f5ff90cd87	Support `_estimator_type`. (#6582 ) * Use `_estimator_type`. For more info, see: https://scikit-learn.org/stable/developers/develop.html#estimator-types * Model trained from dask can be loaded by single node skl interface.	2021-01-08 10:01:16 +08:00
Jiaming Yuan	60cfd14349	[dask, sklearn] Fix predict proba. (#6566 ) * For sklearn: - Handles user defined objective function. - Handles `softmax`. * For dask: - Use the implementation from sklearn, the previous implementation doesn't perform any extra handling.	2021-01-05 08:29:06 +08:00
Jiaming Yuan	ca3da55de4	Support early stopping with training continuation, correct num boosted rounds. (#6506 ) * Implement early stopping with training continuation. * Add new C API for obtaining boosted rounds. * Fix off by 1 in `save_best`. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2020-12-17 19:59:19 +08:00
Jiaming Yuan	a30461cf87	[dask] Support all parameters in regressor and classifier. (#6471 ) * Add eval_metric. * Add callback. * Add feature weights. * Add custom objective.	2020-12-14 07:35:56 +08:00
Jiaming Yuan	d6386e45e8	Fix filtering callable objects in skl xgb param. (#6466 ) Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2020-12-05 17:20:36 +08:00
Jiaming Yuan	2ce2a1a4d8	[SKL] Propagate parameters to booster during set_param. (#6416 )	2020-11-20 20:37:35 +08:00
Philip Hyunsu Cho	9c9070aea2	Use pytest conventions consistently (#6337 ) * Do not derive from unittest.TestCase (not needed for pytest) * assertRaises -> pytest.raises * Simplify test_empty_dmatrix with test parametrization * setUpClass -> setup_class, tearDownClass -> teardown_class * Don't import unittest; import pytest * Use plain assert * Use parametrized tests in more places * Fix test_gpu_with_sklearn.py * Put back run_empty_dmatrix_reg / run_empty_dmatrix_cls * Fix test_eta_decay_gpu_hist * Add parametrized tests for monotone constraints * Fix test names * Remove test parametrization * Revise test_slice to be not flaky	2020-11-19 17:00:15 -08:00

1 2 3

111 Commits