xgboost

Author	SHA1	Message	Date
Jiaming Yuan	827d0e8edb	[breaking] Bump Python requirement to 3.10. (#10434 ) - Bump the Python requirement. - Fix type hints. - Use loky to avoid deadlock. - Workaround cupy-numpy compatibility issue on Windows caused by the `safe` casting rule. - Simplify the repartitioning logic to avoid dask errors.	2024-07-30 17:31:06 +08:00
Jiaming Yuan	65d7bf2dfe	Handle np integer in model slice and prediction. (#10007 )	2024-01-26 04:58:48 +08:00
Jiaming Yuan	2f57bbde3c	Additional tests for attributes and model booosted rounds. (#9962 )	2024-01-09 09:54:39 +08:00
Jiaming Yuan	38dd91f491	Save model in ubj as the default. (#9947 )	2024-01-05 17:53:36 +08:00
Jiaming Yuan	621348abb3	Fix multi-output with alternating strategies. (#9933 ) --------- Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2024-01-04 16:41:13 +08:00
Jiaming Yuan	a7226c0222	Fix feature names with special characters. (#9923 )	2023-12-28 22:45:13 +08:00
Jiaming Yuan	05d7000096	Handle special characters in JSON model dump. (#9474 )	2023-08-14 15:49:00 +08:00
Jiaming Yuan	1f9a57d17b	[Breaking] Require format to be specified in input URI. (#9077 ) Previously, we use `libsvm` as default when format is not specified. However, the dmlc data parser is not particularly robust against errors, and the most common type of error is undefined format. Along with which, we will recommend users to use other data loader instead. We will continue the maintenance of the parsers as it's currently used for many internal tests including federated learning.	2023-04-28 19:45:15 +08:00
Jiaming Yuan	2c8d735cb3	Fix tests with pandas 2.0. (#9014 ) * Fix tests with pandas 2.0. - `is_categorical` is replaced by `is_categorical_dtype`. - one hot encoding returns boolean type instead of integer type.	2023-04-11 00:17:34 +08:00
Jiaming Yuan	bac22734fb	Remove ntree limit in python package. (#8345 ) - Remove `ntree_limit`. The parameter has been deprecated since 1.4.0. - The SHAP package compatibility is broken.	2023-03-31 19:01:55 +08:00
Jiaming Yuan	acc110c251	[MT-TREE] Support prediction cache and model slicing. (#8968 ) - Fix prediction range. - Support prediction cache in mt-hist. - Support model slicing. - Make the booster a Python iterable by defining `__iter__`. - Cleanup removed/deprecated parameters. - A new field in the output model `iteration_indptr` for pointing to the ranges of trees for each iteration.	2023-03-27 23:10:54 +08:00
Jiaming Yuan	151882dd26	Initial support for multi-target tree. (#8616 ) * Implement multi-target for hist. - Add new hist tree builder. - Move data fetchers for tests. - Dispatch function calls in gbm base on the tree type.	2023-03-22 23:49:56 +08:00
Jiaming Yuan	910ce580c8	Clear all cache after model load. (#8904 )	2023-03-14 22:09:36 +08:00
Jiaming Yuan	cf70864fa3	Move Python testing utilities into xgboost module. (#8379 ) - Add typehints. - Fixes for pylint. Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>	2022-10-26 16:56:11 +08:00
Jiaming Yuan	13b15e07e8	Handle formatted JSON input. (#7953 )	2022-06-01 16:20:58 +08:00
Jiaming Yuan	332380479b	Avoid warning in np primitive type tests. (#7833 )	2022-04-23 02:07:01 +08:00
Jiaming Yuan	3c9b04460a	Move `num_parallel_tree` to model parameter. (#7751 ) The size of forest should be a property of model itself instead of a training hyper-parameter.	2022-03-29 02:32:42 +08:00
Jiaming Yuan	7366d3b20c	Ensure models with categorical splits don't use old binary format. (#7666 )	2022-02-19 08:05:28 +08:00
Jiaming Yuan	dac9eb13bd	Implement new `save_raw` in Python. (#7572 ) * Expose the new C API function to Python. * Remove old document and helper script. * Small optimization to the `save_raw` and Json ctors.	2022-01-19 02:27:51 +08:00
Jiaming Yuan	a1bcd33a3b	[breaking] Change internal model serialization to UBJSON. (#7556 ) * Use typed array for models. * Change the memory snapshot format. * Add new C API for saving to raw format.	2022-01-16 02:11:53 +08:00
Jiaming Yuan	d7e1fa7664	Fix feature names and types in output model slice. (#7078 )	2021-07-06 11:47:49 +08:00
Jiaming Yuan	72f9daf9b6	Fix `gpu_id` with custom objective. (#7015 )	2021-06-09 14:51:17 +08:00
Jiaming Yuan	9da2287ab8	[breaking] Save booster feature info in JSON, remove feature name generation. (#6605 ) * Save feature info in booster in JSON model. * [breaking] Remove automatic feature name generation in `DMatrix`. This PR is to enable reliable feature validation in Python package.	2021-02-25 18:54:16 +08:00
Jiaming Yuan	4656b09d5d	[breaking] Add prediction fucntion for DMatrix and use inplace predict for dask. (#6668 ) * Add a new API function for predicting on `DMatrix`. This function aligns with rest of the `XGBoosterPredictFrom` functions on semantic of function arguments. Purge `ntree_limit` from libxgboost, use iteration instead. * [dask] Use `inplace_predict` by default for dask sklearn models. * [dask] Run prediction shape inference on worker instead of client. The breaking change is in the Python sklearn `apply` function, I made it to be consistent with other prediction functions where `best_iteration` is used by default.	2021-02-08 18:26:32 +08:00
Jiaming Yuan	d6d72de339	Revert ntree limit fix (#6616 ) The old (before fix) best_ntree_limit ignores the num_class parameters, which is incorrect. In before we workarounded it in c++ layer to avoid possible breaking changes on other language bindings. But the Python interpretation stayed incorrect. The PR fixed that in Python to consider num_class, but didn't remove the old workaround, so tree calculation in predictor is incorrect, see PredictBatch in CPUPredictor.	2021-01-19 23:51:16 +08:00
Jiaming Yuan	0027220aa0	[breaking] Remove duplicated predict functions, Fix attributes IO. (#6593 ) * Fix attributes not being restored. * Rename all `data` to `X`. [breaking]	2021-01-13 16:56:49 +08:00
Jiaming Yuan	ca3da55de4	Support early stopping with training continuation, correct num boosted rounds. (#6506 ) * Implement early stopping with training continuation. * Add new C API for obtaining boosted rounds. * Fix off by 1 in `save_best`. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2020-12-17 19:59:19 +08:00
Philip Hyunsu Cho	9c9070aea2	Use pytest conventions consistently (#6337 ) * Do not derive from unittest.TestCase (not needed for pytest) * assertRaises -> pytest.raises * Simplify test_empty_dmatrix with test parametrization * setUpClass -> setup_class, tearDownClass -> teardown_class * Don't import unittest; import pytest * Use plain assert * Use parametrized tests in more places * Fix test_gpu_with_sklearn.py * Put back run_empty_dmatrix_reg / run_empty_dmatrix_cls * Fix test_eta_decay_gpu_hist * Add parametrized tests for monotone constraints * Fix test names * Remove test parametrization * Revise test_slice to be not flaky	2020-11-19 17:00:15 -08:00
Jiaming Yuan	2cc9662005	Support slicing tree model (#6302 ) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2020-11-02 23:27:39 -08:00
Jiaming Yuan	ab5b35134f	Rework Python callback functions. (#6199 ) * Define a new callback interface for Python. * Deprecate the old callbacks. * Enable early stopping on dask.	2020-10-10 17:52:36 +08:00
Christian Lorentzen	cf4f019ed6	[Breaking] Change default evaluation metric for classification to logloss / mlogloss (#6183 ) * Change DefaultEvalMetric of classification from error to logloss * Change default binary metric in plugin/example/custom_obj.cc * Set old error metric in python tests * Set old error metric in R tests * Fix missed eval metrics and typos in R tests * Fix setting eval_metric twice in R tests * Add warning for empty eval_metric for classification * Fix Dask tests Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2020-10-02 12:06:47 -07:00
Jiaming Yuan	b9ebbffc57	Fix plotting test. (#6040 ) Previously the test loads a model generated by `test_basic.py`, now we generate the model explicitly. * Cleanup saved files for basic tests.	2020-08-22 13:18:48 +08:00
Jiaming Yuan	8599f87597	Update JSON schema. (#5982 ) * Update JSON schema for pseudo huber. * Update JSON model schema.	2020-08-05 15:21:11 +08:00
Jiaming Yuan	9c93531709	Update Python custom objective demo. (#5981 )	2020-08-05 12:27:19 +08:00
Jiaming Yuan	bc1d3ee230	Fix r early stop with custom objective. (#5923 ) * Specify `ntreelimit`.	2020-07-23 03:28:17 +08:00
Jiaming Yuan	535479e69f	Add JSON schema to model dump. (#5660 )	2020-05-15 10:18:43 +08:00
Jiaming Yuan	0fd455e162	Restore loading model from buffer. (#5360 )	2020-02-26 11:30:13 +08:00
Jiaming Yuan	e433a379e4	Fix changing locale. (#5314 ) * Fix changing locale. * Don't use locale guard. As number parsing is implemented in house, we don't need locale. * Update doc.	2020-02-17 11:31:13 +08:00
Jiaming Yuan	911a902835	Merge model compatibility fixes from 1.0rc branch. (#5305 ) * Port test model compatibility. * Port logit model fix. https://github.com/dmlc/xgboost/pull/5248 https://github.com/dmlc/xgboost/pull/5281	2020-02-13 20:41:58 +08:00
Jiaming Yuan	472ded549d	Save Scikit-Learn attributes into learner attributes. (#5245 ) * Remove the recommendation for pickle. * Save skl attributes in booster.attr * Test loading scikit-learn model with native booster.	2020-01-30 16:00:18 +08:00
Jiaming Yuan	ef19480eda	Add dart to JSON schema. (#5218 ) * Add dart to JSON schema. * Use spaces instead of tab.	2020-01-28 13:29:09 +08:00
Kodi Arfer	f100b8d878	[Breaking] Don't drop trees during DART prediction by default (#5115 ) * Simplify DropTrees calling logic * Add `training` parameter for prediction method. * [Breaking]: Add `training` to C API. * Change for R and Python custom objective. * Correct comment. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>	2020-01-13 21:48:30 +08:00
Jiaming Yuan	298ebe68ac	[Breaking] Remove `learning_rates` in Python. (#5155 ) * Remove `learning_rates`. It's been deprecated since we have callback. * Set `before_iteration` of `reset_learning_rate` to False to preserve the initial learning rate, and comply to the term "reset". Closes #4709. * Tests for various `tree_method`.	2019-12-24 14:25:48 +08:00
Jiaming Yuan	0202e04a8e	Add base margin to sklearn interface. (#5151 )	2019-12-24 09:43:41 +08:00
Jiaming Yuan	1d0ca49761	Example JSON model parser and Schema. (#5137 )	2019-12-23 19:47:35 +08:00
Jiaming Yuan	3136185bc5	JSON configuration IO. (#5111 ) * Add saving/loading JSON configuration. * Implement Python pickle interface with new IO routines. * Basic tests for training continuation.	2019-12-15 17:31:53 +08:00
Jiaming Yuan	ad4a1c732c	Small refinements for JSON model. (#5112 ) * Naming consistency. * Remove duplicated test.	2019-12-11 19:49:01 +08:00
Jiaming Yuan	208ab3b1ff	Model IO in JSON. (#5110 )	2019-12-11 11:20:40 +08:00
Jiaming Yuan	f0064c07ab	Refactor configuration [Part II]. (#4577 ) * Refactor configuration [Part II]. * General changes: Remove `Init` methods to avoid ambiguity. Remove `Configure(std::map<>)` to avoid redundant copying and prepare for parameter validation. (`std::vector` is returned from `InitAllowUnknown`). ** Add name to tree updaters for easier debugging. * Learner changes: Make `LearnerImpl` the only source of configuration. All configurations are stored and carried out by `LearnerImpl::Configure()`. Remove booster in C API. Originally kept for "compatibility reason", but did not state why. So here we just remove it. Add a `metric_names_` field in `LearnerImpl`. Remove `LazyInit`. Configuration will always be lazy. ** Run `Configure` before every iteration. * Predictor changes: Allocate both cpu and gpu predictor. Remove cpu_predictor from gpu_predictor. `GBTree` is now used to dispatch the predictor. ** Remove some GPU Predictor tests. * IO No IO changes. The binary model format stability is tested by comparing hashing value of save models between two commits	2019-07-20 08:34:56 -04:00
Jiaming Yuan	29a1356669	Deprecate `reg:linear' in favor of` reg:squarederror'. (#4267 ) * Deprecate `reg:linear' in favor of `reg:squarederror'. * Replace the use of `reg:linear'. * Replace the use of `silent`.	2019-03-17 17:55:04 +08:00

1 2

60 Commits