xgboost

Author	SHA1	Message	Date
Jiaming Yuan	1094d6015d	[py] Use the first found native library. (#9860 )	2023-12-08 17:23:16 +08:00
Jiaming Yuan	39c637ee19	Use array interface in Python prediction return. (#9855 )	2023-12-08 03:42:14 +08:00
Jiaming Yuan	0715ab3c10	Use `dlopen` to load NCCL. (#9796 ) This PR adds optional support for loading nccl with `dlopen` as an alternative of compile time linking. This is to address the size bloat issue with the PyPI binary release. - Add CMake option to load `nccl` at runtime. - Add an NCCL stub. After this, `nccl` will be fetched from PyPI when using pip to install XGBoost, either by a user or by `pyproject.toml`. Others who want to link the nccl at compile time can continue to do so without any change. At the moment, this is Linux only since we only support MNMG on Linux.	2023-11-22 19:27:31 +08:00
Rong Ou	da6803b75b	Support column-wise data split with in-memory inputs (#9628 ) --------- Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>	2023-10-17 12:16:39 +08:00
Jiaming Yuan	60526100e3	Support arrow through pandas ext types. (#9612 ) - Use pandas extension type for pyarrow support. - Additional support for QDM. - Additional support for inplace_predict.	2023-09-28 17:00:16 +08:00
Jiaming Yuan	a90d204942	Use array interface for testing numpy arrays. (#9602 )	2023-09-23 03:13:48 +08:00
Jiaming Yuan	972730cde0	Use matrix for gradient. (#9508 ) - Use the `linalg::Matrix` for storing gradients. - New API for the custom objective. - Custom objective for multi-class/multi-target is now required to return the correct shape. - Custom objective for Python can accept arrays with any strides. (row-major, column-major)	2023-08-24 05:29:52 +08:00
Jiaming Yuan	044fea1281	Drop support for loading remote files. (#9504 )	2023-08-21 23:34:05 +08:00
Jiaming Yuan	5188e27513	Fix version parsing with rc release. (#9493 )	2023-08-16 22:44:58 +08:00
Jiaming Yuan	f05a23b41c	Use `weakref` instead of `id` for `DataIter` cache. (#9445 ) - Fix case where Python reuses id from freed objects. - Small optimization to column matrix with QDM by using `realloc` instead of copying data.	2023-08-10 00:40:06 +08:00
Jiaming Yuan	7129988847	Accept only keyword arguments in data iterator. (#9431 )	2023-08-03 12:44:16 +08:00
Jiaming Yuan	851cba931e	Define `best_iteration` only if early stopping is used. (#9403 ) * Define `best_iteration` only if early stopping is used. This is the behavior specified by the document but not honored in the actual code. - Don't set the attributes if there's no early stopping. - Clean up the code for callbacks, and replace assertions with proper exceptions. - Assign the attributes when early stopping `save_best` is used. - Turn the attributes into Python properties. --------- Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2023-07-24 12:43:35 +08:00
Jiaming Yuan	01e00efc53	[breaking] Remove support for single string feature info. (#9401 ) - Input must be a sequence of strings. - Improve validation error message.	2023-07-24 11:06:30 +08:00
Jiaming Yuan	6e18d3a290	[pyspark] Handle the `device` parameter in pyspark. (#9390 ) - Handle the new `device` parameter in PySpark. - Deprecate the old `use_gpu` parameter.	2023-07-18 08:47:03 +08:00
Jiaming Yuan	b342ef951b	Make feature validation immutable. (#9388 )	2023-07-16 06:52:55 +08:00
Jiaming Yuan	16eb41936d	Handle the new `device` parameter in dask and demos. (#9386 ) * Handle the new `device` parameter in dask and demos. - Check no ordinal is specified in the dask interface. - Update demos. - Update dask doc. - Update the condition for QDM.	2023-07-15 19:11:20 +08:00
Jiaming Yuan	9da5050643	Turn warning messages into Python warnings. (#9387 )	2023-07-15 07:46:43 +08:00
Jiaming Yuan	04aff3af8e	Define the new `device` parameter. (#9362 )	2023-07-13 19:30:25 +08:00
Jiaming Yuan	20c52f07d2	Support exporting cut values (#9356 )	2023-07-08 15:32:41 +08:00
Jiaming Yuan	39390cc2ee	[breaking] Remove the `predictor` param, allow fallback to prediction using `DMatrix`. (#9129 ) - A `DeviceOrd` struct is implemented to indicate the device. It will eventually replace the `gpu_id` parameter. - The `predictor` parameter is removed. - Fallback to `DMatrix` when `inplace_predict` is not available. - The heuristic for choosing a predictor is only used during training.	2023-07-03 19:23:54 +08:00
michael-gendy-mention-me	c5677a2b2c	Remove `type: ignore` hints (#9197 )	2023-05-27 07:48:28 +08:00
Jiaming Yuan	720a8c3273	[doc] Remove parameter type in Python doc strings. (#9005 )	2023-04-01 04:04:30 +08:00
Jiaming Yuan	bac22734fb	Remove ntree limit in python package. (#8345 ) - Remove `ntree_limit`. The parameter has been deprecated since 1.4.0. - The SHAP package compatibility is broken.	2023-03-31 19:01:55 +08:00
Jiaming Yuan	acc110c251	[MT-TREE] Support prediction cache and model slicing. (#8968 ) - Fix prediction range. - Support prediction cache in mt-hist. - Support model slicing. - Make the booster a Python iterable by defining `__iter__`. - Cleanup removed/deprecated parameters. - A new field in the output model `iteration_indptr` for pointing to the ranges of trees for each iteration.	2023-03-27 23:10:54 +08:00
Jiaming Yuan	c2b3a13e70	[breaking][skl] Remove parameter serialization. (#8963 ) - Remove parameter serialization in the scikit-learn interface. The scikit-lear interface `save_model` will save only the model and discard all hyper-parameters. This is to align with the native XGBoost interface, which distinguishes the hyper-parameter and model parameters. With the scikit-learn interface, model parameters are attributes of the estimator. For instance, `n_features_in_`, `n_classes_` are always accessible with `estimator.n_features_in_` and `estimator.n_classes_`, but not with the `estimator.get_params`. - Define a `load_model` method for classifier to load its own attributes. - Set n_estimators to None by default.	2023-03-27 21:34:10 +08:00
Jiaming Yuan	7eba285a1e	Support sklearn cross validation for ranker. (#8859 ) * Support sklearn cross validation for ranker. - Add a convention for X to include a special `qid` column. sklearn utilities consider only `X`, `y` and `sample_weight` for supervised learning algorithms, but we need an additional qid array for ranking. It's important to be able to support the cross validation function in sklearn since all other tuning functions like grid search are based on cross validation.	2023-03-07 00:22:08 +08:00
Jiaming Yuan	cce4af4acf	Initial support for quantile loss. (#8750 ) - Add support for Python. - Add objective.	2023-02-16 02:30:18 +08:00
Jiaming Yuan	c4802bfcd0	Cleanup booster param types. (#8756 )	2023-02-07 15:52:19 +08:00
Jiaming Yuan	0f37a01dd9	Require black formatter for the python package. (#8748 )	2023-02-07 01:53:33 +08:00
Jiaming Yuan	c1786849e3	Use array interface for CSC matrix. (#8672 ) * Use array interface for CSC matrix. Use array interface for CSC matrix and align the interface with CSR and dense. - Fix nthread issue in the R package DMatrix. - Unify the behavior of handling `missing` with other inputs. - Unify the behavior of handling `missing` around R, Python, Java, and Scala DMatrix. - Expose `num_non_missing` to the JVM interface. - Deprecate old CSR and CSC constructors.	2023-02-05 01:59:46 +08:00
Jiaming Yuan	31b9cbab3d	Make sure input numpy array is aligned. (#8690 ) - use `np.require` to specify that the alignment is required. - scipy csr as well. - validate input pointer in `ArrayInterface`.	2023-01-18 08:12:13 +08:00
Jiaming Yuan	247946a875	Cache transformed in QuantileDMatrix for efficiency. (#8666 )	2023-01-17 06:02:40 +08:00
Jiaming Yuan	1b58d81315	[doc] Document Python inputs. (#8643 )	2023-01-10 15:39:32 +08:00
Rong Ou	3ceeb8c61c	Add data split mode to DMatrix MetaInfo (#8568 )	2022-12-25 20:37:37 +08:00
Jiaming Yuan	f6effa1734	Support `Series` and Python primitives in `inplace_predict` and QDM (#8547 )	2022-12-17 00:15:15 +08:00
Jiaming Yuan	001e663d42	Set `enable_categorical` to True in predict. (#8592 )	2022-12-15 05:27:06 +08:00
James Lamb	06ea6c7e79	[python] remove unnecessary conversions between data structures (#8546 ) Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>	2022-12-14 18:32:02 +08:00
Jiaming Yuan	d666ba775e	Support all pandas nullable integer types. (#8480 ) - Enumerate all pandas integer types. - Tests for `None`, `nan`, and `pd.NA`	2022-11-28 22:38:16 +08:00
Jiaming Yuan	0d3da9869c	Require isort on all Python files. (#8420 )	2022-11-08 12:59:06 +08:00
Yizhi Liu	5699f60a88	Type fix for WebAssembly: use bst_ulong instead of size_t for ncol in CSR conversion. (#8369 )	2022-10-26 19:21:45 +08:00
Jiaming Yuan	c884b9e888	Validate features for inplace predict. (#8359 )	2022-10-19 23:05:36 +08:00
Jiaming Yuan	fcddbc9264	FIx incorrect function name. (#8346 )	2022-10-17 19:28:20 +08:00
Jiaming Yuan	97a5b088a5	[pyspark] Use quantile dmatrix. (#8284 )	2022-10-12 20:38:53 +08:00
Jiaming Yuan	5545c49cfc	Require keyword args for data iterator. (#8327 )	2022-10-10 17:47:13 +08:00
Jiaming Yuan	e47b3a3da3	Upgrade mypy. (#8302 ) Some breaking changes were made in mypy.	2022-10-05 14:31:59 +08:00
Jiaming Yuan	97c3a80a34	Add C document to sphinx, fix arrow. (#8300 ) - Group C API. - Add C API sphinx doc. - Consistent use of `OptionalArg` and the parameter name `config`. - Remove call to deprecated functions in demo. - Fix some formatting errors. - Add links to c examples in the document (only visible with doxygen pages) - Fix arrow.	2022-10-05 09:52:15 +08:00
Jiaming Yuan	55cf24cc32	Obtain CSR matrix from DMatrix. (#8269 )	2022-09-29 20:41:43 +08:00
Jiaming Yuan	6925b222e0	Fix mixed types with cuDF. (#8280 )	2022-09-29 00:57:52 +08:00
Jiaming Yuan	f835368bcf	Mark next release as 1.7 instead of 2.0 (#8281 )	2022-09-28 14:33:37 +08:00
Jiaming Yuan	570f8ae4ba	Use black on more Python files. (#8137 )	2022-08-11 01:38:11 +08:00

1 2 3 4 5 ...

290 Commits