xgboost

Author	SHA1	Message	Date
Jiaming Yuan	e8a962575a	[EM] Allow staging ellpack on host for GPU external memory. (#10488 ) - New parameter `on_host`. - Abstract format creation and stream creation into policy classes.	2024-06-28 04:42:18 +08:00
Philip Hyunsu Cho	bc3747bdce	[CI] Migrate to rockylinux8 / manylinux_2_28_x86_64 (#10399 ) * [CI] Migrate to rockylinux8 / manylinux_2_28_x86_64 * Scrub all references to CentOS 7 * Fix * Remove use of yum * Use gcc-10 in cpu * Temporarily disable -Werror * Use GCC 9 for now * Roll back gRPC * Scrub all references to manylinux2014_x86_64 * Revise rename_whl.py to handle no-op rename * Change JDK_VERSION back to 8 * Reviewer's comment * Use GCC 10 * Use Spark 3.5.1, same as in pom.xml * Fix JAR install	2024-06-17 12:07:49 -07:00
Jiaming Yuan	a5a58102e5	Revamp the rabit implementation. (#10112 ) This PR replaces the original RABIT implementation with a new one, which has already been partially merged into XGBoost. The new one features: - Federated learning for both CPU and GPU. - NCCL. - More data types. - A unified interface for all the underlying implementations. - Improved timeout handling for both tracker and workers. - Exhausted tests with metrics (fixed a couple of bugs along the way). - A reusable tracker for Python and JVM packages.	2024-05-20 11:56:23 +08:00
Jiaming Yuan	73afef1a6e	Fixes for numpy 2.0. (#10252 )	2024-05-07 03:54:32 +08:00
github-actions[bot]	2925cebdca	[CI] Use latest RAPIDS; Pandas 2.0 compatibility fix (#10175 ) * [CI] Update RAPIDS to latest stable * [CI] Use rapidsai stable channel; fix syntax errors in Dockerfile.gpu * Don't combine astype() with loc() * Work around https://github.com/dmlc/xgboost/issues/10181 * Fix formatting * Fix test --------- Co-authored-by: hcho3 <hcho3@users.noreply.github.com> Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2024-04-15 13:38:53 -07:00
Jiaming Yuan	ca4801f81d	Work with IPv6 in the new tracker. (#10125 )	2024-03-20 05:19:23 +08:00
Jiaming Yuan	8ea705e4d5	Support sample weight in sklearn custom objective. (#10050 )	2024-02-21 00:43:14 +08:00
Jiaming Yuan	54b71c8fba	Fix with black 24.1.1. (#10014 )	2024-01-30 17:24:11 +08:00
Jiaming Yuan	38dd91f491	Save model in ubj as the default. (#9947 )	2024-01-05 17:53:36 +08:00
Jiaming Yuan	5f7b5a6921	Add tests for pickling with custom obj and metric. (#9943 )	2024-01-04 14:52:48 +08:00
Jiaming Yuan	3ca06ac51e	[doc] Mention data consistency for categorical features. (#9678 )	2023-10-24 10:11:33 +08:00
Rong Ou	6fbe6248f4	More in-memory input support for column split (#9685 )	2023-10-20 16:02:36 +08:00
Rong Ou	da6803b75b	Support column-wise data split with in-memory inputs (#9628 ) --------- Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>	2023-10-17 12:16:39 +08:00
Jiaming Yuan	ccfc90e4c6	[rabit] Improved connection handling. (#9531 ) - Enable timeout. - Report connection error from the system. - Handle retry for both tracker connection and peer connection.	2023-08-30 13:00:04 +08:00
Jiaming Yuan	972730cde0	Use matrix for gradient. (#9508 ) - Use the `linalg::Matrix` for storing gradients. - New API for the custom objective. - Custom objective for multi-class/multi-target is now required to return the correct shape. - Custom objective for Python can accept arrays with any strides. (row-major, column-major)	2023-08-24 05:29:52 +08:00
Jiaming Yuan	f05a23b41c	Use `weakref` instead of `id` for `DataIter` cache. (#9445 ) - Fix case where Python reuses id from freed objects. - Small optimization to column matrix with QDM by using `realloc` instead of copying data.	2023-08-10 00:40:06 +08:00
Jiaming Yuan	7129988847	Accept only keyword arguments in data iterator. (#9431 )	2023-08-03 12:44:16 +08:00
Jiaming Yuan	04aff3af8e	Define the new `device` parameter. (#9362 )	2023-07-13 19:30:25 +08:00
Jiaming Yuan	20c52f07d2	Support exporting cut values (#9356 )	2023-07-08 15:32:41 +08:00
Jiaming Yuan	39390cc2ee	[breaking] Remove the `predictor` param, allow fallback to prediction using `DMatrix`. (#9129 ) - A `DeviceOrd` struct is implemented to indicate the device. It will eventually replace the `gpu_id` parameter. - The `predictor` parameter is removed. - Fallback to `DMatrix` when `inplace_predict` is not available. - The heuristic for choosing a predictor is only used during training.	2023-07-03 19:23:54 +08:00
Jiaming Yuan	ee6809e642	Use mmap for external memory. (#9282 ) - Have basic infrastructure for mmap. - Release file write handle.	2023-06-19 18:52:55 +08:00
Jiaming Yuan	3913ff470f	Import data lazily during tests. (#9176 )	2023-05-23 03:58:31 +08:00
Jiaming Yuan	08ce495b5d	Use Booster context in DMatrix. (#8896 ) - Pass context from booster to DMatrix. - Use context instead of integer for `n_threads`. - Check the consistency configuration for `max_bin`. - Test for all combinations of initialization options.	2023-04-28 21:47:14 +08:00
Jiaming Yuan	1f9a57d17b	[Breaking] Require format to be specified in input URI. (#9077 ) Previously, we use `libsvm` as default when format is not specified. However, the dmlc data parser is not particularly robust against errors, and the most common type of error is undefined format. Along with which, we will recommend users to use other data loader instead. We will continue the maintenance of the parsers as it's currently used for many internal tests including federated learning.	2023-04-28 19:45:15 +08:00
Jiaming Yuan	e206b899ef	Rework MAP and Pairwise for LTR. (#9075 )	2023-04-28 02:39:12 +08:00
Jiaming Yuan	151882dd26	Initial support for multi-target tree. (#8616 ) * Implement multi-target for hist. - Add new hist tree builder. - Move data fetchers for tests. - Dispatch function calls in gbm base on the tree type.	2023-03-22 23:49:56 +08:00
Jiaming Yuan	5891f752c8	Rework the MAP metric. (#8931 ) - The new implementation is more strict as only binary labels are accepted. The previous implementation converts values greater than 1 to 1. - Deterministic GPU. (no atomic add). - Fix top-k handling. - Precise definition of MAP. (There are other variants on how to handle top-k). - Refactor GPU ranking tests.	2023-03-22 17:45:20 +08:00
Jiaming Yuan	6a892ce281	Specify src path for isort. (#8867 )	2023-03-06 17:30:27 +08:00
Jiaming Yuan	8a16944664	Fix ranking with quantile dmatrix and group weight. (#8762 )	2023-02-10 20:32:35 +08:00
Jiaming Yuan	cfa994d57f	Multi-target support for L1 error. (#8652 ) - Add matrix support to the median function. - Iterate through each target for quantile computation.	2023-01-11 05:51:14 +08:00
Jiaming Yuan	40343c8ee1	Test dask demos. (#8557 ) Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2022-12-13 18:37:31 +08:00
Jiaming Yuan	d666ba775e	Support all pandas nullable integer types. (#8480 ) - Enumerate all pandas integer types. - Tests for `None`, `nan`, and `pd.NA`	2022-11-28 22:38:16 +08:00
Jiaming Yuan	f2209c1fe4	Don't shuffle columns in categorical tests. (#8446 )	2022-11-28 20:28:06 +08:00
Jiaming Yuan	0d3da9869c	Require isort on all Python files. (#8420 )	2022-11-08 12:59:06 +08:00
James Lamb	bf8de227a9	[CI] remove unused import in python tests (#8409 )	2022-11-03 22:27:25 +08:00
Jiaming Yuan	a408c34558	Update JSON parser demo with categorical feature. (#8401 ) - Parse categorical features in the Python example. - Add tests. - Update document.	2022-10-28 20:57:43 +08:00
Jiaming Yuan	cfd2a9f872	Extract dask and spark test into distributed test. (#8395 ) - Move test files. - Run spark and dask separately to prevent conflicts. - Gather common code into the testing module.	2022-10-28 16:24:32 +08:00
Jiaming Yuan	cf70864fa3	Move Python testing utilities into xgboost module. (#8379 ) - Add typehints. - Fixes for pylint. Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>	2022-10-26 16:56:11 +08:00

38 Commits