xgboost

Author	SHA1	Message	Date
Jiaming Yuan	19b59938b7	Convert input to str for hypothesis note. (#9480 )	2023-08-15 02:27:58 +08:00
Jiaming Yuan	912e341d57	Initial GPU support for the approx tree method. (#9414 )	2023-07-31 15:50:28 +08:00
Jiaming Yuan	20c52f07d2	Support exporting cut values (#9356 )	2023-07-08 15:32:41 +08:00
Jiaming Yuan	39390cc2ee	[breaking] Remove the `predictor` param, allow fallback to prediction using `DMatrix`. (#9129 ) - A `DeviceOrd` struct is implemented to indicate the device. It will eventually replace the `gpu_id` parameter. - The `predictor` parameter is removed. - Fallback to `DMatrix` when `inplace_predict` is not available. - The heuristic for choosing a predictor is only used during training.	2023-07-03 19:23:54 +08:00
Jiaming Yuan	3913ff470f	Import data lazily during tests. (#9176 )	2023-05-23 03:58:31 +08:00
Jiaming Yuan	08ce495b5d	Use Booster context in DMatrix. (#8896 ) - Pass context from booster to DMatrix. - Use context instead of integer for `n_threads`. - Check the consistency configuration for `max_bin`. - Test for all combinations of initialization options.	2023-04-28 21:47:14 +08:00
Jiaming Yuan	151882dd26	Initial support for multi-target tree. (#8616 ) * Implement multi-target for hist. - Add new hist tree builder. - Move data fetchers for tests. - Dispatch function calls in gbm base on the tree type.	2023-03-22 23:49:56 +08:00
Jiaming Yuan	6a892ce281	Specify src path for isort. (#8867 )	2023-03-06 17:30:27 +08:00
Rory Mitchell	69a50248b7	Fix scope of feature set pointers (#8850 ) --------- Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>	2023-03-02 12:37:14 +08:00
Jiaming Yuan	cce4af4acf	Initial support for quantile loss. (#8750 ) - Add support for Python. - Add objective.	2023-02-16 02:30:18 +08:00
Jiaming Yuan	badeff1d74	Init estimation for regression. (#8272 )	2023-01-11 02:04:56 +08:00
Jiaming Yuan	0d3da9869c	Require isort on all Python files. (#8420 )	2022-11-08 12:59:06 +08:00
Jiaming Yuan	cfd2a9f872	Extract dask and spark test into distributed test. (#8395 ) - Move test files. - Run spark and dask separately to prevent conflicts. - Gather common code into the testing module.	2022-10-28 16:24:32 +08:00
Jiaming Yuan	cf70864fa3	Move Python testing utilities into xgboost module. (#8379 ) - Add typehints. - Fixes for pylint. Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>	2022-10-26 16:56:11 +08:00
Jiaming Yuan	2176e511fc	Disable pytest-timeout for now. (#8348 )	2022-10-17 23:06:10 +08:00
Rory Mitchell	ce0382dcb0	[CI] Refactor tests to reduce CI time. (#8312 )	2022-10-12 11:32:06 +02:00
Jiaming Yuan	fffb1fca52	Calculate `base_score` based on input labels for mae. (#8107 ) Fit an intercept as base score for abs loss.	2022-09-20 20:53:54 +08:00
Jiaming Yuan	b5eb36f1af	Add `max_cat_threshold` to GPU and handle missing cat values. (#8212 )	2022-09-07 00:57:51 +08:00
Jiaming Yuan	0ce80b7bcf	Mitigate flaky GPU test. (#8078 ) The flakiness is caused by the global random engine, which will take some time to fix.	2022-07-16 13:45:32 +08:00
Jiaming Yuan	647d3844dd	Make test for categorical data deterministic. (#8080 )	2022-07-15 14:48:39 +08:00
Jiaming Yuan	b90c6d25e8	Implement `max_cat_threshold` for CPU. (#7957 )	2022-06-04 11:02:46 +08:00
Jiaming Yuan	bde4f25794	Handle missing categorical value in CPU evaluator. (#7948 )	2022-05-27 14:15:47 +08:00
Jiaming Yuan	474366c020	Add convergence test for sparse datasets. (#7922 )	2022-05-23 18:07:26 +08:00
Jiaming Yuan	46e0bce212	Use maximum category in sketch. (#7853 )	2022-05-05 19:56:49 +08:00
Rory Mitchell	90cce38236	Remove single_precision_histogram for gpu_hist (#7828 )	2022-05-03 14:53:19 +02:00
Jiaming Yuan	fdf533f2b9	[POC] Experimental support for l1 error. (#7812 ) Support adaptive tree, a feature supported by both sklearn and lightgbm. The tree leaf is recomputed based on residue of labels and predictions after construction. For l1 error, the optimal value is the median (50 percentile). This is marked as experimental support for the following reasons: - The value is not well defined for distributed training, where we might have empty leaves for local workers. Right now I just use the original leaf value for computing the average with other workers, which might cause significant errors. - Some follow-ups are required, for exact, pruner, and optimization for quantile function. Also, we need to calculate the initial estimation.	2022-04-26 21:41:55 +08:00
Jiaming Yuan	8b3ecfca25	Mitigate flaky tests. (#7749 ) * Skip non-increasing test with external memory when subsample is used. * Increase bin numbers for boost from prediction test. This mitigates the effect of non-deterministic partitioning.	2022-03-28 21:20:50 +08:00
Jiaming Yuan	7366d3b20c	Ensure models with categorical splits don't use old binary format. (#7666 )	2022-02-19 08:05:28 +08:00
Jiaming Yuan	2369d55e9a	Add tests for prediction cache. (#7650 ) * Extract the test from approx for other tree methods. * Add note on how it works.	2022-02-15 00:28:00 +08:00
Jiaming Yuan	deab0e32ba	Validate out of range categorical value. (#7576 ) * Use float in CPU categorical set to preserve the input value. * Check out of range values.	2022-01-18 20:16:19 +08:00
Jiaming Yuan	001503186c	Rewrite approx (#7214 ) This PR rewrites the approx tree method to use codebase from hist for better performance and code sharing. The rewrite has many benefits: - Support for both `max_leaves` and `max_depth`. - Support for `grow_policy`. - Support for mono constraint. - Support for feature weights. - Support for easier bin configuration (`max_bin`). - Support for categorical data. - Faster performance for most of the datasets. (many times faster) - Support for prediction cache. - Significantly better performance for external memory. - Unites the code base between approx and hist.	2022-01-10 21:15:05 +08:00
Jiaming Yuan	a55d43ccfd	Add test for invalid categorical data values. (#7380 ) * Add test for invalid categorical data values. * Add check during sketching.	2021-11-02 18:00:52 +08:00
Jiaming Yuan	e6088366df	Export Python Interface for external memory. (#7070 ) * Add Python iterator interface. * Add tests. * Add demo. * Add documents. * Handle empty dataset.	2021-07-22 15:15:53 +08:00
Jiaming Yuan	a5d222fcdb	Handle categorical split in model histogram and dataframe. (#7065 ) * Error on get_split_value_histogram when feature is categorical * Add a category column to output dataframe	2021-07-02 13:10:36 +08:00
Jiaming Yuan	29f8fd6fee	Support categorical split in tree model dump. (#7036 )	2021-06-18 16:46:20 +08:00
Jiaming Yuan	d9799b09d0	Categorical data support for cuDF. (#7042 ) * Add support in DMatrix. * Add support in DQM, except for iterator.	2021-06-17 13:54:33 +08:00
Jiaming Yuan	bc08e0c9d1	Remove `experimental_json_serialization` from tests. (#6640 )	2021-01-27 17:44:49 +08:00
Jiaming Yuan	c1a62b5fa2	Expect gpu external memory to fail. (#6381 )	2020-11-12 19:24:48 +08:00
Jiaming Yuan	43efadea2e	Deterministic data partitioning for external memory (#6317 ) * Make external memory data partitioning deterministic. * Change the meaning of `page_size` from bytes to number of rows. * Design a data pool. * Note for external memory. * Enable unity build on Windows CI. * Force garbage collect on test.	2020-11-11 06:11:06 +08:00
Philip Hyunsu Cho	143b278267	Mark flaky tests as XFAIL (#6299 ) * Temporarily skip TestGPUUpdaters::test_categorical * Temporarily skip test_boost_from_prediction[approx]	2020-10-28 11:50:57 -07:00
Jiaming Yuan	b5b24354b8	More categorical tests and disable shap sparse test. (#6219 ) * Fix tree load with 32 category.	2020-10-10 16:12:37 +08:00
Jiaming Yuan	70ce5216b5	Add high level tests for categorical data. (#6179 ) * Fix unique.	2020-10-09 09:27:23 +08:00
Rory Mitchell	b47b5ac771	Use hypothesis (#5759 ) * Use hypothesis * Allow int64 array interface for groups * Add packages to Windows CI * Add to travis * Make sure device index is set correctly * Fix dask-cudf test * appveyor	2020-06-16 12:45:59 +12:00
Rory Mitchell	359023c0fa	Speed up python test (#5752 ) * Speed up tests * Prevent DeviceQuantileDMatrix initialisation with numpy * Use joblib.memory * Use RandomState	2020-06-05 11:39:24 +12:00
Rory Mitchell	13b10a6370	Device dmatrix (#5420 )	2020-03-28 14:42:21 +13:00
Rory Mitchell	24ad9dec0b	Testing hist_util (#5251 ) * Rank tests * Remove categorical split specialisation * Extend tests to multiple features, switch to WQSketch * Add tests for SparseCuts * Add external memory quantile tests, fix some existing tests	2020-02-14 14:36:43 +13:00
Rong Ou	0afcc55d98	Support multiple batches in gpu_hist (#5014 ) * Initial external memory training support for GPU Hist tree method.	2019-11-16 14:50:20 +08:00
Jiaming Yuan	7663de956c	Run training with empty DMatrix. (#4990 ) This makes GPU Hist robust in distributed environment as some workers might not be associated with any data in either training or evaluation. * Disable rabit mock test for now: See #5012 . * Disable dask-cudf test at prediction for now: See #5003 * Launch dask job for all workers despite they might not have any data. * Check 0 rows in elementwise evaluation metrics. Using AUC and AUC-PR still throws an error. See #4663 for a robust fix. * Add tests for edge cases. * Add `LaunchKernel` wrapper handling zero sized grid. * Move some parts of allreducer into a cu file. * Don't validate feature names when the booster is empty. * Sync number of columns in DMatrix. As num_feature is required to be the same across all workers in data split mode. * Filtering in dask interface now by default syncs all booster that's not empty, instead of using rank 0. * Fix Jenkins' GPU tests. * Install dask-cuda from source in Jenkins' test. Now all tests are actually running. * Restore GPU Hist tree synchronization test. * Check UUID of running devices. The check is only performed on CUDA version >= 10.x, as 9.x doesn't have UUID field. * Fix CMake policy and project variables. Use xgboost_SOURCE_DIR uniformly, add policy for CMake >= 3.13. * Fix copying data to CPU * Fix race condition in cpu predictor. * Fix duplicated DMatrix construction. * Don't download extra nccl in CI script.	2019-11-06 16:13:13 +08:00
Rong Ou	38ab79f889	Make HostDeviceVector single gpu only (#4773 ) * Make HostDeviceVector single gpu only	2019-08-26 09:51:13 +12:00
Rong Ou	c5b229632d	[BREAKING] prevent multi-gpu usage (#4749 ) * prevent multi-gpu usage * fix distributed test * combine gpu predictor tests * set upper bound on n_gpus	2019-08-13 09:11:35 +12:00

1 2

67 Commits