xgboost

Author	SHA1	Message	Date
Rong Ou	a2686543a9	Common interface for collective communication (#8057 ) * implement broadcast for federated communicator * implement allreduce * add communicator factory * add device adapter * add device communicator to factory * add rabit communicator * add rabit communicator to the factory * add nccl device communicator * add synchronize to device communicator * add back print and getprocessorname * add python wrapper and c api * clean up types * fix non-gpu build * try to fix ci * fix std::size_t * portable string compare ignore case * c style size_t * fix lint errors * cross platform setenv * fix memory leak * fix lint errors * address review feedback * add python test for rabit communicator * fix failing gtest * use json to configure communicators * fix lint error * get rid of factories * fix cpu build * fix include * fix python import * don't export collective.py yet * skip collective communicator pytest on windows * add review feedback * update documentation * remove mpi communicator type * fix tests * shutdown the communicator separately Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2022-09-12 15:21:12 -07:00
Jiaming Yuan	b5eb36f1af	Add `max_cat_threshold` to GPU and handle missing cat values. (#8212 )	2022-09-07 00:57:51 +08:00
WeichenXu	d03794ce7a	[pyspark] Add param validation for "objective" and "eval_metric" param, and remove invalid booster params (#8173 ) Signed-off-by: Weichen Xu <weichen.xu@databricks.com>	2022-08-24 15:29:43 +08:00
WeichenXu	f4628c22a4	[pyspark] Implement SparkXGBRanker estimator (#8172 ) Signed-off-by: Weichen Xu <weichen.xu@databricks.com>	2022-08-23 02:35:19 +08:00
WeichenXu	53d2a733b0	[pyspark] Make Xgboost estimator support using sparse matrix as optimization (#8145 ) Signed-off-by: Weichen Xu <weichen.xu@databricks.com>	2022-08-19 01:57:28 +08:00
Jiaming Yuan	36e7c5364d	[dask] Deterministic rank assignment. (#8018 )	2022-08-11 19:17:58 +08:00
Jiaming Yuan	570f8ae4ba	Use black on more Python files. (#8137 )	2022-08-11 01:38:11 +08:00
Jiaming Yuan	446d536c23	Fix loading DMatrix binary in distributed env. (#8149 ) - Try to load DMatrix binary before trying to parse text input. - Remove some unmaintained code.	2022-08-10 22:53:16 +08:00
Jiaming Yuan	9ae547f994	Use config_context in sklearn interface. (#8141 )	2022-08-09 14:48:54 +08:00
Bobby Wang	03cc3b359c	[pyspark] support a list of feature column names (#8117 )	2022-08-08 17:05:27 +08:00
Jiaming Yuan	d87f69215e	Quantile DMatrix for CPU. (#8130 ) - Add a new `QuantileDMatrix` that works for both CPU and GPU. - Deprecate `DeviceQuantileDMatrix`.	2022-08-02 15:51:23 +08:00
Jiaming Yuan	546de5efd2	[pyspark] Cleanup data processing. (#8088 ) - Use numpy stack for handling list of arrays. - Reuse concat function from dask. - Prepare for `QuantileDMatrix`. - Remove unused code. - Use iterator for prediction to avoid initializing xgboost model	2022-07-26 15:00:52 +08:00
Bobby Wang	f801d3cf15	[PySpark] change the returning model type to string from binary (#8085 ) * [PySpark] change the returning model type to string from binary XGBoost pyspark can be can be accelerated by RAPIDS Accelerator seamlessly by changing the returning model type from binary to string.	2022-07-19 18:39:20 +08:00
Jiaming Yuan	2365f82750	[dask] Mitigate non-deterministic test. (#8077 )	2022-07-19 16:55:59 +08:00
Jiaming Yuan	647d3844dd	Make test for categorical data deterministic. (#8080 )	2022-07-15 14:48:39 +08:00
Jiaming Yuan	dae7a41baa	Update Python requirement to >=3.8. (#8071 ) Additional changes: - Use mamba for CPU test on Jenkins. - Cleanup CPU test dependencies. - Restore some of the modin tests	2022-07-14 18:01:47 +08:00
WeichenXu	176fec8789	PySpark XGBoost integration (#8020 ) Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu> Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>	2022-07-13 13:11:18 +08:00
Jiaming Yuan	8959622836	[dask] Use an invalid port for test. (#8064 )	2022-07-13 11:59:02 +08:00
Jiaming Yuan	4a87ea49b8	Reduce regularization for CPU gblinear. (#8013 )	2022-06-21 01:05:27 +08:00
Jiaming Yuan	b90c6d25e8	Implement `max_cat_threshold` for CPU. (#7957 )	2022-06-04 11:02:46 +08:00
Jiaming Yuan	13b15e07e8	Handle formatted JSON input. (#7953 )	2022-06-01 16:20:58 +08:00
Jiaming Yuan	bde4f25794	Handle missing categorical value in CPU evaluator. (#7948 )	2022-05-27 14:15:47 +08:00
Jiaming Yuan	18cbebaeb9	Unify the cat split storage for CPU. (#7937 ) * Unify the cat split storage for CPU. * Cleanup. * Workaround.	2022-05-26 04:14:40 -07:00
Jiaming Yuan	606be9e663	Handle missing values in one hot splits. (#7934 )	2022-05-24 20:48:41 +08:00
Jiaming Yuan	474366c020	Add convergence test for sparse datasets. (#7922 )	2022-05-23 18:07:26 +08:00
Jiaming Yuan	f93a727869	Address remaining mypy errors in python package. (#7914 )	2022-05-18 22:46:15 +08:00
Rong Ou	77d4a53c32	use RabitContext intead of init/finalize (#7911 )	2022-05-17 12:15:41 +08:00
Jiaming Yuan	db80671d6b	Fix monotone constraint with tuple input. (#7891 )	2022-05-13 04:00:03 +08:00
Jiaming Yuan	46e0bce212	Use maximum category in sketch. (#7853 )	2022-05-05 19:56:49 +08:00
Jiaming Yuan	fdf533f2b9	[POC] Experimental support for l1 error. (#7812 ) Support adaptive tree, a feature supported by both sklearn and lightgbm. The tree leaf is recomputed based on residue of labels and predictions after construction. For l1 error, the optimal value is the median (50 percentile). This is marked as experimental support for the following reasons: - The value is not well defined for distributed training, where we might have empty leaves for local workers. Right now I just use the original leaf value for computing the average with other workers, which might cause significant errors. - Some follow-ups are required, for exact, pruner, and optimization for quantile function. Also, we need to calculate the initial estimation.	2022-04-26 21:41:55 +08:00
Jiaming Yuan	332380479b	Avoid warning in np primitive type tests. (#7833 )	2022-04-23 02:07:01 +08:00
Jiaming Yuan	c70fa502a5	Expose `feature_types` to sklearn interface. (#7821 )	2022-04-21 20:23:35 +08:00
Jiaming Yuan	52d4eda786	Deprecate `use_label_encoder` in XGBClassifier. (#7822 ) * Deprecate `use_label_encoder` in XGBClassifier. * We have removed the encoder, now prepare to remove the indicator.	2022-04-21 13:14:02 +08:00
Jiaming Yuan	9150fdbd4d	Support pandas nullable types. (#7760 )	2022-03-30 08:51:52 +08:00
Jiaming Yuan	a50b84244e	Cleanup configuration for constraints. (#7758 )	2022-03-29 04:22:46 +08:00
Jiaming Yuan	3c9b04460a	Move `num_parallel_tree` to model parameter. (#7751 ) The size of forest should be a property of model itself instead of a training hyper-parameter.	2022-03-29 02:32:42 +08:00
Jiaming Yuan	8b3ecfca25	Mitigate flaky tests. (#7749 ) * Skip non-increasing test with external memory when subsample is used. * Increase bin numbers for boost from prediction test. This mitigates the effect of non-deterministic partitioning.	2022-03-28 21:20:50 +08:00
Jiaming Yuan	4d81c741e9	External memory support for hist (#7531 ) * Generate column matrix from gHistIndex. * Avoid synchronization with the sparse page once the cache is written. * Cleanups: Remove member variables/functions, change the update routine to look like approx and gpu_hist. * Remove pruner.	2022-03-22 00:13:20 +08:00
Xiaochang Wu	613ec36c5a	Support building SimpleDMatrix from Arrow data format (#7512 ) * Integrate with Arrow C data API. * Support Arrow dataset. * Support Arrow table. Co-authored-by: Xiaochang Wu <xiaochang.wu@intel.com> Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com> Co-authored-by: Zhang Zhang <zhang.zhang@intel.com>	2022-03-15 13:25:19 +08:00
Jiaming Yuan	a62a3d991d	[dask] prediction with categorical data. (#7708 )	2022-03-10 00:21:48 +08:00
Cheng Li	a92e0f6240	multi groups in the constraints (#7711 )	2022-03-01 18:10:15 +08:00
Jiaming Yuan	83a66b4994	Support categorical data for hist. (#7695 ) * Extract partitioner from hist. * Implement categorical data support by passing the gradient index directly into the partitioner. * Organize/update document. * Remove code for negative hessian.	2022-02-25 03:47:14 +08:00
Jiaming Yuan	f08c5dcb06	Cleanup some pylint errors. (#7667 ) * Cleanup some pylint errors. * Cleanup pylint errors in rabit modules. * Make data iter an abstract class and cleanup private access. * Cleanup no-self-use for booster.	2022-02-19 18:53:12 +08:00
Jiaming Yuan	7366d3b20c	Ensure models with categorical splits don't use old binary format. (#7666 )	2022-02-19 08:05:28 +08:00
Jiaming Yuan	0d0abe1845	Support optimal partitioning for GPU hist. (#7652 ) * Implement `MaxCategory` in quantile. * Implement partition-based split for GPU evaluation. Currently, it's based on the existing evaluation function. * Extract an evaluator from GPU Hist to store the needed states. * Added some CUDA stream/event utilities. * Update document with references. * Fixed a bug in approx evaluator where the number of data points is less than the number of categories.	2022-02-15 03:03:12 +08:00
Jiaming Yuan	5cd1f71b51	[dask] Improve configuration for port. (#7645 ) - Try port 0 to let the OS return the available port. - Add port configuration.	2022-02-14 21:34:34 +08:00
Jiaming Yuan	3e693e4f97	[dask] Fix nthread config with dask sklearn wrapper. (#7633 )	2022-02-08 06:38:32 +08:00
Philip Hyunsu Cho	c621775f34	Replace all uses of deprecated function sklearn.datasets.load_boston (#7373 ) * Replace all uses of deprecated function sklearn.datasets.load_boston * More renaming * Fix bad name * Update assertion * Fix n boosted rounds. * Avoid over regularization. * Rebase. * Avoid over regularization. * Whac-a-mole Co-authored-by: fis <jm.yuan@outlook.com>	2022-01-30 04:27:57 -08:00
Philip Hyunsu Cho	b4340abf56	Add special handling for multi:softmax in sklearn predict (#7607 ) * Add special handling for multi:softmax in sklearn predict * Add test coverage	2022-01-29 15:54:49 -08:00
Jiaming Yuan	24789429fd	Support latest pandas Index type. (#7595 )	2022-01-26 18:20:10 +08:00

1 2 3 4 5 ...

459 Commits