xgboost

Author	SHA1	Message	Date
Rong Ou	3632242e0b	Support column split with GPU quantile (#9370 )	2023-07-11 12:15:56 +08:00
Jiaming Yuan	20c52f07d2	Support exporting cut values (#9356 )	2023-07-08 15:32:41 +08:00
Jiaming Yuan	59787b23af	Allow empty page in external memory. (#9361 )	2023-07-08 09:24:35 +08:00
Jiaming Yuan	41c6813496	Preserve order of saved updaters config. (#9355 ) - Save the updater sequence as an array instead of object. - Warn only once. The compatibility is kept, but we should be able to break it as the config is not loaded in pickle model and it's declared to be not stable.	2023-07-05 20:20:07 +08:00
Jiaming Yuan	645037e376	Improve test coverage with predictor configuration. (#9354 ) * Improve test coverage with predictor configuration. - Test with ext memory. - Test with QDM. - Test with dart.	2023-07-05 15:17:22 +08:00
Jiaming Yuan	39390cc2ee	[breaking] Remove the `predictor` param, allow fallback to prediction using `DMatrix`. (#9129 ) - A `DeviceOrd` struct is implemented to indicate the device. It will eventually replace the `gpu_id` parameter. - The `predictor` parameter is removed. - Fallback to `DMatrix` when `inplace_predict` is not available. - The heuristic for choosing a predictor is only used during training.	2023-07-03 19:23:54 +08:00
Jiaming Yuan	bc267dd729	Use ptr from `mmap` for `GHistIndexMatrix` and `ColumnMatrix`. (#9315 ) * Use ptr from mmap for `GHistIndexMatrix` and `ColumnMatrix`. - Define a resource for holding various types of memory pointers. - Define ref vector for holding resources. - Swap the underlying resources for GHist and ColumnM. - Add documentation for current status. - s390x support is removed. It should work if you can compile XGBoost, all the old workaround code does is to get GCC to compile.	2023-06-27 19:05:46 +08:00
Jiaming Yuan	54da4b3185	Cleanup to prepare for using mmap pointer in external memory. (#9317 ) - Update SparseDMatrix comment. - Use a pointer in the bitfield. We will replace the `std::vector<bool>` in `ColumnMatrix` with bitfield. - Clean up the page source. The timer is removed as it's inaccurate once we swap the mmap pointer into the page.	2023-06-22 06:43:11 +08:00
Jiaming Yuan	ee6809e642	Use mmap for external memory. (#9282 ) - Have basic infrastructure for mmap. - Release file write handle.	2023-06-19 18:52:55 +08:00
Jiaming Yuan	0cba2cdbb0	Support linalg data structures in check device. (#9243 )	2023-06-06 09:47:24 +08:00
Jiaming Yuan	9fbde21e9d	Rework the precision metric. (#9222 ) - Rework the precision metric for both CPU and GPU. - Mention it in the document. - Cleanup old support code for GPU ranking metric. - Deterministic GPU implementation. * Drop support for classification. * type. * use batch shape. * lint. * cpu build. * cpu build. * lint. * Tests. * Fix. * Cleanup error message.	2023-06-02 20:49:43 +08:00
Jiaming Yuan	17fd3f55e9	Optimize adapter element counting on GPU. (#9209 ) - Implement a simple `IterSpan` for passing iterators with size. - Use shared memory for column size counts. - Use one thread for each sample in row count to reduce atomic operations.	2023-05-30 23:28:43 +08:00
Jiaming Yuan	097f11b6e0	Support CUDA f16 without transformation. (#9207 ) - Support f16 from cupy. - Include CUDA header explicitly. - Cleanup cmake nvtx support.	2023-05-30 20:54:31 +08:00
Jiaming Yuan	053aababd4	Avoid thrust logical operation. (#9199 ) Thrust implementation of `thrust::all_of/any_of/none_of` adopts an early stopping strategy to bailout early by dividing the input into small batches. This is not ideal for data validation as we expect all data to be valid. The strategy leads to excessive kernel launches and stream synchronization. * Use reduce from dh instead.	2023-05-27 01:36:58 +08:00
Rong Ou	5b69534b43	Support column split in multi-target `hist` (#9171 )	2023-05-26 16:56:05 +08:00
Rong Ou	52311dcec9	Fix multi-threaded gtests (#9148 )	2023-05-10 19:15:32 +08:00
Jiaming Yuan	85988a3178	Wait for data CUDA stream instead of sync. (#9144 ) --------- Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2023-05-09 09:52:21 +08:00
Jiaming Yuan	08ce495b5d	Use Booster context in DMatrix. (#8896 ) - Pass context from booster to DMatrix. - Use context instead of integer for `n_threads`. - Check the consistency configuration for `max_bin`. - Test for all combinations of initialization options.	2023-04-28 21:47:14 +08:00
Jiaming Yuan	1f9a57d17b	[Breaking] Require format to be specified in input URI. (#9077 ) Previously, we use `libsvm` as default when format is not specified. However, the dmlc data parser is not particularly robust against errors, and the most common type of error is undefined format. Along with which, we will recommend users to use other data loader instead. We will continue the maintenance of the parsers as it's currently used for many internal tests including federated learning.	2023-04-28 19:45:15 +08:00
Jiaming Yuan	17ff471616	Optimize array interface input. (#9090 )	2023-04-28 18:01:58 +08:00
Jiaming Yuan	0e470ef606	Optimize prediction with QuantileDMatrix. (#9096 ) - Reduce overhead in `FVecDrop`. - Reduce overhead caused by `HostVector()` calls.	2023-04-28 00:51:41 +08:00
Rong Ou	a320b402a5	More refactoring to take advantage of collective aggregators (#9081 )	2023-04-26 03:36:09 +08:00
Rong Ou	ff26cd3212	More tests for column split and vertical federated learning (#8985 ) Added some more tests for the learner and fit_stump, for both column-wise distributed learning and vertical federated learning. Also moved the `IsRowSplit` and `IsColumnSplit` methods from the `DMatrix` to the `MetaInfo` since in some places we only have access to the `MetaInfo`. Added a new convenience method `IsVerticalFederatedLearning`. Some refactoring of the testing fixtures.	2023-03-28 16:40:26 +08:00
Jiaming Yuan	151882dd26	Initial support for multi-target tree. (#8616 ) * Implement multi-target for hist. - Add new hist tree builder. - Move data fetchers for tests. - Dispatch function calls in gbm base on the tree type.	2023-03-22 23:49:56 +08:00
Rong Ou	b240f055d3	Support vertical federated learning (#8932 )	2023-03-22 14:25:26 +08:00
Jiaming Yuan	f186c87cf9	Check inf in data for all types of DMatrix. (#8911 )	2023-03-15 11:24:35 +08:00
Jiaming Yuan	36a7396658	Replace dmlc any with std any. (#8892 )	2023-03-11 06:11:04 +08:00
Jiaming Yuan	4d665b3fb0	Restore clang tidy test. (#8861 )	2023-03-03 13:47:04 -08:00
Rong Ou	7cbaee9916	Support column split in `approx` tree method (#8847 )	2023-03-02 03:59:07 +08:00
Rong Ou	d9688f93c7	Support column-split in row partitioner (#8828 )	2023-02-26 04:43:35 +08:00
Rong Ou	a65ad0bd9c	Support column split in histogram builder (#8811 )	2023-02-17 22:37:01 +08:00
Jiaming Yuan	c0afdb6786	Fix CPU bin compression with categorical data. (#8809 ) * Fix CPU bin compression with categorical data. * The bug causes the maximum category to be lesser than 256 or the maximum number of bins when the input data is dense.	2023-02-16 04:20:34 +08:00
Jiaming Yuan	282b1729da	Specify the number of threads for parallel sort. (#8735 ) * Specify the number of threads for parallel sort. - Pass context object into argsort. - Replace macros with inline functions.	2023-02-16 00:20:19 +08:00
Jiaming Yuan	594371e35b	Fix CPP lint. (#8807 )	2023-02-15 20:16:35 +08:00
Jiaming Yuan	31d3ec07af	Extract device algorithms. (#8789 )	2023-02-13 20:53:53 +08:00
Jiaming Yuan	d11a0044cf	Generalize prediction cache. (#8783 ) * Extract most of the functionality into `DMatrixCache`. * Move API entry to independent file to reduce dependency on `predictor.h` file. * Add test. --------- Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2023-02-13 12:36:43 +08:00
Jiaming Yuan	8a16944664	Fix ranking with quantile dmatrix and group weight. (#8762 )	2023-02-10 20:32:35 +08:00
Jiaming Yuan	a2e433a089	Fix empty DMatrix with categorical features. (#8739 )	2023-02-07 00:40:11 +08:00
Rong Ou	66191e9926	Support cpu quantile sketch with column-wise data split (#8742 )	2023-02-05 14:26:24 +08:00
Jiaming Yuan	c1786849e3	Use array interface for CSC matrix. (#8672 ) * Use array interface for CSC matrix. Use array interface for CSC matrix and align the interface with CSR and dense. - Fix nthread issue in the R package DMatrix. - Unify the behavior of handling `missing` with other inputs. - Unify the behavior of handling `missing` around R, Python, Java, and Scala DMatrix. - Expose `num_non_missing` to the JVM interface. - Deprecate old CSR and CSC constructors.	2023-02-05 01:59:46 +08:00
Jiaming Yuan	3760cede0f	Consistent use of context to specify number of threads. (#8733 ) - Use context in all tests. - Use context in R. - Use context in C API DMatrix initialization. (0 threads is used as dft).	2023-01-30 15:25:31 +08:00
Jiaming Yuan	7a068af1a3	Workaround CUDA warning. (#8696 )	2023-01-19 09:16:08 +08:00
Jiaming Yuan	31b9cbab3d	Make sure input numpy array is aligned. (#8690 ) - use `np.require` to specify that the alignment is required. - scipy csr as well. - validate input pointer in `ArrayInterface`.	2023-01-18 08:12:13 +08:00
Jiaming Yuan	07cf3d3e53	Fix threads in DMatrix slice. (#8667 )	2023-01-14 07:16:57 +08:00
Jiaming Yuan	badeff1d74	Init estimation for regression. (#8272 )	2023-01-11 02:04:56 +08:00
Rong Ou	3ceeb8c61c	Add data split mode to DMatrix MetaInfo (#8568 )	2022-12-25 20:37:37 +08:00
Jiaming Yuan	c6a8754c62	Define CUDA Context. (#8604 ) We will transition to non-default and non-blocking CUDA stream.	2022-12-20 15:15:07 +08:00
Rong Ou	15a88ceef0	Fix deprecated CUB calls in CUDA 12.0 (#8578 )	2022-12-12 17:02:30 +08:00
Jiaming Yuan	3e26107a9c	Rename and extract `Context`. (#8528 ) * Rename `GenericParameter` to `Context`. * Rename header file to reflect the change. * Rename all references.	2022-12-07 04:58:54 +08:00
Jiaming Yuan	e3bf5565ab	Extract transform iterator. (#8498 )	2022-12-05 21:37:07 +08:00

1 2 3 4 5 ...

314 Commits