xgboost

Author	SHA1	Message	Date
Jiaming Yuan	c03a4d5088	Check support status for categorical features. (#9946 )	2024-01-04 16:51:33 +08:00
Jiaming Yuan	06bdc15e9b	[coll] Pass context to various functions. (#9772 ) * [coll] Pass context to various functions. In the future, the `Context` object would be required for collective operations, this PR passes the context object to some required functions to prepare for swapping out the implementation.	2023-11-08 09:54:05 +08:00
Rong Ou	0ecb4de963	[breaking] Change DMatrix construction to be distributed (#9623 ) * Change column-split DMatrix construction to be distributed * remove splitting code for row split	2023-10-10 23:35:57 +08:00
Jiaming Yuan	8c676c889d	Remove internal use of gpu_id. (#9568 )	2023-09-20 23:29:51 +08:00
Jiaming Yuan	a196443a07	Implement sketching with Hessian on GPU. (#9399 ) - Prepare for implementing approx on GPU. - Unify the code path between weighted and uniform sketching on DMatrix.	2023-07-24 15:43:03 +08:00
Jiaming Yuan	20c52f07d2	Support exporting cut values (#9356 )	2023-07-08 15:32:41 +08:00
Jiaming Yuan	08ce495b5d	Use Booster context in DMatrix. (#8896 ) - Pass context from booster to DMatrix. - Use context instead of integer for `n_threads`. - Check the consistency configuration for `max_bin`. - Test for all combinations of initialization options.	2023-04-28 21:47:14 +08:00
Jiaming Yuan	1f9a57d17b	[Breaking] Require format to be specified in input URI. (#9077 ) Previously, we use `libsvm` as default when format is not specified. However, the dmlc data parser is not particularly robust against errors, and the most common type of error is undefined format. Along with which, we will recommend users to use other data loader instead. We will continue the maintenance of the parsers as it's currently used for many internal tests including federated learning.	2023-04-28 19:45:15 +08:00
Rong Ou	a320b402a5	More refactoring to take advantage of collective aggregators (#9081 )	2023-04-26 03:36:09 +08:00
Rong Ou	ff26cd3212	More tests for column split and vertical federated learning (#8985 ) Added some more tests for the learner and fit_stump, for both column-wise distributed learning and vertical federated learning. Also moved the `IsRowSplit` and `IsColumnSplit` methods from the `DMatrix` to the `MetaInfo` since in some places we only have access to the `MetaInfo`. Added a new convenience method `IsVerticalFederatedLearning`. Some refactoring of the testing fixtures.	2023-03-28 16:40:26 +08:00
Rong Ou	b240f055d3	Support vertical federated learning (#8932 )	2023-03-22 14:25:26 +08:00
Rong Ou	7cbaee9916	Support column split in `approx` tree method (#8847 )	2023-03-02 03:59:07 +08:00
Rong Ou	a65ad0bd9c	Support column split in histogram builder (#8811 )	2023-02-17 22:37:01 +08:00
Jiaming Yuan	282b1729da	Specify the number of threads for parallel sort. (#8735 ) * Specify the number of threads for parallel sort. - Pass context object into argsort. - Replace macros with inline functions.	2023-02-16 00:20:19 +08:00
Rong Ou	66191e9926	Support cpu quantile sketch with column-wise data split (#8742 )	2023-02-05 14:26:24 +08:00
Rong Ou	3ceeb8c61c	Add data split mode to DMatrix MetaInfo (#8568 )	2022-12-25 20:37:37 +08:00
Jiaming Yuan	3e26107a9c	Rename and extract `Context`. (#8528 ) * Rename `GenericParameter` to `Context`. * Rename header file to reflect the change. * Rename all references.	2022-12-07 04:58:54 +08:00
Rong Ou	78d65a1928	Initial support for column-wise data split (#8468 )	2022-12-04 01:37:51 +08:00
Jiaming Yuan	3fc1046fd3	Reduce compiler warnings on CPU-only build. (#8483 )	2022-11-29 00:04:16 +08:00
Rong Ou	30b1a26fc0	Remove unused page size constant (#8457 )	2022-11-17 11:41:39 +08:00
Rong Ou	8e76f5f595	Use `DataSplitMode` to configure data loading (#8434 ) * Use `DataSplitMode` to configure data loading	2022-11-08 16:21:50 +08:00
Jiaming Yuan	55cf24cc32	Obtain CSR matrix from DMatrix. (#8269 )	2022-09-29 20:41:43 +08:00
Jiaming Yuan	2c70751d1e	Implement iterative DMatrix for CPU. (#8116 )	2022-07-26 22:34:21 +08:00
Jiaming Yuan	4083440690	Small cleanups to various data types. (#8086 ) - Use `bst_bin_t` in batch param constructor. - Use `StringView` to avoid `std::string` when appropriate. - Avoid using `MetaInfo` in quantile constructor to limit the scope of parameter.	2022-07-18 22:39:36 +08:00
Haoming Chen	b37ff3d492	Fix cox objective test by using XGBOOST_PARALLEL_STABLE_SORT (#7756 )	2022-03-26 17:58:30 +08:00
Jiaming Yuan	64575591d8	Use context in `SetInfo`. (#7687 ) * Use the name `Context`. * Pass a context object into `SetInfo`. * Add context to proxy matrix. * Add context to iterative DMatrix. This is to remove the use of the default number of threads during `SetInfo` as a follow-up on removing the global omp variable while preparing for CUDA stream semantic. Currently, XGBoost uses the legacy CUDA stream, we will gradually remove them in the future in favor of non-blocking streams.	2022-03-24 22:16:26 +08:00
Jiaming Yuan	e78a38b837	Sort sparse page index when constructing DMatrix. (#7731 )	2022-03-16 18:01:05 +08:00
Jiaming Yuan	2775c2a1ab	Prepare external memory support for hist. (#7638 ) This PR prepares the GHistIndexMatrix to host the column matrix which is used by the hist tree method by accepting sparse_threshold parameter. Some cleanups are made to ensure the correct batch param is being passed into DMatrix along with some additional tests for correctness of SimpleDMatrix.	2022-02-10 16:58:02 +08:00
Jiaming Yuan	81210420c6	Remove `omp_get_max_threads` (#7608 ) This is the one last PR for removing omp global variable. * Add context object to the `DMatrix`. This bridges `DMatrix` with https://github.com/dmlc/xgboost/issues/7308 . * Require context to be available at the construction time of booster. * Add `n_threads` support for R csc DMatrix constructor. * Remove `omp_get_max_threads` in R glue code. * Remove threading utilities that rely on omp global variable.	2022-01-28 16:09:22 +08:00
Jiaming Yuan	5d7818e75d	Remove `omp_get_max_threads` in tree updaters. (#7590 )	2022-01-26 19:55:47 +08:00
Jiaming Yuan	5817840858	Remove `omp_get_max_threads` in data. (#7588 )	2022-01-24 02:44:07 +08:00
Jiaming Yuan	a1bcd33a3b	[breaking] Change internal model serialization to UBJSON. (#7556 ) * Use typed array for models. * Change the memory snapshot format. * Add new C API for saving to raw format.	2022-01-16 02:11:53 +08:00
Jiaming Yuan	5b1161bb64	Convert labels into tensor. (#7456 ) * Add a new ctor to tensor for `initilizer_list`. * Change labels from host device vector to tensor. * Rename the field from `labels_` to `labels` since it's a public member.	2021-12-17 00:58:35 +08:00
Jiaming Yuan	557ffc4bf5	Reduce base margin to 2 dim for now. (#7455 )	2021-11-27 00:46:13 +08:00
Jiaming Yuan	d33854af1b	[Breaking] Accept multi-dim meta info. (#7405 ) This PR changes base_margin into a 3-dim array, with one of them being reserved for multi-target classification. Also, a breaking change is made for binary serialization due to extra dimension along with a fix for saving the feature weights. Lastly, it unifies the prediction initialization between CPU and GPU. After this PR, the meta info setter in Python will be based on array interface.	2021-11-18 23:02:54 +08:00
Jiaming Yuan	b0015fda96	Fix R CRAN failures. (#7404 ) * Remove hist builder dtor. * Initialize values. * Tolerance. * Remove the use of nthread in col maker.	2021-11-16 10:51:12 +08:00
Jiaming Yuan	55ee272ea8	Extend array interface to handle ndarray. (#7434 ) * Extend array interface to handle ndarray. The `ArrayInterface` class is extended to support multi-dim array inputs. Previously this class handles only 2-dim (vector is also matrix). This PR specifies the expected dimension at compile-time and the array interface can perform various checks automatically for input data. Also, adapters like CSR are more rigorous about their input. Lastly, row vector and column vector are handled without intervention from the caller.	2021-11-16 09:52:15 +08:00
Jiaming Yuan	d4274bc556	Fix typo. (#7433 )	2021-11-15 01:28:11 +08:00
Jiaming Yuan	3515931305	Initial support for external memory in gradient index. (#7183 ) * Add hessian to batch param in preparation of new approx impl. * Extract a push method for gradient index matrix. * Use span instead of vector ref for hessian in sketching. * Create a binary format for gradient index.	2021-09-13 12:40:56 +08:00
Jiaming Yuan	bd1f3a38f0	Rewrite sparse dmatrix using callbacks. (#7092 ) - Reduce dependency on dmlc parsers and provide an interface for users to load data by themselves. - Remove use of threaded iterator and IO queue. - Remove `page_size`. - Make sure the number of pages in memory is bounded. - Make sure the cache can not be violated. - Provide an interface for internal algorithms to process data asynchronously.	2021-07-16 12:33:31 +08:00
Jiaming Yuan	1cd20efe68	Move `GHistIndex` into `DMatrix`. (#7064 )	2021-07-01 00:44:49 +08:00
Louis Desreumaux	9b530e5697	Improve OpenMP exception handling (#6680 )	2021-02-25 13:56:16 +08:00
Jiaming Yuan	f2f7dd87b8	Use view for `SparsePage` exclusively. (#6590 )	2021-01-11 18:04:55 +08:00
Jiaming Yuan	43efadea2e	Deterministic data partitioning for external memory (#6317 ) * Make external memory data partitioning deterministic. * Change the meaning of `page_size` from bytes to number of rows. * Design a data pool. * Note for external memory. * Enable unity build on Windows CI. * Force garbage collect on test.	2020-11-11 06:11:06 +08:00
Igor Moura	5e1e972aea	Clean up warnings (#6325 )	2020-10-30 23:50:29 +08:00
Jiaming Yuan	2fcc4f2886	Unify evaluation functions. (#6037 )	2020-08-26 14:23:27 +08:00
Jiaming Yuan	20c95be625	Expand categorical node. (#6028 ) Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2020-08-25 18:53:57 +08:00
Jiaming Yuan	4d99c58a5f	Feature weights (#5962 )	2020-08-18 19:55:41 +08:00
Jiaming Yuan	674c409e9d	Remove rabit dependency on public headers. (#6005 )	2020-08-13 08:26:20 +08:00
Jiaming Yuan	ee70a2380b	Unify CPU hist sketching (#5880 )	2020-08-12 01:33:06 +08:00

1 2 3

119 Commits