xgboost

Author	SHA1	Message	Date
Jiaming Yuan	6c403187ec	Fix column split race condition. (#10572 )	2024-07-12 01:07:12 +08:00
Jiaming Yuan	34b154c284	Avoid the use of size_t in the partitioner. (#10541 ) - Avoid the use of size_t in the partitioner. - Use `Span` instead of `Elem` where `node_id` is not needed. - Remove the `const_cast`. - Make sure the constness is not removed in the `Elem` by making it reference only. size_t is implementation-defined, which causes issue when we want to pass pointer or span.	2024-07-11 00:43:08 +08:00
Jiaming Yuan	a5a58102e5	Revamp the rabit implementation. (#10112 ) This PR replaces the original RABIT implementation with a new one, which has already been partially merged into XGBoost. The new one features: - Federated learning for both CPU and GPU. - NCCL. - More data types. - A unified interface for all the underlying implementations. - Improved timeout handling for both tracker and workers. - Exhausted tests with metrics (fixed a couple of bugs along the way). - A reusable tracker for Python and JVM packages.	2024-05-20 11:56:23 +08:00
Jiaming Yuan	53fc17578f	Use `std::uint64_t` for row index. (#10120 ) - Use std::uint64_t instead of size_t to avoid implementation-defined type. - Rename to bst_idx_t, to account for other types of indexing. - Small cleanup to the base header.	2024-03-15 18:43:49 +08:00
Jiaming Yuan	0ce4372bd4	Use UBJSON for serializing splits for vertical data split. (#10059 )	2024-02-25 00:18:23 +08:00
Jiaming Yuan	cacb4b1fdd	Fix gain calculation in multi-target tree. (#9978 )	2024-01-17 13:18:44 +08:00
Jiaming Yuan	972730cde0	Use matrix for gradient. (#9508 ) - Use the `linalg::Matrix` for storing gradients. - New API for the custom objective. - Custom objective for multi-class/multi-target is now required to return the correct shape. - Custom objective for Python can accept arrays with any strides. (row-major, column-major)	2023-08-24 05:29:52 +08:00
Jiaming Yuan	e93a274823	Small cleanup for histogram routines. (#9427 ) * Small cleanup for histogram routines. - Extract hist train param from GPU hist. - Make histogram const after construction. - Unify parameter names.	2023-08-02 18:28:26 +08:00
Jiaming Yuan	03bc6e6427	Remove unused variables. (#9210 ) - remove used variables. - Remove signed comparison warnings.	2023-05-28 05:24:15 +08:00
Rong Ou	5b69534b43	Support column split in multi-target `hist` (#9171 )	2023-05-26 16:56:05 +08:00
Rong Ou	603f8ce2fa	Support `hist` in the partition builder under column split (#9120 )	2023-05-11 05:24:29 +08:00
Jiaming Yuan	08ce495b5d	Use Booster context in DMatrix. (#8896 ) - Pass context from booster to DMatrix. - Use context instead of integer for `n_threads`. - Check the consistency configuration for `max_bin`. - Test for all combinations of initialization options.	2023-04-28 21:47:14 +08:00
Jiaming Yuan	a093770f36	Partitioner for multi-target tree. (#8922 )	2023-03-16 18:49:34 +08:00
Jiaming Yuan	5ba3509dd3	Define multi expand entry. (#8895 )	2023-03-13 19:31:05 +08:00
Rong Ou	d9688f93c7	Support column-split in row partitioner (#8828 )	2023-02-26 04:43:35 +08:00
Jiaming Yuan	3e26107a9c	Rename and extract `Context`. (#8528 ) * Rename `GenericParameter` to `Context`. * Rename header file to reflect the change. * Rename all references.	2022-12-07 04:58:54 +08:00
Dmitry Razdoburdin	5bd849f1b5	Unify the partitioner for hist and approx. Co-authored-by: dmitry.razdoburdin <drazdobu@jfldaal005.jf.intel.com> Co-authored-by: jiamingy <jm.yuan@outlook.com>	2022-10-20 02:49:20 +08:00
Jiaming Yuan	441ffc017a	Copy data from Ellpack to GHist. (#8215 )	2022-09-06 23:05:49 +08:00
Jiaming Yuan	4a4e5c7c18	Prepare gradient index for Quantile DMatrix. (#8103 ) * Prepare gradient index for Quantile DMatrix. - Implement push batch with adapter batch. - Implement `GetFvalue` for prediction.	2022-07-22 17:26:33 +08:00
Jiaming Yuan	4fcfd9c96e	Fix and cleanup for column matrix. (#7901 ) * Fix missed type dispatching for dense columns with missing values. * Code cleanup to reduce special cases. * Reduce memory usage.	2022-05-16 21:11:50 +08:00
Jiaming Yuan	4d81c741e9	External memory support for hist (#7531 ) * Generate column matrix from gHistIndex. * Avoid synchronization with the sparse page once the cache is written. * Cleanups: Remove member variables/functions, change the update routine to look like approx and gpu_hist. * Remove pruner.	2022-03-22 00:13:20 +08:00
Jiaming Yuan	996cc705af	Small cleanup to hist tree method. (#7735 ) * Remove special optimization using number of bins. * Remove 1-based index for column sampling. * Remove data layout. * Unify update prediction cache.	2022-03-20 03:44:55 +08:00
Jiaming Yuan	83a66b4994	Support categorical data for hist. (#7695 ) * Extract partitioner from hist. * Implement categorical data support by passing the gradient index directly into the partitioner. * Organize/update document. * Remove code for negative hessian.	2022-02-25 03:47:14 +08:00
Jiaming Yuan	6762c45494	Small cleanup to gradient index and hist. (#7668 ) * Code comments. * Const accessor to index. * Remove some weird variables in the `Index` class. * Simplify the `MemStackAllocator`.	2022-02-23 11:37:21 +08:00
Jiaming Yuan	2775c2a1ab	Prepare external memory support for hist. (#7638 ) This PR prepares the GHistIndexMatrix to host the column matrix which is used by the hist tree method by accepting sparse_threshold parameter. Some cleanups are made to ensure the correct batch param is being passed into DMatrix along with some additional tests for correctness of SimpleDMatrix.	2022-02-10 16:58:02 +08:00
Jiaming Yuan	5d7818e75d	Remove `omp_get_max_threads` in tree updaters. (#7590 )	2022-01-26 19:55:47 +08:00
Jiaming Yuan	5817840858	Remove `omp_get_max_threads` in data. (#7588 )	2022-01-24 02:44:07 +08:00
Jiaming Yuan	9ab73f737e	Extract Sketch Entry from hist maker. (#7503 ) * Extract Sketch Entry from hist maker. * Add a new sketch container for sorted inputs. * Optimize bin search.	2021-12-18 05:36:56 +08:00
Jiaming Yuan	4100827971	Pass infomation about objective to tree methods. (#7385 ) * Define the `ObjInfo` and pass it down to every tree updater.	2021-11-04 01:52:44 +08:00
Jiaming Yuan	149f209af6	Extract histogram builder from CPU Hist. (#7152 ) * Extract the CPU histogram builder. * Fix tests. * Reduce number of histograms being built.	2021-08-09 21:15:21 +08:00
Jiaming Yuan	615ab2b03e	Extract evaluate splits from CPU hist. (#7079 ) Other than modularizing the split evaluation function, this PR also removes some more functions including `InitNewNodes` and `BuildNodeStats` among some other unused variables. Also, scattered code like setting leaf weights is grouped into the split evaluator and `NodeEntry` is simplified and made private. Another subtle difference with the original implementation is that the modified code doesn't call `tree[nidx].Parent()` to traversal upward.	2021-07-07 15:16:25 +08:00
Jiaming Yuan	1cd20efe68	Move `GHistIndex` into `DMatrix`. (#7064 )	2021-07-01 00:44:49 +08:00
ShvetsKS	2567404ab6	Simplify sparse and dense CPU hist kernels (#7029 ) * Simplify sparse and dense kernels * Extract row partitioner. Co-authored-by: Kirill Shvets <kirill.shvets@intel.com>	2021-06-11 18:26:30 +08:00
ShvetsKS	5cdaac00c1	Remove feature grouping (#7018 ) Co-authored-by: Kirill Shvets <kirill.shvets@intel.com>	2021-06-03 04:35:26 +08:00
ShvetsKS	57c732655e	Merge lossgude and depthwise strategies for CPU hist (#7007 ) * fix java/scala test: max depth is also valid parameter for lossguide Co-authored-by: Kirill Shvets <kirill.shvets@intel.com>	2021-06-03 01:49:43 +08:00
ShvetsKS	55b823b27d	Reduce 'InitSampling' complexity and set gradients to zero (#6922 ) Co-authored-by: Kirill Shvets <kirill.shvets@intel.com>	2021-05-29 04:52:23 +08:00
Igor Rukhovich	19a2c54265	Prediction by indices (subsample < 1) (#6683 ) * Another implementation of predicting by indices * Fixed omp parallel_for variable type * Removed SparsePageView from Updater	2021-03-16 15:08:20 +13:00
Jiaming Yuan	f2f7dd87b8	Use view for `SparsePage` exclusively. (#6590 )	2021-01-11 18:04:55 +08:00
ShvetsKS	956beead70	Thread local memory allocation for BuildHist (#6358 ) * thread mem locality * fix apply * cleanup * fix lint * fix tests * simple try * fix * fix * apply comments * fix comments * fix * apply simple comment Co-authored-by: ShvetsKS <kirill.shvets@intel.com>	2020-11-25 17:50:12 +03:00
Sergio Gavilán	b181a88f9f	Reduced some C++ compiler warnings (#6197 ) * Removed some warnings * Rebase with master * Solved C++ Google Tests errors made by refactoring in order to remove warnings * Undo renaming path -> path_ * Fix style check Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2020-10-29 12:36:00 -07:00
Jiaming Yuan	2fcc4f2886	Unify evaluation functions. (#6037 )	2020-08-26 14:23:27 +08:00
ShvetsKS	cd3d14ad0e	Add float32 histogram (#5624 ) * new single_precision_histogram param was added. Co-authored-by: SHVETS, KIRILL <kirill.shvets@intel.com> Co-authored-by: fis <jm.yuan@outlook.com>	2020-06-03 11:24:53 +08:00
ShvetsKS	dd01e4ba8d	Distributed optimizations for 'hist' method with CPUs (#5557 ) Co-authored-by: SHVETS, KIRILL <kirill.shvets@intel.com>	2020-05-20 06:03:03 +03:00
Oleksandr Kuvshynov	4e64e2ef8e	skip missing lookup if nothing is missing in CPU hist partition kernel. (#5644 ) * [xgboost] skip missing lookup if nothing is missing	2020-05-12 05:50:08 +03:00
ShvetsKS	a2d86b8e4b	Optimizations for RNG in InitData kernel (#5522 ) * optimizations for subsampling in InitData * optimizations for subsampling in InitData Co-authored-by: SHVETS, KIRILL <kirill.shvets@intel.com>	2020-04-16 18:24:32 +03:00
Jiaming Yuan	6671b42dd4	Use ellpack for prediction only when sparsepage doesn't exist. (#5504 )	2020-04-10 12:15:46 +08:00
Jiaming Yuan	0012f2ef93	Upgrade clang-tidy on CI. (#5469 ) * Correct all clang-tidy errors. * Upgrade clang-tidy to 10 on CI. Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2020-04-05 04:42:29 +08:00
Jiaming Yuan	4942da64ae	Refactor tests with data generator. (#5439 )	2020-03-27 06:44:44 +08:00
Egor Smirnov	1b97eaf7a7	Optimized ApplySplit, BuildHist and UpdatePredictCache functions on CPU (#5244 ) * Split up sparse and dense build hist kernels. * Add `PartitionBuilder`.	2020-02-29 16:11:42 +08:00
Egor Smirnov	c67163250e	Optimized BuildHist function (#5156 )	2020-01-29 23:32:57 -08:00

1 2

70 Commits