xgboost

Author	SHA1	Message	Date
Jiaming Yuan	282b1729da	Specify the number of threads for parallel sort. (#8735 ) * Specify the number of threads for parallel sort. - Pass context object into argsort. - Replace macros with inline functions.	2023-02-16 00:20:19 +08:00
Jiaming Yuan	31d3ec07af	Extract device algorithms. (#8789 )	2023-02-13 20:53:53 +08:00
Jiaming Yuan	457f704e3d	Add quantile metric. (#8761 )	2023-02-13 19:07:40 +08:00
Rong Ou	ed91e775ec	Fix quantile tests running on multi-gpus (#8775 ) * Fix quantile tests running on multi-gpus * Run some gtests with multiple GPUs * fix mgpu test naming * Instruct NCCL to print extra logs * Allocate extra space in /dev/shm to enable NCCL * use gtest_skip to skip mgpu tests --------- Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>	2023-02-12 17:00:26 -08:00
Jiaming Yuan	17b709acb9	Rename ranking utils to threading utils. (#8785 )	2023-02-12 05:41:18 +08:00
Jiaming Yuan	5f76edd296	Extract make metric name from ranking metric. (#8768 ) - Extract the metric parsing routine from ranking. - Add a test. - Accept null for string view.	2023-02-09 18:30:21 +08:00
Jiaming Yuan	48cefa012e	Support multiple alphas for segmented quantile. (#8758 )	2023-02-07 17:17:59 +08:00
Jiaming Yuan	28bb01aa22	Extract optional weight. (#8747 ) - Extract optional weight from coommon.h to reduce dependency on this header. - Add test.	2023-02-07 03:11:53 +08:00
Rong Ou	66191e9926	Support cpu quantile sketch with column-wise data split (#8742 )	2023-02-05 14:26:24 +08:00
Jiaming Yuan	c1786849e3	Use array interface for CSC matrix. (#8672 ) * Use array interface for CSC matrix. Use array interface for CSC matrix and align the interface with CSR and dense. - Fix nthread issue in the R package DMatrix. - Unify the behavior of handling `missing` with other inputs. - Unify the behavior of handling `missing` around R, Python, Java, and Scala DMatrix. - Expose `num_non_missing` to the JVM interface. - Deprecate old CSR and CSC constructors.	2023-02-05 01:59:46 +08:00
Jiaming Yuan	3760cede0f	Consistent use of context to specify number of threads. (#8733 ) - Use context in all tests. - Use context in R. - Use context in C API DMatrix initialization. (0 threads is used as dft).	2023-01-30 15:25:31 +08:00
Rong Ou	8af98e30fc	Use in-memory communicator to test quantile (#8710 )	2023-01-27 23:28:28 +08:00
Jiaming Yuan	4416452f94	Return single thread from context when called inside omp region. (#8693 )	2023-01-18 09:23:37 +08:00
Jiaming Yuan	43152657d4	Extract JSON type check. (#8677 ) - Reuse it in `GetMissing`. - Add test.	2023-01-17 03:11:07 +08:00
Jiaming Yuan	cfa994d57f	Multi-target support for L1 error. (#8652 ) - Add matrix support to the median function. - Iterate through each target for quantile computation.	2023-01-11 05:51:14 +08:00
Jiaming Yuan	8d545ab2a2	Implement fit stump. (#8607 )	2023-01-04 04:14:51 +08:00
Rong Ou	3ceeb8c61c	Add data split mode to DMatrix MetaInfo (#8568 )	2022-12-25 20:37:37 +08:00
Jiaming Yuan	38887a1876	Fix windows build on buildkite. (#8602 )	2022-12-16 21:12:24 +08:00
Jiaming Yuan	43a647a4dd	Fix inference with categorical feature. (#8591 )	2022-12-15 17:57:26 +08:00
Jiaming Yuan	3e26107a9c	Rename and extract `Context`. (#8528 ) * Rename `GenericParameter` to `Context`. * Rename header file to reflect the change. * Rename all references.	2022-12-07 04:58:54 +08:00
Jiaming Yuan	e3bf5565ab	Extract transform iterator. (#8498 )	2022-12-05 21:37:07 +08:00
Rong Ou	8e76f5f595	Use `DataSplitMode` to configure data loading (#8434 ) * Use `DataSplitMode` to configure data loading	2022-11-08 16:21:50 +08:00
Jiaming Yuan	3ef1703553	Allow using string view to find JSON value. (#8332 ) - Allow comparison between string and string view. - Fix compiler warnings.	2022-10-13 17:10:13 +08:00
Rong Ou	668b8a0ea4	[Breaking] Switch from rabit to the collective communicator (#8257 ) * Switch from rabit to the collective communicator * fix size_t specialization * really fix size_t * try again * add include * more include * fix lint errors * remove rabit includes * fix pylint error * return dict from communicator context * fix communicator shutdown * fix dask test * reset communicator mocklist * fix distributed tests * do not save device communicator * fix jvm gpu tests * add python test for federated communicator * Update gputreeshap submodule Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>	2022-10-05 14:39:01 -08:00
Rory Mitchell	d686bf52a6	Reduce time for some multi-gpu tests (#8288 ) * Faster dask tests * Reuse AllReducer objects in tests. * Faster boost from prediction tests. * Use rmm dask fixture. * Speed up dask demo. * mypy * Format with black. * mypy * Clang-tidy Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>	2022-10-04 02:49:33 -08:00
Jiaming Yuan	6d1452074a	Remove MGPU cpp tests. (#8276 ) Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>	2022-09-27 21:18:23 +08:00
Rory Mitchell	8f77677193	Use quantised gradients in gpu_hist histograms (#8246 )	2022-09-26 17:35:35 +02:00
Jiaming Yuan	fffb1fca52	Calculate `base_score` based on input labels for mae. (#8107 ) Fit an intercept as base score for abs loss.	2022-09-20 20:53:54 +08:00
Jiaming Yuan	bc818316f2	Prepare for improving Windows networking compatibility. (#8234 ) * Prepare for improving Windows networking compatibility. * Include dmlc filesystem indirectly as dmlc/filesystem.h includes windows.h, which conflicts with winsock2.h * Define `NOMINMAX` conditionally. * Link the winsock library when mysys32 is used. * Add config file for read the doc.	2022-09-10 15:16:49 +08:00
Jiaming Yuan	441ffc017a	Copy data from Ellpack to GHist. (#8215 )	2022-09-06 23:05:49 +08:00
Jiaming Yuan	7785d65c8a	Fix feature weights with multiple column sampling. (#8100 )	2022-07-22 20:23:05 +08:00
Jiaming Yuan	4083440690	Small cleanups to various data types. (#8086 ) - Use `bst_bin_t` in batch param constructor. - Use `StringView` to avoid `std::string` when appropriate. - Avoid using `MetaInfo` in quantile constructor to limit the scope of parameter.	2022-07-18 22:39:36 +08:00
Jiaming Yuan	f0c1b842bf	Implement sketching with adapter. (#8019 )	2022-06-23 00:03:02 +08:00
Jiaming Yuan	142a208a90	Fix compiler warnings. (#8022 ) - Remove/fix unused parameters - Remove deprecated code in rabit. - Update dmlc-core.	2022-06-22 21:29:10 +08:00
Jiaming Yuan	8f8bd8147a	Fix LTR with weighted Quantile DMatrix. (#7975 ) * Fix LTR with weighted Quantile DMatrix. * Better tests.	2022-06-09 01:33:41 +08:00
Jiaming Yuan	1a33b50a0d	Fix compiler warnings. (#7974 ) - Remove unused parameters. There are still many warnings that are not yet addressed. Currently, the warnings in dmlc-core dominate the error log. - Remove `distributed` parameter from metric. - Fixes some warnings about signed comparison.	2022-06-06 22:56:25 +08:00
Jiaming Yuan	d48123d23b	Fix rmm build (#7973 ) - Optionally switch to c++17 - Use rmm CMake target. - Workaround compiler errors. - Fix GPUMetric inheritance. - Run death tests even if it's built with RMM support. Co-authored-by: jakirkham <jakirkham@gmail.com>	2022-06-06 20:18:32 +08:00
Rong Ou	80339c3427	Enable distributed GPU training over Rabit (#7930 )	2022-05-31 04:09:45 +08:00
Jiaming Yuan	18a38f7ca0	Refactor for GHistIndex. (#7923 ) * Pass sparse page as adapter, which prepares for quantile dmatrix. * Remove old external memory code like `rbegin` and extra `Init` function. * Simplify type dispatch.	2022-05-23 23:04:53 +08:00
Jiaming Yuan	19775ffe15	Use adapter to initialize column matrix. (#7912 )	2022-05-18 16:15:12 +08:00
Jiaming Yuan	4fcfd9c96e	Fix and cleanup for column matrix. (#7901 ) * Fix missed type dispatching for dense columns with missing values. * Code cleanup to reduce special cases. * Reduce memory usage.	2022-05-16 21:11:50 +08:00
Jiaming Yuan	1baad8650c	Small cleanup to Column. (#7898 ) * Define forward iterator to hide the internal state.	2022-05-15 12:39:10 +08:00
Jiaming Yuan	1b6538b4e5	[breaking] Drop single precision histogram (#7892 ) Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2022-05-13 19:54:55 +08:00
Jiaming Yuan	11d65fcb21	Extract partial sum into an independent function. (#7889 )	2022-05-13 14:30:35 +08:00
Jiaming Yuan	fdf533f2b9	[POC] Experimental support for l1 error. (#7812 ) Support adaptive tree, a feature supported by both sklearn and lightgbm. The tree leaf is recomputed based on residue of labels and predictions after construction. For l1 error, the optimal value is the median (50 percentile). This is marked as experimental support for the following reasons: - The value is not well defined for distributed training, where we might have empty leaves for local workers. Right now I just use the original leaf value for computing the average with other workers, which might cause significant errors. - Some follow-ups are required, for exact, pruner, and optimization for quantile function. Also, we need to calculate the initial estimation.	2022-04-26 21:41:55 +08:00
Jiaming Yuan	64575591d8	Use context in `SetInfo`. (#7687 ) * Use the name `Context`. * Pass a context object into `SetInfo`. * Add context to proxy matrix. * Add context to iterative DMatrix. This is to remove the use of the default number of threads during `SetInfo` as a follow-up on removing the global omp variable while preparing for CUDA stream semantic. Currently, XGBoost uses the legacy CUDA stream, we will gradually remove them in the future in favor of non-blocking streams.	2022-03-24 22:16:26 +08:00
Jiaming Yuan	4d81c741e9	External memory support for hist (#7531 ) * Generate column matrix from gHistIndex. * Avoid synchronization with the sparse page once the cache is written. * Cleanups: Remove member variables/functions, change the update routine to look like approx and gpu_hist. * Remove pruner.	2022-03-22 00:13:20 +08:00
Jiaming Yuan	98d6faefd6	Implement slope for Pseduo-Huber. (#7727 ) * Add objective and metric. * Some refactoring for CPU/GPU dispatching using linalg module.	2022-03-14 21:42:38 +08:00
Jiaming Yuan	6762c45494	Small cleanup to gradient index and hist. (#7668 ) * Code comments. * Const accessor to index. * Remove some weird variables in the `Index` class. * Simplify the `MemStackAllocator`.	2022-02-23 11:37:21 +08:00
Jiaming Yuan	0d0abe1845	Support optimal partitioning for GPU hist. (#7652 ) * Implement `MaxCategory` in quantile. * Implement partition-based split for GPU evaluation. Currently, it's based on the existing evaluation function. * Extract an evaluator from GPU Hist to store the needed states. * Added some CUDA stream/event utilities. * Update document with references. * Fixed a bug in approx evaluator where the number of data points is less than the number of categories.	2022-02-15 03:03:12 +08:00

1 2 3 4 5

223 Commits