xgboost

Author	SHA1	Message	Date
Jiaming Yuan	031d66ec27	Configuration for init estimation. (#8343 ) * Configuration for init estimation. * Check whether the model needs configuration based on const attribute `ModelFitted` instead of a mutable state. * Add parameter `boost_from_average` to tell whether the user has specified base score. * Add tests.	2022-10-18 01:52:24 +08:00
Jiaming Yuan	3ef1703553	Allow using string view to find JSON value. (#8332 ) - Allow comparison between string and string view. - Fix compiler warnings.	2022-10-13 17:10:13 +08:00
Philip Hyunsu Cho	2faa744aba	[CI] Test federated learning plugin in the CI (#8325 )	2022-10-12 13:57:39 -07:00
Rong Ou	39afdac3be	Better error message when world size and rank are set as strings (#8316 ) Co-authored-by: jiamingy <jm.yuan@outlook.com>	2022-10-12 15:53:25 +08:00
Rory Mitchell	210915c985	Use integer gradients in gpu_hist split evaluation (#8274 )	2022-10-11 12:16:27 +02:00
Rong Ou	668b8a0ea4	[Breaking] Switch from rabit to the collective communicator (#8257 ) * Switch from rabit to the collective communicator * fix size_t specialization * really fix size_t * try again * add include * more include * fix lint errors * remove rabit includes * fix pylint error * return dict from communicator context * fix communicator shutdown * fix dask test * reset communicator mocklist * fix distributed tests * do not save device communicator * fix jvm gpu tests * add python test for federated communicator * Update gputreeshap submodule Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>	2022-10-05 14:39:01 -08:00
Rory Mitchell	d686bf52a6	Reduce time for some multi-gpu tests (#8288 ) * Faster dask tests * Reuse AllReducer objects in tests. * Faster boost from prediction tests. * Use rmm dask fixture. * Speed up dask demo. * mypy * Format with black. * mypy * Clang-tidy Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>	2022-10-04 02:49:33 -08:00
Jiaming Yuan	6d1452074a	Remove MGPU cpp tests. (#8276 ) Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>	2022-09-27 21:18:23 +08:00
Rory Mitchell	8f77677193	Use quantised gradients in gpu_hist histograms (#8246 )	2022-09-26 17:35:35 +02:00
Jiaming Yuan	3fd331f8f2	Add checks to C pointer arguments. (#8254 )	2022-09-22 19:02:22 +08:00
Dmitry Razdoburdin	eb7bbee2c9	Optional by-column histogram build. (#8233 ) Co-authored-by: dmitry.razdoburdin <drazdobu@jfldaal005.jf.intel.com>	2022-09-22 05:16:13 +08:00
Jiaming Yuan	b791446623	Initial support for IPv6 (#8225 ) - Merge rabit socket into XGBoost. - Dask interface support. - Add test to the socket.	2022-09-21 18:06:50 +08:00
Jiaming Yuan	fffb1fca52	Calculate `base_score` based on input labels for mae. (#8107 ) Fit an intercept as base score for abs loss.	2022-09-20 20:53:54 +08:00
Rong Ou	a2686543a9	Common interface for collective communication (#8057 ) * implement broadcast for federated communicator * implement allreduce * add communicator factory * add device adapter * add device communicator to factory * add rabit communicator * add rabit communicator to the factory * add nccl device communicator * add synchronize to device communicator * add back print and getprocessorname * add python wrapper and c api * clean up types * fix non-gpu build * try to fix ci * fix std::size_t * portable string compare ignore case * c style size_t * fix lint errors * cross platform setenv * fix memory leak * fix lint errors * address review feedback * add python test for rabit communicator * fix failing gtest * use json to configure communicators * fix lint error * get rid of factories * fix cpu build * fix include * fix python import * don't export collective.py yet * skip collective communicator pytest on windows * add review feedback * update documentation * remove mpi communicator type * fix tests * shutdown the communicator separately Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2022-09-12 15:21:12 -07:00
Jiaming Yuan	bc818316f2	Prepare for improving Windows networking compatibility. (#8234 ) * Prepare for improving Windows networking compatibility. * Include dmlc filesystem indirectly as dmlc/filesystem.h includes windows.h, which conflicts with winsock2.h * Define `NOMINMAX` conditionally. * Link the winsock library when mysys32 is used. * Add config file for read the doc.	2022-09-10 15:16:49 +08:00
Jiaming Yuan	b5eb36f1af	Add `max_cat_threshold` to GPU and handle missing cat values. (#8212 )	2022-09-07 00:57:51 +08:00
Jiaming Yuan	441ffc017a	Copy data from Ellpack to GHist. (#8215 )	2022-09-06 23:05:49 +08:00
Jiaming Yuan	16bca5d4a1	Support CPU input for device `QuantileDMatrix`. (#8136 ) - Copy `GHistIndexMatrix` to `Ellpack` when needed.	2022-08-11 21:21:26 +08:00
Jiaming Yuan	2c70751d1e	Implement iterative DMatrix for CPU. (#8116 )	2022-07-26 22:34:21 +08:00
Jiaming Yuan	7785d65c8a	Fix feature weights with multiple column sampling. (#8100 )	2022-07-22 20:23:05 +08:00
Jiaming Yuan	4a4e5c7c18	Prepare gradient index for Quantile DMatrix. (#8103 ) * Prepare gradient index for Quantile DMatrix. - Implement push batch with adapter batch. - Implement `GetFvalue` for prediction.	2022-07-22 17:26:33 +08:00
Rory Mitchell	1be09848a7	Refactor split valuation kernel (#8073 )	2022-07-21 15:41:50 +02:00
Jiaming Yuan	ef11b024e8	Cleanup data generator. (#8094 ) - Avoid duplicated definition of data shape. - Explicitly define numpy iterator for CPU data.	2022-07-20 13:48:52 +08:00
Jiaming Yuan	4083440690	Small cleanups to various data types. (#8086 ) - Use `bst_bin_t` in batch param constructor. - Use `StringView` to avoid `std::string` when appropriate. - Avoid using `MetaInfo` in quantile constructor to limit the scope of parameter.	2022-07-18 22:39:36 +08:00
Jiaming Yuan	abaa593aa0	Fix compiler warnings. (#8059 ) - Remove unused parameters. - Avoid comparison of different signedness.	2022-07-14 05:29:56 +08:00
Rory Mitchell	794cbaa60a	Fuse split evaluation kernels (#8026 )	2022-07-05 10:24:31 +02:00
Jiaming Yuan	8746f9cddf	Rename `IterativeDMatrix`. (#8045 )	2022-07-04 18:52:31 +08:00
Rory Mitchell	bc4f802b17	Batch UpdatePosition using cudaMemcpy (#7964 )	2022-06-30 17:52:40 +02:00
Jiaming Yuan	f0c1b842bf	Implement sketching with adapter. (#8019 )	2022-06-23 00:03:02 +08:00
Jiaming Yuan	142a208a90	Fix compiler warnings. (#8022 ) - Remove/fix unused parameters - Remove deprecated code in rabit. - Update dmlc-core.	2022-06-22 21:29:10 +08:00
Jiaming Yuan	9b0eb66b78	Fix GPU driver test. (#8008 ) * Initialize the training parameter.	2022-06-20 19:37:31 +08:00
Jiaming Yuan	8f8bd8147a	Fix LTR with weighted Quantile DMatrix. (#7975 ) * Fix LTR with weighted Quantile DMatrix. * Better tests.	2022-06-09 01:33:41 +08:00
Jiaming Yuan	1a33b50a0d	Fix compiler warnings. (#7974 ) - Remove unused parameters. There are still many warnings that are not yet addressed. Currently, the warnings in dmlc-core dominate the error log. - Remove `distributed` parameter from metric. - Fixes some warnings about signed comparison.	2022-06-06 22:56:25 +08:00
Jiaming Yuan	d48123d23b	Fix rmm build (#7973 ) - Optionally switch to c++17 - Use rmm CMake target. - Workaround compiler errors. - Fix GPUMetric inheritance. - Run death tests even if it's built with RMM support. Co-authored-by: jakirkham <jakirkham@gmail.com>	2022-06-06 20:18:32 +08:00
Rong Ou	80339c3427	Enable distributed GPU training over Rabit (#7930 )	2022-05-31 04:09:45 +08:00
Jiaming Yuan	bde4f25794	Handle missing categorical value in CPU evaluator. (#7948 )	2022-05-27 14:15:47 +08:00
Jiaming Yuan	18a38f7ca0	Refactor for GHistIndex. (#7923 ) * Pass sparse page as adapter, which prepares for quantile dmatrix. * Remove old external memory code like `rbegin` and extra `Init` function. * Simplify type dispatch.	2022-05-23 23:04:53 +08:00
Jiaming Yuan	765097d514	Simplify inplace-predict. (#7910 ) Pass the `X` as part of Proxy DMatrix instead of an independent `dmlc::any`.	2022-05-18 17:52:00 +08:00
Jiaming Yuan	19775ffe15	Use adapter to initialize column matrix. (#7912 )	2022-05-18 16:15:12 +08:00
Rory Mitchell	71d3b2e036	Fuse gpu_hist all-reduce calls where possible (#7867 )	2022-05-17 13:27:50 +02:00
Jiaming Yuan	4fcfd9c96e	Fix and cleanup for column matrix. (#7901 ) * Fix missed type dispatching for dense columns with missing values. * Code cleanup to reduce special cases. * Reduce memory usage.	2022-05-16 21:11:50 +08:00
Jiaming Yuan	1baad8650c	Small cleanup to Column. (#7898 ) * Define forward iterator to hide the internal state.	2022-05-15 12:39:10 +08:00
Jiaming Yuan	1b6538b4e5	[breaking] Drop single precision histogram (#7892 ) Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2022-05-13 19:54:55 +08:00
Jiaming Yuan	11d65fcb21	Extract partial sum into an independent function. (#7889 )	2022-05-13 14:30:35 +08:00
Rory Mitchell	7ef54e39ec	Small refactor to categoricals (#7858 )	2022-05-05 17:47:02 +02:00
Rong Ou	14ef38b834	Initial support for federated learning (#7831 ) Federated learning plugin for xgboost: * A gRPC server to aggregate MPI-style requests (allgather, allreduce, broadcast) from federated workers. * A Rabit engine for the federated environment. * Integration test to simulate federated learning. Additional followups are needed to address GPU support, better security, and privacy, etc.	2022-05-05 21:49:22 +08:00
Jiaming Yuan	317d7be6ee	Always use partition based categorical splits. (#7857 )	2022-05-03 22:30:32 +08:00
Rory Mitchell	90cce38236	Remove single_precision_histogram for gpu_hist (#7828 )	2022-05-03 14:53:19 +02:00
Jiaming Yuan	fdf533f2b9	[POC] Experimental support for l1 error. (#7812 ) Support adaptive tree, a feature supported by both sklearn and lightgbm. The tree leaf is recomputed based on residue of labels and predictions after construction. For l1 error, the optimal value is the median (50 percentile). This is marked as experimental support for the following reasons: - The value is not well defined for distributed training, where we might have empty leaves for local workers. Right now I just use the original leaf value for computing the average with other workers, which might cause significant errors. - Some follow-ups are required, for exact, pruner, and optimization for quantile function. Also, we need to calculate the initial estimation.	2022-04-26 21:41:55 +08:00
Jiaming Yuan	3c9b04460a	Move `num_parallel_tree` to model parameter. (#7751 ) The size of forest should be a property of model itself instead of a training hyper-parameter.	2022-03-29 02:32:42 +08:00

... 3 4 5 6 7 ...

726 Commits