xgboost

Author	SHA1	Message	Date
Jiaming Yuan	11d65fcb21	Extract partial sum into an independent function. (#7889 )	2022-05-13 14:30:35 +08:00
Rory Mitchell	7ef54e39ec	Small refactor to categoricals (#7858 )	2022-05-05 17:47:02 +02:00
Rong Ou	14ef38b834	Initial support for federated learning (#7831 ) Federated learning plugin for xgboost: * A gRPC server to aggregate MPI-style requests (allgather, allreduce, broadcast) from federated workers. * A Rabit engine for the federated environment. * Integration test to simulate federated learning. Additional followups are needed to address GPU support, better security, and privacy, etc.	2022-05-05 21:49:22 +08:00
Jiaming Yuan	317d7be6ee	Always use partition based categorical splits. (#7857 )	2022-05-03 22:30:32 +08:00
Rory Mitchell	90cce38236	Remove single_precision_histogram for gpu_hist (#7828 )	2022-05-03 14:53:19 +02:00
Jiaming Yuan	fdf533f2b9	[POC] Experimental support for l1 error. (#7812 ) Support adaptive tree, a feature supported by both sklearn and lightgbm. The tree leaf is recomputed based on residue of labels and predictions after construction. For l1 error, the optimal value is the median (50 percentile). This is marked as experimental support for the following reasons: - The value is not well defined for distributed training, where we might have empty leaves for local workers. Right now I just use the original leaf value for computing the average with other workers, which might cause significant errors. - Some follow-ups are required, for exact, pruner, and optimization for quantile function. Also, we need to calculate the initial estimation.	2022-04-26 21:41:55 +08:00
Jiaming Yuan	3c9b04460a	Move `num_parallel_tree` to model parameter. (#7751 ) The size of forest should be a property of model itself instead of a training hyper-parameter.	2022-03-29 02:32:42 +08:00
Jiaming Yuan	64575591d8	Use context in `SetInfo`. (#7687 ) * Use the name `Context`. * Pass a context object into `SetInfo`. * Add context to proxy matrix. * Add context to iterative DMatrix. This is to remove the use of the default number of threads during `SetInfo` as a follow-up on removing the global omp variable while preparing for CUDA stream semantic. Currently, XGBoost uses the legacy CUDA stream, we will gradually remove them in the future in favor of non-blocking streams.	2022-03-24 22:16:26 +08:00
Jiaming Yuan	4d81c741e9	External memory support for hist (#7531 ) * Generate column matrix from gHistIndex. * Avoid synchronization with the sparse page once the cache is written. * Cleanups: Remove member variables/functions, change the update routine to look like approx and gpu_hist. * Remove pruner.	2022-03-22 00:13:20 +08:00
Jiaming Yuan	996cc705af	Small cleanup to hist tree method. (#7735 ) * Remove special optimization using number of bins. * Remove 1-based index for column sampling. * Remove data layout. * Unify update prediction cache.	2022-03-20 03:44:55 +08:00
Jiaming Yuan	e78a38b837	Sort sparse page index when constructing DMatrix. (#7731 )	2022-03-16 18:01:05 +08:00
Jiaming Yuan	98d6faefd6	Implement slope for Pseduo-Huber. (#7727 ) * Add objective and metric. * Some refactoring for CPU/GPU dispatching using linalg module.	2022-03-14 21:42:38 +08:00
Haoming Chen	04fc575c0e	Run tests in a temporary directory (#7723 ) Fix some tests to run in a temporary directory in case the root directory is not writable. Note that most of tests are already running in the temporary directory, so this PR just make them consistent.	2022-03-12 21:24:36 +08:00
Jiaming Yuan	18a4af63aa	Update documents and tests. (#7659 ) * Revise documents after recent refactoring and cat support. * Add tests for behavior of max_depth and max_leaves.	2022-02-26 03:57:47 +08:00
Jiaming Yuan	83a66b4994	Support categorical data for hist. (#7695 ) * Extract partitioner from hist. * Implement categorical data support by passing the gradient index directly into the partitioner. * Organize/update document. * Remove code for negative hessian.	2022-02-25 03:47:14 +08:00
Jiaming Yuan	6762c45494	Small cleanup to gradient index and hist. (#7668 ) * Code comments. * Const accessor to index. * Remove some weird variables in the `Index` class. * Simplify the `MemStackAllocator`.	2022-02-23 11:37:21 +08:00
Jiaming Yuan	0d0abe1845	Support optimal partitioning for GPU hist. (#7652 ) * Implement `MaxCategory` in quantile. * Implement partition-based split for GPU evaluation. Currently, it's based on the existing evaluation function. * Extract an evaluator from GPU Hist to store the needed states. * Added some CUDA stream/event utilities. * Update document with references. * Fixed a bug in approx evaluator where the number of data points is less than the number of categories.	2022-02-15 03:03:12 +08:00
Jiaming Yuan	2369d55e9a	Add tests for prediction cache. (#7650 ) * Extract the test from approx for other tree methods. * Add note on how it works.	2022-02-15 00:28:00 +08:00
Jiaming Yuan	2775c2a1ab	Prepare external memory support for hist. (#7638 ) This PR prepares the GHistIndexMatrix to host the column matrix which is used by the hist tree method by accepting sparse_threshold parameter. Some cleanups are made to ensure the correct batch param is being passed into DMatrix along with some additional tests for correctness of SimpleDMatrix.	2022-02-10 16:58:02 +08:00
Jiaming Yuan	81210420c6	Remove `omp_get_max_threads` (#7608 ) This is the one last PR for removing omp global variable. * Add context object to the `DMatrix`. This bridges `DMatrix` with https://github.com/dmlc/xgboost/issues/7308 . * Require context to be available at the construction time of booster. * Add `n_threads` support for R csc DMatrix constructor. * Remove `omp_get_max_threads` in R glue code. * Remove threading utilities that rely on omp global variable.	2022-01-28 16:09:22 +08:00
Jiaming Yuan	5d7818e75d	Remove `omp_get_max_threads` in tree updaters. (#7590 )	2022-01-26 19:55:47 +08:00
Jiaming Yuan	6967ef7267	Remove `omp_get_max_threads` in objective. (#7589 )	2022-01-24 04:35:49 +08:00
Jiaming Yuan	5817840858	Remove `omp_get_max_threads` in data. (#7588 )	2022-01-24 02:44:07 +08:00
Jiaming Yuan	cc06fab9a7	Support distributed CPU env for categorical data. (#7575 ) * Add support for cat data in sketch allreduce. * Share tests between CPU and GPU.	2022-01-18 21:56:07 +08:00
Jiaming Yuan	465dc63833	Fix tree param feature type. (#7565 )	2022-01-16 04:46:29 +08:00
Jiaming Yuan	a1bcd33a3b	[breaking] Change internal model serialization to UBJSON. (#7556 ) * Use typed array for models. * Change the memory snapshot format. * Add new C API for saving to raw format.	2022-01-16 02:11:53 +08:00
Jiaming Yuan	52277cc3da	Rename build info function to be consistent with rest of the API. (#7553 )	2022-01-14 00:39:28 +08:00
Jiaming Yuan	e5e47c3c99	Clarify the behavior of invalid categorical value handling. (#7529 )	2022-01-13 16:11:52 +08:00
Jiaming Yuan	c635d4c46a	Implement ubjson. (#7549 ) * Implement ubjson. This is a partial implementation of UBJSON with support for typed arrays. Some missing features are `f64`, typed object, and the no-op.	2022-01-10 23:24:23 +08:00
Jiaming Yuan	001503186c	Rewrite approx (#7214 ) This PR rewrites the approx tree method to use codebase from hist for better performance and code sharing. The rewrite has many benefits: - Support for both `max_leaves` and `max_depth`. - Support for `grow_policy`. - Support for mono constraint. - Support for feature weights. - Support for easier bin configuration (`max_bin`). - Support for categorical data. - Faster performance for most of the datasets. (many times faster) - Support for prediction cache. - Significantly better performance for external memory. - Unites the code base between approx and hist.	2022-01-10 21:15:05 +08:00
Jiaming Yuan	91c1a1c52f	Fix index type for bitfield. (#7541 )	2022-01-05 19:23:29 +08:00
Jiaming Yuan	68cdbc9c16	Remove `omp_get_max_threads` in CPU predictor. (#7519 ) This is part of the on going effort to remove the dependency on global omp variables.	2022-01-04 22:12:15 +08:00
Jiaming Yuan	7f399eac8b	Use double for GPU Hist node sum. (#7507 )	2021-12-22 08:41:35 +08:00
Jiaming Yuan	58a6723eb1	Initial support for multioutput regression. (#7514 ) * Add num target model parameter, which is configured from input labels. * Change elementwise metric and indexing for weights. * Add demo. * Add tests.	2021-12-18 09:28:38 +08:00
Jiaming Yuan	9ab73f737e	Extract Sketch Entry from hist maker. (#7503 ) * Extract Sketch Entry from hist maker. * Add a new sketch container for sorted inputs. * Optimize bin search.	2021-12-18 05:36:56 +08:00
Jiaming Yuan	5b1161bb64	Convert labels into tensor. (#7456 ) * Add a new ctor to tensor for `initilizer_list`. * Change labels from host device vector to tensor. * Rename the field from `labels_` to `labels` since it's a public member.	2021-12-17 00:58:35 +08:00
Jiaming Yuan	70b12d898a	[dask] Fix ddqdm with empty partition. (#7510 ) * Fix empty partition. * war.	2021-12-16 20:37:29 +08:00
Jiaming Yuan	eee527d264	Add approx partitioner. (#7467 )	2021-11-27 15:22:06 +08:00
Jiaming Yuan	85cbd32c5a	Add range-based slicing to tensor view. (#7453 )	2021-11-27 13:42:36 +08:00
Jiaming Yuan	557ffc4bf5	Reduce base margin to 2 dim for now. (#7455 )	2021-11-27 00:46:13 +08:00
Jiaming Yuan	bf7bb575b4	Test CPU histogram with cat data. (#7465 )	2021-11-27 00:43:28 +08:00
Jiaming Yuan	176110a22d	Support external memory in CPU histogram building. (#7372 )	2021-11-23 01:13:33 +08:00
Jiaming Yuan	d33854af1b	[Breaking] Accept multi-dim meta info. (#7405 ) This PR changes base_margin into a 3-dim array, with one of them being reserved for multi-target classification. Also, a breaking change is made for binary serialization due to extra dimension along with a fix for saving the feature weights. Lastly, it unifies the prediction initialization between CPU and GPU. After this PR, the meta info setter in Python will be based on array interface.	2021-11-18 23:02:54 +08:00
Jiaming Yuan	9fb4338964	Add test for eta and mitigate float error. (#7446 ) * Add eta test. * Don't skip test.	2021-11-18 20:42:48 +08:00
Philip Hyunsu Cho	2adf222fb2	[CI] CI cost saving (#7407 ) * [CI] Drop CUDA 10.1; Require 11.0 * Change NCCL version * Use CUDA 10.1 for clang-tidy, for now * Remove JDK 11 and 12 * Fix NCCL version * Don't require 11.0 just yet, until clang-tidy is fixed * Skip MultiClassesSerializationTest.GpuHist	2021-11-17 21:02:20 -08:00
Jiaming Yuan	55ee272ea8	Extend array interface to handle ndarray. (#7434 ) * Extend array interface to handle ndarray. The `ArrayInterface` class is extended to support multi-dim array inputs. Previously this class handles only 2-dim (vector is also matrix). This PR specifies the expected dimension at compile-time and the array interface can perform various checks automatically for input data. Also, adapters like CSR are more rigorous about their input. Lastly, row vector and column vector are handled without intervention from the caller.	2021-11-16 09:52:15 +08:00
Jiaming Yuan	a7057fa64c	Implement typed storage for tensor. (#7429 ) * Add `Tensor` class. * Add elementwise kernel for CPU and GPU. * Add unravel index. * Move some computation to compile time.	2021-11-14 18:53:13 +08:00
Jiaming Yuan	46726ec176	Expose build info (#7399 )	2021-11-12 18:22:46 +08:00
Jiaming Yuan	937fa282b5	Extract string view. (#7416 ) * Add equality operators. * Return a view in substr. * Add proper iterator types.	2021-11-12 18:22:30 +08:00
Jiaming Yuan	ca6f980932	Check number of trees in inplace predict. (#7409 )	2021-11-12 18:20:23 +08:00

1 2 3 4 5 ...

533 Commits