xgboost

Author	SHA1	Message	Date
Jiaming Yuan	fdf533f2b9	[POC] Experimental support for l1 error. (#7812 ) Support adaptive tree, a feature supported by both sklearn and lightgbm. The tree leaf is recomputed based on residue of labels and predictions after construction. For l1 error, the optimal value is the median (50 percentile). This is marked as experimental support for the following reasons: - The value is not well defined for distributed training, where we might have empty leaves for local workers. Right now I just use the original leaf value for computing the average with other workers, which might cause significant errors. - Some follow-ups are required, for exact, pruner, and optimization for quantile function. Also, we need to calculate the initial estimation.	2022-04-26 21:41:55 +08:00
Jiaming Yuan	64575591d8	Use context in `SetInfo`. (#7687 ) * Use the name `Context`. * Pass a context object into `SetInfo`. * Add context to proxy matrix. * Add context to iterative DMatrix. This is to remove the use of the default number of threads during `SetInfo` as a follow-up on removing the global omp variable while preparing for CUDA stream semantic. Currently, XGBoost uses the legacy CUDA stream, we will gradually remove them in the future in favor of non-blocking streams.	2022-03-24 22:16:26 +08:00
Jiaming Yuan	4d81c741e9	External memory support for hist (#7531 ) * Generate column matrix from gHistIndex. * Avoid synchronization with the sparse page once the cache is written. * Cleanups: Remove member variables/functions, change the update routine to look like approx and gpu_hist. * Remove pruner.	2022-03-22 00:13:20 +08:00
Jiaming Yuan	98d6faefd6	Implement slope for Pseduo-Huber. (#7727 ) * Add objective and metric. * Some refactoring for CPU/GPU dispatching using linalg module.	2022-03-14 21:42:38 +08:00
Jiaming Yuan	6762c45494	Small cleanup to gradient index and hist. (#7668 ) * Code comments. * Const accessor to index. * Remove some weird variables in the `Index` class. * Simplify the `MemStackAllocator`.	2022-02-23 11:37:21 +08:00
Jiaming Yuan	0d0abe1845	Support optimal partitioning for GPU hist. (#7652 ) * Implement `MaxCategory` in quantile. * Implement partition-based split for GPU evaluation. Currently, it's based on the existing evaluation function. * Extract an evaluator from GPU Hist to store the needed states. * Added some CUDA stream/event utilities. * Update document with references. * Fixed a bug in approx evaluator where the number of data points is less than the number of categories.	2022-02-15 03:03:12 +08:00
Jiaming Yuan	2775c2a1ab	Prepare external memory support for hist. (#7638 ) This PR prepares the GHistIndexMatrix to host the column matrix which is used by the hist tree method by accepting sparse_threshold parameter. Some cleanups are made to ensure the correct batch param is being passed into DMatrix along with some additional tests for correctness of SimpleDMatrix.	2022-02-10 16:58:02 +08:00
Jiaming Yuan	81210420c6	Remove `omp_get_max_threads` (#7608 ) This is the one last PR for removing omp global variable. * Add context object to the `DMatrix`. This bridges `DMatrix` with https://github.com/dmlc/xgboost/issues/7308 . * Require context to be available at the construction time of booster. * Add `n_threads` support for R csc DMatrix constructor. * Remove `omp_get_max_threads` in R glue code. * Remove threading utilities that rely on omp global variable.	2022-01-28 16:09:22 +08:00
Jiaming Yuan	5d7818e75d	Remove `omp_get_max_threads` in tree updaters. (#7590 )	2022-01-26 19:55:47 +08:00
Jiaming Yuan	6967ef7267	Remove `omp_get_max_threads` in objective. (#7589 )	2022-01-24 04:35:49 +08:00
Jiaming Yuan	5817840858	Remove `omp_get_max_threads` in data. (#7588 )	2022-01-24 02:44:07 +08:00
Jiaming Yuan	cc06fab9a7	Support distributed CPU env for categorical data. (#7575 ) * Add support for cat data in sketch allreduce. * Share tests between CPU and GPU.	2022-01-18 21:56:07 +08:00
Jiaming Yuan	e5e47c3c99	Clarify the behavior of invalid categorical value handling. (#7529 )	2022-01-13 16:11:52 +08:00
Jiaming Yuan	c635d4c46a	Implement ubjson. (#7549 ) * Implement ubjson. This is a partial implementation of UBJSON with support for typed arrays. Some missing features are `f64`, typed object, and the no-op.	2022-01-10 23:24:23 +08:00
Jiaming Yuan	91c1a1c52f	Fix index type for bitfield. (#7541 )	2022-01-05 19:23:29 +08:00
Jiaming Yuan	9ab73f737e	Extract Sketch Entry from hist maker. (#7503 ) * Extract Sketch Entry from hist maker. * Add a new sketch container for sorted inputs. * Optimize bin search.	2021-12-18 05:36:56 +08:00
Jiaming Yuan	85cbd32c5a	Add range-based slicing to tensor view. (#7453 )	2021-11-27 13:42:36 +08:00
Jiaming Yuan	d33854af1b	[Breaking] Accept multi-dim meta info. (#7405 ) This PR changes base_margin into a 3-dim array, with one of them being reserved for multi-target classification. Also, a breaking change is made for binary serialization due to extra dimension along with a fix for saving the feature weights. Lastly, it unifies the prediction initialization between CPU and GPU. After this PR, the meta info setter in Python will be based on array interface.	2021-11-18 23:02:54 +08:00
Jiaming Yuan	a7057fa64c	Implement typed storage for tensor. (#7429 ) * Add `Tensor` class. * Add elementwise kernel for CPU and GPU. * Add unravel index. * Move some computation to compile time.	2021-11-14 18:53:13 +08:00
Jiaming Yuan	937fa282b5	Extract string view. (#7416 ) * Add equality operators. * Return a view in substr. * Add proper iterator types.	2021-11-12 18:22:30 +08:00
Jiaming Yuan	d7d1b6e3a6	CPU evaluation for cat data. (#7393 ) * Implementation for one hot based. * Implementation for partition based. (LightGBM)	2021-11-06 14:41:35 +08:00
Jiaming Yuan	b06040b6d0	Implement a general array view. (#7365 ) * Replace existing matrix and vector view. This is to prepare for handling higher dimension data and prediction when we support multi-target models.	2021-11-05 04:16:11 +08:00
Jiaming Yuan	6295dc3b67	Fix span reverse iterator. (#7387 ) * Fix span reverse iterator. * Disable `rbegin` on device code to avoid calling host function. * Add `trbegin` and friends.	2021-11-02 13:35:59 +08:00
Jiaming Yuan	ac9bfaa4f2	Handle missing values in dataframe with category dtype. (#7331 ) * Replace -1 in pandas initializer. * Unify `IsValid` functor. * Mimic pandas data handling in cuDF glue code. * Check invalid categories. * Fix DDM sketching.	2021-10-28 03:33:54 +08:00
Jiaming Yuan	d8a549e6ac	Avoid thread block with sparse data. (#7255 )	2021-09-25 13:11:34 +08:00
Jiaming Yuan	31c1e13f90	Categorical data support in CPU sketching. (#7221 )	2021-09-17 04:37:09 +08:00
Jiaming Yuan	2942dc68e4	Fix mixed types in GPU sketching. (#7228 )	2021-09-16 00:10:25 +08:00
Jiaming Yuan	b12e7f7edd	Add noexcept to JSON objects. (#7205 )	2021-09-07 13:56:48 +08:00
Jiaming Yuan	7a1d67f9cb	[breaking] Use integer atomic for GPU histogram. (#7180 ) On GPU we use rouding factor to truncate the gradient for deterministic results. This PR changes the gradient representation to fixed point number with exponent aligned with rounding factor. [breaking] Drop non-deterministic histogram. Use fixed point for shared memory. This PR is to improve the performance of GPU Hist. Co-authored-by: Andy Adinets <aadinets@nvidia.com>	2021-08-28 05:17:05 +08:00
Jiaming Yuan	7ee7a95b84	Use upstream URI in distributed quantile tests. (#7129 ) * Use upstream URI in distributed quantile tests. * Fix test cv `PytestAssertRewriteWarning`.	2021-07-27 14:09:49 +08:00
Jiaming Yuan	bd1f3a38f0	Rewrite sparse dmatrix using callbacks. (#7092 ) - Reduce dependency on dmlc parsers and provide an interface for users to load data by themselves. - Remove use of threaded iterator and IO queue. - Remove `page_size`. - Make sure the number of pages in memory is bounded. - Make sure the cache can not be violated. - Provide an interface for internal algorithms to process data asynchronously.	2021-07-16 12:33:31 +08:00
Jiaming Yuan	345796825f	Optional find dependency in installed cmake config. (#7099 ) * Find dependency only when xgboost is built as static library. * Resolve msvc warning. * Add test for linking shared library.	2021-07-11 17:20:55 +08:00
Jiaming Yuan	77f6cf2d13	Support hessian in host sketch container. (#7081 ) Prepare for migrating approx onto hist's codebase.	2021-07-08 16:33:58 +08:00
Jiaming Yuan	1cd20efe68	Move `GHistIndex` into `DMatrix`. (#7064 )	2021-07-01 00:44:49 +08:00
Jiaming Yuan	1c8fdf2218	Remove use of `device_idx` in `dh::LaunchN`. (#7063 ) It's an unused parameter, removing it can make the CI log more readable.	2021-06-29 11:37:26 +08:00
ShvetsKS	2567404ab6	Simplify sparse and dense CPU hist kernels (#7029 ) * Simplify sparse and dense kernels * Extract row partitioner. Co-authored-by: Kirill Shvets <kirill.shvets@intel.com>	2021-06-11 18:26:30 +08:00
Jiaming Yuan	556a83022d	Implement unified update prediction cache for (gpu_)hist. (#6860 ) * Implement utilites for linalg. * Unify the update prediction cache functions. * Implement update prediction cache for multi-class gpu hist.	2021-04-17 00:29:34 +08:00
Jiaming Yuan	3039dd194b	Don't estimate sketch batch size when rmm is used. (#6807 )	2021-03-31 15:29:56 +08:00
Jiaming Yuan	a59c7323b4	Fix inplace predict missing value. (#6787 )	2021-03-27 05:36:10 +08:00
Jiaming Yuan	bcc0277338	Re-implement ROC-AUC. (#6747 ) * Re-implement ROC-AUC. * Binary * MultiClass * LTR * Add documents. This PR resolves a few issues: - Define a value when the dataset is invalid, which can happen if there's an empty dataset, or when the dataset contains only positive or negative values. - Define ROC-AUC for multi-class classification. - Define weighted average value for distributed setting. - A correct implementation for learning to rank task. Previous implementation is just binary classification with averaging across groups, which doesn't measure ordered learning to rank.	2021-03-20 16:52:40 +08:00
Jiaming Yuan	1a73a28511	Add device argsort. (#6749 ) This is part of https://github.com/dmlc/xgboost/pull/6747 .	2021-03-16 16:05:22 +08:00
Jiaming Yuan	218a5fb6dd	Simplify Span checks. (#6685 ) * Stop printing out message. * Remove R specialization. The printed message is not really useful anyway, without a reproducible example there's no way to fix it. But if there's a reproducible example, we can always obtain these information by a debugger. Removing the `printf` function avoids creating the context in kernel.	2021-02-09 08:12:58 +08:00
Jiaming Yuan	1b70a323a7	Improve string view to reduce string allocation. (#6644 )	2021-01-27 19:08:52 +08:00
Jiaming Yuan	f2f7dd87b8	Use view for `SparsePage` exclusively. (#6590 )	2021-01-11 18:04:55 +08:00
Jiaming Yuan	886486a519	Support categorical data in GPU weighted sketching. (#6508 )	2020-12-16 14:23:28 +08:00
ShvetsKS	956beead70	Thread local memory allocation for BuildHist (#6358 ) * thread mem locality * fix apply * cleanup * fix lint * fix tests * simple try * fix * fix * apply comments * fix comments * fix * apply simple comment Co-authored-by: ShvetsKS <kirill.shvets@intel.com>	2020-11-25 17:50:12 +03:00
Jiaming Yuan	43efadea2e	Deterministic data partitioning for external memory (#6317 ) * Make external memory data partitioning deterministic. * Change the meaning of `page_size` from bytes to number of rows. * Design a data pool. * Note for external memory. * Enable unity build on Windows CI. * Force garbage collect on test.	2020-11-11 06:11:06 +08:00
Jiaming Yuan	c4da967b5c	Support unity build. (#6295 ) * Support unity build. * Setup on Windows Jenkins. * Revert "Setup on Windows Jenkins." This reverts commit 8345cb8d2b009eec8ae9fa6f16412a7c9b6ec12c.	2020-10-28 11:49:28 -07:00
Jiaming Yuan	ddf37cca30	Unify thread configuration. (#6186 )	2020-10-19 16:05:42 +08:00
Jiaming Yuan	bed7ae4083	Loop over `thrust::reduce`. (#6229 ) * Check input chunk size of dqdm. * Add doc for current limitation.	2020-10-14 10:40:56 +13:00

1 2 3 4

179 Commits