xgboost

Author	SHA1	Message	Date
Jiaming Yuan	001503186c	Rewrite approx (#7214 ) This PR rewrites the approx tree method to use codebase from hist for better performance and code sharing. The rewrite has many benefits: - Support for both `max_leaves` and `max_depth`. - Support for `grow_policy`. - Support for mono constraint. - Support for feature weights. - Support for easier bin configuration (`max_bin`). - Support for categorical data. - Faster performance for most of the datasets. (many times faster) - Support for prediction cache. - Significantly better performance for external memory. - Unites the code base between approx and hist.	2022-01-10 21:15:05 +08:00
Jiaming Yuan	91c1a1c52f	Fix index type for bitfield. (#7541 )	2022-01-05 19:23:29 +08:00
Jiaming Yuan	9ab73f737e	Extract Sketch Entry from hist maker. (#7503 ) * Extract Sketch Entry from hist maker. * Add a new sketch container for sorted inputs. * Optimize bin search.	2021-12-18 05:36:56 +08:00
Jiaming Yuan	5b1161bb64	Convert labels into tensor. (#7456 ) * Add a new ctor to tensor for `initilizer_list`. * Change labels from host device vector to tensor. * Rename the field from `labels_` to `labels` since it's a public member.	2021-12-17 00:58:35 +08:00
Jiaming Yuan	eee527d264	Add approx partitioner. (#7467 )	2021-11-27 15:22:06 +08:00
Jiaming Yuan	176110a22d	Support external memory in CPU histogram building. (#7372 )	2021-11-23 01:13:33 +08:00
Jiaming Yuan	d33854af1b	[Breaking] Accept multi-dim meta info. (#7405 ) This PR changes base_margin into a 3-dim array, with one of them being reserved for multi-target classification. Also, a breaking change is made for binary serialization due to extra dimension along with a fix for saving the feature weights. Lastly, it unifies the prediction initialization between CPU and GPU. After this PR, the meta info setter in Python will be based on array interface.	2021-11-18 23:02:54 +08:00
Jiaming Yuan	55ee272ea8	Extend array interface to handle ndarray. (#7434 ) * Extend array interface to handle ndarray. The `ArrayInterface` class is extended to support multi-dim array inputs. Previously this class handles only 2-dim (vector is also matrix). This PR specifies the expected dimension at compile-time and the array interface can perform various checks automatically for input data. Also, adapters like CSR are more rigorous about their input. Lastly, row vector and column vector are handled without intervention from the caller.	2021-11-16 09:52:15 +08:00
Jiaming Yuan	a7057fa64c	Implement typed storage for tensor. (#7429 ) * Add `Tensor` class. * Add elementwise kernel for CPU and GPU. * Add unravel index. * Move some computation to compile time.	2021-11-14 18:53:13 +08:00
Jiaming Yuan	937fa282b5	Extract string view. (#7416 ) * Add equality operators. * Return a view in substr. * Add proper iterator types.	2021-11-12 18:22:30 +08:00
Jiaming Yuan	d7d1b6e3a6	CPU evaluation for cat data. (#7393 ) * Implementation for one hot based. * Implementation for partition based. (LightGBM)	2021-11-06 14:41:35 +08:00
Jiaming Yuan	ccdabe4512	Support building gradient index with cat data. (#7371 )	2021-11-03 22:37:37 +08:00
Jiaming Yuan	57a4b4ff64	Handle `OMP_THREAD_LIMIT`. (#7390 )	2021-11-03 15:44:38 +08:00
Jiaming Yuan	a55d43ccfd	Add test for invalid categorical data values. (#7380 ) * Add test for invalid categorical data values. * Add check during sketching.	2021-11-02 18:00:52 +08:00
Jiaming Yuan	32e673d8c4	Support building with CTK11.5. (#7379 ) * Support building with CTK11.5. * Require system cub installation for CTK11.4+. * Check thrust version for segmented sort.	2021-11-02 16:22:26 +08:00
Jiaming Yuan	6295dc3b67	Fix span reverse iterator. (#7387 ) * Fix span reverse iterator. * Disable `rbegin` on device code to avoid calling host function. * Add `trbegin` and friends.	2021-11-02 13:35:59 +08:00
Jiaming Yuan	ac9bfaa4f2	Handle missing values in dataframe with category dtype. (#7331 ) * Replace -1 in pandas initializer. * Unify `IsValid` functor. * Mimic pandas data handling in cuDF glue code. * Check invalid categories. * Fix DDM sketching.	2021-10-28 03:33:54 +08:00
Jiaming Yuan	d4349426d8	Re-implement PR-AUC. (#7297 ) * Support binary/multi-class classification, ranking. * Add documents. * Handle missing data.	2021-10-26 13:07:50 +08:00
Jiaming Yuan	ca17f8a5fc	Dispatch thrust versions and upgrade rmm. (#7254 ) Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2021-09-25 03:43:23 +08:00
Jiaming Yuan	c311a8c1d8	Enable compiling with system cub. (#7232 ) - Tested with all CUDA 11.x. - Workaround cub scan by using discard iterator in AUC. - Limit the size of Argsort when compiled with CUDA cub.	2021-09-17 14:28:18 +08:00
Jiaming Yuan	31c1e13f90	Categorical data support in CPU sketching. (#7221 )	2021-09-17 04:37:09 +08:00
Jiaming Yuan	2942dc68e4	Fix mixed types in GPU sketching. (#7228 )	2021-09-16 00:10:25 +08:00
Jiaming Yuan	3515931305	Initial support for external memory in gradient index. (#7183 ) * Add hessian to batch param in preparation of new approx impl. * Extract a push method for gradient index matrix. * Use span instead of vector ref for hessian in sketching. * Create a binary format for gradient index.	2021-09-13 12:40:56 +08:00
Jiaming Yuan	b12e7f7edd	Add noexcept to JSON objects. (#7205 )	2021-09-07 13:56:48 +08:00
Jiaming Yuan	ba69244a94	Restore the custom double atomic add. (#7198 )	2021-08-28 18:30:42 +08:00
Jiaming Yuan	7a1d67f9cb	[breaking] Use integer atomic for GPU histogram. (#7180 ) On GPU we use rouding factor to truncate the gradient for deterministic results. This PR changes the gradient representation to fixed point number with exponent aligned with rounding factor. [breaking] Drop non-deterministic histogram. Use fixed point for shared memory. This PR is to improve the performance of GPU Hist. Co-authored-by: Andy Adinets <aadinets@nvidia.com>	2021-08-28 05:17:05 +08:00
Jiaming Yuan	e7d7ab6bc3	Better error message for `ncclUnhandledCudaError`. (#7190 )	2021-08-27 10:29:22 +08:00
Jiaming Yuan	9600ca83f3	Remove synchronization in monitor. (#7164 ) * Remove synchronization in monitor. Calling rabit functions during destruction is flaky. * Add xgboost prefix to nvtx marker.	2021-08-11 16:33:53 +08:00
Jiaming Yuan	149f209af6	Extract histogram builder from CPU Hist. (#7152 ) * Extract the CPU histogram builder. * Fix tests. * Reduce number of histograms being built.	2021-08-09 21:15:21 +08:00
Robert Maynard	1a75f43304	Allow compilation with nvcc 11.4 (#7131 ) * Use type aliases for discard iterators * update to include host_vector as thrust 1.12 doesn't bring it in as a side-effect * cub::DispatchRadixSort requires signed offset types	2021-07-27 20:05:33 +08:00
Taewoo Kim	41e882f80b	Check input value is duplicated when quantile queue is full (#7091 ) Co-authored-by: Taewoo Kim <taewoo@layer6.com>	2021-07-23 03:07:01 +08:00
farfarawayzyt	e64ee6592f	fix typo in src/common/hist.cc BuildHistKernel (#7116 )	2021-07-21 19:53:05 +08:00
farfarawayzyt	d7c14496d2	fix typo in arguments of PartitionBuilder::Init (#7113 ) Co-authored-by: Yuntian Zhang <zhangyt@lamda.nju.edu.cn>	2021-07-16 15:46:22 +08:00
Jiaming Yuan	bd1f3a38f0	Rewrite sparse dmatrix using callbacks. (#7092 ) - Reduce dependency on dmlc parsers and provide an interface for users to load data by themselves. - Remove use of threaded iterator and IO queue. - Remove `page_size`. - Make sure the number of pages in memory is bounded. - Make sure the cache can not be violated. - Provide an interface for internal algorithms to process data asynchronously.	2021-07-16 12:33:31 +08:00
Jiaming Yuan	77f6cf2d13	Support hessian in host sketch container. (#7081 ) Prepare for migrating approx onto hist's codebase.	2021-07-08 16:33:58 +08:00
Jiaming Yuan	1cd20efe68	Move `GHistIndex` into `DMatrix`. (#7064 )	2021-07-01 00:44:49 +08:00
Jiaming Yuan	1c8fdf2218	Remove use of `device_idx` in `dh::LaunchN`. (#7063 ) It's an unused parameter, removing it can make the CI log more readable.	2021-06-29 11:37:26 +08:00
Jiaming Yuan	8fa32fdda2	Implement categorical data support for SHAP. (#7053 ) * Add CPU implementation. * Update GPUTreeSHAP. * Add GPU implementation by defining custom split condition.	2021-06-25 19:02:46 +08:00
Jiaming Yuan	86715e4cd4	Support categorical data for dask functional interface and DQM. (#7043 ) * Support categorical data for dask functional interface and DQM. * Implement categorical data support for GPU GK-merge. * Add support for dask functional interface. * Add support for DQM. * Get newer cupy.	2021-06-18 13:06:52 +08:00
ShvetsKS	2567404ab6	Simplify sparse and dense CPU hist kernels (#7029 ) * Simplify sparse and dense kernels * Extract row partitioner. Co-authored-by: Kirill Shvets <kirill.shvets@intel.com>	2021-06-11 18:26:30 +08:00
TP Boudreau	bd2ca543c4	Fix BinarySearchBin() argument types (#7026 )	2021-06-08 19:05:46 +08:00
ShvetsKS	5cdaac00c1	Remove feature grouping (#7018 ) Co-authored-by: Kirill Shvets <kirill.shvets@intel.com>	2021-06-03 04:35:26 +08:00
Andrew Ziem	3e7e426b36	Fix spelling in documents (#6948 ) * Update roxygen2 doc. Co-authored-by: fis <jm.yuan@outlook.com>	2021-05-11 20:44:36 +08:00
Jiaming Yuan	a2ecbdaa31	Add an API guard to prevent global variables being changed. (#6891 )	2021-04-23 10:27:57 +08:00
Jiaming Yuan	1b26a2a561	Copy output data for argsort. (#6866 ) Fix GPU AUC.	2021-04-16 21:05:01 +08:00
Jiaming Yuan	f294c4e023	Use constexpr in `dh::CopyIf`. (#6828 )	2021-04-08 07:37:47 +08:00
Jiaming Yuan	7bcc8b3e5c	Use batched copy if. (#6826 )	2021-04-06 10:34:04 +08:00
Jiaming Yuan	3039dd194b	Don't estimate sketch batch size when rmm is used. (#6807 )	2021-03-31 15:29:56 +08:00
ShvetsKS	8825670c9c	Memory consumption fix for row-major adapters (#6779 ) Co-authored-by: Kirill Shvets <kirill.shvets@intel.com> Co-authored-by: fis <jm.yuan@outlook.com>	2021-03-26 08:44:30 +08:00
Jiaming Yuan	a7083d3c13	Fix dart inplace prediction with GPU input. (#6777 ) * Fix dart inplace predict with data on GPU, which might trigger a fatal check for device access right. * Avoid copying data whenever possible.	2021-03-25 12:00:32 +08:00

... 2 3 4 5 6 ...

495 Commits