xgboost

Author	SHA1	Message	Date
Jiaming Yuan	bf7bb575b4	Test CPU histogram with cat data. (#7465 )	2021-11-27 00:43:28 +08:00
Jiaming Yuan	176110a22d	Support external memory in CPU histogram building. (#7372 )	2021-11-23 01:13:33 +08:00
Jiaming Yuan	d33854af1b	[Breaking] Accept multi-dim meta info. (#7405 ) This PR changes base_margin into a 3-dim array, with one of them being reserved for multi-target classification. Also, a breaking change is made for binary serialization due to extra dimension along with a fix for saving the feature weights. Lastly, it unifies the prediction initialization between CPU and GPU. After this PR, the meta info setter in Python will be based on array interface.	2021-11-18 23:02:54 +08:00
Jiaming Yuan	9fb4338964	Add test for eta and mitigate float error. (#7446 ) * Add eta test. * Don't skip test.	2021-11-18 20:42:48 +08:00
Philip Hyunsu Cho	2adf222fb2	[CI] CI cost saving (#7407 ) * [CI] Drop CUDA 10.1; Require 11.0 * Change NCCL version * Use CUDA 10.1 for clang-tidy, for now * Remove JDK 11 and 12 * Fix NCCL version * Don't require 11.0 just yet, until clang-tidy is fixed * Skip MultiClassesSerializationTest.GpuHist	2021-11-17 21:02:20 -08:00
Jiaming Yuan	55ee272ea8	Extend array interface to handle ndarray. (#7434 ) * Extend array interface to handle ndarray. The `ArrayInterface` class is extended to support multi-dim array inputs. Previously this class handles only 2-dim (vector is also matrix). This PR specifies the expected dimension at compile-time and the array interface can perform various checks automatically for input data. Also, adapters like CSR are more rigorous about their input. Lastly, row vector and column vector are handled without intervention from the caller.	2021-11-16 09:52:15 +08:00
Jiaming Yuan	a7057fa64c	Implement typed storage for tensor. (#7429 ) * Add `Tensor` class. * Add elementwise kernel for CPU and GPU. * Add unravel index. * Move some computation to compile time.	2021-11-14 18:53:13 +08:00
Jiaming Yuan	46726ec176	Expose build info (#7399 )	2021-11-12 18:22:46 +08:00
Jiaming Yuan	937fa282b5	Extract string view. (#7416 ) * Add equality operators. * Return a view in substr. * Add proper iterator types.	2021-11-12 18:22:30 +08:00
Jiaming Yuan	ca6f980932	Check number of trees in inplace predict. (#7409 )	2021-11-12 18:20:23 +08:00
Jiaming Yuan	d7d1b6e3a6	CPU evaluation for cat data. (#7393 ) * Implementation for one hot based. * Implementation for partition based. (LightGBM)	2021-11-06 14:41:35 +08:00
Jiaming Yuan	6ede12412c	Update dmlc-core and use data iter for GPU sampling tests. (#7398 ) * Update dmlc-core. * New parquet parser in dmlc-core. * Use data iter for GPU sampling tests.	2021-11-06 05:12:49 +08:00
Jiaming Yuan	c968217ca8	[R] Fix global feature importance and predict with 1 sample. (#7394 ) * [R] Fix global feature importance. * Add implementation for tree index. The parameter is not documented in C API since we should work on porting the model slicing to R instead of supporting more use of tree index. * Fix the difference between "gain" and "total_gain". * debug. * Fix prediction.	2021-11-05 10:07:00 +08:00
Jiaming Yuan	b06040b6d0	Implement a general array view. (#7365 ) * Replace existing matrix and vector view. This is to prepare for handling higher dimension data and prediction when we support multi-target models.	2021-11-05 04:16:11 +08:00
Jiaming Yuan	4100827971	Pass infomation about objective to tree methods. (#7385 ) * Define the `ObjInfo` and pass it down to every tree updater.	2021-11-04 01:52:44 +08:00
Jiaming Yuan	ccdabe4512	Support building gradient index with cat data. (#7371 )	2021-11-03 22:37:37 +08:00
Jiaming Yuan	a13321148a	Support multi-class with base margin. (#7381 ) This is already partially supported but never properly tested. So the only possible way to use it is calling `numpy.ndarray.flatten` with `base_margin` before passing it into XGBoost. This PR adds proper support for most of the data types along with tests.	2021-11-02 13:38:00 +08:00
Jiaming Yuan	6295dc3b67	Fix span reverse iterator. (#7387 ) * Fix span reverse iterator. * Disable `rbegin` on device code to avoid calling host function. * Add `trbegin` and friends.	2021-11-02 13:35:59 +08:00
Jiaming Yuan	ac9bfaa4f2	Handle missing values in dataframe with category dtype. (#7331 ) * Replace -1 in pandas initializer. * Unify `IsValid` functor. * Mimic pandas data handling in cuDF glue code. * Check invalid categories. * Fix DDM sketching.	2021-10-28 03:33:54 +08:00
Jiaming Yuan	d4349426d8	Re-implement PR-AUC. (#7297 ) * Support binary/multi-class classification, ranking. * Add documents. * Handle missing data.	2021-10-26 13:07:50 +08:00
Jiaming Yuan	d1f00fb0b7	Stricter validation for group. (#7345 )	2021-10-21 12:13:33 +08:00
Jiaming Yuan	8d7c6366d7	Accept histogram cut instead gradient index in evaluation. (#7336 )	2021-10-20 18:04:46 +08:00
Jiaming Yuan	fb1a9e6bc5	Avoid omp reduction in coordinate descent and aft metrics. (#7316 ) Aside from the omp issue, parameter configuration for aft metric is simplified.	2021-10-17 15:55:49 +08:00
Jiaming Yuan	8e619010d0	Extract CPUExpandEntry and HistParam. (#7321 ) * Remove kRootNid. * Check for empty hessian.	2021-10-17 14:22:25 +08:00
Jiaming Yuan	4ddf8d001c	Deterministic result for element-wise/mclass metrics. (#7303 ) Remove openmp reduction.	2021-10-13 14:22:40 +08:00
Jiaming Yuan	130df8cdda	Add tests for tree grow policy. (#7302 )	2021-10-12 15:04:06 +08:00
Jiaming Yuan	298af6f409	Fix weighted samples in multi-class AUC. (#7300 )	2021-10-11 15:12:29 +08:00
Jiaming Yuan	d8a549e6ac	Avoid thread block with sparse data. (#7255 )	2021-09-25 13:11:34 +08:00
Jiaming Yuan	31c1e13f90	Categorical data support in CPU sketching. (#7221 )	2021-09-17 04:37:09 +08:00
Jiaming Yuan	0ed979b096	Support more input types for categorical data. (#7220 ) * Support more input types for categorical data. * Shorten the type name from "categorical" to "c". * Tests for np/cp array and scipy csr/csc/coo. * Specify the type for feature info.	2021-09-16 20:39:30 +08:00
Jiaming Yuan	2942dc68e4	Fix mixed types in GPU sketching. (#7228 )	2021-09-16 00:10:25 +08:00
Jiaming Yuan	3515931305	Initial support for external memory in gradient index. (#7183 ) * Add hessian to batch param in preparation of new approx impl. * Extract a push method for gradient index matrix. * Use span instead of vector ref for hessian in sketching. * Create a binary format for gradient index.	2021-09-13 12:40:56 +08:00
Jiaming Yuan	b12e7f7edd	Add noexcept to JSON objects. (#7205 )	2021-09-07 13:56:48 +08:00
Jiaming Yuan	7a1d67f9cb	[breaking] Use integer atomic for GPU histogram. (#7180 ) On GPU we use rouding factor to truncate the gradient for deterministic results. This PR changes the gradient representation to fixed point number with exponent aligned with rounding factor. [breaking] Drop non-deterministic histogram. Use fixed point for shared memory. This PR is to improve the performance of GPU Hist. Co-authored-by: Andy Adinets <aadinets@nvidia.com>	2021-08-28 05:17:05 +08:00
Jiaming Yuan	149f209af6	Extract histogram builder from CPU Hist. (#7152 ) * Extract the CPU histogram builder. * Fix tests. * Reduce number of histograms being built.	2021-08-09 21:15:21 +08:00
Jiaming Yuan	d080b5a953	Fix model slicing. (#7149 ) * Use correct pointer. * Remove best_iteration/best_score.	2021-08-03 11:51:56 +08:00
Jiaming Yuan	7ee7a95b84	Use upstream URI in distributed quantile tests. (#7129 ) * Use upstream URI in distributed quantile tests. * Fix test cv `PytestAssertRewriteWarning`.	2021-07-27 14:09:49 +08:00
Jiaming Yuan	e6088366df	Export Python Interface for external memory. (#7070 ) * Add Python iterator interface. * Add tests. * Add demo. * Add documents. * Handle empty dataset.	2021-07-22 15:15:53 +08:00
Jiaming Yuan	bd1f3a38f0	Rewrite sparse dmatrix using callbacks. (#7092 ) - Reduce dependency on dmlc parsers and provide an interface for users to load data by themselves. - Remove use of threaded iterator and IO queue. - Remove `page_size`. - Make sure the number of pages in memory is bounded. - Make sure the cache can not be violated. - Provide an interface for internal algorithms to process data asynchronously.	2021-07-16 12:33:31 +08:00
Jiaming Yuan	345796825f	Optional find dependency in installed cmake config. (#7099 ) * Find dependency only when xgboost is built as static library. * Resolve msvc warning. * Add test for linking shared library.	2021-07-11 17:20:55 +08:00
Jiaming Yuan	77f6cf2d13	Support hessian in host sketch container. (#7081 ) Prepare for migrating approx onto hist's codebase.	2021-07-08 16:33:58 +08:00
Jiaming Yuan	84d359efb8	Support host data in proxy DMatrix. (#7087 )	2021-07-08 11:35:48 +08:00
Jiaming Yuan	5d7cdf2e36	[Breaking] Rename Quantile DMatrix C API. (#7082 ) The role of ProxyDMatrix is going beyond what it was designed. Now it's used by both QuantileDeviceDMatrix and inplace prediction. After the refactoring of sparse DMatrix it will also be used for external memory. Renaming the C API to extract it from QuantileDeviceDMatrix.	2021-07-08 11:34:14 +08:00
Jiaming Yuan	c766f143ab	Refactor external memory formats. (#7089 ) * Save base_rowid. * Return write size. * Remove unused function.	2021-07-08 04:04:51 +08:00
Jiaming Yuan	615ab2b03e	Extract evaluate splits from CPU hist. (#7079 ) Other than modularizing the split evaluation function, this PR also removes some more functions including `InitNewNodes` and `BuildNodeStats` among some other unused variables. Also, scattered code like setting leaf weights is grouped into the split evaluator and `NodeEntry` is simplified and made private. Another subtle difference with the original implementation is that the modified code doesn't call `tree[nidx].Parent()` to traversal upward.	2021-07-07 15:16:25 +08:00
Jiaming Yuan	1cd20efe68	Move `GHistIndex` into `DMatrix`. (#7064 )	2021-07-01 00:44:49 +08:00
Jiaming Yuan	1c8fdf2218	Remove use of `device_idx` in `dh::LaunchN`. (#7063 ) It's an unused parameter, removing it can make the CI log more readable.	2021-06-29 11:37:26 +08:00
Jiaming Yuan	8fa32fdda2	Implement categorical data support for SHAP. (#7053 ) * Add CPU implementation. * Update GPUTreeSHAP. * Add GPU implementation by defining custom split condition.	2021-06-25 19:02:46 +08:00
Jiaming Yuan	29f8fd6fee	Support categorical split in tree model dump. (#7036 )	2021-06-18 16:46:20 +08:00
Jiaming Yuan	7968c0d051	Test on s390x. (#7038 ) * Fix && remove unused parameter.	2021-06-18 14:55:08 +08:00

1 2 3 4 5 ...

493 Commits