xgboost

Author	SHA1	Message	Date
Jiaming Yuan	2775c2a1ab	Prepare external memory support for hist. (#7638 ) This PR prepares the GHistIndexMatrix to host the column matrix which is used by the hist tree method by accepting sparse_threshold parameter. Some cleanups are made to ensure the correct batch param is being passed into DMatrix along with some additional tests for correctness of SimpleDMatrix.	2022-02-10 16:58:02 +08:00
Jiaming Yuan	81210420c6	Remove `omp_get_max_threads` (#7608 ) This is the one last PR for removing omp global variable. * Add context object to the `DMatrix`. This bridges `DMatrix` with https://github.com/dmlc/xgboost/issues/7308 . * Require context to be available at the construction time of booster. * Add `n_threads` support for R csc DMatrix constructor. * Remove `omp_get_max_threads` in R glue code. * Remove threading utilities that rely on omp global variable.	2022-01-28 16:09:22 +08:00
Jiaming Yuan	e060519d4f	Avoid regenerating the gradient index for approx. (#7591 )	2022-01-26 21:41:30 +08:00
Jiaming Yuan	5d7818e75d	Remove `omp_get_max_threads` in tree updaters. (#7590 )	2022-01-26 19:55:47 +08:00
Jiaming Yuan	6967ef7267	Remove `omp_get_max_threads` in objective. (#7589 )	2022-01-24 04:35:49 +08:00
Jiaming Yuan	5817840858	Remove `omp_get_max_threads` in data. (#7588 )	2022-01-24 02:44:07 +08:00
Jiaming Yuan	dac9eb13bd	Implement new `save_raw` in Python. (#7572 ) * Expose the new C API function to Python. * Remove old document and helper script. * Small optimization to the `save_raw` and Json ctors.	2022-01-19 02:27:51 +08:00
Jiaming Yuan	bb56bb9a13	Fix merge conflict. (#7577 )	2022-01-18 23:01:34 +08:00
Jiaming Yuan	cc06fab9a7	Support distributed CPU env for categorical data. (#7575 ) * Add support for cat data in sketch allreduce. * Share tests between CPU and GPU.	2022-01-18 21:56:07 +08:00
Jiaming Yuan	deab0e32ba	Validate out of range categorical value. (#7576 ) * Use float in CPU categorical set to preserve the input value. * Check out of range values.	2022-01-18 20:16:19 +08:00
Jiaming Yuan	a1bcd33a3b	[breaking] Change internal model serialization to UBJSON. (#7556 ) * Use typed array for models. * Change the memory snapshot format. * Add new C API for saving to raw format.	2022-01-16 02:11:53 +08:00
Jiaming Yuan	52277cc3da	Rename build info function to be consistent with rest of the API. (#7553 )	2022-01-14 00:39:28 +08:00
Jiaming Yuan	e94b766310	Fix early stopping with linear model. (#7554 )	2022-01-13 21:53:06 +08:00
Jiaming Yuan	e5e47c3c99	Clarify the behavior of invalid categorical value handling. (#7529 )	2022-01-13 16:11:52 +08:00
Philip Hyunsu Cho	20c0d60ac7	Restore functionality of max_depth=0 in hist (#7551 ) * Restore functionality of max_depth=0 in hist * Add test case	2022-01-11 01:37:44 +08:00
Jiaming Yuan	2db808021d	Silent some warnings for unused variable. (#7548 )	2022-01-11 01:16:26 +08:00
Jiaming Yuan	c635d4c46a	Implement ubjson. (#7549 ) * Implement ubjson. This is a partial implementation of UBJSON with support for typed arrays. Some missing features are `f64`, typed object, and the no-op.	2022-01-10 23:24:23 +08:00
Jiaming Yuan	001503186c	Rewrite approx (#7214 ) This PR rewrites the approx tree method to use codebase from hist for better performance and code sharing. The rewrite has many benefits: - Support for both `max_leaves` and `max_depth`. - Support for `grow_policy`. - Support for mono constraint. - Support for feature weights. - Support for easier bin configuration (`max_bin`). - Support for categorical data. - Faster performance for most of the datasets. (many times faster) - Support for prediction cache. - Significantly better performance for external memory. - Unites the code base between approx and hist.	2022-01-10 21:15:05 +08:00
Jiaming Yuan	91c1a1c52f	Fix index type for bitfield. (#7541 )	2022-01-05 19:23:29 +08:00
Jiaming Yuan	0df2ae63c7	Fix num_boosted_rounds for linear model. (#7538 ) * Add note. * Fix n boosted rounds.	2022-01-05 03:29:33 +08:00
Jiaming Yuan	28af6f9abb	Remove `omp_get_max_threads` in gbm and linear. (#7537 ) * Use ctx in gbm. * Use ctx threads in gbm and linear.	2022-01-05 03:28:52 +08:00
Jiaming Yuan	eea094e1bc	Remove some warnings from clang. (#7533 ) * Unused variable. * Unnecessary virtual function.	2022-01-05 03:28:21 +08:00
Jiaming Yuan	68cdbc9c16	Remove `omp_get_max_threads` in CPU predictor. (#7519 ) This is part of the on going effort to remove the dependency on global omp variables.	2022-01-04 22:12:15 +08:00
Ikko Ashimine	5516281881	Fix typo in tree_model.cc (#7539 ) occurance -> occurrence	2021-12-30 20:12:25 +08:00
Ginko Balboa	29bfa94bb6	Fix external memory with gpu_hist and subsampling combination bug. (#7481 ) Instead of accessing data from the `original_page_`, access the data from the first page of the available batch. fix #7476 Co-authored-by: jiamingy <jm.yuan@outlook.com>	2021-12-24 11:15:35 +08:00
Jiaming Yuan	7f399eac8b	Use double for GPU Hist node sum. (#7507 )	2021-12-22 08:41:35 +08:00
Jiaming Yuan	58a6723eb1	Initial support for multioutput regression. (#7514 ) * Add num target model parameter, which is configured from input labels. * Change elementwise metric and indexing for weights. * Add demo. * Add tests.	2021-12-18 09:28:38 +08:00
Jiaming Yuan	9ab73f737e	Extract Sketch Entry from hist maker. (#7503 ) * Extract Sketch Entry from hist maker. * Add a new sketch container for sorted inputs. * Optimize bin search.	2021-12-18 05:36:56 +08:00
Jiaming Yuan	5b1161bb64	Convert labels into tensor. (#7456 ) * Add a new ctor to tensor for `initilizer_list`. * Change labels from host device vector to tensor. * Rename the field from `labels_` to `labels` since it's a public member.	2021-12-17 00:58:35 +08:00
Jiaming Yuan	70b12d898a	[dask] Fix ddqdm with empty partition. (#7510 ) * Fix empty partition. * war.	2021-12-16 20:37:29 +08:00
Jiaming Yuan	01152f89ee	Remove unused parameters. (#7499 )	2021-12-09 14:24:51 +08:00
Jiaming Yuan	eee527d264	Add approx partitioner. (#7467 )	2021-11-27 15:22:06 +08:00
Jiaming Yuan	85cbd32c5a	Add range-based slicing to tensor view. (#7453 )	2021-11-27 13:42:36 +08:00
Jiaming Yuan	557ffc4bf5	Reduce base margin to 2 dim for now. (#7455 )	2021-11-27 00:46:13 +08:00
Jiaming Yuan	5262e933f7	Remove unnecessary constexpr. (#7466 )	2021-11-23 16:42:08 +08:00
Jiaming Yuan	176110a22d	Support external memory in CPU histogram building. (#7372 )	2021-11-23 01:13:33 +08:00
Jiaming Yuan	d33854af1b	[Breaking] Accept multi-dim meta info. (#7405 ) This PR changes base_margin into a 3-dim array, with one of them being reserved for multi-target classification. Also, a breaking change is made for binary serialization due to extra dimension along with a fix for saving the feature weights. Lastly, it unifies the prediction initialization between CPU and GPU. After this PR, the meta info setter in Python will be based on array interface.	2021-11-18 23:02:54 +08:00
Jiaming Yuan	b0015fda96	Fix R CRAN failures. (#7404 ) * Remove hist builder dtor. * Initialize values. * Tolerance. * Remove the use of nthread in col maker.	2021-11-16 10:51:12 +08:00
Jiaming Yuan	55ee272ea8	Extend array interface to handle ndarray. (#7434 ) * Extend array interface to handle ndarray. The `ArrayInterface` class is extended to support multi-dim array inputs. Previously this class handles only 2-dim (vector is also matrix). This PR specifies the expected dimension at compile-time and the array interface can perform various checks automatically for input data. Also, adapters like CSR are more rigorous about their input. Lastly, row vector and column vector are handled without intervention from the caller.	2021-11-16 09:52:15 +08:00
Jiaming Yuan	d4274bc556	Fix typo. (#7433 )	2021-11-15 01:28:11 +08:00
Jiaming Yuan	a7057fa64c	Implement typed storage for tensor. (#7429 ) * Add `Tensor` class. * Add elementwise kernel for CPU and GPU. * Add unravel index. * Move some computation to compile time.	2021-11-14 18:53:13 +08:00
Jiaming Yuan	46726ec176	Expose build info (#7399 )	2021-11-12 18:22:46 +08:00
Jiaming Yuan	937fa282b5	Extract string view. (#7416 ) * Add equality operators. * Return a view in substr. * Add proper iterator types.	2021-11-12 18:22:30 +08:00
Jiaming Yuan	ca6f980932	Check number of trees in inplace predict. (#7409 )	2021-11-12 18:20:23 +08:00
Jiaming Yuan	d7d1b6e3a6	CPU evaluation for cat data. (#7393 ) * Implementation for one hot based. * Implementation for partition based. (LightGBM)	2021-11-06 14:41:35 +08:00
Jiaming Yuan	6ede12412c	Update dmlc-core and use data iter for GPU sampling tests. (#7398 ) * Update dmlc-core. * New parquet parser in dmlc-core. * Use data iter for GPU sampling tests.	2021-11-06 05:12:49 +08:00
Jiaming Yuan	c968217ca8	[R] Fix global feature importance and predict with 1 sample. (#7394 ) * [R] Fix global feature importance. * Add implementation for tree index. The parameter is not documented in C API since we should work on porting the model slicing to R instead of supporting more use of tree index. * Fix the difference between "gain" and "total_gain". * debug. * Fix prediction.	2021-11-05 10:07:00 +08:00
Jiaming Yuan	b06040b6d0	Implement a general array view. (#7365 ) * Replace existing matrix and vector view. This is to prepare for handling higher dimension data and prediction when we support multi-target models.	2021-11-05 04:16:11 +08:00
Jiaming Yuan	4100827971	Pass infomation about objective to tree methods. (#7385 ) * Define the `ObjInfo` and pass it down to every tree updater.	2021-11-04 01:52:44 +08:00
Jiaming Yuan	ccdabe4512	Support building gradient index with cat data. (#7371 )	2021-11-03 22:37:37 +08:00

... 4 5 6 7 8 ...

1466 Commits