xgboost

Author	SHA1	Message	Date
Jiaming Yuan	3c9b04460a	Move `num_parallel_tree` to model parameter. (#7751 ) The size of forest should be a property of model itself instead of a training hyper-parameter.	2022-03-29 02:32:42 +08:00
Jiaming Yuan	81210420c6	Remove `omp_get_max_threads` (#7608 ) This is the one last PR for removing omp global variable. * Add context object to the `DMatrix`. This bridges `DMatrix` with https://github.com/dmlc/xgboost/issues/7308 . * Require context to be available at the construction time of booster. * Add `n_threads` support for R csc DMatrix constructor. * Remove `omp_get_max_threads` in R glue code. * Remove threading utilities that rely on omp global variable.	2022-01-28 16:09:22 +08:00
Jiaming Yuan	a1bcd33a3b	[breaking] Change internal model serialization to UBJSON. (#7556 ) * Use typed array for models. * Change the memory snapshot format. * Add new C API for saving to raw format.	2022-01-16 02:11:53 +08:00
Jiaming Yuan	e94b766310	Fix early stopping with linear model. (#7554 )	2022-01-13 21:53:06 +08:00
Jiaming Yuan	001503186c	Rewrite approx (#7214 ) This PR rewrites the approx tree method to use codebase from hist for better performance and code sharing. The rewrite has many benefits: - Support for both `max_leaves` and `max_depth`. - Support for `grow_policy`. - Support for mono constraint. - Support for feature weights. - Support for easier bin configuration (`max_bin`). - Support for categorical data. - Faster performance for most of the datasets. (many times faster) - Support for prediction cache. - Significantly better performance for external memory. - Unites the code base between approx and hist.	2022-01-10 21:15:05 +08:00
Jiaming Yuan	0df2ae63c7	Fix num_boosted_rounds for linear model. (#7538 ) * Add note. * Fix n boosted rounds.	2022-01-05 03:29:33 +08:00
Jiaming Yuan	28af6f9abb	Remove `omp_get_max_threads` in gbm and linear. (#7537 ) * Use ctx in gbm. * Use ctx threads in gbm and linear.	2022-01-05 03:28:52 +08:00
Jiaming Yuan	d33854af1b	[Breaking] Accept multi-dim meta info. (#7405 ) This PR changes base_margin into a 3-dim array, with one of them being reserved for multi-target classification. Also, a breaking change is made for binary serialization due to extra dimension along with a fix for saving the feature weights. Lastly, it unifies the prediction initialization between CPU and GPU. After this PR, the meta info setter in Python will be based on array interface.	2021-11-18 23:02:54 +08:00
Jiaming Yuan	ca6f980932	Check number of trees in inplace predict. (#7409 )	2021-11-12 18:20:23 +08:00
Jiaming Yuan	c968217ca8	[R] Fix global feature importance and predict with 1 sample. (#7394 ) * [R] Fix global feature importance. * Add implementation for tree index. The parameter is not documented in C API since we should work on porting the model slicing to R instead of supporting more use of tree index. * Fix the difference between "gain" and "total_gain". * debug. * Fix prediction.	2021-11-05 10:07:00 +08:00
Jiaming Yuan	b06040b6d0	Implement a general array view. (#7365 ) * Replace existing matrix and vector view. This is to prepare for handling higher dimension data and prediction when we support multi-target models.	2021-11-05 04:16:11 +08:00
Jiaming Yuan	4100827971	Pass infomation about objective to tree methods. (#7385 ) * Define the `ObjInfo` and pass it down to every tree updater.	2021-11-04 01:52:44 +08:00
Jiaming Yuan	3a4f51f39f	Avoid calling CUDA code on CPU for linear model. (#7154 )	2021-09-01 10:45:31 +08:00
Jiaming Yuan	d080b5a953	Fix model slicing. (#7149 ) * Use correct pointer. * Remove best_iteration/best_score.	2021-08-03 11:51:56 +08:00
Jiaming Yuan	1c8fdf2218	Remove use of `device_idx` in `dh::LaunchN`. (#7063 ) It's an unused parameter, removing it can make the CI log more readable.	2021-06-29 11:37:26 +08:00
Jiaming Yuan	663136aa08	Implement feature score for linear model. (#7048 ) * Add feature score support for linear model. * Port R interface to the new implementation. * Add linear model support in Python. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2021-06-25 14:34:02 +08:00
Jiaming Yuan	7dd29ffd47	Implement feature score in GBTree. (#7041 ) * Categorical data support. * Eliminate text parsing during feature score computation.	2021-06-18 11:53:16 +08:00
Jiaming Yuan	5c2d7a18c9	Parallel model dump for trees. (#7040 )	2021-06-15 14:08:26 +08:00
Jiaming Yuan	72f9daf9b6	Fix `gpu_id` with custom objective. (#7015 )	2021-06-09 14:51:17 +08:00
Jiaming Yuan	816b789bf0	Add predictor to skl constructor. (#7000 )	2021-05-29 04:52:56 +08:00
Jiaming Yuan	86e60e3ba8	Guard against index error in prediction. (#6982 ) * Remove `best_ntree_limit` from documents.	2021-05-25 23:24:59 +08:00
Livius	a4886c404a	Fix compilation error on x86 (#6964 ) Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>	2021-05-14 13:31:49 +08:00
Andrew Ziem	3e7e426b36	Fix spelling in documents (#6948 ) * Update roxygen2 doc. Co-authored-by: fis <jm.yuan@outlook.com>	2021-05-11 20:44:36 +08:00
Jiaming Yuan	556a83022d	Implement unified update prediction cache for (gpu_)hist. (#6860 ) * Implement utilites for linalg. * Unify the update prediction cache functions. * Implement update prediction cache for multi-class gpu hist.	2021-04-17 00:29:34 +08:00
Jiaming Yuan	79b8b560d2	Optimize dart inplace predict perf. (#6804 )	2021-03-31 15:20:54 +08:00
Jiaming Yuan	a7083d3c13	Fix dart inplace prediction with GPU input. (#6777 ) * Fix dart inplace predict with data on GPU, which might trigger a fatal check for device access right. * Avoid copying data whenever possible.	2021-03-25 12:00:32 +08:00
Louis Desreumaux	9b530e5697	Improve OpenMP exception handling (#6680 )	2021-02-25 13:56:16 +08:00
ShvetsKS	9a0399e898	Removed unnecessary PredictBatch calls (#6700 ) Co-authored-by: Shvets Kirill <kirill.shvets@intel.com>	2021-02-10 20:15:14 +08:00
Jiaming Yuan	e8c5c53e2f	Use `Predictor` for `dart`. (#6693 ) * Use normal predictor for dart booster. * Implement `inplace_predict` for dart. * Enable `dart` for dask interface now that it's thread-safe. * categorical data should be working out of box for dart now. The implementation is not very efficient as it has to pull back the data and apply weight for each tree, but still a significant improvement over previous implementation as now we no longer binary search for each sample. * Fix output prediction shape on dataframe.	2021-02-09 23:30:19 +08:00
Jiaming Yuan	4656b09d5d	[breaking] Add prediction fucntion for DMatrix and use inplace predict for dask. (#6668 ) * Add a new API function for predicting on `DMatrix`. This function aligns with rest of the `XGBoosterPredictFrom` functions on semantic of function arguments. Purge `ntree_limit` from libxgboost, use iteration instead. * [dask] Use `inplace_predict` by default for dask sklearn models. * [dask] Run prediction shape inference on worker instead of client. The breaking change is in the Python sklearn `apply` function, I made it to be consistent with other prediction functions where `best_iteration` is used by default.	2021-02-08 18:26:32 +08:00
Jiaming Yuan	411592a347	Enhance inplace prediction. (#6653 ) * Accept array interface for csr and array. * Accept an optional proxy dmatrix for metainfo. This constructs an explicit `_ProxyDMatrix` type in Python. * Remove unused doc. * Add strict output.	2021-02-02 11:41:46 +08:00
Jiaming Yuan	d132933550	Remove type check for solaris. (#6610 )	2021-01-16 02:58:19 +08:00
ShvetsKS	7f4d3a91b9	Multiclass prediction caching for CPU Hist (#6550 ) Co-authored-by: Kirill Shvets <kirill.shvets@intel.com>	2021-01-13 04:42:07 +08:00
Jiaming Yuan	f2f7dd87b8	Use view for `SparsePage` exclusively. (#6590 )	2021-01-11 18:04:55 +08:00
Jiaming Yuan	ca3da55de4	Support early stopping with training continuation, correct num boosted rounds. (#6506 ) * Implement early stopping with training continuation. * Add new C API for obtaining boosted rounds. * Fix off by 1 in `save_best`. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2020-12-17 19:59:19 +08:00
Jiaming Yuan	8a17610666	Implement GPU predict leaf. (#6187 )	2020-11-11 17:33:47 +08:00
Jack Dunn	51e6531315	Fix missing space in warning message (#6340 )	2020-11-04 06:03:16 -05:00
Jiaming Yuan	2cc9662005	Support slicing tree model (#6302 ) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2020-11-02 23:27:39 -08:00
Igor Moura	5e1e972aea	Clean up warnings (#6325 )	2020-10-30 23:50:29 +08:00
Jiaming Yuan	3310e208fd	Fix inplace prediction interval. (#6259 ) * Add back the interval in call. * Make the interval non-optional.	2020-10-28 13:13:59 +08:00
Jiaming Yuan	cc76724762	Reduce warning. (#6273 )	2020-10-27 12:24:19 -07:00
Igor Moura	d1254808d5	Clean up C++ warnings (#6213 )	2020-10-19 23:02:33 +08:00
Jiaming Yuan	5037abeb86	Fix linear gpu input (#6255 )	2020-10-19 12:02:36 +08:00
Rory Mitchell	dda9e1e487	Update GPUTreeshap (#6163 ) * Reduce shap test duration * Test interoperability with shap package * Add feature interactions * Update GPUTreeShap	2020-09-28 09:43:47 +13:00
Rory Mitchell	9a4e8b1d81	GPUTreeShap (#6038 )	2020-08-25 12:47:41 +12:00
Qi Zhang	989ddd036f	Swap byte-order in binary serializer to support big-endian arch (#5813 ) * fixed some endian issues * Use dmlc::ByteSwap() to simplify code * Fix lint check * [CI] Add test for s390x * Download latest CMake on s390x * Fix a bug in my code * Save magic number in dmatrix with byteswap on big-endian machine * Save version in binary with byteswap on big-endian machine * Load scalar with byteswap in MetaInfo * Add a debugging message * Handle arrays correctly when byteswapping * EOF can also be 255 * Handle magic number in MetaInfo carefully * Skip Tree.Load test for big-endian, since the test manually builds little-endian binary model * Handle missing packages in Python tests * Don't use boto3 in model compatibility tests * Add s390 Docker file for local testing * Add model compatibility tests * Add R compatibility test * Revert "Add R compatibility test" This reverts commit c2d2bdcb7dbae133cbb927fcd20f7e83ee2b18a8. Co-authored-by: Qi Zhang <q.zhang@ibm.com> Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2020-08-18 14:47:17 -07:00
Vladislav Epifanov	388f975cf5	Introducing DPC++-based plugin (predictor, objective function) supporting oneAPI programming model (#5825 ) * Added plugin with DPC++-based predictor and objective function * Update CMakeLists.txt * Update regression_obj_oneapi.cc * Added README.md for OneAPI plugin * Added OneAPI predictor support to gbtree * Update README.md * Merged kernels in gradient computation. Enabled multiple loss functions with DPC++ backend * Aligned plugin CMake files with latest master changes. Fixed whitespace typos * Removed debug output * [CI] Make oneapi_plugin a CMake target * Added tests for OneAPI plugin for predictor and obj. functions * Temporarily switched to default selector for device dispacthing in OneAPI plugin to enable execution in environments without gpus * Updated readme file. * Fixed USM usage in predictor * Removed workaround with explicit templated names for DPC++ kernels * Fixed warnings in plugin tests * Fix CMake build of gtest Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2020-08-08 18:40:40 -07:00
Jiaming Yuan	9c6e791e64	Enforce tree order in JSON. (#5974 ) * Make JSON model IO more future proof by using tree id in model loading.	2020-08-05 16:44:52 +08:00
Philip Hyunsu Cho	1d22a9be1c	Revert "Reorder includes. (#5749 )" (#5771 ) This reverts commit `d3a0efbf16`.	2020-06-09 10:29:28 -07:00
Jiaming Yuan	d3a0efbf16	Reorder includes. (#5749 ) * Reorder includes. * R.	2020-06-03 17:30:47 +12:00

1 2 3 4

194 Commits