xgboost

Author	SHA1	Message	Date
amdsc21	acad01afc9	sync Mar 29	2023-03-30 00:46:50 +02:00
Rong Ou	ff26cd3212	More tests for column split and vertical federated learning (#8985 ) Added some more tests for the learner and fit_stump, for both column-wise distributed learning and vertical federated learning. Also moved the `IsRowSplit` and `IsColumnSplit` methods from the `DMatrix` to the `MetaInfo` since in some places we only have access to the `MetaInfo`. Added a new convenience method `IsVerticalFederatedLearning`. Some refactoring of the testing fixtures.	2023-03-28 16:40:26 +08:00
amdsc21	204d0c9a53	add hip tests	2023-03-11 00:38:16 +01:00
Jiaming Yuan	d11a0044cf	Generalize prediction cache. (#8783 ) * Extract most of the functionality into `DMatrixCache`. * Move API entry to independent file to reduce dependency on `predictor.h` file. * Add test. --------- Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2023-02-13 12:36:43 +08:00
Jiaming Yuan	8d545ab2a2	Implement fit stump. (#8607 )	2023-01-04 04:14:51 +08:00
Rong Ou	3ceeb8c61c	Add data split mode to DMatrix MetaInfo (#8568 )	2022-12-25 20:37:37 +08:00
Jiaming Yuan	3e26107a9c	Rename and extract `Context`. (#8528 ) * Rename `GenericParameter` to `Context`. * Rename header file to reflect the change. * Rename all references.	2022-12-07 04:58:54 +08:00
Rong Ou	8e76f5f595	Use `DataSplitMode` to configure data loading (#8434 ) * Use `DataSplitMode` to configure data loading	2022-11-08 16:21:50 +08:00
Jiaming Yuan	031d66ec27	Configuration for init estimation. (#8343 ) * Configuration for init estimation. * Check whether the model needs configuration based on const attribute `ModelFitted` instead of a mutable state. * Add parameter `boost_from_average` to tell whether the user has specified base score. * Add tests.	2022-10-18 01:52:24 +08:00
Jiaming Yuan	fffb1fca52	Calculate `base_score` based on input labels for mae. (#8107 ) Fit an intercept as base score for abs loss.	2022-09-20 20:53:54 +08:00
Jiaming Yuan	bc818316f2	Prepare for improving Windows networking compatibility. (#8234 ) * Prepare for improving Windows networking compatibility. * Include dmlc filesystem indirectly as dmlc/filesystem.h includes windows.h, which conflicts with winsock2.h * Define `NOMINMAX` conditionally. * Link the winsock library when mysys32 is used. * Add config file for read the doc.	2022-09-10 15:16:49 +08:00
Jiaming Yuan	64575591d8	Use context in `SetInfo`. (#7687 ) * Use the name `Context`. * Pass a context object into `SetInfo`. * Add context to proxy matrix. * Add context to iterative DMatrix. This is to remove the use of the default number of threads during `SetInfo` as a follow-up on removing the global omp variable while preparing for CUDA stream semantic. Currently, XGBoost uses the legacy CUDA stream, we will gradually remove them in the future in favor of non-blocking streams.	2022-03-24 22:16:26 +08:00
Jiaming Yuan	98d6faefd6	Implement slope for Pseduo-Huber. (#7727 ) * Add objective and metric. * Some refactoring for CPU/GPU dispatching using linalg module.	2022-03-14 21:42:38 +08:00
Jiaming Yuan	81210420c6	Remove `omp_get_max_threads` (#7608 ) This is the one last PR for removing omp global variable. * Add context object to the `DMatrix`. This bridges `DMatrix` with https://github.com/dmlc/xgboost/issues/7308 . * Require context to be available at the construction time of booster. * Add `n_threads` support for R csc DMatrix constructor. * Remove `omp_get_max_threads` in R glue code. * Remove threading utilities that rely on omp global variable.	2022-01-28 16:09:22 +08:00
Jiaming Yuan	58a6723eb1	Initial support for multioutput regression. (#7514 ) * Add num target model parameter, which is configured from input labels. * Change elementwise metric and indexing for weights. * Add demo. * Add tests.	2021-12-18 09:28:38 +08:00
Jiaming Yuan	5b1161bb64	Convert labels into tensor. (#7456 ) * Add a new ctor to tensor for `initilizer_list`. * Change labels from host device vector to tensor. * Rename the field from `labels_` to `labels` since it's a public member.	2021-12-17 00:58:35 +08:00
Jiaming Yuan	bd1f3a38f0	Rewrite sparse dmatrix using callbacks. (#7092 ) - Reduce dependency on dmlc parsers and provide an interface for users to load data by themselves. - Remove use of threaded iterator and IO queue. - Remove `page_size`. - Make sure the number of pages in memory is bounded. - Make sure the cache can not be violated. - Provide an interface for internal algorithms to process data asynchronously.	2021-07-16 12:33:31 +08:00
Jiaming Yuan	556a83022d	Implement unified update prediction cache for (gpu_)hist. (#6860 ) * Implement utilites for linalg. * Unify the update prediction cache functions. * Implement update prediction cache for multi-class gpu hist.	2021-04-17 00:29:34 +08:00
Jiaming Yuan	f6fe15d11f	Improve parameter validation (#6769 ) * Add quotes to unused parameters. * Check for whitespace.	2021-03-20 01:56:55 +08:00
Jiaming Yuan	9da2287ab8	[breaking] Save booster feature info in JSON, remove feature name generation. (#6605 ) * Save feature info in booster in JSON model. * [breaking] Remove automatic feature name generation in `DMatrix`. This PR is to enable reliable feature validation in Python package.	2021-02-25 18:54:16 +08:00
Jiaming Yuan	4656b09d5d	[breaking] Add prediction fucntion for DMatrix and use inplace predict for dask. (#6668 ) * Add a new API function for predicting on `DMatrix`. This function aligns with rest of the `XGBoosterPredictFrom` functions on semantic of function arguments. Purge `ntree_limit` from libxgboost, use iteration instead. * [dask] Use `inplace_predict` by default for dask sklearn models. * [dask] Run prediction shape inference on worker instead of client. The breaking change is in the Python sklearn `apply` function, I made it to be consistent with other prediction functions where `best_iteration` is used by default.	2021-02-08 18:26:32 +08:00
Jiaming Yuan	c3c8e66fc9	Make prediction functions thread safe. (#6648 )	2021-01-28 23:29:43 +08:00
Jiaming Yuan	519cee115a	Avoid resetting seed for every configuration. (#6349 )	2020-11-06 10:28:35 +08:00
Jiaming Yuan	2cc9662005	Support slicing tree model (#6302 ) This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2020-11-02 23:27:39 -08:00
Jiaming Yuan	9c6e791e64	Enforce tree order in JSON. (#5974 ) * Make JSON model IO more future proof by using tree id in model loading.	2020-08-05 16:44:52 +08:00
boxdot	d268a2a463	Thread-safe prediction by making the prediction cache thread-local. (#5853 ) Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>	2020-07-30 12:33:50 +08:00
Jiaming Yuan	21ed1f0c6d	Support 64bit seed. (#5643 )	2020-05-07 14:52:38 +08:00
Jiaming Yuan	6671b42dd4	Use ellpack for prediction only when sparsepage doesn't exist. (#5504 )	2020-04-10 12:15:46 +08:00
Jiaming Yuan	0012f2ef93	Upgrade clang-tidy on CI. (#5469 ) * Correct all clang-tidy errors. * Upgrade clang-tidy to 10 on CI. Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2020-04-05 04:42:29 +08:00
Jiaming Yuan	4942da64ae	Refactor tests with data generator. (#5439 )	2020-03-27 06:44:44 +08:00
Rory Mitchell	b0ed3f0a66	Remove unnecessary DMatrix methods (#5324 )	2020-02-25 12:40:39 +13:00
Jiaming Yuan	911a902835	Merge model compatibility fixes from 1.0rc branch. (#5305 ) * Port test model compatibility. * Port logit model fix. https://github.com/dmlc/xgboost/pull/5248 https://github.com/dmlc/xgboost/pull/5281	2020-02-13 20:41:58 +08:00
Jiaming Yuan	29eeea709a	Pass shared pointer instead of raw pointer to Learner. (#5302 ) Extracted from https://github.com/dmlc/xgboost/pull/5220 .	2020-02-11 14:16:38 +08:00
Jiaming Yuan	3eb1279bbf	Config for linear updaters. (#5222 )	2020-01-25 11:26:46 +08:00
Kodi Arfer	f100b8d878	[Breaking] Don't drop trees during DART prediction by default (#5115 ) * Simplify DropTrees calling logic * Add `training` parameter for prediction method. * [Breaking]: Add `training` to C API. * Change for R and Python custom objective. * Correct comment. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>	2020-01-13 21:48:30 +08:00
Jiaming Yuan	7b65698187	Enforce correct data shape. (#5191 ) * Fix syncing DMatrix columns. * notes for tree method. * Enable feature validation for all interfaces except for jvm. * Better tests for boosting from predictions. * Disable validation on JVM.	2020-01-13 15:48:17 +08:00
Jiaming Yuan	ebc86a3afa	Disable parameter validation for Scikit-Learn interface. (#5167 ) * Disable parameter validation for now. Scikit-Learn passes all parameters down to XGBoost, whether they are used or not. * Add option `validate_parameters`.	2020-01-07 11:17:31 +08:00
Jiaming Yuan	f3d7877802	Parameter validation (#5157 ) * Unused code. * Split up old colmaker parameters from train param. * Fix dart. * Better name.	2019-12-26 11:59:05 +08:00
Jiaming Yuan	ad4a1c732c	Small refinements for JSON model. (#5112 ) * Naming consistency. * Remove duplicated test.	2019-12-11 19:49:01 +08:00
Jiaming Yuan	208ab3b1ff	Model IO in JSON. (#5110 )	2019-12-11 11:20:40 +08:00
Jiaming Yuan	f24be2efb4	Use configure_file() to configure version only (#4974 ) * Avoid writing build_config.h * Remove build_config.h all together. * Lint.	2019-10-22 23:47:00 -07:00
Jiaming Yuan	a5f232feb8	Fix calling GPU predictor (#4836 ) * Fix calling GPU predictor	2019-09-05 19:09:38 -04:00
Rong Ou	38ab79f889	Make HostDeviceVector single gpu only (#4773 ) * Make HostDeviceVector single gpu only	2019-08-26 09:51:13 +12:00
Rong Ou	c5b229632d	[BREAKING] prevent multi-gpu usage (#4749 ) * prevent multi-gpu usage * fix distributed test * combine gpu predictor tests * set upper bound on n_gpus	2019-08-13 09:11:35 +12:00
Bobby	3e2c472944	Fix model parameter recovery (#4738 )	2019-08-07 02:32:10 -04:00
Rong Ou	851b5b3808	Remove gpu_exact tree method (#4742 )	2019-08-07 11:43:20 +12:00
Jiaming Yuan	4fe0d8203e	Specify version macro in CMake. (#4730 ) * Specify version macro in CMake. * Use `XGBOOST_DEFINITIONS` instead.	2019-08-04 06:04:04 -04:00
Jiaming Yuan	f0064c07ab	Refactor configuration [Part II]. (#4577 ) * Refactor configuration [Part II]. * General changes: Remove `Init` methods to avoid ambiguity. Remove `Configure(std::map<>)` to avoid redundant copying and prepare for parameter validation. (`std::vector` is returned from `InitAllowUnknown`). ** Add name to tree updaters for easier debugging. * Learner changes: Make `LearnerImpl` the only source of configuration. All configurations are stored and carried out by `LearnerImpl::Configure()`. Remove booster in C API. Originally kept for "compatibility reason", but did not state why. So here we just remove it. Add a `metric_names_` field in `LearnerImpl`. Remove `LazyInit`. Configuration will always be lazy. ** Run `Configure` before every iteration. * Predictor changes: Allocate both cpu and gpu predictor. Remove cpu_predictor from gpu_predictor. `GBTree` is now used to dispatch the predictor. ** Remove some GPU Predictor tests. * IO No IO changes. The binary model format stability is tested by comparing hashing value of save models between two commits	2019-07-20 08:34:56 -04:00
Jiaming Yuan	c5719cc457	Offload some configurations into GBM. (#4553 ) This is part 1 of refactoring configuration. * Move tree heuristic configurations. * Split up declarations and definitions for GBTree. * Implement UseGPU in gbm.	2019-06-14 09:18:51 +08:00
Jiaming Yuan	c589eff941	De-duplicate GPU parameters. (#4454 ) * Only define `gpu_id` and `n_gpus` in `LearnerTrainParam` * Pass LearnerTrainParam through XGBoost vid factory method. * Disable all GPU usage when GPU related parameters are not specified (fixes XGBoost choosing GPU over aggressively). * Test learner train param io. * Fix gpu pickling.	2019-05-29 11:55:57 +08:00

1 2

59 Commits