xgboost

Author	SHA1	Message	Date
Jiaming Yuan	8d06878bf9	Deterministic GPU histogram. (#5361 ) * Use pre-rounding based method to obtain reproducible floating point summation. * GPU Hist for regression and classification are bit-by-bit reproducible. * Add doc. * Switch to thrust reduce for `node_sum_gradient`.	2020-03-04 15:13:28 +08:00
sriramch	5dc8e894c9	Fixes and changes to the ranking metrics computed on cpu (#5380 ) * - fixes and changes to the ranking metrics computed on cpu - auc/aucpr ranking metric accelerated on cpu - fixes to the auc/aucpr metrics	2020-03-03 15:56:36 +13:00
Egor Smirnov	1b97eaf7a7	Optimized ApplySplit, BuildHist and UpdatePredictCache functions on CPU (#5244 ) * Split up sparse and dense build hist kernels. * Add `PartitionBuilder`.	2020-02-29 16:11:42 +08:00
sriramch	b81f8cbbc0	Move segment sorter to common (#5378 ) - move segment sorter to common - this is the first of a handful of pr's that splits the larger pr #5326 - it moves this facility to common (from ranking objective class), so that it can be used for metric computation - it also wraps all the bald device pointers into span.	2020-02-29 15:42:07 +08:00
Jiaming Yuan	f2b8cd2922	Add number of columns to native data iterator. (#5202 ) * Change native data iter into an adapter.	2020-02-25 23:42:01 +08:00
Jiaming Yuan	e0509b3307	Fix pruner. (#5335 ) * Honor the tree depth. * Prevent pruning pruned node.	2020-02-25 08:32:46 +08:00
Rory Mitchell	b0ed3f0a66	Remove unnecessary DMatrix methods (#5324 )	2020-02-25 12:40:39 +13:00
Jiaming Yuan	655cf17b60	Predict on Ellpack. (#5327 ) * Unify GPU prediction node. * Add `PageExists`. * Dispatch prediction on input data for GPU Predictor.	2020-02-23 06:27:03 +08:00
Philip Hyunsu Cho	7ac7e8778f	Port patches from 1.0.0 branch (#5336 ) * Remove f-string, since it's not supported by Python 3.5 (#5330) * Remove f-string, since it's not supported by Python 3.5 * Add Python 3.5 to CI, to ensure compatibility * Remove duplicated matplotlib * Show deprecation notice for Python 3.5 * Fix lint * Fix lint * Fix a unit test that mistook MINOR ver for PATCH ver * Enforce only major version in JSON model schema * Bump version to 1.1.0-SNAPSHOT	2020-02-21 13:13:21 -08:00
Rory Mitchell	bc96ceb8b2	Refactor SparsePageSource, delete cache files after use (#5321 ) * Refactor sparse page source * Delete temporary cache files * Log fatal if cache exists * Log fatal if multiple threads used with prefetcher	2020-02-19 16:43:41 +13:00
Rory Mitchell	b2b2c4e231	Remove SimpleCSRSource (#5315 )	2020-02-18 16:49:17 +13:00
Jiaming Yuan	0110754a76	Remove update prediction cache from predictors. (#5312 ) Move this function into gbtree, and uses only updater for doing so. As now the predictor knows exactly how many trees to predict, there's no need for it to update the prediction cache.	2020-02-17 11:35:47 +08:00
Jiaming Yuan	c35cdecddd	Move prediction cache to Learner. (#5220 ) * Move prediction cache into Learner. * Clean-ups - Remove duplicated cache in Learner and GBM. - Remove ad-hoc fix of invalid cache. - Remove `PredictFromCache` in predictors. - Remove prediction cache for linear altogether, as it's only moving the prediction into training process but doesn't provide any actual overall speed gain. - The cache is now unique to Learner, which means the ownership is no longer shared by any other components. * Changes - Add version to prediction cache. - Use weak ptr to check expired DMatrix. - Pass shared pointer instead of raw pointer.	2020-02-14 13:04:23 +08:00
Rory Mitchell	24ad9dec0b	Testing hist_util (#5251 ) * Rank tests * Remove categorical split specialisation * Extend tests to multiple features, switch to WQSketch * Add tests for SparseCuts * Add external memory quantile tests, fix some existing tests	2020-02-14 14:36:43 +13:00
Jiaming Yuan	911a902835	Merge model compatibility fixes from 1.0rc branch. (#5305 ) * Port test model compatibility. * Port logit model fix. https://github.com/dmlc/xgboost/pull/5248 https://github.com/dmlc/xgboost/pull/5281	2020-02-13 20:41:58 +08:00
Jiaming Yuan	29eeea709a	Pass shared pointer instead of raw pointer to Learner. (#5302 ) Extracted from https://github.com/dmlc/xgboost/pull/5220 .	2020-02-11 14:16:38 +08:00
Rong Ou	e4b74c4d22	Gradient based sampling for GPU Hist (#5093 ) * Implement gradient based sampling for GPU Hist tree method. * Add samplers and handle compacted page in GPU Hist.	2020-02-04 10:31:27 +08:00
Jiaming Yuan	fe8d72b50b	Cleanup warnings. (#5247 ) From clang-tidy-9 and gcc-7: Invalid case style, narrowing definition, wrong initialization order, unused variables.	2020-01-31 14:52:15 +08:00
Egor Smirnov	c67163250e	Optimized BuildHist function (#5156 )	2020-01-29 23:32:57 -08:00
Jiaming Yuan	3eb1279bbf	Config for linear updaters. (#5222 )	2020-01-25 11:26:46 +08:00
Philip Hyunsu Cho	44469a0ca9	Extensible binary serialization format for DMatrix::MetaInfo (#5187 ) * Turn xgboost::DataType into C++11 enum class * New binary serialization format for DMatrix::MetaInfo * Fix clang-tidy * Fix c++ test * Implement new format proposal * Move helper functions to anonymous namespace; remove unneeded field * Fix lint * Add shape. * Keep only roundtrip test. * Fix test. * various fixes * Update data.cc Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>	2020-01-23 11:33:17 -08:00
Rory Mitchell	9c56480c61	Support dmatrix construction from cupy array (#5206 )	2020-01-22 13:15:27 +13:00
Philip Hyunsu Cho	0184f2e9f7	Explicitly use UTF-8 codepage when using MSVC (#5197 ) * Explicitly use UTF-8 codepage when using MSVC * Fix build with CUDA enabled	2020-01-14 13:30:34 -08:00
Rory Mitchell	a73e25e15f	Implement slice via adapters (#5198 )	2020-01-14 12:55:41 +13:00
Kodi Arfer	f100b8d878	[Breaking] Don't drop trees during DART prediction by default (#5115 ) * Simplify DropTrees calling logic * Add `training` parameter for prediction method. * [Breaking]: Add `training` to C API. * Change for R and Python custom objective. * Correct comment. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>	2020-01-13 21:48:30 +08:00
Jiaming Yuan	7b65698187	Enforce correct data shape. (#5191 ) * Fix syncing DMatrix columns. * notes for tree method. * Enable feature validation for all interfaces except for jvm. * Better tests for boosting from predictions. * Disable validation on JVM.	2020-01-13 15:48:17 +08:00
Rory Mitchell	8cbcc53ccb	Remove old cudf constructor code (#5194 )	2020-01-10 16:35:23 +13:00
Rory Mitchell	87ebfc1315	Implement cudf construction with adapters. (#5189 )	2020-01-09 20:23:06 +13:00
Jiaming Yuan	ee287808fb	Lazy initialization of device vector. (#5173 ) * Lazy initialization of device vector. * Fix #5162. * Disable copy constructor of HostDeviceVector. Prevents implicit copying. * Fix CPU build. * Bring back move assignment operator.	2020-01-07 11:23:05 +08:00
Jiaming Yuan	ebc86a3afa	Disable parameter validation for Scikit-Learn interface. (#5167 ) * Disable parameter validation for now. Scikit-Learn passes all parameters down to XGBoost, whether they are used or not. * Add option `validate_parameters`.	2020-01-07 11:17:31 +08:00
Egor Smirnov	7b17e76c5b	Optimized EvaluateSplut function (#5138 ) * Add block based threading utilities.	2019-12-31 18:18:42 +08:00
Jiaming Yuan	04db125699	Quick fix for memory leak in CPU Hist. (#5153 ) Closes https://github.com/dmlc/xgboost/issues/3579 . * Don't use map.	2019-12-31 14:05:53 +08:00
Jiaming Yuan	61286c6e8f	Fix wrapping GPU ID and prevent data copying. (#5160 ) * Removed some data copying. * Make sure gpu_id is valid before any configuration is carried out.	2019-12-27 16:51:08 +08:00
sriramch	ee81ba8e1f	implementation of map ranking algorithm on gpu (#5129 ) * - implementation of map ranking algorithm - also effected necessary suggestions mentioned in the earlier ranking pr's - made some performance improvements to the ndcg algo as well	2019-12-27 12:05:37 +13:00
Philip Hyunsu Cho	9b0af6e882	Enable OpenMP with Apple Clang (Mac default compiler) (#5146 ) * Add OpenMP as CMake target * Require CMake 3.12, to allow linking OpenMP target to objxgboost * Specify OpenMP compiler flag for CUDA host compiler * Require CMake 3.16+ if the OS is Mac OSX * Use AppleClang in Mac tests. * Update dmlc-core	2019-12-26 16:53:12 +08:00
Jiaming Yuan	f3d7877802	Parameter validation (#5157 ) * Unused code. * Split up old colmaker parameters from train param. * Fix dart. * Better name.	2019-12-26 11:59:05 +08:00
Jiaming Yuan	c8bdb652c4	Add check for length of weights. (#4872 )	2019-12-21 11:30:58 +08:00
Rory Mitchell	3d04a8cc97	Use dynamic types for array interface columns instead of templates (#5108 )	2019-12-21 16:08:10 +13:00
Jiaming Yuan	27b3646d29	Tests and documents for new JSON routines. (#5120 )	2019-12-18 08:44:27 +08:00
Jiaming Yuan	3136185bc5	JSON configuration IO. (#5111 ) * Add saving/loading JSON configuration. * Implement Python pickle interface with new IO routines. * Basic tests for training continuation.	2019-12-15 17:31:53 +08:00
Jiaming Yuan	ad4a1c732c	Small refinements for JSON model. (#5112 ) * Naming consistency. * Remove duplicated test.	2019-12-11 19:49:01 +08:00
Jiaming Yuan	208ab3b1ff	Model IO in JSON. (#5110 )	2019-12-11 11:20:40 +08:00
Rory Mitchell	c7cc657a4d	Use adapters for SparsePageDMatrix (#5092 )	2019-12-11 15:59:23 +13:00
Jiaming Yuan	e089e16e3d	Pass pointer to model parameters. (#5101 ) * Pass pointer to model parameters. This PR de-duplicates most of the model parameters except the one in `tree_model.h`. One difficulty is `base_score` is a model property but can be changed at runtime by objective function. Hence when performing model IO, we need to save the one provided by users, instead of the one transformed by objective. Here we created an immutable version of `LearnerModelParam` that represents the value of model parameter after configuration.	2019-12-10 12:11:22 +08:00
Rory Mitchell	979f74d51a	Group builder modified for incremental building (#5098 )	2019-12-10 14:33:56 +13:00
Jiaming Yuan	608ebbe444	Fix GPU ID and prediction cache from pickle (#5086 ) * Hack for saving GPU ID. * Declare prediction cache on GBTree. * Add a simple test. * Add `auto` option for GPU Predictor.	2019-12-07 16:02:06 +08:00
Jiaming Yuan	7ef5b78003	Implement JSON IO for updaters (#5094 ) * Implement JSON IO for updaters. * Remove parameters in split evaluator.	2019-12-07 00:24:00 +08:00
Jiaming Yuan	2dcb62ddfb	Add IO utilities. (#5091 ) * Add fixed size stream for reading model stream. * Add file extension.	2019-12-05 22:15:34 +08:00
Jiaming Yuan	64af1ecf86	[Breaking] Remove num roots. (#5059 )	2019-12-05 21:58:43 +08:00
Jiaming Yuan	df9bdbbcb9	Fix parsing empty vector in parameter. (#5087 )	2019-12-05 11:42:01 +08:00

1 2 3 4 5

234 Commits