xgboost

Author	SHA1	Message	Date
Jiaming Yuan	c35cdecddd	Move prediction cache to Learner. (#5220 ) * Move prediction cache into Learner. * Clean-ups - Remove duplicated cache in Learner and GBM. - Remove ad-hoc fix of invalid cache. - Remove `PredictFromCache` in predictors. - Remove prediction cache for linear altogether, as it's only moving the prediction into training process but doesn't provide any actual overall speed gain. - The cache is now unique to Learner, which means the ownership is no longer shared by any other components. * Changes - Add version to prediction cache. - Use weak ptr to check expired DMatrix. - Pass shared pointer instead of raw pointer.	2020-02-14 13:04:23 +08:00
Rory Mitchell	24ad9dec0b	Testing hist_util (#5251 ) * Rank tests * Remove categorical split specialisation * Extend tests to multiple features, switch to WQSketch * Add tests for SparseCuts * Add external memory quantile tests, fix some existing tests	2020-02-14 14:36:43 +13:00
Jiaming Yuan	911a902835	Merge model compatibility fixes from 1.0rc branch. (#5305 ) * Port test model compatibility. * Port logit model fix. https://github.com/dmlc/xgboost/pull/5248 https://github.com/dmlc/xgboost/pull/5281	2020-02-13 20:41:58 +08:00
Jiaming Yuan	29eeea709a	Pass shared pointer instead of raw pointer to Learner. (#5302 ) Extracted from https://github.com/dmlc/xgboost/pull/5220 .	2020-02-11 14:16:38 +08:00
Rong Ou	e4b74c4d22	Gradient based sampling for GPU Hist (#5093 ) * Implement gradient based sampling for GPU Hist tree method. * Add samplers and handle compacted page in GPU Hist.	2020-02-04 10:31:27 +08:00
Jiaming Yuan	fe8d72b50b	Cleanup warnings. (#5247 ) From clang-tidy-9 and gcc-7: Invalid case style, narrowing definition, wrong initialization order, unused variables.	2020-01-31 14:52:15 +08:00
Egor Smirnov	c67163250e	Optimized BuildHist function (#5156 )	2020-01-29 23:32:57 -08:00
Jiaming Yuan	3eb1279bbf	Config for linear updaters. (#5222 )	2020-01-25 11:26:46 +08:00
Philip Hyunsu Cho	44469a0ca9	Extensible binary serialization format for DMatrix::MetaInfo (#5187 ) * Turn xgboost::DataType into C++11 enum class * New binary serialization format for DMatrix::MetaInfo * Fix clang-tidy * Fix c++ test * Implement new format proposal * Move helper functions to anonymous namespace; remove unneeded field * Fix lint * Add shape. * Keep only roundtrip test. * Fix test. * various fixes * Update data.cc Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>	2020-01-23 11:33:17 -08:00
Rory Mitchell	9c56480c61	Support dmatrix construction from cupy array (#5206 )	2020-01-22 13:15:27 +13:00
Philip Hyunsu Cho	0184f2e9f7	Explicitly use UTF-8 codepage when using MSVC (#5197 ) * Explicitly use UTF-8 codepage when using MSVC * Fix build with CUDA enabled	2020-01-14 13:30:34 -08:00
Rory Mitchell	a73e25e15f	Implement slice via adapters (#5198 )	2020-01-14 12:55:41 +13:00
Kodi Arfer	f100b8d878	[Breaking] Don't drop trees during DART prediction by default (#5115 ) * Simplify DropTrees calling logic * Add `training` parameter for prediction method. * [Breaking]: Add `training` to C API. * Change for R and Python custom objective. * Correct comment. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>	2020-01-13 21:48:30 +08:00
Jiaming Yuan	7b65698187	Enforce correct data shape. (#5191 ) * Fix syncing DMatrix columns. * notes for tree method. * Enable feature validation for all interfaces except for jvm. * Better tests for boosting from predictions. * Disable validation on JVM.	2020-01-13 15:48:17 +08:00
Rory Mitchell	8cbcc53ccb	Remove old cudf constructor code (#5194 )	2020-01-10 16:35:23 +13:00
Rory Mitchell	87ebfc1315	Implement cudf construction with adapters. (#5189 )	2020-01-09 20:23:06 +13:00
Jiaming Yuan	ee287808fb	Lazy initialization of device vector. (#5173 ) * Lazy initialization of device vector. * Fix #5162. * Disable copy constructor of HostDeviceVector. Prevents implicit copying. * Fix CPU build. * Bring back move assignment operator.	2020-01-07 11:23:05 +08:00
Jiaming Yuan	ebc86a3afa	Disable parameter validation for Scikit-Learn interface. (#5167 ) * Disable parameter validation for now. Scikit-Learn passes all parameters down to XGBoost, whether they are used or not. * Add option `validate_parameters`.	2020-01-07 11:17:31 +08:00
Egor Smirnov	7b17e76c5b	Optimized EvaluateSplut function (#5138 ) * Add block based threading utilities.	2019-12-31 18:18:42 +08:00
Jiaming Yuan	04db125699	Quick fix for memory leak in CPU Hist. (#5153 ) Closes https://github.com/dmlc/xgboost/issues/3579 . * Don't use map.	2019-12-31 14:05:53 +08:00
Jiaming Yuan	61286c6e8f	Fix wrapping GPU ID and prevent data copying. (#5160 ) * Removed some data copying. * Make sure gpu_id is valid before any configuration is carried out.	2019-12-27 16:51:08 +08:00
sriramch	ee81ba8e1f	implementation of map ranking algorithm on gpu (#5129 ) * - implementation of map ranking algorithm - also effected necessary suggestions mentioned in the earlier ranking pr's - made some performance improvements to the ndcg algo as well	2019-12-27 12:05:37 +13:00
Philip Hyunsu Cho	9b0af6e882	Enable OpenMP with Apple Clang (Mac default compiler) (#5146 ) * Add OpenMP as CMake target * Require CMake 3.12, to allow linking OpenMP target to objxgboost * Specify OpenMP compiler flag for CUDA host compiler * Require CMake 3.16+ if the OS is Mac OSX * Use AppleClang in Mac tests. * Update dmlc-core	2019-12-26 16:53:12 +08:00
Jiaming Yuan	f3d7877802	Parameter validation (#5157 ) * Unused code. * Split up old colmaker parameters from train param. * Fix dart. * Better name.	2019-12-26 11:59:05 +08:00
Jiaming Yuan	c8bdb652c4	Add check for length of weights. (#4872 )	2019-12-21 11:30:58 +08:00
Rory Mitchell	3d04a8cc97	Use dynamic types for array interface columns instead of templates (#5108 )	2019-12-21 16:08:10 +13:00
Jiaming Yuan	27b3646d29	Tests and documents for new JSON routines. (#5120 )	2019-12-18 08:44:27 +08:00
Jiaming Yuan	3136185bc5	JSON configuration IO. (#5111 ) * Add saving/loading JSON configuration. * Implement Python pickle interface with new IO routines. * Basic tests for training continuation.	2019-12-15 17:31:53 +08:00
Jiaming Yuan	ad4a1c732c	Small refinements for JSON model. (#5112 ) * Naming consistency. * Remove duplicated test.	2019-12-11 19:49:01 +08:00
Jiaming Yuan	208ab3b1ff	Model IO in JSON. (#5110 )	2019-12-11 11:20:40 +08:00
Rory Mitchell	c7cc657a4d	Use adapters for SparsePageDMatrix (#5092 )	2019-12-11 15:59:23 +13:00
Jiaming Yuan	e089e16e3d	Pass pointer to model parameters. (#5101 ) * Pass pointer to model parameters. This PR de-duplicates most of the model parameters except the one in `tree_model.h`. One difficulty is `base_score` is a model property but can be changed at runtime by objective function. Hence when performing model IO, we need to save the one provided by users, instead of the one transformed by objective. Here we created an immutable version of `LearnerModelParam` that represents the value of model parameter after configuration.	2019-12-10 12:11:22 +08:00
Rory Mitchell	979f74d51a	Group builder modified for incremental building (#5098 )	2019-12-10 14:33:56 +13:00
Jiaming Yuan	608ebbe444	Fix GPU ID and prediction cache from pickle (#5086 ) * Hack for saving GPU ID. * Declare prediction cache on GBTree. * Add a simple test. * Add `auto` option for GPU Predictor.	2019-12-07 16:02:06 +08:00
Jiaming Yuan	7ef5b78003	Implement JSON IO for updaters (#5094 ) * Implement JSON IO for updaters. * Remove parameters in split evaluator.	2019-12-07 00:24:00 +08:00
Jiaming Yuan	2dcb62ddfb	Add IO utilities. (#5091 ) * Add fixed size stream for reading model stream. * Add file extension.	2019-12-05 22:15:34 +08:00
Jiaming Yuan	64af1ecf86	[Breaking] Remove num roots. (#5059 )	2019-12-05 21:58:43 +08:00
Jiaming Yuan	df9bdbbcb9	Fix parsing empty vector in parameter. (#5087 )	2019-12-05 11:42:01 +08:00
Jiaming Yuan	f0ca53d9ec	Convenient methods for JSON integer. (#5089 ) * Fix parsing empty object.	2019-12-05 11:01:12 +08:00
Rory Mitchell	e3c34c79be	External data adapters (#5044 ) * Use external data adapters as lightweight intermediate layer between external data and DMatrix	2019-12-04 10:56:17 +13:00
Jiaming Yuan	d667ea9335	[CI] Fix Travis tests. (#5062 ) - Install wget explicitly to match openssl. - Install CMake explicitly. - Use newer miniconda link. - Reenable unittests. - gcc@9 + xcode@10 for osx due to missing <_stdio.h>. Other versions of gcc should also work. But as homebrew pour gcc@9 after update by default, so I just stick with latest version. - Disabled one external memory test for OSX. Not sure about the thread implementation in there and fixing external memory is beyond the scope of this PR. - Use Python3 with conda in jvm package.	2019-11-25 03:32:10 +08:00
Rong Ou	0afcc55d98	Support multiple batches in gpu_hist (#5014 ) * Initial external memory training support for GPU Hist tree method.	2019-11-16 14:50:20 +08:00
Jiaming Yuan	97abcc7ee2	Extract interaction constraint from split evaluator. (#5034 ) * Extract interaction constraints from split evaluator. The reason for doing so is mostly for model IO, where num_feature and interaction_constraints are copied in split evaluator. Also interaction constraint by itself is a feature selector, acting like column sampler and it's inefficient to bury it deep in the evaluator chain. Lastly removing one another copied parameter is a win. * Enable inc for approx tree method. As now the implementation is spited up from evaluator class, it's also enabled for approx method. * Removing obsoleted code in colmaker. They are never documented nor actually used in real world. Also there isn't a single test for those code blocks. * Unifying the types used for row and column. As the size of input dataset is marching to billion, incorrect use of int is subject to overflow, also singed integer overflow is undefined behaviour. This PR starts the procedure for unifying used index type to unsigned integers. There's optimization that can utilize this undefined behaviour, but after some testings I don't see the optimization is beneficial to XGBoost.	2019-11-14 20:11:41 +08:00
sriramch	2abe69d774	- ndcg ltr implementation on gpu (#5004 ) * - ndcg ltr implementation on gpu - this is a follow-up to the pairwise ltr implementation	2019-11-13 11:21:04 +13:00
Philip Hyunsu Cho	f4e7b707c9	Revert #4529 (#5008 ) * Revert " Optimize ‘hist’ for multi-core CPU (#4529)" This reverts commit `4d6590be3c`. * Fix build	2019-11-12 09:35:03 -08:00
Jiaming Yuan	7663de956c	Run training with empty DMatrix. (#4990 ) This makes GPU Hist robust in distributed environment as some workers might not be associated with any data in either training or evaluation. * Disable rabit mock test for now: See #5012 . * Disable dask-cudf test at prediction for now: See #5003 * Launch dask job for all workers despite they might not have any data. * Check 0 rows in elementwise evaluation metrics. Using AUC and AUC-PR still throws an error. See #4663 for a robust fix. * Add tests for edge cases. * Add `LaunchKernel` wrapper handling zero sized grid. * Move some parts of allreducer into a cu file. * Don't validate feature names when the booster is empty. * Sync number of columns in DMatrix. As num_feature is required to be the same across all workers in data split mode. * Filtering in dask interface now by default syncs all booster that's not empty, instead of using rank 0. * Fix Jenkins' GPU tests. * Install dask-cuda from source in Jenkins' test. Now all tests are actually running. * Restore GPU Hist tree synchronization test. * Check UUID of running devices. The check is only performed on CUDA version >= 10.x, as 9.x doesn't have UUID field. * Fix CMake policy and project variables. Use xgboost_SOURCE_DIR uniformly, add policy for CMake >= 3.13. * Fix copying data to CPU * Fix race condition in cpu predictor. * Fix duplicated DMatrix construction. * Don't download extra nccl in CI script.	2019-11-06 16:13:13 +08:00
Jiaming Yuan	ac457c56a2	Use `UpdateAllowUnknown' for non-model related parameter. (#4961 ) * Use `UpdateAllowUnknown' for non-model related parameter. Model parameter can not pack an additional boolean value due to binary IO format. This commit deals only with non-model related parameter configuration. * Add tidy command line arg for use-dmlc-gtest.	2019-10-23 05:50:12 -04:00
Jiaming Yuan	f24be2efb4	Use configure_file() to configure version only (#4974 ) * Avoid writing build_config.h * Remove build_config.h all together. * Lint.	2019-10-22 23:47:00 -07:00
Rong Ou	5b1715d97c	Write ELLPACK pages to disk (#4879 ) * add ellpack source * add batch param * extract function to parse cache info * construct ellpack info separately * push batch to ellpack page * write ellpack page. * make sparse page source reusable	2019-10-22 23:44:32 -04:00
sriramch	310fe60b35	Pairwise ranking objective implementation on gpu (#4873 ) * - pairwise ranking objective implementation on gpu - there are couple of more algorithms (ndcg and map) for which support will be added as follow-up pr's - with no label groups defined, get gradient is 90x faster on gpu (120m instance mortgage dataset) - it can perform by an order of magnitude faster with ~ 10 groups (and adequate cores for the cpu implementation) * Add JSON config to rank obj.	2019-10-22 23:40:07 -04:00

1 2 3 4 5

222 Commits