xgboost

Author	SHA1	Message	Date
Jiaming Yuan	ebc86a3afa	Disable parameter validation for Scikit-Learn interface. (#5167 ) * Disable parameter validation for now. Scikit-Learn passes all parameters down to XGBoost, whether they are used or not. * Add option `validate_parameters`.	2020-01-07 11:17:31 +08:00
Jiaming Yuan	2b9a62a806	Throw error when not compiled with NCCL. (#5170 )	2020-01-07 11:15:19 +08:00
Egor Smirnov	7b17e76c5b	Optimized EvaluateSplut function (#5138 ) * Add block based threading utilities.	2019-12-31 18:18:42 +08:00
Jiaming Yuan	04db125699	Quick fix for memory leak in CPU Hist. (#5153 ) Closes https://github.com/dmlc/xgboost/issues/3579 . * Don't use map.	2019-12-31 14:05:53 +08:00
Jiaming Yuan	139ccc9902	Fix num_roots to be 1. (#5165 )	2019-12-30 02:18:45 +08:00
Jiaming Yuan	61286c6e8f	Fix wrapping GPU ID and prevent data copying. (#5160 ) * Removed some data copying. * Make sure gpu_id is valid before any configuration is carried out.	2019-12-27 16:51:08 +08:00
sriramch	ee81ba8e1f	implementation of map ranking algorithm on gpu (#5129 ) * - implementation of map ranking algorithm - also effected necessary suggestions mentioned in the earlier ranking pr's - made some performance improvements to the ndcg algo as well	2019-12-27 12:05:37 +13:00
Philip Hyunsu Cho	9b0af6e882	Enable OpenMP with Apple Clang (Mac default compiler) (#5146 ) * Add OpenMP as CMake target * Require CMake 3.12, to allow linking OpenMP target to objxgboost * Specify OpenMP compiler flag for CUDA host compiler * Require CMake 3.16+ if the OS is Mac OSX * Use AppleClang in Mac tests. * Update dmlc-core	2019-12-26 16:53:12 +08:00
Jiaming Yuan	f3d7877802	Parameter validation (#5157 ) * Unused code. * Split up old colmaker parameters from train param. * Fix dart. * Better name.	2019-12-26 11:59:05 +08:00
Jiaming Yuan	c8bdb652c4	Add check for length of weights. (#4872 )	2019-12-21 11:30:58 +08:00
Rory Mitchell	3d04a8cc97	Use dynamic types for array interface columns instead of templates (#5108 )	2019-12-21 16:08:10 +13:00
Jiaming Yuan	27b3646d29	Tests and documents for new JSON routines. (#5120 )	2019-12-18 08:44:27 +08:00
Jiaming Yuan	2fdb34ed2e	Fix metric name loading. (#5122 )	2019-12-16 10:14:02 +08:00
Jiaming Yuan	3136185bc5	JSON configuration IO. (#5111 ) * Add saving/loading JSON configuration. * Implement Python pickle interface with new IO routines. * Basic tests for training continuation.	2019-12-15 17:31:53 +08:00
Jiaming Yuan	ad4a1c732c	Small refinements for JSON model. (#5112 ) * Naming consistency. * Remove duplicated test.	2019-12-11 19:49:01 +08:00
Jiaming Yuan	208ab3b1ff	Model IO in JSON. (#5110 )	2019-12-11 11:20:40 +08:00
Rory Mitchell	c7cc657a4d	Use adapters for SparsePageDMatrix (#5092 )	2019-12-11 15:59:23 +13:00
Jiaming Yuan	e089e16e3d	Pass pointer to model parameters. (#5101 ) * Pass pointer to model parameters. This PR de-duplicates most of the model parameters except the one in `tree_model.h`. One difficulty is `base_score` is a model property but can be changed at runtime by objective function. Hence when performing model IO, we need to save the one provided by users, instead of the one transformed by objective. Here we created an immutable version of `LearnerModelParam` that represents the value of model parameter after configuration.	2019-12-10 12:11:22 +08:00
Rory Mitchell	979f74d51a	Group builder modified for incremental building (#5098 )	2019-12-10 14:33:56 +13:00
Jiaming Yuan	1cb6bcc382	Remove dead code in colmaker. (#5105 )	2019-12-10 09:32:37 +08:00
Egor Smirnov	b1789b0346	added tracking execution time for UpdatePredictionCache function (#5107 )	2019-12-10 01:32:56 +08:00
Jiaming Yuan	608ebbe444	Fix GPU ID and prediction cache from pickle (#5086 ) * Hack for saving GPU ID. * Declare prediction cache on GBTree. * Add a simple test. * Add `auto` option for GPU Predictor.	2019-12-07 16:02:06 +08:00
Jiaming Yuan	7ef5b78003	Implement JSON IO for updaters (#5094 ) * Implement JSON IO for updaters. * Remove parameters in split evaluator.	2019-12-07 00:24:00 +08:00
Jiaming Yuan	2dcb62ddfb	Add IO utilities. (#5091 ) * Add fixed size stream for reading model stream. * Add file extension.	2019-12-05 22:15:34 +08:00
Jiaming Yuan	64af1ecf86	[Breaking] Remove num roots. (#5059 )	2019-12-05 21:58:43 +08:00
Jiaming Yuan	f3d8536702	Don't use 0 for "fresh leaf". (#5084 ) * Allow using right child as marker for Exact tree_method.	2019-12-05 11:50:51 +08:00
Jiaming Yuan	df9bdbbcb9	Fix parsing empty vector in parameter. (#5087 )	2019-12-05 11:42:01 +08:00
Jiaming Yuan	f5e13dcb9b	Implement training observer. (#5088 )	2019-12-05 11:12:20 +08:00
Jiaming Yuan	f0ca53d9ec	Convenient methods for JSON integer. (#5089 ) * Fix parsing empty object.	2019-12-05 11:01:12 +08:00
Rory Mitchell	e3c34c79be	External data adapters (#5044 ) * Use external data adapters as lightweight intermediate layer between external data and DMatrix	2019-12-04 10:56:17 +13:00
Kodi Arfer	f2277e7106	Use DART tree weights when computing SHAPs (#5050 ) This PR fixes tree weights in dart being ignored when computing contributions. * Fix ellpack page source link. * Add tree weights to compute contribution.	2019-12-03 19:55:53 +08:00
Rong Ou	0afcc55d98	Support multiple batches in gpu_hist (#5014 ) * Initial external memory training support for GPU Hist tree method.	2019-11-16 14:50:20 +08:00
Jiaming Yuan	97abcc7ee2	Extract interaction constraint from split evaluator. (#5034 ) * Extract interaction constraints from split evaluator. The reason for doing so is mostly for model IO, where num_feature and interaction_constraints are copied in split evaluator. Also interaction constraint by itself is a feature selector, acting like column sampler and it's inefficient to bury it deep in the evaluator chain. Lastly removing one another copied parameter is a win. * Enable inc for approx tree method. As now the implementation is spited up from evaluator class, it's also enabled for approx method. * Removing obsoleted code in colmaker. They are never documented nor actually used in real world. Also there isn't a single test for those code blocks. * Unifying the types used for row and column. As the size of input dataset is marching to billion, incorrect use of int is subject to overflow, also singed integer overflow is undefined behaviour. This PR starts the procedure for unifying used index type to unsigned integers. There's optimization that can utilize this undefined behaviour, but after some testings I don't see the optimization is beneficial to XGBoost.	2019-11-14 20:11:41 +08:00
Sebastian	886bf93ba4	fix: reset hit counter for next batch (#5035 )	2019-11-14 10:22:02 +08:00
sriramch	2abe69d774	- ndcg ltr implementation on gpu (#5004 ) * - ndcg ltr implementation on gpu - this is a follow-up to the pairwise ltr implementation	2019-11-13 11:21:04 +13:00
Philip Hyunsu Cho	f4e7b707c9	Revert #4529 (#5008 ) * Revert " Optimize ‘hist’ for multi-core CPU (#4529)" This reverts commit 4d6590be3c9a043d44d9e4fe0a456a9f8179ec72. * Fix build	2019-11-12 09:35:03 -08:00
KaiJin Ji	1733c9e8f7	Improve operation efficiency for single predict (#5016 ) * Improve operation efficiency for single predict	2019-11-10 02:01:28 +08:00
Jiaming Yuan	7663de956c	Run training with empty DMatrix. (#4990 ) This makes GPU Hist robust in distributed environment as some workers might not be associated with any data in either training or evaluation. * Disable rabit mock test for now: See #5012 . * Disable dask-cudf test at prediction for now: See #5003 * Launch dask job for all workers despite they might not have any data. * Check 0 rows in elementwise evaluation metrics. Using AUC and AUC-PR still throws an error. See #4663 for a robust fix. * Add tests for edge cases. * Add `LaunchKernel` wrapper handling zero sized grid. * Move some parts of allreducer into a cu file. * Don't validate feature names when the booster is empty. * Sync number of columns in DMatrix. As num_feature is required to be the same across all workers in data split mode. * Filtering in dask interface now by default syncs all booster that's not empty, instead of using rank 0. * Fix Jenkins' GPU tests. * Install dask-cuda from source in Jenkins' test. Now all tests are actually running. * Restore GPU Hist tree synchronization test. * Check UUID of running devices. The check is only performed on CUDA version >= 10.x, as 9.x doesn't have UUID field. * Fix CMake policy and project variables. Use xgboost_SOURCE_DIR uniformly, add policy for CMake >= 3.13. * Fix copying data to CPU * Fix race condition in cpu predictor. * Fix duplicated DMatrix construction. * Don't download extra nccl in CI script.	2019-11-06 16:13:13 +08:00
Christopher Cowden	807a244517	Fix repeated split and 0 cover nodes (#5010 )	2019-11-06 14:57:22 +08:00
Jiaming Yuan	755a606201	Fix dart usegpu. (#4984 )	2019-10-28 06:12:04 -04:00
Jiaming Yuan	6ec7e300bd	Fix external memory race in colmaker. (#4980 ) * Move `GetColDensity` out of omp parallel block.	2019-10-25 04:11:13 -04:00
Jiaming Yuan	ac457c56a2	Use `UpdateAllowUnknown' for non-model related parameter. (#4961 ) * Use `UpdateAllowUnknown' for non-model related parameter. Model parameter can not pack an additional boolean value due to binary IO format. This commit deals only with non-model related parameter configuration. * Add tidy command line arg for use-dmlc-gtest.	2019-10-23 05:50:12 -04:00
Jiaming Yuan	f24be2efb4	Use configure_file() to configure version only (#4974 ) * Avoid writing build_config.h * Remove build_config.h all together. * Lint.	2019-10-22 23:47:00 -07:00
Rong Ou	5b1715d97c	Write ELLPACK pages to disk (#4879 ) * add ellpack source * add batch param * extract function to parse cache info * construct ellpack info separately * push batch to ellpack page * write ellpack page. * make sparse page source reusable	2019-10-22 23:44:32 -04:00
sriramch	310fe60b35	Pairwise ranking objective implementation on gpu (#4873 ) * - pairwise ranking objective implementation on gpu - there are couple of more algorithms (ndcg and map) for which support will be added as follow-up pr's - with no label groups defined, get gradient is 90x faster on gpu (120m instance mortgage dataset) - it can perform by an order of magnitude faster with ~ 10 groups (and adequate cores for the cpu implementation) * Add JSON config to rank obj.	2019-10-22 23:40:07 -04:00
Jiaming Yuan	5620322a48	[Breaking] Add global versioning. (#4936 ) * Use CMake config file for representing version. * Generate c and Python version file with CMake. The generated file is written into source tree. But unless XGBoost upgrades its version, there will be no actual modification. This retains compatibility with Makefiles for R. * Add XGBoost version the DMatrix binaries. * Simplify prefetch detection in CMakeLists.txt	2019-10-22 23:27:26 -04:00
Jiaming Yuan	7e477a2adb	Fix data loading (#4862 ) * Fix loading text data. * Fix config regex. * Try to explain the error better in exception. * Update doc.	2019-10-22 12:33:14 -04:00
Jiaming Yuan	4771bb0d41	Catch exception in transform function omp context. (#4960 )	2019-10-21 17:03:38 +08:00
Jiaming Yuan	31030a8d3a	Set correct file permission. (#4964 )	2019-10-18 12:54:29 -04:00
Jiaming Yuan	ae536756ae	Add Model and Configurable interface. (#4945 ) * Apply Configurable to objective functions. * Apply Model to Learner and Regtree, gbm. * Add Load/SaveConfig to objs. * Refactor obj tests to use smart pointer. * Dummy methods for Save/Load Model.	2019-10-18 01:56:02 -04:00

1 2 3 4 5 ...

892 Commits