xgboost

Author	SHA1	Message	Date
sriramch	a2042b685a	- training with external memory - part 2 of 2 (#4526 ) * - training with external memory - part 2 of 2 - when external memory support is enabled, building of histogram indices are done incrementally for every sparse page - the entire set of input data is divided across multiple gpu's and the relative row positions within each device is tracked when building the compressed histogram buffer - this was tested using a mortgage dataset containing ~ 670m rows before 4xt4's could be saturated	2019-06-12 09:52:56 +12:00
Jiaming Yuan	4591039eba	Remove remaining reg:linear. (#4544 )	2019-06-11 16:04:09 +08:00
Jiaming Yuan	2f1319f273	Add `rmsle` metric and `reg:squaredlogerror` objective (#4541 )	2019-06-11 05:48:27 +08:00
Rory Mitchell	9683fd433e	Overload device memory allocation (#4532 ) * Group source files, include headers in source files * Overload device memory allocation	2019-06-10 11:35:13 +12:00
Jiaming Yuan	da21ac0cc2	Fix tweedie metric string. (#4543 )	2019-06-09 09:52:29 +08:00
Jiaming Yuan	afa99e6d9d	Use yaml.safe_load. (#4537 )	2019-06-07 03:39:25 +08:00
Philip Hyunsu Cho	3f2fe25a32	Fix C++11 config parser (#4521 ) * Fix C++11 config parser * Use raw strings to improve readability of regex * Fix compilation for GCC 5.x Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>	2019-06-03 22:18:16 +08:00
Rory Mitchell	23a10c8339	Refactor histogram building code for gpu_hist (#4528 )	2019-06-03 09:50:10 +12:00
Rory Mitchell	fbbae3386a	Smarter choice of histogram construction for distributed gpu_hist (#4519 ) * Smarter choice of histogram construction for distributed gpu_hist * Limit omp team size in ExecuteShards	2019-05-31 14:11:34 +12:00
Jiaming Yuan	b48f895027	Fix prediction from loaded pickle. (#4516 )	2019-05-30 15:05:09 +08:00
sriramch	fed665ae8a	- training with external memory part 1 of 2 (#4486 ) * - training with external memory part 1 of 2 - this pr focuses on computing the quantiles using multiple gpus on a dataset that uses the external cache capabilities - there will a follow-up pr soon after this that will support creation of histogram indices on large dataset as well - both of these changes are required to support training with external memory - the sparse pages in dmatrix are taken in batches and the the cut matrices are incrementally built - also snuck in some (perf) changes related to sketches aggregation amongst multiple features across multiple sparse page batches. instead of aggregating the summary inside each device and merged later, it is aggregated in-place when the device is working on different rows but the same feature	2019-05-30 08:18:34 +12:00
sriramch	6e16900711	Fix crash with approx tree method on cpu (#4510 )	2019-05-30 01:11:29 +08:00
Jiaming Yuan	c589eff941	De-duplicate GPU parameters. (#4454 ) * Only define `gpu_id` and `n_gpus` in `LearnerTrainParam` * Pass LearnerTrainParam through XGBoost vid factory method. * Disable all GPU usage when GPU related parameters are not specified (fixes XGBoost choosing GPU over aggressively). * Test learner train param io. * Fix gpu pickling.	2019-05-29 11:55:57 +08:00
sriramch	a3fedbeaa8	- fix issues with training with external memory on cpu (#4487 ) * - fix issues with training with external memory on cpu - use the batch size to determine the correct number of rows in a batch - use the right number of threads in omp parallalization if the batch size is less than the default omp max threads (applicable for the last batch) * - handle scenarios where last batch size is < available number of threads - augment tests such that we can test all scenarios (batch size <, >, = number of threads)	2019-05-29 12:31:30 +12:00
Rory Mitchell	09b90d9329	Add native support for Dask (#4473 ) * Add native support for Dask * Add multi-GPU demo * Add sklearn example	2019-05-27 13:29:28 +12:00
Bryan Woods	278562db13	Add support for cross-validation using query ID (#4474 ) * adding support for matrix slicing with query ID for cross-validation * hail mary test of unrar installation for windows tests * trying to modify tests to run in Github CI * Remove dependency on wget and unrar * Save error log from R test * Relax assertion in test_training * Use int instead of bool in C function interface * Revise R interface * Add XGDMatrixSliceDMatrixEx and keep old XGDMatrixSliceDMatrix for API compatibility	2019-05-23 10:45:02 -07:00
Philip Hyunsu Cho	515f5f5c47	[RFC] Version 0.90 release candidate (#4475 ) * Release 0.90 * Add script to automatically generate acknowledgment * Update NEWS.md	2019-05-20 01:02:44 -07:00
Philip Hyunsu Cho	cf2400036e	[CI] Add Python and C++ tests for Windows GPU target (#4469 ) * Add CMake option to use bundled gtest from dmlc-core, so that it is easy to build XGBoost with gtest on Windows * Consistently apply OpenMP flag to all targets. Force enable OpenMP when USE_CUDA is turned on. * Insert vcomp140.dll into Windows wheels * Add C++ and Python tests for CPU and GPU targets (CUDA 9.0, 10.0, 10.1) * Prevent spurious msbuild failure * Add GPU tests * Upgrade dmlc-core	2019-05-16 01:06:46 +00:00
Rong Ou	df2cdaca50	add cuda 10.1 support (#4468 )	2019-05-14 18:30:58 +00:00
Philip Hyunsu Cho	b5f7cbfadf	[CI] Cache two R build Docker containers (#4458 )	2019-05-11 10:54:00 -07:00
Rong Ou	be0f346ec9	mgpu predictor using explicit offsets (#4438 ) * mgpu prediction using explicit sharding	2019-05-11 09:35:06 +12:00
Philip Hyunsu Cho	6ff994126a	[BLOCKING][CI] Upgrade to Spark 2.4.3 (#4414 ) * [CI] Upgrade to Spark 2.4.2 * Pass Spark version to build script * Allow multiple --build-arg in ci_build.sh * Fix syntax * Fix container name * Update pom.xml * Fix container name * Update Jenkinsfile * Update pom.xml * Update Dockerfile.jvm_cross	2019-05-09 21:36:59 -07:00
Xin Yin	8d1098a983	In AUC and AUCPR metrics, detect whether weights are per-instance or per-group (#4216 ) * In AUC and AUCPR metrics, detect whether weights are per-instance or per-group * Fix C++ style check * Add a test for weighted AUC	2019-05-04 00:53:04 -07:00
Philip Hyunsu Cho	9252b686ae	Make AUCPR work with multiple query groups (#4436 ) * Make AUCPR work with multiple query groups * Check AUCPR <= 1.0 in distributed setting	2019-05-03 10:34:44 -07:00
Rong Ou	feb6ae3e18	Initial support for external memory in gpu_predictor (#4284 )	2019-05-03 13:01:27 +12:00
Philip Hyunsu Cho	bfddc2c42c	Make CMakeLists.txt compatible with CMake 3.3 (#4420 ) * Make CMakeLists.txt compatible with CMake 3.3; require CMake 3.11 for MSVC * Use CMake 3.12 when sanitizer is enabled * Disable funroll-loops for MSVC * Use cmake version in container name * Add missing arg * Fix egrep use in ci_build.sh * Display CMake version * Do not set OpenMP_CXX_LIBRARIES for MSVC * Use cmake_minimum_required()	2019-05-02 11:49:32 +08:00
Philip Hyunsu Cho	ba98e0cdf2	Add additional Python tests to test training under constraints (#4426 )	2019-04-30 18:23:39 -07:00
Rong Ou	eaab364a63	More explict sharding methods for device memory (#4396 ) * Rename the Reshard method to Shard * Add a new Reshard method for sharding a vector that's already sharded	2019-05-01 11:47:22 +12:00
Rory Mitchell	5e582b0fa7	Combine thread launches into single launch per tree for gpu_hist (#4343 ) * Combine thread launches into single launch per tree for gpu_hist algorithm. * Address deprecation warning * Add manual column sampler constructor * Turn off omp dynamic to get a guaranteed number of threads * Enable openmp in cuda code	2019-04-29 09:58:34 +12:00
Jiaming Yuan	77c03538b0	Fix node reuse. (#4404 ) * Reinitialize `_sindex` when reallocating a deleted node.	2019-04-27 13:03:23 +08:00
Nan Zhu	37dc82c3ff	[jvm-packages] allow partial evaluation of dataframe before prediction (#4407 ) * allow partial evaluation of dataframe before prediction * resume spark test * comments * Run unit tests after building JVM packages	2019-04-26 21:02:40 -07:00
Philip Hyunsu Cho	ea850ecd20	[CI] Refactor Jenkins CI pipeline + migrate all Linux tests to Jenkins (#4401 ) * All Linux tests are now in Jenkins CI * Tests are now de-coupled from builds. We can now build XGBoost with one version of CUDA/JDK and test it with another version of CUDA/JDK * Builds (compilation) are significantly faster because 1) They use C5 instances with faster CPU cores; and 2) build environment setup is cached using Docker containers	2019-04-26 18:39:12 -07:00
Rong Ou	2c61f02add	fix broken python test (#4395 )	2019-04-23 16:01:23 -07:00
Philip Hyunsu Cho	bbe0dbd7ec	Migrate pylint check to Python 3 (#4381 ) * Migrate lint to Python 3 * Fix lint errors * Use Miniconda3 to use Python 3.7 * Use latest pylint and astroid	2019-04-21 01:01:54 -07:00
Jiaming Yuan	207f058711	Refactor CMake scripts. (#4323 ) * Refactor CMake scripts. * Remove CMake CUDA wrapper. * Bump CMake version for CUDA. * Use CMake to handle Doxygen. * Split up CMakeList. * Export install target. * Use modern CMake. * Remove build.sh * Workaround for gpu_hist test. * Use cmake 3.12. * Revert machine.conf. * Move CLI test to gpu. * Small cleanup. * Support using XGBoost as submodule. * Fix windows * Fix cpp tests on Windows * Remove duplicated find_package.	2019-04-15 10:08:12 -07:00
Jiaming Yuan	84d992babc	GPU multiclass metrics (#4368 ) * Port multi classes metrics to CUDA.	2019-04-15 17:47:47 +08:00
James Lamb	edae664afb	[r-package] cut CI-time dependency on craigcitro/r-travis (fixes #4348 ) (#4353 ) * [r-package] cut CI-time dependency on craigcitro/r-travis (fixes #4348) * Install R * Install R on OSX * Remove gfortran symlink * Specify CRAN repo * added more R dependencies needed for testing * removed heavy R dependencies in CI * fixed bug in env var, removed unnecessary apt installs of R * fix to R installs	2019-04-12 00:22:48 -07:00
Rong Ou	f4521bf6aa	refactor tests to get rid of duplication (#4358 ) * refactor tests to get rid of duplication * address review comments	2019-04-12 00:21:48 -07:00
Jiaming Yuan	5c2575535f	Fix Histogram allocation. (#4347 ) * Fix Histogram allocation. nidx_map is cleared after `Reset`, but histogram data size isn't changed hence histogram recycling is used in later iterations. After a reset(building new tree), newly allocated node will start from 0, while recycling always choose the node with smallest index, which happens to be our newly allocated node 0.	2019-04-10 19:21:26 +08:00
Rong Ou	81c1cd40ca	add a test for cpu predictor using external memory (#4308 ) * add a test for cpu predictor using external memory * allow different page size for testing	2019-04-10 13:25:10 +12:00
Philip Hyunsu Cho	70be1e38c2	[CI] Optimize external Docker build cache (#4334 ) * When building pull requests, use Docker cache for master branch Docker build caches are per-branch, so new pull requests will initially have no build cache, causing the Docker containers to be built from scratch. New pull requests should use the cache associated with the master branch. This makes sense, since most pull requests do not modify the Dockerfile. * Add comments	2019-04-04 15:59:07 -07:00
Philip Hyunsu Cho	37c75aac41	[CI] Add external Docker build cache (#4331 )	2019-04-04 13:36:39 -07:00
sriramch	2f7087eba1	Improve HostDeviceVector exception safety (#4301 ) * make the assignments of HostDeviceVector exception safe. * storing a dummy GPUDistribution instance in HDV for CPU based code. * change testxgboost binary location to build directory.	2019-03-31 22:48:58 +08:00
Philip Hyunsu Cho	7aed8f3d48	[CI] Upgrade to GCC 5.3.1, CMake 3.6.0 (#4306 ) * Upgrade to GCC 5.3.1, CMake 3.6.0 * <regex> is now okay	2019-03-28 00:21:21 -07:00
Rory Mitchell	3f312e30db	Retire DVec class in favour of c++20 style span for device memory. (#4293 )	2019-03-28 13:59:58 +13:00
Jiaming Yuan	c85181dd8a	Remove remaining `silent` and `debug_verbose`. (#4299 )	2019-03-28 03:30:46 +08:00
Rory Mitchell	6d5b34d824	Further optimisations for gpu_hist. (#4283 ) - Fuse final update position functions into a single more efficient kernel - Refactor gpu_hist with a more explicit ellpack matrix representation	2019-03-24 17:17:22 +13:00
Rong Ou	5aa42b5f11	jenkins build for cuda 10.0 (#4281 ) * jenkins build for cuda 10.0 * yum install nccl2 for cuda 10.0	2019-03-22 22:35:18 -07:00
Rory Mitchell	00465d243d	Optimisations for gpu_hist. (#4248 ) * Optimisations for gpu_hist. * Use streams to overlap operations. * ColumnSampler now uses HostDeviceVector to prevent repeatedly copying feature vectors to the device.	2019-03-20 13:30:06 +13:00
Jiaming Yuan	29a1356669	Deprecate `reg:linear' in favor of` reg:squarederror'. (#4267 ) * Deprecate `reg:linear' in favor of `reg:squarederror'. * Replace the use of `reg:linear'. * Replace the use of `silent`.	2019-03-17 17:55:04 +08:00

... 18 19 20 21 22 ...

1264 Commits