xgboost

Author	SHA1	Message	Date
Jiaming Yuan	17fd3f55e9	Optimize adapter element counting on GPU. (#9209 ) - Implement a simple `IterSpan` for passing iterators with size. - Use shared memory for column size counts. - Use one thread for each sample in row count to reduce atomic operations.	2023-05-30 23:28:43 +08:00
Jiaming Yuan	4d665b3fb0	Restore clang tidy test. (#8861 )	2023-03-03 13:47:04 -08:00
Jiaming Yuan	ac9bfaa4f2	Handle missing values in dataframe with category dtype. (#7331 ) * Replace -1 in pandas initializer. * Unify `IsValid` functor. * Mimic pandas data handling in cuDF glue code. * Check invalid categories. * Fix DDM sketching.	2021-10-28 03:33:54 +08:00
Jiaming Yuan	2942dc68e4	Fix mixed types in GPU sketching. (#7228 )	2021-09-16 00:10:25 +08:00
Jiaming Yuan	1c8fdf2218	Remove use of `device_idx` in `dh::LaunchN`. (#7063 ) It's an unused parameter, removing it can make the CI log more readable.	2021-06-29 11:37:26 +08:00
Jiaming Yuan	86715e4cd4	Support categorical data for dask functional interface and DQM. (#7043 ) * Support categorical data for dask functional interface and DQM. * Implement categorical data support for GPU GK-merge. * Add support for dask functional interface. * Add support for DQM. * Get newer cupy.	2021-06-18 13:06:52 +08:00
Jiaming Yuan	3039dd194b	Don't estimate sketch batch size when rmm is used. (#6807 )	2021-03-31 15:29:56 +08:00
Jiaming Yuan	a7083d3c13	Fix dart inplace prediction with GPU input. (#6777 ) * Fix dart inplace predict with data on GPU, which might trigger a fatal check for device access right. * Avoid copying data whenever possible.	2021-03-25 12:00:32 +08:00
Jiaming Yuan	886486a519	Support categorical data in GPU weighted sketching. (#6508 )	2020-12-16 14:23:28 +08:00
Jiaming Yuan	2241563f23	Handle duplicated values in sketching. (#6178 ) * Accumulate weights in duplicated values. * Fix device id in iterative dmatrix.	2020-10-10 19:32:44 +08:00
Jiaming Yuan	70ce5216b5	Add high level tests for categorical data. (#6179 ) * Fix unique.	2020-10-09 09:27:23 +08:00
Jiaming Yuan	f0c63902ff	Use default allocator in sketching. (#6182 )	2020-09-30 14:55:59 +08:00
Jiaming Yuan	444131a2e6	Add categorical data support to GPU Hist. (#6164 )	2020-09-29 11:27:25 +08:00
Jiaming Yuan	210c131ce7	Support categorical data in GPU sketching. (#6137 )	2020-09-21 13:53:06 +08:00
Jiaming Yuan	e319b63f9e	Merge extract cuts into QuantileContainer. (#6125 ) * Use pruning for initial summary construction.	2020-09-18 16:36:39 +08:00
Jiaming Yuan	ee70a2380b	Unify CPU hist sketching (#5880 )	2020-08-12 01:33:06 +08:00
Jiaming Yuan	e471056ec4	Fix sketch size calculation. (#5898 )	2020-07-17 08:33:16 +08:00
Jiaming Yuan	dd445af56e	Cleanup on device sketch. (#5874 ) * Remove old functions. * Merge weighted and un-weighted into a common interface.	2020-07-14 10:15:54 +08:00
Rong Ou	06320729d4	fix device sketch with weights in external memory mode (#5870 )	2020-07-08 08:44:07 +08:00
Jiaming Yuan	048d969be4	Implement GK sketching on GPU. (#5846 ) * Implement GK sketching on GPU. * Strong tests on quantile building. * Handle sparse dataset by binary searching the column index. * Hypothesis test on dask.	2020-07-07 12:16:21 +08:00
Jiaming Yuan	3028fa6b42	Implement weighted sketching for adapter. (#5760 ) * Bounded memory tests. * Fixed memory estimation.	2020-06-12 06:20:39 +08:00
Philip Hyunsu Cho	1d22a9be1c	Revert "Reorder includes. (#5749 )" (#5771 ) This reverts commit `d3a0efbf16`.	2020-06-09 10:29:28 -07:00
Jiaming Yuan	d3a0efbf16	Reorder includes. (#5749 ) * Reorder includes. * R.	2020-06-03 17:30:47 +12:00
Jiaming Yuan	e533908922	Expose device sketching in header. (#5747 )	2020-06-02 13:02:53 +08:00
Jiaming Yuan	29a4cfe400	Group aware GPU sketching. (#5551 ) * Group aware GPU weighted sketching. * Distribute group weights to each data point. * Relax the test. * Validate input meta info. * Fix metainfo copy ctor.	2020-04-20 17:18:52 +08:00
Rory Mitchell	13b10a6370	Device dmatrix (#5420 )	2020-03-28 14:42:21 +13:00
Rory Mitchell	b745b7acce	Fix memory usage of device sketching (#5407 )	2020-03-14 13:43:24 +13:00
Rory Mitchell	a38e7bd19c	Sketching from adapters (#5365 ) * Sketching from adapters * Add weights test	2020-03-07 21:07:58 +13:00
Rory Mitchell	7e32af5c21	Wide dataset quantile performance improvement (#5306 )	2020-02-16 10:24:42 +13:00
Rory Mitchell	24ad9dec0b	Testing hist_util (#5251 ) * Rank tests * Remove categorical split specialisation * Extend tests to multiple features, switch to WQSketch * Add tests for SparseCuts * Add external memory quantile tests, fix some existing tests	2020-02-14 14:36:43 +13:00
Rong Ou	0afcc55d98	Support multiple batches in gpu_hist (#5014 ) * Initial external memory training support for GPU Hist tree method.	2019-11-16 14:50:20 +08:00
Jiaming Yuan	7663de956c	Run training with empty DMatrix. (#4990 ) This makes GPU Hist robust in distributed environment as some workers might not be associated with any data in either training or evaluation. * Disable rabit mock test for now: See #5012 . * Disable dask-cudf test at prediction for now: See #5003 * Launch dask job for all workers despite they might not have any data. * Check 0 rows in elementwise evaluation metrics. Using AUC and AUC-PR still throws an error. See #4663 for a robust fix. * Add tests for edge cases. * Add `LaunchKernel` wrapper handling zero sized grid. * Move some parts of allreducer into a cu file. * Don't validate feature names when the booster is empty. * Sync number of columns in DMatrix. As num_feature is required to be the same across all workers in data split mode. * Filtering in dask interface now by default syncs all booster that's not empty, instead of using rank 0. * Fix Jenkins' GPU tests. * Install dask-cuda from source in Jenkins' test. Now all tests are actually running. * Restore GPU Hist tree synchronization test. * Check UUID of running devices. The check is only performed on CUDA version >= 10.x, as 9.x doesn't have UUID field. * Fix CMake policy and project variables. Use xgboost_SOURCE_DIR uniformly, add policy for CMake >= 3.13. * Fix copying data to CPU * Fix race condition in cpu predictor. * Fix duplicated DMatrix construction. * Don't download extra nccl in CI script.	2019-11-06 16:13:13 +08:00
Jiaming Yuan	095de3bf5f	Export c++ headers in CMake installation. (#4897 ) * Move get transpose into cc. * Clean up headers in host device vector, remove thrust dependency. * Move span and host device vector into public. * Install c++ headers. * Short notes for c and c++. Co-Authored-By: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2019-10-06 23:53:09 -04:00
Rong Ou	562bb0ae31	remove device shards (#4867 )	2019-09-25 13:15:46 +08:00
Rong Ou	125bcec62e	Move ellpack page construction into DMatrix (#4833 )	2019-09-16 23:50:55 -04:00
Rong Ou	38ab79f889	Make HostDeviceVector single gpu only (#4773 ) * Make HostDeviceVector single gpu only	2019-08-26 09:51:13 +12:00
Rong Ou	6edddd7966	Refactor DMatrix to return batches of different page types (#4686 ) * Use explicit template parameter for specifying page type.	2019-08-03 15:10:34 -04:00
Jiaming Yuan	f0064c07ab	Refactor configuration [Part II]. (#4577 ) * Refactor configuration [Part II]. * General changes: Remove `Init` methods to avoid ambiguity. Remove `Configure(std::map<>)` to avoid redundant copying and prepare for parameter validation. (`std::vector` is returned from `InitAllowUnknown`). ** Add name to tree updaters for easier debugging. * Learner changes: Make `LearnerImpl` the only source of configuration. All configurations are stored and carried out by `LearnerImpl::Configure()`. Remove booster in C API. Originally kept for "compatibility reason", but did not state why. So here we just remove it. Add a `metric_names_` field in `LearnerImpl`. Remove `LazyInit`. Configuration will always be lazy. ** Run `Configure` before every iteration. * Predictor changes: Allocate both cpu and gpu predictor. Remove cpu_predictor from gpu_predictor. `GBTree` is now used to dispatch the predictor. ** Remove some GPU Predictor tests. * IO No IO changes. The binary model format stability is tested by comparing hashing value of save models between two commits	2019-07-20 08:34:56 -04:00
Jiaming Yuan	d9a47794a5	Fix CPU hist init for sparse dataset. (#4625 ) * Fix CPU hist init for sparse dataset. * Implement sparse histogram cut. * Allow empty features. * Fix windows build, don't use sparse in distributed environment. * Comments. * Smaller threshold. * Fix windows omp. * Fix msvc lambda capture. * Fix MSVC macro. * Fix MSVC initialization list. * Fix MSVC initialization list x2. * Preserve categorical feature behavior. * Rename matrix to sparse cuts. * Reuse UseGroup. * Check for categorical data when adding cut. Co-Authored-By: Philip Hyunsu Cho <chohyu01@cs.washington.edu> * Sanity check. * Fix comments. * Fix comment.	2019-07-04 16:27:03 -07:00
sriramch	90f683b25b	Set the appropriate device before freeing device memory... (#4566 ) * - set the appropriate device before freeing device memory... - pr #4532 added a global memory tracker/logger to keep track of number of (de)allocations and peak memory usage on a per device basis. - this pr adds the appropriate check to make sure that the (de)allocation counts and memory usages makes sense for the device. since verbosity is typically increased on debug/non-retail builds. * - pre-create cub allocators and reuse them - create them once and not resize them dynamically. we need to ensure that these allocators are created and destroyed exactly once so that the appropriate device id's are set	2019-06-18 14:58:05 +12:00
Rory Mitchell	9683fd433e	Overload device memory allocation (#4532 ) * Group source files, include headers in source files * Overload device memory allocation	2019-06-10 11:35:13 +12:00
sriramch	fed665ae8a	- training with external memory part 1 of 2 (#4486 ) * - training with external memory part 1 of 2 - this pr focuses on computing the quantiles using multiple gpus on a dataset that uses the external cache capabilities - there will a follow-up pr soon after this that will support creation of histogram indices on large dataset as well - both of these changes are required to support training with external memory - the sparse pages in dmatrix are taken in batches and the the cut matrices are incrementally built - also snuck in some (perf) changes related to sketches aggregation amongst multiple features across multiple sparse page batches. instead of aggregating the summary inside each device and merged later, it is aggregated in-place when the device is working on different rows but the same feature	2019-05-30 08:18:34 +12:00
Jiaming Yuan	c589eff941	De-duplicate GPU parameters. (#4454 ) * Only define `gpu_id` and `n_gpus` in `LearnerTrainParam` * Pass LearnerTrainParam through XGBoost vid factory method. * Disable all GPU usage when GPU related parameters are not specified (fixes XGBoost choosing GPU over aggressively). * Test learner train param io. * Fix gpu pickling.	2019-05-29 11:55:57 +08:00
Jiaming Yuan	7b9043cf71	Fix clang-tidy warnings. (#4149 ) * Upgrade gtest for clang-tidy. * Use CMake to install GTest instead of mv. * Don't enforce clang-tidy to return 0 due to errors in thrust. * Add a small test for tidy itself. * Reformat.	2019-03-13 02:25:51 +08:00
Rory Mitchell	4eeeded7d1	Remove various synchronisations from cuda API calls, instrument monitor (#4205 ) * Remove various synchronisations from cuda API calls, instrument monitor with nvtx profiler ranges.	2019-03-10 15:01:23 +13:00
Rory Mitchell	93f9ce9ef9	Single precision histograms on GPU (#3965 ) * Allow single precision histogram summation in gpu_hist * Add python test, reduce run-time of gpu_hist tests * Update documentation	2018-12-10 10:55:30 +13:00
Rory Mitchell	a9d684db18	GPU performance logging/improvements (#3945 ) - Improved GPU performance logging - Only use one execute shards function - Revert performance regression on multi-GPU - Use threads to launch NCCL AllReduce	2018-11-29 14:36:51 +13:00
Jiaming Yuan	f1275f52c1	Fix specifying gpu_id, add tests. (#3851 ) * Rewrite gpu_id related code. * Remove normalised/unnormalised operatios. * Address difference between `Index' and `Device ID'. * Modify doc for `gpu_id'. * Better LOG for GPUSet. * Check specified n_gpus. * Remove inappropriate `device_idx' term. * Clarify GpuIdType and size_t.	2018-11-06 18:17:53 +13:00
trivialfis	5a7f7e7d49	Implement devices to devices reshard. (#3721 ) * Force clearing device memory before Reshard. * Remove calculating row_segments for gpu_hist and gpu_sketch. * Guard against changing device.	2018-09-28 17:40:23 +12:00
Andy Adinets	72cd1517d6	Replaced std::vector with HostDeviceVector in MetaInfo and SparsePage. (#3446 ) * Replaced std::vector with HostDeviceVector in MetaInfo and SparsePage. - added distributions to HostDeviceVector - using HostDeviceVector for labels, weights and base margings in MetaInfo - using HostDeviceVector for offset and data in SparsePage - other necessary refactoring * Added const version of HostDeviceVector API calls. - const versions added to calls that can trigger data transfers, e.g. DevicePointer() - updated the code that uses HostDeviceVector - objective functions now accept const HostDeviceVector<bst_float>& for predictions * Updated src/linear/updater_gpu_coordinate.cu. * Added read-only state for HostDeviceVector sync. - this means no copies are performed if both host and devices access the HostDeviceVector read-only * Fixed linter and test errors. - updated the lz4 plugin - added ConstDeviceSpan to HostDeviceVector - using device % dh::NVisibleDevices() for the physical device number, e.g. in calls to cudaSetDevice() * Fixed explicit template instantiation errors for HostDeviceVector. - replaced HostDeviceVector<unsigned int> with HostDeviceVector<int> * Fixed HostDeviceVector tests that require multiple GPUs. - added a mock set device handler; when set, it is called instead of cudaSetDevice()	2018-08-30 14:28:47 +12:00

1 2

53 Commits