xgboost

Author	SHA1	Message	Date
Rory Mitchell	8cbcc53ccb	Remove old cudf constructor code (#5194 )	2020-01-10 16:35:23 +13:00
Rory Mitchell	87ebfc1315	Implement cudf construction with adapters. (#5189 )	2020-01-09 20:23:06 +13:00
Jiaming Yuan	61286c6e8f	Fix wrapping GPU ID and prevent data copying. (#5160 ) * Removed some data copying. * Make sure gpu_id is valid before any configuration is carried out.	2019-12-27 16:51:08 +08:00
Rory Mitchell	3d04a8cc97	Use dynamic types for array interface columns instead of templates (#5108 )	2019-12-21 16:08:10 +13:00
Rory Mitchell	c7cc657a4d	Use adapters for SparsePageDMatrix (#5092 )	2019-12-11 15:59:23 +13:00
Rory Mitchell	979f74d51a	Group builder modified for incremental building (#5098 )	2019-12-10 14:33:56 +13:00
Jiaming Yuan	64af1ecf86	[Breaking] Remove num roots. (#5059 )	2019-12-05 21:58:43 +08:00
Rory Mitchell	e3c34c79be	External data adapters (#5044 ) * Use external data adapters as lightweight intermediate layer between external data and DMatrix	2019-12-04 10:56:17 +13:00
Jiaming Yuan	d667ea9335	[CI] Fix Travis tests. (#5062 ) - Install wget explicitly to match openssl. - Install CMake explicitly. - Use newer miniconda link. - Reenable unittests. - gcc@9 + xcode@10 for osx due to missing <_stdio.h>. Other versions of gcc should also work. But as homebrew pour gcc@9 after update by default, so I just stick with latest version. - Disabled one external memory test for OSX. Not sure about the thread implementation in there and fixing external memory is beyond the scope of this PR. - Use Python3 with conda in jvm package.	2019-11-25 03:32:10 +08:00
Rong Ou	0afcc55d98	Support multiple batches in gpu_hist (#5014 ) * Initial external memory training support for GPU Hist tree method.	2019-11-16 14:50:20 +08:00
Jiaming Yuan	97abcc7ee2	Extract interaction constraint from split evaluator. (#5034 ) * Extract interaction constraints from split evaluator. The reason for doing so is mostly for model IO, where num_feature and interaction_constraints are copied in split evaluator. Also interaction constraint by itself is a feature selector, acting like column sampler and it's inefficient to bury it deep in the evaluator chain. Lastly removing one another copied parameter is a win. * Enable inc for approx tree method. As now the implementation is spited up from evaluator class, it's also enabled for approx method. * Removing obsoleted code in colmaker. They are never documented nor actually used in real world. Also there isn't a single test for those code blocks. * Unifying the types used for row and column. As the size of input dataset is marching to billion, incorrect use of int is subject to overflow, also singed integer overflow is undefined behaviour. This PR starts the procedure for unifying used index type to unsigned integers. There's optimization that can utilize this undefined behaviour, but after some testings I don't see the optimization is beneficial to XGBoost.	2019-11-14 20:11:41 +08:00
Rong Ou	5b1715d97c	Write ELLPACK pages to disk (#4879 ) * add ellpack source * add batch param * extract function to parse cache info * construct ellpack info separately * push batch to ellpack page * write ellpack page. * make sparse page source reusable	2019-10-22 23:44:32 -04:00
Jiaming Yuan	5620322a48	[Breaking] Add global versioning. (#4936 ) * Use CMake config file for representing version. * Generate c and Python version file with CMake. The generated file is written into source tree. But unless XGBoost upgrades its version, there will be no actual modification. This retains compatibility with Makefiles for R. * Add XGBoost version the DMatrix binaries. * Simplify prefetch detection in CMakeLists.txt	2019-10-22 23:27:26 -04:00
Jiaming Yuan	7e477a2adb	Fix data loading (#4862 ) * Fix loading text data. * Fix config regex. * Try to explain the error better in exception. * Update doc.	2019-10-22 12:33:14 -04:00
Jiaming Yuan	b61d534472	Span: use `size_t' for index_type, add` front' and `back'. (#4935 ) * Use `size_t' for index_type. Add `front' and `back'. * Remove a batch of `static_cast'.	2019-10-14 09:13:33 -04:00
Jiaming Yuan	3d46bd0fa5	Ignore columnar alignment requirement. (#4928 ) * Better error message for wrong type. * Fix stride size.	2019-10-13 06:41:43 -04:00
Jiaming Yuan	d30e63a0a5	Support feature names/types for cudf. (#4902 ) * Implement most of the pandas procedure for cudf except for type conversion. * Requires an array of interfaces in metainfo.	2019-09-29 15:07:51 -04:00
Jiaming Yuan	5374f52531	Complete cudf support. (#4850 ) * Handles missing value. * Accept all floating point and integer types. * Move to cudf 9.0 API. * Remove requirement on `null_count`. * Arbitrary column types support.	2019-09-16 23:52:00 -04:00
Rong Ou	125bcec62e	Move ellpack page construction into DMatrix (#4833 )	2019-09-16 23:50:55 -04:00
Jiaming Yuan	9700776597	Cudf support. (#4745 ) * Initial support for cudf integration. * Add two C APIs for consuming data and metainfo. * Add CopyFrom for SimpleCSRSource as a generic function to consume the data. * Add FromDeviceColumnar for consuming device data. * Add new MetaInfo::SetInfo for consuming label, weight etc.	2019-08-19 16:51:40 +12:00
Rong Ou	19f9fd5de9	remove the qids_ field in MetaInfo (#4744 )	2019-08-08 10:01:59 +08:00
Rong Ou	6edddd7966	Refactor DMatrix to return batches of different page types (#4686 ) * Use explicit template parameter for specifying page type.	2019-08-03 15:10:34 -04:00
Jiaming Yuan	d9a47794a5	Fix CPU hist init for sparse dataset. (#4625 ) * Fix CPU hist init for sparse dataset. * Implement sparse histogram cut. * Allow empty features. * Fix windows build, don't use sparse in distributed environment. * Comments. * Smaller threshold. * Fix windows omp. * Fix msvc lambda capture. * Fix MSVC macro. * Fix MSVC initialization list. * Fix MSVC initialization list x2. * Preserve categorical feature behavior. * Rename matrix to sparse cuts. * Reuse UseGroup. * Check for categorical data when adding cut. Co-Authored-By: Philip Hyunsu Cho <chohyu01@cs.washington.edu> * Sanity check. * Fix comments. * Fix comment.	2019-07-04 16:27:03 -07:00
Jiaming Yuan	45876bf41b	Fix external memory for get column batches. (#4622 ) * Fix external memory for get column batches. This fixes two bugs: * Use PushCSC for get column batches. * Don't remove the created temporary directory before finishing test. * Check all pages.	2019-06-30 09:56:49 +08:00
sriramch	6e16900711	Fix crash with approx tree method on cpu (#4510 )	2019-05-30 01:11:29 +08:00
Rong Ou	feb6ae3e18	Initial support for external memory in gpu_predictor (#4284 )	2019-05-03 13:01:27 +12:00
Rong Ou	f4521bf6aa	refactor tests to get rid of duplication (#4358 ) * refactor tests to get rid of duplication * address review comments	2019-04-12 00:21:48 -07:00
Rong Ou	81c1cd40ca	add a test for cpu predictor using external memory (#4308 ) * add a test for cpu predictor using external memory * allow different page size for testing	2019-04-10 13:25:10 +12:00
Jiaming Yuan	7b9043cf71	Fix clang-tidy warnings. (#4149 ) * Upgrade gtest for clang-tidy. * Use CMake to install GTest instead of mv. * Don't enforce clang-tidy to return 0 due to errors in thrust. * Add a small test for tidy itself. * Reformat.	2019-03-13 02:25:51 +08:00
Jiaming Yuan	7ea5675679	Add PushCSC for SparsePage. (#4193 ) * Add PushCSC for SparsePage. * Move Push* definitions into cc file. * Add std:: prefix to `size_t` make clang++ happy. * Address monitor count == 0.	2019-03-02 01:58:08 +08:00
Philip Hyunsu Cho	abf2f661be	Fix #3708 : Use dmlc::TemporaryDirectory to handle temporaries in cross-platform way (#3783 ) * Fix #3708: Use dmlc::TemporaryDirectory to handle temporaries in cross-platform way Also install git inside NVIDIA GPU container * Update dmlc-core	2018-10-18 10:16:04 -07:00
Rory Mitchell	70d208d68c	Dmatrix refactor stage 2 (#3395 ) * DMatrix refactor 2 * Remove buffered rowset usage where possible * Transition to c++11 style iterators for row access * Transition column iterators to C++ 11	2018-10-01 01:29:03 +13:00
Andy Adinets	72cd1517d6	Replaced std::vector with HostDeviceVector in MetaInfo and SparsePage. (#3446 ) * Replaced std::vector with HostDeviceVector in MetaInfo and SparsePage. - added distributions to HostDeviceVector - using HostDeviceVector for labels, weights and base margings in MetaInfo - using HostDeviceVector for offset and data in SparsePage - other necessary refactoring * Added const version of HostDeviceVector API calls. - const versions added to calls that can trigger data transfers, e.g. DevicePointer() - updated the code that uses HostDeviceVector - objective functions now accept const HostDeviceVector<bst_float>& for predictions * Updated src/linear/updater_gpu_coordinate.cu. * Added read-only state for HostDeviceVector sync. - this means no copies are performed if both host and devices access the HostDeviceVector read-only * Fixed linter and test errors. - updated the lz4 plugin - added ConstDeviceSpan to HostDeviceVector - using device % dh::NVisibleDevices() for the physical device number, e.g. in calls to cudaSetDevice() * Fixed explicit template instantiation errors for HostDeviceVector. - replaced HostDeviceVector<unsigned int> with HostDeviceVector<int> * Fixed HostDeviceVector tests that require multiple GPUs. - added a mock set device handler; when set, it is called instead of cudaSetDevice()	2018-08-30 14:28:47 +12:00
Rory Mitchell	78bea0d204	Add google test for a column sampling, restore metainfo tests (#3637 ) * Add google test for a column sampling, restore metainfo tests * Update metainfo test for visual studio * Fix multi-GPU bug introduced in #3635	2018-08-28 16:10:26 +12:00
trivialfis	cf2d86a4f6	Add travis sanitizers tests. (#3557 ) * Add travis sanitizers tests. * Add gcc-7 in Travis. * Add SANITIZER_PATH for CMake. * Enable sanitizer tests in Travis. * Fix memory leaks in tests. * Fix all memory leaks reported by Address Sanitizer. * tests/cpp/helpers.h/CreateDMatrix now returns raw pointer.	2018-08-19 16:40:30 +12:00
trivialfis	2c502784ff	Span class. (#3548 ) * Add basic Span class based on ISO++20. * Use Span<Entry const> instead of Inst in SparsePage. * Add DeviceSpan in HostDeviceVector, use it in regression obj.	2018-08-14 17:58:11 +12:00
Rory Mitchell	bbb771f32e	Refactor parts of fast histogram utilities (#3564 ) * Refactor parts of fast histogram utilities * Removed byte packing from column matrix	2018-08-09 17:59:57 +12:00
liuliang01	0cf88d036f	Add qid like ranklib format (#2749 ) * add qid for https://github.com/dmlc/xgboost/issues/2748 * change names * change spaces * change qid to bst_uint type * change qid type to size_t * change qid first to SIZE_MAX * change qid type from size_t to uint64_t * update dmlc-core * fix qids name error * fix group_ptr_ error * Style fix * Add qid handling logic to SparsePage * New MetaInfo format + backward compatibility fix Old MetaInfo format (1.0) doesn't contain qid field. We still want to be able to read from MetaInfo files saved in old format. Also, define a new format (2.0) that contains the qid field. This way, we can distinguish files that contain qid and those that do not. * Update MetaInfo test * Simply group assignment logic * Explicitly set qid=nullptr in NativeDataIter NativeDataIter's callback does not support qid field. Users of NativeDataIter will need to call setGroup() function separately to set group information. * Save qids_ in SaveBinary() * Upgrade dmlc-core submodule * Add a test for reading qid * Add contributor * Check the size of qids_ * Document qid format	2018-06-30 20:24:03 +00:00
Rory Mitchell	a96039141a	Dmatrix refactor stage 1 (#3301 ) * Use sparse page as singular CSR matrix representation * Simplify dmatrix methods * Reduce statefullness of batch iterators * BREAKING CHANGE: Remove prob_buffer_row parameter. Users are instead recommended to sample their dataset as a preprocessing step before using XGBoost.	2018-06-07 10:25:58 +12:00
Rory Mitchell	ccf80703ef	Clang-tidy static analysis (#3222 ) * Clang-tidy static analysis * Modernise checks * Google coding standard checks * Identifier renaming according to Google style	2018-04-19 18:57:13 +12:00
Rory Mitchell	10eb05a63a	Refactor linear modelling and add new coordinate descent updater (#3103 ) * Refactor linear modelling and add new coordinate descent updater * Allow unsorted column iterator * Add prediction cacheing to gblinear	2018-02-17 09:17:01 +13:00
Thejaswi	85b2fb3eee	[GPU-Plugin] Integration of a faster version of grow_gpu plugin into mainstream (#2360 ) * Integrating a faster version of grow_gpu plugin 1. Removed the older files to reduce duplication 2. Moved all of the grow_gpu files under 'exact' folder 3. All of them are inside 'exact' namespace to avoid any conflicts 4. Fixed a bug in benchmark.py while running only 'grow_gpu' plugin 5. Added cub and googletest submodules to ease integration and unit-testing 6. Updates to CMakeLists.txt to directly build cuda objects into libxgboost * Added support for building gpu plugins through make flow 1. updated makefile and config.mk to add right targets 2. added unit-tests for gpu exact plugin code * 1. Added support for building gpu plugin using 'make' flow as well 2. Updated instructions for building and testing gpu plugin * Fix travis-ci errors for PR#2360 1. lint errors on unit-tests 2. removed googletest, instead depended upon dmlc-core provide gtest cache * Some more fixes to travis-ci lint failures PR#2360 * Added Rory's copyrights to the files containing code from both. * updated copyright statement as per Rory's request * moved the static datasets into a script to generate them at runtime * 1. memory usage print when silent=0 2. tests/ and test/ folder organization 3. removal of the dependency of googletest for just building xgboost 4. coding style updates for .cuh as well * Fixes for compilation warnings * add cuda object files as well when JVM_BINDINGS=ON	2017-06-06 09:39:53 +12:00
AbdealiJK	d6407c3746	tests/cpp: Add tests for SparsePageDMatrix The SparsePageDMatrix or external memory DMatrix reads data from the file IO rather than load it into RAM.	2016-12-04 11:25:57 -08:00
AbdealiJK	c3629c91d3	tests/cpp: Add tests for SimpleCSRSource Test the binary format saved and read by a SimpleDMatrix, which is internally the SimpleCSRSource.	2016-12-04 11:25:57 -08:00
AbdealiJK	be0f55d563	tests/cpp: Add tests for SimpleDMatrix	2016-12-04 11:25:57 -08:00
AbdealiJK	ef7fe06cf8	tests/cpp/test_metainfo: Add tests to save and load	2016-12-04 11:25:57 -08:00
AbdealiJK	1f2ad36bad	Add make commands for tests This adds the make commands required to build and run tests.	2016-12-04 11:25:57 -08:00

47 Commits