xgboost

Author	SHA1	Message	Date
Jiaming Yuan	93c44a9a64	Move feature names and types of DMatrix from Python to C++. (#5858 ) * Add thread local return entry for DMatrix. * Save feature name and feature type in binary file. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2020-07-07 09:40:13 +08:00
Jiaming Yuan	1a0801238e	Implement iterative DMatrix. (#5837 )	2020-07-03 11:44:52 +08:00
Jiaming Yuan	90a9c68874	Implement a DMatrix Proxy. (#5803 )	2020-06-29 15:03:10 +08:00
Jiaming Yuan	47c89775d6	Accept string for ArrayInterface constructor. (#5799 )	2020-06-27 00:06:54 +08:00
Jiaming Yuan	c4d721200a	Implement extend method for meta info. (#5800 ) * Implement extend for host device vector.	2020-06-20 03:32:03 +08:00
Jiaming Yuan	38ee514787	Implement fast number serialization routines. (#5772 ) * Implement ryu algorithm. * Implement integer printing. * Full coverage roundtrip test.	2020-06-17 12:39:23 +08:00
fis	7c3a168ffd	Revert "Accept string for ArrayInterface constructor." This reverts commit `e8ecafb8dc`.	2020-06-16 20:02:35 +08:00
fis	e8ecafb8dc	Accept string for ArrayInterface constructor.	2020-06-16 20:00:24 +08:00
Rory Mitchell	b47b5ac771	Use hypothesis (#5759 ) * Use hypothesis * Allow int64 array interface for groups * Add packages to Windows CI * Add to travis * Make sure device index is set correctly * Fix dask-cudf test * appveyor	2020-06-16 12:45:59 +12:00
Jiaming Yuan	306e38ff31	Avoid including `c_api.h` in header files. (#5782 )	2020-06-12 16:24:24 +08:00
Jiaming Yuan	cacff9232a	Remove column major specialization. (#5755 ) Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2020-06-05 16:19:14 +08:00
Jiaming Yuan	8438c7d0e4	Fix IsDense. (#5702 )	2020-05-26 08:24:37 +08:00
Jiaming Yuan	eaf2a00b5c	Enhance nvtx support. (#5636 )	2020-05-06 22:54:24 +08:00
Jiaming Yuan	e726dd9902	Set device in device dmatrix. (#5596 )	2020-04-25 13:42:53 +08:00
Jiaming Yuan	29a4cfe400	Group aware GPU sketching. (#5551 ) * Group aware GPU weighted sketching. * Distribute group weights to each data point. * Relax the test. * Validate input meta info. * Fix metainfo copy ctor.	2020-04-20 17:18:52 +08:00
Jiaming Yuan	e1f22baf8c	Fix slice and get info. (#5552 )	2020-04-18 18:00:13 +08:00
Rory Mitchell	ca4e05660e	Purge device_helpers.cuh (#5534 ) * Simplifications with caching_device_vector * Purge device helpers	2020-04-15 21:51:56 +12:00
Jiaming Yuan	6671b42dd4	Use ellpack for prediction only when sparsepage doesn't exist. (#5504 )	2020-04-10 12:15:46 +08:00
Jiaming Yuan	0012f2ef93	Upgrade clang-tidy on CI. (#5469 ) * Correct all clang-tidy errors. * Upgrade clang-tidy to 10 on CI. Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2020-04-05 04:42:29 +08:00
Jiaming Yuan	459b175dc6	Split up test helpers header. (#5455 )	2020-04-03 10:36:53 +08:00
Jiaming Yuan	29c6ad943a	Prevent copying SimpleDMatrix. (#5453 ) * Set default dtor for SimpleDMatrix to initialize default copy ctor, which is deleted due to unique ptr. * Remove commented code. * Remove warning for calling host function (std::max). * Remove warning for initialization order. * Remove warning for unused variables.	2020-04-02 07:01:49 +08:00
Rory Mitchell	13b10a6370	Device dmatrix (#5420 )	2020-03-28 14:42:21 +13:00
Jiaming Yuan	4942da64ae	Refactor tests with data generator. (#5439 )	2020-03-27 06:44:44 +08:00
Rory Mitchell	b745b7acce	Fix memory usage of device sketching (#5407 )	2020-03-14 13:43:24 +13:00
Rory Mitchell	3ad4333b0e	Partial rewrite EllpackPage (#5352 )	2020-03-11 10:15:53 +13:00
Rory Mitchell	a38e7bd19c	Sketching from adapters (#5365 ) * Sketching from adapters * Add weights test	2020-03-07 21:07:58 +13:00
Jiaming Yuan	f2b8cd2922	Add number of columns to native data iterator. (#5202 ) * Change native data iter into an adapter.	2020-02-25 23:42:01 +08:00
Rory Mitchell	b0ed3f0a66	Remove unnecessary DMatrix methods (#5324 )	2020-02-25 12:40:39 +13:00
Jiaming Yuan	655cf17b60	Predict on Ellpack. (#5327 ) * Unify GPU prediction node. * Add `PageExists`. * Dispatch prediction on input data for GPU Predictor.	2020-02-23 06:27:03 +08:00
Rory Mitchell	bc96ceb8b2	Refactor SparsePageSource, delete cache files after use (#5321 ) * Refactor sparse page source * Delete temporary cache files * Log fatal if cache exists * Log fatal if multiple threads used with prefetcher	2020-02-19 16:43:41 +13:00
Rory Mitchell	b2b2c4e231	Remove SimpleCSRSource (#5315 )	2020-02-18 16:49:17 +13:00
Rong Ou	e4b74c4d22	Gradient based sampling for GPU Hist (#5093 ) * Implement gradient based sampling for GPU Hist tree method. * Add samplers and handle compacted page in GPU Hist.	2020-02-04 10:31:27 +08:00
Jiaming Yuan	fe8d72b50b	Cleanup warnings. (#5247 ) From clang-tidy-9 and gcc-7: Invalid case style, narrowing definition, wrong initialization order, unused variables.	2020-01-31 14:52:15 +08:00
Philip Hyunsu Cho	44469a0ca9	Extensible binary serialization format for DMatrix::MetaInfo (#5187 ) * Turn xgboost::DataType into C++11 enum class * New binary serialization format for DMatrix::MetaInfo * Fix clang-tidy * Fix c++ test * Implement new format proposal * Move helper functions to anonymous namespace; remove unneeded field * Fix lint * Add shape. * Keep only roundtrip test. * Fix test. * various fixes * Update data.cc Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>	2020-01-23 11:33:17 -08:00
Rory Mitchell	9c56480c61	Support dmatrix construction from cupy array (#5206 )	2020-01-22 13:15:27 +13:00
Rory Mitchell	a73e25e15f	Implement slice via adapters (#5198 )	2020-01-14 12:55:41 +13:00
Rory Mitchell	8cbcc53ccb	Remove old cudf constructor code (#5194 )	2020-01-10 16:35:23 +13:00
Rory Mitchell	87ebfc1315	Implement cudf construction with adapters. (#5189 )	2020-01-09 20:23:06 +13:00
Jiaming Yuan	61286c6e8f	Fix wrapping GPU ID and prevent data copying. (#5160 ) * Removed some data copying. * Make sure gpu_id is valid before any configuration is carried out.	2019-12-27 16:51:08 +08:00
Rory Mitchell	3d04a8cc97	Use dynamic types for array interface columns instead of templates (#5108 )	2019-12-21 16:08:10 +13:00
Rory Mitchell	c7cc657a4d	Use adapters for SparsePageDMatrix (#5092 )	2019-12-11 15:59:23 +13:00
Rory Mitchell	979f74d51a	Group builder modified for incremental building (#5098 )	2019-12-10 14:33:56 +13:00
Jiaming Yuan	64af1ecf86	[Breaking] Remove num roots. (#5059 )	2019-12-05 21:58:43 +08:00
Rory Mitchell	e3c34c79be	External data adapters (#5044 ) * Use external data adapters as lightweight intermediate layer between external data and DMatrix	2019-12-04 10:56:17 +13:00
Jiaming Yuan	d667ea9335	[CI] Fix Travis tests. (#5062 ) - Install wget explicitly to match openssl. - Install CMake explicitly. - Use newer miniconda link. - Reenable unittests. - gcc@9 + xcode@10 for osx due to missing <_stdio.h>. Other versions of gcc should also work. But as homebrew pour gcc@9 after update by default, so I just stick with latest version. - Disabled one external memory test for OSX. Not sure about the thread implementation in there and fixing external memory is beyond the scope of this PR. - Use Python3 with conda in jvm package.	2019-11-25 03:32:10 +08:00
Rong Ou	0afcc55d98	Support multiple batches in gpu_hist (#5014 ) * Initial external memory training support for GPU Hist tree method.	2019-11-16 14:50:20 +08:00
Jiaming Yuan	97abcc7ee2	Extract interaction constraint from split evaluator. (#5034 ) * Extract interaction constraints from split evaluator. The reason for doing so is mostly for model IO, where num_feature and interaction_constraints are copied in split evaluator. Also interaction constraint by itself is a feature selector, acting like column sampler and it's inefficient to bury it deep in the evaluator chain. Lastly removing one another copied parameter is a win. * Enable inc for approx tree method. As now the implementation is spited up from evaluator class, it's also enabled for approx method. * Removing obsoleted code in colmaker. They are never documented nor actually used in real world. Also there isn't a single test for those code blocks. * Unifying the types used for row and column. As the size of input dataset is marching to billion, incorrect use of int is subject to overflow, also singed integer overflow is undefined behaviour. This PR starts the procedure for unifying used index type to unsigned integers. There's optimization that can utilize this undefined behaviour, but after some testings I don't see the optimization is beneficial to XGBoost.	2019-11-14 20:11:41 +08:00
Rong Ou	5b1715d97c	Write ELLPACK pages to disk (#4879 ) * add ellpack source * add batch param * extract function to parse cache info * construct ellpack info separately * push batch to ellpack page * write ellpack page. * make sparse page source reusable	2019-10-22 23:44:32 -04:00
Jiaming Yuan	5620322a48	[Breaking] Add global versioning. (#4936 ) * Use CMake config file for representing version. * Generate c and Python version file with CMake. The generated file is written into source tree. But unless XGBoost upgrades its version, there will be no actual modification. This retains compatibility with Makefiles for R. * Add XGBoost version the DMatrix binaries. * Simplify prefetch detection in CMakeLists.txt	2019-10-22 23:27:26 -04:00
Jiaming Yuan	7e477a2adb	Fix data loading (#4862 ) * Fix loading text data. * Fix config regex. * Try to explain the error better in exception. * Update doc.	2019-10-22 12:33:14 -04:00

1 2

83 Commits