xgboost

Author	SHA1	Message	Date
Rong Ou	5b1715d97c	Write ELLPACK pages to disk (#4879 ) * add ellpack source * add batch param * extract function to parse cache info * construct ellpack info separately * push batch to ellpack page * write ellpack page. * make sparse page source reusable	2019-10-22 23:44:32 -04:00
sriramch	310fe60b35	Pairwise ranking objective implementation on gpu (#4873 ) * - pairwise ranking objective implementation on gpu - there are couple of more algorithms (ndcg and map) for which support will be added as follow-up pr's - with no label groups defined, get gradient is 90x faster on gpu (120m instance mortgage dataset) - it can perform by an order of magnitude faster with ~ 10 groups (and adequate cores for the cpu implementation) * Add JSON config to rank obj.	2019-10-22 23:40:07 -04:00
Jiaming Yuan	5620322a48	[Breaking] Add global versioning. (#4936 ) * Use CMake config file for representing version. * Generate c and Python version file with CMake. The generated file is written into source tree. But unless XGBoost upgrades its version, there will be no actual modification. This retains compatibility with Makefiles for R. * Add XGBoost version the DMatrix binaries. * Simplify prefetch detection in CMakeLists.txt	2019-10-22 23:27:26 -04:00
Jiaming Yuan	7e477a2adb	Fix data loading (#4862 ) * Fix loading text data. * Fix config regex. * Try to explain the error better in exception. * Update doc.	2019-10-22 12:33:14 -04:00
Jiaming Yuan	4771bb0d41	Catch exception in transform function omp context. (#4960 )	2019-10-21 17:03:38 +08:00
Jiaming Yuan	31030a8d3a	Set correct file permission. (#4964 )	2019-10-18 12:54:29 -04:00
Jiaming Yuan	ae536756ae	Add Model and Configurable interface. (#4945 ) * Apply Configurable to objective functions. * Apply Model to Learner and Regtree, gbm. * Add Load/SaveConfig to objs. * Refactor obj tests to use smart pointer. * Dummy methods for Save/Load Model.	2019-10-18 01:56:02 -04:00
Rory Mitchell	60748b2071	Use heuristic to select histogram node, avoid rabit call (#4951 )	2019-10-18 11:33:54 +13:00
Jiaming Yuan	b61d534472	Span: use `size_t' for index_type, add` front' and `back'. (#4935 ) * Use `size_t' for index_type. Add `front' and `back'. * Remove a batch of `static_cast'.	2019-10-14 09:13:33 -04:00
Jiaming Yuan	3d46bd0fa5	Ignore columnar alignment requirement. (#4928 ) * Better error message for wrong type. * Fix stride size.	2019-10-13 06:41:43 -04:00
Oleksandr Pryimak	80977182c5	Use bundled gtest (#4900 ) * Suggest to use gtest bundled with dmlc * Use dmlc bundled gtest in all CI scripts * Make clang-tidy to use dmlc embedded gtest	2019-10-09 16:26:19 -07:00
Jiaming Yuan	095de3bf5f	Export c++ headers in CMake installation. (#4897 ) * Move get transpose into cc. * Clean up headers in host device vector, remove thrust dependency. * Move span and host device vector into public. * Install c++ headers. * Short notes for c and c++. Co-Authored-By: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2019-10-06 23:53:09 -04:00
Jiaming Yuan	d30e63a0a5	Support feature names/types for cudf. (#4902 ) * Implement most of the pandas procedure for cudf except for type conversion. * Requires an array of interfaces in metainfo.	2019-09-29 15:07:51 -04:00
Rong Ou	562bb0ae31	remove device shards (#4867 )	2019-09-25 13:15:46 +08:00
Jiaming Yuan	0b89cd1dfa	Support gamma in GPU_Hist. (#4874 ) * Just prevent building the tree instead of using an explicit pruner.	2019-09-24 10:16:08 +08:00
Jiaming Yuan	57106a3459	Fix parsing empty json object. (#4868 ) * Fix parsing empty json object. * Better error message.	2019-09-18 03:31:46 -04:00
Jiaming Yuan	5374f52531	Complete cudf support. (#4850 ) * Handles missing value. * Accept all floating point and integer types. * Move to cudf 9.0 API. * Remove requirement on `null_count`. * Arbitrary column types support.	2019-09-16 23:52:00 -04:00
Rong Ou	125bcec62e	Move ellpack page construction into DMatrix (#4833 )	2019-09-16 23:50:55 -04:00
Jiaming Yuan	a5f232feb8	Fix calling GPU predictor (#4836 ) * Fix calling GPU predictor	2019-09-05 19:09:38 -04:00
Jiaming Yuan	c0fbeff0ab	Restrict access to `cfg_` in gbm. (#4801 ) * Restrict access to `cfg_` in gbm. * Verify having correct updaters. * Remove `grow_global_histmaker` This updater is the same as `grow_histmaker`. The former is not in our document so we just remove it.	2019-09-02 00:43:19 -04:00
Rong Ou	733ed24dd9	further cleanup of single process multi-GPU code (#4810 ) * use subspan in gpu predictor instead of copying * Revise `HostDeviceVector`	2019-08-30 05:27:23 -04:00
Rong Ou	38ab79f889	Make HostDeviceVector single gpu only (#4773 ) * Make HostDeviceVector single gpu only	2019-08-26 09:51:13 +12:00
Jiaming Yuan	9700776597	Cudf support. (#4745 ) * Initial support for cudf integration. * Add two C APIs for consuming data and metainfo. * Add CopyFrom for SimpleCSRSource as a generic function to consume the data. * Add FromDeviceColumnar for consuming device data. * Add new MetaInfo::SetInfo for consuming label, weight etc.	2019-08-19 16:51:40 +12:00
Jiaming Yuan	ab357dd41c	Remove plugin, cuda related code in automake & autoconf files (#4789 ) * Build plugin example with CMake. * Remove plugin, cuda related code in automake & autoconf files. * Fix typo in GPU doc.	2019-08-18 16:54:34 -04:00
Xu Xiao	ef9af33a00	[HOTFIX] distributed training with hist method (#4716 ) * add parallel test for hist.EvalualiteSplit * update test_openmp.py * update test_openmp.py * update test_openmp.py * update test_openmp.py * update test_openmp.py * fix OMP schedule policy * fix clang-tidy * add logging: total_num_bins * fix * fix * test * replace guided OPENMP policy with static in updater_quantile_hist.cc	2019-08-13 11:27:29 -07:00
Rong Ou	c5b229632d	[BREAKING] prevent multi-gpu usage (#4749 ) * prevent multi-gpu usage * fix distributed test * combine gpu predictor tests * set upper bound on n_gpus	2019-08-13 09:11:35 +12:00
Rong Ou	19f9fd5de9	remove the qids_ field in MetaInfo (#4744 )	2019-08-08 10:01:59 +08:00
Bobby	3e2c472944	Fix model parameter recovery (#4738 )	2019-08-07 02:32:10 -04:00
Rong Ou	851b5b3808	Remove gpu_exact tree method (#4742 )	2019-08-07 11:43:20 +12:00
Jiaming Yuan	2a4df8e29f	Add Json integer, remove specialization. (#4739 )	2019-08-06 03:10:49 -04:00
Jiaming Yuan	9c469b3844	Move bitfield into common. (#4737 ) * Prepare for columnar format support.	2019-08-06 02:49:32 -04:00
Jiaming Yuan	4fe0d8203e	Specify version macro in CMake. (#4730 ) * Specify version macro in CMake. * Use `XGBOOST_DEFINITIONS` instead.	2019-08-04 06:04:04 -04:00
Rong Ou	6edddd7966	Refactor DMatrix to return batches of different page types (#4686 ) * Use explicit template parameter for specifying page type.	2019-08-03 15:10:34 -04:00
Jiaming Yuan	d2e1e4d5b4	A simple Json implementation for future use. (#4708 ) * A simple Json implementation for future use.	2019-07-29 21:17:27 -04:00
Jiaming Yuan	f0064c07ab	Refactor configuration [Part II]. (#4577 ) * Refactor configuration [Part II]. * General changes: Remove `Init` methods to avoid ambiguity. Remove `Configure(std::map<>)` to avoid redundant copying and prepare for parameter validation. (`std::vector` is returned from `InitAllowUnknown`). ** Add name to tree updaters for easier debugging. * Learner changes: Make `LearnerImpl` the only source of configuration. All configurations are stored and carried out by `LearnerImpl::Configure()`. Remove booster in C API. Originally kept for "compatibility reason", but did not state why. So here we just remove it. Add a `metric_names_` field in `LearnerImpl`. Remove `LazyInit`. Configuration will always be lazy. ** Run `Configure` before every iteration. * Predictor changes: Allocate both cpu and gpu predictor. Remove cpu_predictor from gpu_predictor. `GBTree` is now used to dispatch the predictor. ** Remove some GPU Predictor tests. * IO No IO changes. The binary model format stability is tested by comparing hashing value of save models between two commits	2019-07-20 08:34:56 -04:00
Matvey Turkov	61f764946f	fixed year to 2019 in conf.py, helpers.h and LICENSE (#4661 )	2019-07-15 12:29:12 -04:00
sriramch	7a388cbf8b	Modify caching allocator/vector and fix issues relating to inability to train large datasets (#4615 )	2019-07-09 18:33:27 +12:00
Jiaming Yuan	d9a47794a5	Fix CPU hist init for sparse dataset. (#4625 ) * Fix CPU hist init for sparse dataset. * Implement sparse histogram cut. * Allow empty features. * Fix windows build, don't use sparse in distributed environment. * Comments. * Smaller threshold. * Fix windows omp. * Fix msvc lambda capture. * Fix MSVC macro. * Fix MSVC initialization list. * Fix MSVC initialization list x2. * Preserve categorical feature behavior. * Rename matrix to sparse cuts. * Reuse UseGroup. * Check for categorical data when adding cut. Co-Authored-By: Philip Hyunsu Cho <chohyu01@cs.washington.edu> * Sanity check. * Fix comments. * Fix comment.	2019-07-04 16:27:03 -07:00
Philip Hyunsu Cho	96bf91725b	Support ndcg- and map- (#4635 )	2019-07-03 22:51:48 -07:00
Jiaming Yuan	45876bf41b	Fix external memory for get column batches. (#4622 ) * Fix external memory for get column batches. This fixes two bugs: * Use PushCSC for get column batches. * Don't remove the created temporary directory before finishing test. * Check all pages.	2019-06-30 09:56:49 +08:00
Rong Ou	63ec95623d	fix gpu predictor when dmatrix is mismatched with model (#4613 )	2019-06-28 11:03:02 +12:00
Egor Smirnov	4d6590be3c	Optimize ‘hist’ for multi-core CPU (#4529 ) * Initial performance optimizations for xgboost * remove includes * revert float->double * fix for CI * fix for CI * fix for CI * fix for CI * fix for CI * fix for CI * fix for CI * fix for CI * fix for CI * fix for CI * Check existence of _mm_prefetch and __builtin_prefetch * Fix lint * optimizations for CPU * appling comments in review * add some comments, code refactoring * fixing issues in CI * adding runtime checks * remove 1 extra check * remove extra checks in BuildHist * remove checks * add debug info * added debug info * revert changes * added comments * Apply suggestions from code review Co-Authored-By: Philip Hyunsu Cho <chohyu01@cs.washington.edu> * apply review comments * Remove unused function CreateNewNodes() * Add descriptive comment on node_idx variable in QuantileHistMaker::Builder::BuildHistsBatch()	2019-06-27 11:33:49 -07:00
Jiaming Yuan	8bdf15120a	Implement tree model dump with code generator. (#4602 ) * Implement tree model dump with a code generator. * Split up generators. * Implement graphviz generator. * Use pattern matching. * [Breaking] Return a Source in `to_graphviz` instead of Digraph in Python package. Co-Authored-By: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2019-06-26 15:20:44 +08:00
Rong Ou	6125521caf	fix compiler warning (#4588 )	2019-06-21 04:06:26 +08:00
Rory Mitchell	221e163185	Refactor out row partitioning logic from gpu_hist, introduce caching device vectors (#4554 )	2019-06-20 18:24:09 +12:00
Jiaming Yuan	ae05948e32	Feature interaction for GPU Hist. (#4534 ) * GPU hist Interaction Constraints. * Duplicate related parameters. * Add tests for CPU interaction constraint. * Add better error reporting. * Thorough tests.	2019-06-19 18:11:02 +08:00
sriramch	6757654337	Optimizations for quantisation on device (#4572 ) * - do not create device vectors for the entire sparse page while computing histograms... - while creating the compressed histogram indices, the row vector is created for the entire sparse page batch. this is needless as we only process chunks at a time based on a slice of the total gpu memory - this pr will allocate only as much as required to store the ppropriate row indices and the entries * - do not dereference row_ptrs once the device_vector has been created to elide host copies of those counts - instead, grab the entry counts directly from the sparsepage	2019-06-19 10:50:25 +12:00
sriramch	90f683b25b	Set the appropriate device before freeing device memory... (#4566 ) * - set the appropriate device before freeing device memory... - pr #4532 added a global memory tracker/logger to keep track of number of (de)allocations and peak memory usage on a per device basis. - this pr adds the appropriate check to make sure that the (de)allocation counts and memory usages makes sense for the device. since verbosity is typically increased on debug/non-retail builds. * - pre-create cub allocators and reuse them - create them once and not resize them dynamically. we need to ensure that these allocators are created and destroyed exactly once so that the appropriate device id's are set	2019-06-18 14:58:05 +12:00
Jiaming Yuan	c5719cc457	Offload some configurations into GBM. (#4553 ) This is part 1 of refactoring configuration. * Move tree heuristic configurations. * Split up declarations and definitions for GBTree. * Implement UseGPU in gbm.	2019-06-14 09:18:51 +08:00
sriramch	a2042b685a	- training with external memory - part 2 of 2 (#4526 ) * - training with external memory - part 2 of 2 - when external memory support is enabled, building of histogram indices are done incrementally for every sparse page - the entire set of input data is divided across multiple gpu's and the relative row positions within each device is tracked when building the compressed histogram buffer - this was tested using a mortgage dataset containing ~ 670m rows before 4xt4's could be saturated	2019-06-12 09:52:56 +12:00

1 2 3 4

174 Commits