xgboost

Author	SHA1	Message	Date
Jiaming Yuan	7d52c0b8c2	Requires setting leaf stat when expanding tree. (#5501 ) * Fix GPU Hist feature importance.	2020-04-10 12:27:03 +08:00
Jiaming Yuan	6671b42dd4	Use ellpack for prediction only when sparsepage doesn't exist. (#5504 )	2020-04-10 12:15:46 +08:00
Jiaming Yuan	0012f2ef93	Upgrade clang-tidy on CI. (#5469 ) * Correct all clang-tidy errors. * Upgrade clang-tidy to 10 on CI. Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2020-04-05 04:42:29 +08:00
Jiaming Yuan	459b175dc6	Split up test helpers header. (#5455 )	2020-04-03 10:36:53 +08:00
Jiaming Yuan	4942da64ae	Refactor tests with data generator. (#5439 )	2020-03-27 06:44:44 +08:00
Jiaming Yuan	ab7a46a1a4	Check whether current updater can modify a tree. (#5406 ) * Check whether current updater can modify a tree. * Fix tree model JSON IO for pruned trees.	2020-03-14 09:24:08 +08:00
Rory Mitchell	b745b7acce	Fix memory usage of device sketching (#5407 )	2020-03-14 13:43:24 +13:00
Rory Mitchell	3ad4333b0e	Partial rewrite EllpackPage (#5352 )	2020-03-11 10:15:53 +13:00
Rory Mitchell	a38e7bd19c	Sketching from adapters (#5365 ) * Sketching from adapters * Add weights test	2020-03-07 21:07:58 +13:00
Jiaming Yuan	8d06878bf9	Deterministic GPU histogram. (#5361 ) * Use pre-rounding based method to obtain reproducible floating point summation. * GPU Hist for regression and classification are bit-by-bit reproducible. * Add doc. * Switch to thrust reduce for `node_sum_gradient`.	2020-03-04 15:13:28 +08:00
Egor Smirnov	1b97eaf7a7	Optimized ApplySplit, BuildHist and UpdatePredictCache functions on CPU (#5244 ) * Split up sparse and dense build hist kernels. * Add `PartitionBuilder`.	2020-02-29 16:11:42 +08:00
sriramch	b81f8cbbc0	Move segment sorter to common (#5378 ) - move segment sorter to common - this is the first of a handful of pr's that splits the larger pr #5326 - it moves this facility to common (from ranking objective class), so that it can be used for metric computation - it also wraps all the bald device pointers into span.	2020-02-29 15:42:07 +08:00
Jiaming Yuan	e0509b3307	Fix pruner. (#5335 ) * Honor the tree depth. * Prevent pruning pruned node.	2020-02-25 08:32:46 +08:00
Rory Mitchell	24ad9dec0b	Testing hist_util (#5251 ) * Rank tests * Remove categorical split specialisation * Extend tests to multiple features, switch to WQSketch * Add tests for SparseCuts * Add external memory quantile tests, fix some existing tests	2020-02-14 14:36:43 +13:00
Jiaming Yuan	29eeea709a	Pass shared pointer instead of raw pointer to Learner. (#5302 ) Extracted from https://github.com/dmlc/xgboost/pull/5220 .	2020-02-11 14:16:38 +08:00
Rong Ou	e4b74c4d22	Gradient based sampling for GPU Hist (#5093 ) * Implement gradient based sampling for GPU Hist tree method. * Add samplers and handle compacted page in GPU Hist.	2020-02-04 10:31:27 +08:00
Egor Smirnov	c67163250e	Optimized BuildHist function (#5156 )	2020-01-29 23:32:57 -08:00
Jiaming Yuan	3eb1279bbf	Config for linear updaters. (#5222 )	2020-01-25 11:26:46 +08:00
Egor Smirnov	7b17e76c5b	Optimized EvaluateSplut function (#5138 ) * Add block based threading utilities.	2019-12-31 18:18:42 +08:00
Jiaming Yuan	04db125699	Quick fix for memory leak in CPU Hist. (#5153 ) Closes https://github.com/dmlc/xgboost/issues/3579 . * Don't use map.	2019-12-31 14:05:53 +08:00
Jiaming Yuan	ad4a1c732c	Small refinements for JSON model. (#5112 ) * Naming consistency. * Remove duplicated test.	2019-12-11 19:49:01 +08:00
Jiaming Yuan	208ab3b1ff	Model IO in JSON. (#5110 )	2019-12-11 11:20:40 +08:00
Jiaming Yuan	7ef5b78003	Implement JSON IO for updaters (#5094 ) * Implement JSON IO for updaters. * Remove parameters in split evaluator.	2019-12-07 00:24:00 +08:00
Jiaming Yuan	df9bdbbcb9	Fix parsing empty vector in parameter. (#5087 )	2019-12-05 11:42:01 +08:00
Rong Ou	0afcc55d98	Support multiple batches in gpu_hist (#5014 ) * Initial external memory training support for GPU Hist tree method.	2019-11-16 14:50:20 +08:00
Jiaming Yuan	97abcc7ee2	Extract interaction constraint from split evaluator. (#5034 ) * Extract interaction constraints from split evaluator. The reason for doing so is mostly for model IO, where num_feature and interaction_constraints are copied in split evaluator. Also interaction constraint by itself is a feature selector, acting like column sampler and it's inefficient to bury it deep in the evaluator chain. Lastly removing one another copied parameter is a win. * Enable inc for approx tree method. As now the implementation is spited up from evaluator class, it's also enabled for approx method. * Removing obsoleted code in colmaker. They are never documented nor actually used in real world. Also there isn't a single test for those code blocks. * Unifying the types used for row and column. As the size of input dataset is marching to billion, incorrect use of int is subject to overflow, also singed integer overflow is undefined behaviour. This PR starts the procedure for unifying used index type to unsigned integers. There's optimization that can utilize this undefined behaviour, but after some testings I don't see the optimization is beneficial to XGBoost.	2019-11-14 20:11:41 +08:00
Philip Hyunsu Cho	f4e7b707c9	Revert #4529 (#5008 ) * Revert " Optimize ‘hist’ for multi-core CPU (#4529)" This reverts commit `4d6590be3c`. * Fix build	2019-11-12 09:35:03 -08:00
Jiaming Yuan	ac457c56a2	Use `UpdateAllowUnknown' for non-model related parameter. (#4961 ) * Use `UpdateAllowUnknown' for non-model related parameter. Model parameter can not pack an additional boolean value due to binary IO format. This commit deals only with non-model related parameter configuration. * Add tidy command line arg for use-dmlc-gtest.	2019-10-23 05:50:12 -04:00
Rong Ou	5b1715d97c	Write ELLPACK pages to disk (#4879 ) * add ellpack source * add batch param * extract function to parse cache info * construct ellpack info separately * push batch to ellpack page * write ellpack page. * make sparse page source reusable	2019-10-22 23:44:32 -04:00
Jiaming Yuan	ae536756ae	Add Model and Configurable interface. (#4945 ) * Apply Configurable to objective functions. * Apply Model to Learner and Regtree, gbm. * Add Load/SaveConfig to objs. * Refactor obj tests to use smart pointer. * Dummy methods for Save/Load Model.	2019-10-18 01:56:02 -04:00
Jiaming Yuan	b61d534472	Span: use `size_t' for index_type, add` front' and `back'. (#4935 ) * Use `size_t' for index_type. Add `front' and `back'. * Remove a batch of `static_cast'.	2019-10-14 09:13:33 -04:00
Jiaming Yuan	095de3bf5f	Export c++ headers in CMake installation. (#4897 ) * Move get transpose into cc. * Clean up headers in host device vector, remove thrust dependency. * Move span and host device vector into public. * Install c++ headers. * Short notes for c and c++. Co-Authored-By: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2019-10-06 23:53:09 -04:00
Rong Ou	562bb0ae31	remove device shards (#4867 )	2019-09-25 13:15:46 +08:00
Jiaming Yuan	0b89cd1dfa	Support gamma in GPU_Hist. (#4874 ) * Just prevent building the tree instead of using an explicit pruner.	2019-09-24 10:16:08 +08:00
Rong Ou	125bcec62e	Move ellpack page construction into DMatrix (#4833 )	2019-09-16 23:50:55 -04:00
Rong Ou	733ed24dd9	further cleanup of single process multi-GPU code (#4810 ) * use subspan in gpu predictor instead of copying * Revise `HostDeviceVector`	2019-08-30 05:27:23 -04:00
Rong Ou	38ab79f889	Make HostDeviceVector single gpu only (#4773 ) * Make HostDeviceVector single gpu only	2019-08-26 09:51:13 +12:00
Jiaming Yuan	9700776597	Cudf support. (#4745 ) * Initial support for cudf integration. * Add two C APIs for consuming data and metainfo. * Add CopyFrom for SimpleCSRSource as a generic function to consume the data. * Add FromDeviceColumnar for consuming device data. * Add new MetaInfo::SetInfo for consuming label, weight etc.	2019-08-19 16:51:40 +12:00
Xu Xiao	ef9af33a00	[HOTFIX] distributed training with hist method (#4716 ) * add parallel test for hist.EvalualiteSplit * update test_openmp.py * update test_openmp.py * update test_openmp.py * update test_openmp.py * update test_openmp.py * fix OMP schedule policy * fix clang-tidy * add logging: total_num_bins * fix * fix * test * replace guided OPENMP policy with static in updater_quantile_hist.cc	2019-08-13 11:27:29 -07:00
Rong Ou	c5b229632d	[BREAKING] prevent multi-gpu usage (#4749 ) * prevent multi-gpu usage * fix distributed test * combine gpu predictor tests * set upper bound on n_gpus	2019-08-13 09:11:35 +12:00
Rong Ou	851b5b3808	Remove gpu_exact tree method (#4742 )	2019-08-07 11:43:20 +12:00
Jiaming Yuan	9c469b3844	Move bitfield into common. (#4737 ) * Prepare for columnar format support.	2019-08-06 02:49:32 -04:00
Rong Ou	6edddd7966	Refactor DMatrix to return batches of different page types (#4686 ) * Use explicit template parameter for specifying page type.	2019-08-03 15:10:34 -04:00
Jiaming Yuan	f0064c07ab	Refactor configuration [Part II]. (#4577 ) * Refactor configuration [Part II]. * General changes: Remove `Init` methods to avoid ambiguity. Remove `Configure(std::map<>)` to avoid redundant copying and prepare for parameter validation. (`std::vector` is returned from `InitAllowUnknown`). ** Add name to tree updaters for easier debugging. * Learner changes: Make `LearnerImpl` the only source of configuration. All configurations are stored and carried out by `LearnerImpl::Configure()`. Remove booster in C API. Originally kept for "compatibility reason", but did not state why. So here we just remove it. Add a `metric_names_` field in `LearnerImpl`. Remove `LazyInit`. Configuration will always be lazy. ** Run `Configure` before every iteration. * Predictor changes: Allocate both cpu and gpu predictor. Remove cpu_predictor from gpu_predictor. `GBTree` is now used to dispatch the predictor. ** Remove some GPU Predictor tests. * IO No IO changes. The binary model format stability is tested by comparing hashing value of save models between two commits	2019-07-20 08:34:56 -04:00
sriramch	7a388cbf8b	Modify caching allocator/vector and fix issues relating to inability to train large datasets (#4615 )	2019-07-09 18:33:27 +12:00
Jiaming Yuan	d9a47794a5	Fix CPU hist init for sparse dataset. (#4625 ) * Fix CPU hist init for sparse dataset. * Implement sparse histogram cut. * Allow empty features. * Fix windows build, don't use sparse in distributed environment. * Comments. * Smaller threshold. * Fix windows omp. * Fix msvc lambda capture. * Fix MSVC macro. * Fix MSVC initialization list. * Fix MSVC initialization list x2. * Preserve categorical feature behavior. * Rename matrix to sparse cuts. * Reuse UseGroup. * Check for categorical data when adding cut. Co-Authored-By: Philip Hyunsu Cho <chohyu01@cs.washington.edu> * Sanity check. * Fix comments. * Fix comment.	2019-07-04 16:27:03 -07:00
Egor Smirnov	4d6590be3c	Optimize ‘hist’ for multi-core CPU (#4529 ) * Initial performance optimizations for xgboost * remove includes * revert float->double * fix for CI * fix for CI * fix for CI * fix for CI * fix for CI * fix for CI * fix for CI * fix for CI * fix for CI * fix for CI * Check existence of _mm_prefetch and __builtin_prefetch * Fix lint * optimizations for CPU * appling comments in review * add some comments, code refactoring * fixing issues in CI * adding runtime checks * remove 1 extra check * remove extra checks in BuildHist * remove checks * add debug info * added debug info * revert changes * added comments * Apply suggestions from code review Co-Authored-By: Philip Hyunsu Cho <chohyu01@cs.washington.edu> * apply review comments * Remove unused function CreateNewNodes() * Add descriptive comment on node_idx variable in QuantileHistMaker::Builder::BuildHistsBatch()	2019-06-27 11:33:49 -07:00
Jiaming Yuan	8bdf15120a	Implement tree model dump with code generator. (#4602 ) * Implement tree model dump with a code generator. * Split up generators. * Implement graphviz generator. * Use pattern matching. * [Breaking] Return a Source in `to_graphviz` instead of Digraph in Python package. Co-Authored-By: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2019-06-26 15:20:44 +08:00
Rong Ou	6125521caf	fix compiler warning (#4588 )	2019-06-21 04:06:26 +08:00
Rory Mitchell	221e163185	Refactor out row partitioning logic from gpu_hist, introduce caching device vectors (#4554 )	2019-06-20 18:24:09 +12:00

1 2

99 Commits