xgboost

Author	SHA1	Message	Date
Rory Mitchell	90cce38236	Remove single_precision_histogram for gpu_hist (#7828 )	2022-05-03 14:53:19 +02:00
Jiaming Yuan	fdf533f2b9	[POC] Experimental support for l1 error. (#7812 ) Support adaptive tree, a feature supported by both sklearn and lightgbm. The tree leaf is recomputed based on residue of labels and predictions after construction. For l1 error, the optimal value is the median (50 percentile). This is marked as experimental support for the following reasons: - The value is not well defined for distributed training, where we might have empty leaves for local workers. Right now I just use the original leaf value for computing the average with other workers, which might cause significant errors. - Some follow-ups are required, for exact, pruner, and optimization for quantile function. Also, we need to calculate the initial estimation.	2022-04-26 21:41:55 +08:00
Jiaming Yuan	996cc705af	Small cleanup to hist tree method. (#7735 ) * Remove special optimization using number of bins. * Remove 1-based index for column sampling. * Remove data layout. * Unify update prediction cache.	2022-03-20 03:44:55 +08:00
Jiaming Yuan	0d0abe1845	Support optimal partitioning for GPU hist. (#7652 ) * Implement `MaxCategory` in quantile. * Implement partition-based split for GPU evaluation. Currently, it's based on the existing evaluation function. * Extract an evaluator from GPU Hist to store the needed states. * Added some CUDA stream/event utilities. * Update document with references. * Fixed a bug in approx evaluator where the number of data points is less than the number of categories.	2022-02-15 03:03:12 +08:00
Jiaming Yuan	001503186c	Rewrite approx (#7214 ) This PR rewrites the approx tree method to use codebase from hist for better performance and code sharing. The rewrite has many benefits: - Support for both `max_leaves` and `max_depth`. - Support for `grow_policy`. - Support for mono constraint. - Support for feature weights. - Support for easier bin configuration (`max_bin`). - Support for categorical data. - Faster performance for most of the datasets. (many times faster) - Support for prediction cache. - Significantly better performance for external memory. - Unites the code base between approx and hist.	2022-01-10 21:15:05 +08:00
Jiaming Yuan	6ede12412c	Update dmlc-core and use data iter for GPU sampling tests. (#7398 ) * Update dmlc-core. * New parquet parser in dmlc-core. * Use data iter for GPU sampling tests.	2021-11-06 05:12:49 +08:00
Jiaming Yuan	b06040b6d0	Implement a general array view. (#7365 ) * Replace existing matrix and vector view. This is to prepare for handling higher dimension data and prediction when we support multi-target models.	2021-11-05 04:16:11 +08:00
Jiaming Yuan	4100827971	Pass infomation about objective to tree methods. (#7385 ) * Define the `ObjInfo` and pass it down to every tree updater.	2021-11-04 01:52:44 +08:00
Jiaming Yuan	7a1d67f9cb	[breaking] Use integer atomic for GPU histogram. (#7180 ) On GPU we use rouding factor to truncate the gradient for deterministic results. This PR changes the gradient representation to fixed point number with exponent aligned with rounding factor. [breaking] Drop non-deterministic histogram. Use fixed point for shared memory. This PR is to improve the performance of GPU Hist. Co-authored-by: Andy Adinets <aadinets@nvidia.com>	2021-08-28 05:17:05 +08:00
Jiaming Yuan	bd1f3a38f0	Rewrite sparse dmatrix using callbacks. (#7092 ) - Reduce dependency on dmlc parsers and provide an interface for users to load data by themselves. - Remove use of threaded iterator and IO queue. - Remove `page_size`. - Make sure the number of pages in memory is bounded. - Make sure the cache can not be violated. - Provide an interface for internal algorithms to process data asynchronously.	2021-07-16 12:33:31 +08:00
ShvetsKS	57c732655e	Merge lossgude and depthwise strategies for CPU hist (#7007 ) * fix java/scala test: max depth is also valid parameter for lossguide Co-authored-by: Kirill Shvets <kirill.shvets@intel.com>	2021-06-03 01:49:43 +08:00
Jiaming Yuan	556a83022d	Implement unified update prediction cache for (gpu_)hist. (#6860 ) * Implement utilites for linalg. * Unify the update prediction cache functions. * Implement update prediction cache for multi-class gpu hist.	2021-04-17 00:29:34 +08:00
Jiaming Yuan	4f75f514ce	Fix GPU RF (#6755 ) * Fix sampling.	2021-03-17 06:23:35 +08:00
Jiaming Yuan	444131a2e6	Add categorical data support to GPU Hist. (#6164 )	2020-09-29 11:27:25 +08:00
Jiaming Yuan	2fcc4f2886	Unify evaluation functions. (#6037 )	2020-08-26 14:23:27 +08:00
Jiaming Yuan	a144daf034	Limit tree depth for GPU hist. (#6045 )	2020-08-22 19:34:52 +08:00
Jiaming Yuan	4d99c58a5f	Feature weights (#5962 )	2020-08-18 19:55:41 +08:00
Rory Mitchell	b9649e7b8e	Refactor gpu_hist split evaluation (#5610 ) * Refactor * Rewrite evaluate splits * Add more tests	2020-04-30 08:58:12 +12:00
Andy Adinets	73142041b9	For histograms, opting into maximum shared memory available per block. (#5491 )	2020-04-21 14:56:42 +12:00
Rory Mitchell	ca4e05660e	Purge device_helpers.cuh (#5534 ) * Simplifications with caching_device_vector * Purge device helpers	2020-04-15 21:51:56 +12:00
Jiaming Yuan	6671b42dd4	Use ellpack for prediction only when sparsepage doesn't exist. (#5504 )	2020-04-10 12:15:46 +08:00
Jiaming Yuan	0012f2ef93	Upgrade clang-tidy on CI. (#5469 ) * Correct all clang-tidy errors. * Upgrade clang-tidy to 10 on CI. Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2020-04-05 04:42:29 +08:00
Jiaming Yuan	459b175dc6	Split up test helpers header. (#5455 )	2020-04-03 10:36:53 +08:00
Jiaming Yuan	4942da64ae	Refactor tests with data generator. (#5439 )	2020-03-27 06:44:44 +08:00
Rory Mitchell	b745b7acce	Fix memory usage of device sketching (#5407 )	2020-03-14 13:43:24 +13:00
Rory Mitchell	3ad4333b0e	Partial rewrite EllpackPage (#5352 )	2020-03-11 10:15:53 +13:00
Rory Mitchell	a38e7bd19c	Sketching from adapters (#5365 ) * Sketching from adapters * Add weights test	2020-03-07 21:07:58 +13:00
Jiaming Yuan	8d06878bf9	Deterministic GPU histogram. (#5361 ) * Use pre-rounding based method to obtain reproducible floating point summation. * GPU Hist for regression and classification are bit-by-bit reproducible. * Add doc. * Switch to thrust reduce for `node_sum_gradient`.	2020-03-04 15:13:28 +08:00
Rory Mitchell	24ad9dec0b	Testing hist_util (#5251 ) * Rank tests * Remove categorical split specialisation * Extend tests to multiple features, switch to WQSketch * Add tests for SparseCuts * Add external memory quantile tests, fix some existing tests	2020-02-14 14:36:43 +13:00
Jiaming Yuan	29eeea709a	Pass shared pointer instead of raw pointer to Learner. (#5302 ) Extracted from https://github.com/dmlc/xgboost/pull/5220 .	2020-02-11 14:16:38 +08:00
Rong Ou	e4b74c4d22	Gradient based sampling for GPU Hist (#5093 ) * Implement gradient based sampling for GPU Hist tree method. * Add samplers and handle compacted page in GPU Hist.	2020-02-04 10:31:27 +08:00
Jiaming Yuan	ad4a1c732c	Small refinements for JSON model. (#5112 ) * Naming consistency. * Remove duplicated test.	2019-12-11 19:49:01 +08:00
Jiaming Yuan	7ef5b78003	Implement JSON IO for updaters (#5094 ) * Implement JSON IO for updaters. * Remove parameters in split evaluator.	2019-12-07 00:24:00 +08:00
Rong Ou	0afcc55d98	Support multiple batches in gpu_hist (#5014 ) * Initial external memory training support for GPU Hist tree method.	2019-11-16 14:50:20 +08:00
Jiaming Yuan	97abcc7ee2	Extract interaction constraint from split evaluator. (#5034 ) * Extract interaction constraints from split evaluator. The reason for doing so is mostly for model IO, where num_feature and interaction_constraints are copied in split evaluator. Also interaction constraint by itself is a feature selector, acting like column sampler and it's inefficient to bury it deep in the evaluator chain. Lastly removing one another copied parameter is a win. * Enable inc for approx tree method. As now the implementation is spited up from evaluator class, it's also enabled for approx method. * Removing obsoleted code in colmaker. They are never documented nor actually used in real world. Also there isn't a single test for those code blocks. * Unifying the types used for row and column. As the size of input dataset is marching to billion, incorrect use of int is subject to overflow, also singed integer overflow is undefined behaviour. This PR starts the procedure for unifying used index type to unsigned integers. There's optimization that can utilize this undefined behaviour, but after some testings I don't see the optimization is beneficial to XGBoost.	2019-11-14 20:11:41 +08:00
Rong Ou	5b1715d97c	Write ELLPACK pages to disk (#4879 ) * add ellpack source * add batch param * extract function to parse cache info * construct ellpack info separately * push batch to ellpack page * write ellpack page. * make sparse page source reusable	2019-10-22 23:44:32 -04:00
Rong Ou	562bb0ae31	remove device shards (#4867 )	2019-09-25 13:15:46 +08:00
Jiaming Yuan	0b89cd1dfa	Support gamma in GPU_Hist. (#4874 ) * Just prevent building the tree instead of using an explicit pruner.	2019-09-24 10:16:08 +08:00
Rong Ou	125bcec62e	Move ellpack page construction into DMatrix (#4833 )	2019-09-16 23:50:55 -04:00
Rong Ou	733ed24dd9	further cleanup of single process multi-GPU code (#4810 ) * use subspan in gpu predictor instead of copying * Revise `HostDeviceVector`	2019-08-30 05:27:23 -04:00
Rong Ou	38ab79f889	Make HostDeviceVector single gpu only (#4773 ) * Make HostDeviceVector single gpu only	2019-08-26 09:51:13 +12:00
Rong Ou	c5b229632d	[BREAKING] prevent multi-gpu usage (#4749 ) * prevent multi-gpu usage * fix distributed test * combine gpu predictor tests * set upper bound on n_gpus	2019-08-13 09:11:35 +12:00
Rong Ou	6edddd7966	Refactor DMatrix to return batches of different page types (#4686 ) * Use explicit template parameter for specifying page type.	2019-08-03 15:10:34 -04:00
Jiaming Yuan	f0064c07ab	Refactor configuration [Part II]. (#4577 ) * Refactor configuration [Part II]. * General changes: Remove `Init` methods to avoid ambiguity. Remove `Configure(std::map<>)` to avoid redundant copying and prepare for parameter validation. (`std::vector` is returned from `InitAllowUnknown`). ** Add name to tree updaters for easier debugging. * Learner changes: Make `LearnerImpl` the only source of configuration. All configurations are stored and carried out by `LearnerImpl::Configure()`. Remove booster in C API. Originally kept for "compatibility reason", but did not state why. So here we just remove it. Add a `metric_names_` field in `LearnerImpl`. Remove `LazyInit`. Configuration will always be lazy. ** Run `Configure` before every iteration. * Predictor changes: Allocate both cpu and gpu predictor. Remove cpu_predictor from gpu_predictor. `GBTree` is now used to dispatch the predictor. ** Remove some GPU Predictor tests. * IO No IO changes. The binary model format stability is tested by comparing hashing value of save models between two commits	2019-07-20 08:34:56 -04:00
Jiaming Yuan	d9a47794a5	Fix CPU hist init for sparse dataset. (#4625 ) * Fix CPU hist init for sparse dataset. * Implement sparse histogram cut. * Allow empty features. * Fix windows build, don't use sparse in distributed environment. * Comments. * Smaller threshold. * Fix windows omp. * Fix msvc lambda capture. * Fix MSVC macro. * Fix MSVC initialization list. * Fix MSVC initialization list x2. * Preserve categorical feature behavior. * Rename matrix to sparse cuts. * Reuse UseGroup. * Check for categorical data when adding cut. Co-Authored-By: Philip Hyunsu Cho <chohyu01@cs.washington.edu> * Sanity check. * Fix comments. * Fix comment.	2019-07-04 16:27:03 -07:00
Rong Ou	6125521caf	fix compiler warning (#4588 )	2019-06-21 04:06:26 +08:00
Rory Mitchell	221e163185	Refactor out row partitioning logic from gpu_hist, introduce caching device vectors (#4554 )	2019-06-20 18:24:09 +12:00
Jiaming Yuan	ae05948e32	Feature interaction for GPU Hist. (#4534 ) * GPU hist Interaction Constraints. * Duplicate related parameters. * Add tests for CPU interaction constraint. * Add better error reporting. * Thorough tests.	2019-06-19 18:11:02 +08:00
sriramch	6757654337	Optimizations for quantisation on device (#4572 ) * - do not create device vectors for the entire sparse page while computing histograms... - while creating the compressed histogram indices, the row vector is created for the entire sparse page batch. this is needless as we only process chunks at a time based on a slice of the total gpu memory - this pr will allocate only as much as required to store the ppropriate row indices and the entries * - do not dereference row_ptrs once the device_vector has been created to elide host copies of those counts - instead, grab the entry counts directly from the sparsepage	2019-06-19 10:50:25 +12:00
sriramch	a2042b685a	- training with external memory - part 2 of 2 (#4526 ) * - training with external memory - part 2 of 2 - when external memory support is enabled, building of histogram indices are done incrementally for every sparse page - the entire set of input data is divided across multiple gpu's and the relative row positions within each device is tracked when building the compressed histogram buffer - this was tested using a mortgage dataset containing ~ 670m rows before 4xt4's could be saturated	2019-06-12 09:52:56 +12:00

1 2

81 Commits