xgboost

Author	SHA1	Message	Date
Rong Ou	602484e19f	Remove some unused functions as reported by cppcheck (#4743 )	2019-08-07 02:42:33 -04:00
Rong Ou	6edddd7966	Refactor DMatrix to return batches of different page types (#4686 ) * Use explicit template parameter for specifying page type.	2019-08-03 15:10:34 -04:00
Jiaming Yuan	f0064c07ab	Refactor configuration [Part II]. (#4577 ) * Refactor configuration [Part II]. * General changes: Remove `Init` methods to avoid ambiguity. Remove `Configure(std::map<>)` to avoid redundant copying and prepare for parameter validation. (`std::vector` is returned from `InitAllowUnknown`). ** Add name to tree updaters for easier debugging. * Learner changes: Make `LearnerImpl` the only source of configuration. All configurations are stored and carried out by `LearnerImpl::Configure()`. Remove booster in C API. Originally kept for "compatibility reason", but did not state why. So here we just remove it. Add a `metric_names_` field in `LearnerImpl`. Remove `LazyInit`. Configuration will always be lazy. ** Run `Configure` before every iteration. * Predictor changes: Allocate both cpu and gpu predictor. Remove cpu_predictor from gpu_predictor. `GBTree` is now used to dispatch the predictor. ** Remove some GPU Predictor tests. * IO No IO changes. The binary model format stability is tested by comparing hashing value of save models between two commits	2019-07-20 08:34:56 -04:00
Jiaming Yuan	7b9043cf71	Fix clang-tidy warnings. (#4149 ) * Upgrade gtest for clang-tidy. * Use CMake to install GTest instead of mv. * Don't enforce clang-tidy to return 0 due to errors in thrust. * Add a small test for tidy itself. * Reformat.	2019-03-13 02:25:51 +08:00
Jiaming Yuan	017c97b8ce	Clean up training code. (#3825 ) * Remove GHistRow, GHistEntry, GHistIndexRow. * Remove kSimpleStats. * Remove CheckInfo, SetLeafVec in GradStats and in SKStats. * Clean up the GradStats. * Cleanup calcgain. * Move LossChangeMissing out of common. * Remove [] operator from GHistIndexBlock.	2019-02-07 14:22:13 +08:00
Nan Zhu	ae3bb9c2d5	Distributed Fast Histogram Algorithm (#4011 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix scalastyle error * fix scalastyle error * init * allow hist algo * more changes * temp * update * remove hist sync * udpate rabit * change hist size * change the histogram * update kfactor * sync per node stats * temp * update * final * code clean * update rabit * more cleanup * fix errors * fix failed tests * enforce c++11 * fix lint issue * broadcast subsampled feature correctly * revert some changes * fix lint issue * enable monotone and interaction constraints * don't specify default for monotone and interactions * update docs	2019-02-05 05:12:53 -08:00
Rory Mitchell	1fc37e4749	Require leaf statistics when expanding tree (#4015 ) * Cache left and right gradient sums * Require leaf statistics when expanding tree	2019-01-17 21:12:20 -08:00
Rory Mitchell	f75a21af25	Reduce tree expand boilerplate code (#4008 )	2018-12-20 15:52:28 +13:00
Rory Mitchell	3d81c48d3f	Remove leaf vector, add tree serialisation test, fix Windows tests (#3989 )	2018-12-13 10:28:38 +13:00
Jiaming Yuan	19ee0a3579	Refactor fast-hist, add tests for some updaters. (#3836 ) Add unittest for prune. Add unittest for refresh. Refactor fast_hist. * Remove fast_hist_param. * Rename to quantile_hist. Add unittests for QuantileHist. * Refactor QuantileHist into .h and .cc file. * Remove sync.h. * Remove MGPU_mock test. Rename fast hist method to quantile hist.	2018-11-07 21:15:07 +13:00
trivialfis	4b892c2b30	Remove obsoleted QuantileHistMaker. (#3761 ) Fix #3755.	2018-10-06 11:22:15 -07:00
Rory Mitchell	70d208d68c	Dmatrix refactor stage 2 (#3395 ) * DMatrix refactor 2 * Remove buffered rowset usage where possible * Transition to c++11 style iterators for row access * Transition column iterators to C++ 11	2018-10-01 01:29:03 +13:00
Nan Zhu	79d854c695	[jvm-packages] fix errors in example (#3719 ) * add back train method but mark as deprecated * fix scalastyle error * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix scalastyle error * instrumentation * use log console * better measurement * fix erros in example * update histmaker	2018-09-22 16:39:38 -07:00
Andy Adinets	72cd1517d6	Replaced std::vector with HostDeviceVector in MetaInfo and SparsePage. (#3446 ) * Replaced std::vector with HostDeviceVector in MetaInfo and SparsePage. - added distributions to HostDeviceVector - using HostDeviceVector for labels, weights and base margings in MetaInfo - using HostDeviceVector for offset and data in SparsePage - other necessary refactoring * Added const version of HostDeviceVector API calls. - const versions added to calls that can trigger data transfers, e.g. DevicePointer() - updated the code that uses HostDeviceVector - objective functions now accept const HostDeviceVector<bst_float>& for predictions * Updated src/linear/updater_gpu_coordinate.cu. * Added read-only state for HostDeviceVector sync. - this means no copies are performed if both host and devices access the HostDeviceVector read-only * Fixed linter and test errors. - updated the lz4 plugin - added ConstDeviceSpan to HostDeviceVector - using device % dh::NVisibleDevices() for the physical device number, e.g. in calls to cudaSetDevice() * Fixed explicit template instantiation errors for HostDeviceVector. - replaced HostDeviceVector<unsigned int> with HostDeviceVector<int> * Fixed HostDeviceVector tests that require multiple GPUs. - added a mock set device handler; when set, it is called instead of cudaSetDevice()	2018-08-30 14:28:47 +12:00
trivialfis	2c502784ff	Span class. (#3548 ) * Add basic Span class based on ISO++20. * Use Span<Entry const> instead of Inst in SparsePage. * Add DeviceSpan in HostDeviceVector, use it in regression obj.	2018-08-14 17:58:11 +12:00
Rory Mitchell	645996b12f	Remove accidental SparsePage copies (#3583 )	2018-08-12 17:49:38 -07:00
Philip Hyunsu Cho	7fefd6865d	Fix #3402 : wrong fid crashes distributed algorithm (#3535 ) * Fix #3402: wrong fid crashes distributed algorithm The bug was introduced by the recent DMatrix refactor (#3301). It was partially fixed by #3408 but the example in #3402 was still failing. The example in #3402 will succeed after this fix is applied. * Explicitly specify "this" to prevent compile error * Add regression test * Add distributed test to Travis matrix * Install kubernetes Python package as dependency of dmlc tracker * Add Python dependencies * Add compile step * Reduce size of regression test case * Further reduce size of test	2018-08-04 19:20:04 -07:00
Rory Mitchell	a725272e19	Correct mistake from dmatrix refactor (#3408 )	2018-07-24 15:03:36 +12:00
Rory Mitchell	a96039141a	Dmatrix refactor stage 1 (#3301 ) * Use sparse page as singular CSR matrix representation * Simplify dmatrix methods * Reduce statefullness of batch iterators * BREAKING CHANGE: Remove prob_buffer_row parameter. Users are instead recommended to sample their dataset as a preprocessing step before using XGBoost.	2018-06-07 10:25:58 +12:00
Rory Mitchell	ccf80703ef	Clang-tidy static analysis (#3222 ) * Clang-tidy static analysis * Modernise checks * Google coding standard checks * Identifier renaming according to Google style	2018-04-19 18:57:13 +12:00
Andrew V. Adinetz	d5992dd881	Replaced std::vector-based interfaces with HostDeviceVector-based interfaces. (#3116 ) * Replaced std::vector-based interfaces with HostDeviceVector-based interfaces. - replacement was performed in the learner, boosters, predictors, updaters, and objective functions - only interfaces used in training were replaced; interfaces like PredictInstance() still use std::vector - refactoring necessary for replacement of interfaces was also performed, such as using HostDeviceVector in prediction cache * HostDeviceVector-based interfaces for custom objective function example plugin.	2018-02-28 13:00:04 +13:00
Rory Mitchell	e6a9063344	Integer gradient summation for GPU histogram algorithm. (#2681 )	2017-09-08 15:07:29 +12:00
Tianqi Chen	d581a3d0e7	[UPDATE] Update rabit and threadlocal (#2114 ) * [UPDATE] Update rabit and threadlocal * minor fix to make build system happy * upgrade requirement to g++4.8 * upgrade dmlc-core * update travis	2017-03-16 18:48:37 -07:00
Tianqi Chen	fd19b7a188	Automatically remove nan from input data when it is sparse. (#2062 ) * [DATALoad] Automatically remove Nan when load from sparse matrix * add log	2017-02-25 08:59:17 -08:00
Simon DENEL	7078c41dad	Changing omp_get_num_threads to omp_get_max_threads (#1831 ) * Updating dmlc-core * Changing omp_get_num_threads to omp_get_max_threads	2016-12-04 11:26:45 -08:00
AbdealiJK	6f16f0ef58	Use bst_float consistently throughout (#1824 ) * Fix various typos * Add override to functions that are overridden gcc gives warnings about functions that are being overridden by not being marked as oveirridden. This fixes it. * Use bst_float consistently Use bst_float for all the variables that involve weight, leaf value, gradient, hessian, gain, loss_chg, predictions, base_margin, feature values. In some cases, when due to additions and so on the value can take a larger value, double is used. This ensures that type conversions are minimal and reduces loss of precision.	2016-11-30 10:02:10 -08:00
tqchen	ecb3a271be	[PYTHON-DIST] Distributed xgboost python training API.	2016-02-29 16:54:13 -08:00
tqchen	413f119c7e	Update dmlc-core	2016-02-10 13:11:21 -08:00
tqchen	63c4ad7617	[APPROX] Make global proposal default, add group ptr solution	2016-02-10 11:19:10 -08:00
tqchen	ce4d59ed69	[TREE] Enable global proposal for faster speed	2016-02-10 11:19:10 -08:00
tqchen	a500fbc9b0	[TREE] switch to two pass	2016-02-10 11:17:17 -08:00
tqchen	523afcbcd2	[TREE] Cleanup some functions, add utility function for two pass	2016-02-10 11:17:17 -08:00
tqchen	52227a8920	[TREE] Refactor histmaker	2016-02-10 11:17:17 -08:00
tqchen	d75e3ed05d	[LIBXGBOOST] pass demo running.	2016-01-16 10:24:01 -08:00
tqchen	e4567bbc47	[REFACTOR] Add alias, allow missing variables, init gbm interface	2016-01-16 10:24:01 -08:00
tqchen	d4677b6561	[TREE] finish move of updater	2016-01-16 10:24:01 -08:00
tqchen	4adc4cf0b9	[TREE] Move the files to target refactor location	2016-01-16 10:24:01 -08:00

37 Commits