xgboost

Author	SHA1	Message	Date
Jiaming Yuan	bcc0277338	Re-implement ROC-AUC. (#6747 ) * Re-implement ROC-AUC. * Binary * MultiClass * LTR * Add documents. This PR resolves a few issues: - Define a value when the dataset is invalid, which can happen if there's an empty dataset, or when the dataset contains only positive or negative values. - Define ROC-AUC for multi-class classification. - Define weighted average value for distributed setting. - A correct implementation for learning to rank task. Previous implementation is just binary classification with averaging across groups, which doesn't measure ordered learning to rank.	2021-03-20 16:52:40 +08:00
Jiaming Yuan	b180223d18	Cleanup RABIT. (#6290 ) * Remove recovery and MPI speed tests. * Remove readme. * Remove Python binding. * Add checks in C API.	2020-10-27 08:48:22 +08:00
Jiaming Yuan	4d99c58a5f	Feature weights (#5962 )	2020-08-18 19:55:41 +08:00
Philip Hyunsu Cho	1d22a9be1c	Revert "Reorder includes. (#5749 )" (#5771 ) This reverts commit d3a0efbf162f3dceaaf684109e1178c150b32de3.	2020-06-09 10:29:28 -07:00
Jiaming Yuan	d3a0efbf16	Reorder includes. (#5749 ) * Reorder includes. * R.	2020-06-03 17:30:47 +12:00
Jiaming Yuan	c90457f489	Refactor the CLI. (#5574 ) * Enable parameter validation. * Enable JSON. * Catch `dmlc::Error`. * Show help message.	2020-04-26 10:56:33 +08:00
Jiaming Yuan	97abcc7ee2	Extract interaction constraint from split evaluator. (#5034 ) * Extract interaction constraints from split evaluator. The reason for doing so is mostly for model IO, where num_feature and interaction_constraints are copied in split evaluator. Also interaction constraint by itself is a feature selector, acting like column sampler and it's inefficient to bury it deep in the evaluator chain. Lastly removing one another copied parameter is a win. * Enable inc for approx tree method. As now the implementation is spited up from evaluator class, it's also enabled for approx method. * Removing obsoleted code in colmaker. They are never documented nor actually used in real world. Also there isn't a single test for those code blocks. * Unifying the types used for row and column. As the size of input dataset is marching to billion, incorrect use of int is subject to overflow, also singed integer overflow is undefined behaviour. This PR starts the procedure for unifying used index type to unsigned integers. There's optimization that can utilize this undefined behaviour, but after some testings I don't see the optimization is beneficial to XGBoost.	2019-11-14 20:11:41 +08:00
Jiaming Yuan	095de3bf5f	Export c++ headers in CMake installation. (#4897 ) * Move get transpose into cc. * Clean up headers in host device vector, remove thrust dependency. * Move span and host device vector into public. * Install c++ headers. * Short notes for c and c++. Co-Authored-By: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2019-10-06 23:53:09 -04:00
Chen Qin	512f037e55	[rabit_bootstrap_cache ] failed xgb worker recover from other workers (#4808 ) * Better recovery support. Restarting only the failed workers.	2019-09-16 23:31:52 -04:00
Rory Mitchell	5e582b0fa7	Combine thread launches into single launch per tree for gpu_hist (#4343 ) * Combine thread launches into single launch per tree for gpu_hist algorithm. * Address deprecation warning * Add manual column sampler constructor * Turn off omp dynamic to get a guaranteed number of threads * Enable openmp in cuda code	2019-04-29 09:58:34 +12:00
Rory Mitchell	00465d243d	Optimisations for gpu_hist. (#4248 ) * Optimisations for gpu_hist. * Use streams to overlap operations. * ColumnSampler now uses HostDeviceVector to prevent repeatedly copying feature vectors to the device.	2019-03-20 13:30:06 +13:00
Andy Adinets	b833b642ec	Improved multi-node multi-GPU random forests. (#4238 ) * Improved multi-node multi-GPU random forests. - removed rabit::Broadcast() from each invocation of column sampling - instead, syncing the PRNG seed when a ColumnSampler() object is constructed - this makes non-trivial column sampling significantly faster in the distributed case - refactored distributed GPU tests - added distributed random forests tests	2019-03-13 12:36:28 +13:00
Jiaming Yuan	2e618af743	Fix cpplint. (#4157 ) * Add comment after #endif. * Add missing headers.	2019-02-18 00:16:29 +08:00
Nan Zhu	ae3bb9c2d5	Distributed Fast Histogram Algorithm (#4011 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix scalastyle error * fix scalastyle error * init * allow hist algo * more changes * temp * update * remove hist sync * udpate rabit * change hist size * change the histogram * update kfactor * sync per node stats * temp * update * final * code clean * update rabit * more cleanup * fix errors * fix failed tests * enforce c++11 * fix lint issue * broadcast subsampled feature correctly * revert some changes * fix lint issue * enable monotone and interaction constraints * don't specify default for monotone and interactions * update docs	2019-02-05 05:12:53 -08:00
Andy Adinets	42bf90eb8f	Column sampling at individual nodes (splits). (#3971 ) * Column sampling at individual nodes (splits). * Documented colsample_bynode parameter. - also updated documentation for colsample_by* parameters * Updated documentation. * GetFeatureSet() returns shared pointer to std::vector. * Sync sampled columns across multiple processes.	2018-12-14 22:37:35 +08:00
Philip Hyunsu Cho	b38c636d05	Fix #3523 : Fix CustomGlobalRandomEngine for R (#3781 ) Symptom Apple Clang's implementation of `std::shuffle` expects doesn't work correctly when it is run with the random bit generator for R package: ```cpp CustomGlobalRandomEngine::result_type CustomGlobalRandomEngine::operator()() { return static_cast<result_type>( std::floor(unif_rand() * CustomGlobalRandomEngine::max())); } ``` Minimial reproduction of failure (compile using Apple Clang 10.0): ```cpp std::vector<int> feature_set(100); std::iota(feature_set.begin(), feature_set.end(), 0); // initialize with 0, 1, 2, 3, ..., 99 std::shuffle(feature_set.begin(), feature_set.end(), common::GlobalRandom()); // This returns 0, 1, 2, ..., 99, so content didn't get shuffled at all!!! ``` Note that this bug is platform-dependent; it does not appear when GCC or upstream LLVM Clang is used. Diagnosis Apple Clang's `std::shuffle` expects 32-bit integer inputs, whereas `CustomGlobalRandomEngine::operator()` produces 64-bit integers. Fix Have `CustomGlobalRandomEngine::operator()` produce 32-bit integers. Closes #3523.	2018-10-15 09:39:13 -07:00
Rory Mitchell	686e990ffc	GPU memory usage fixes + column sampling refactor (#3635 ) * Remove thrust copy calls * Fix histogram memory usage * Cap extreme histogram memory usage * More efficient column sampling * Use column sampler across updaters * More efficient split evaluation on GPU with column sampling	2018-08-27 16:26:46 +12:00
Rory Mitchell	ccf80703ef	Clang-tidy static analysis (#3222 ) * Clang-tidy static analysis * Modernise checks * Google coding standard checks * Identifier renaming according to Google style	2018-04-19 18:57:13 +12:00
tqchen	e80d3db64b	[DIST] Enable multiple thread and tracker, make rabit and xgboost more thread-safe by using thread local variables.	2016-03-03 20:36:14 -08:00
tqchen	1495a43cea	[R] make all customizations to meet strict standard of cran	2016-01-16 10:25:12 -08:00
tqchen	e4567bbc47	[REFACTOR] Add alias, allow missing variables, init gbm interface	2016-01-16 10:24:01 -08:00
tqchen	20043f63a6	[TREE] Move colmaker	2016-01-16 10:24:01 -08:00
tqchen	dedd87662b	[OBJ] Add basic objective function and registry	2016-01-16 10:24:01 -08:00

23 Commits