xgboost

Author	SHA1	Message	Date
Jiaming Yuan	d062a9e009	Define pair generation strategies for LTR. (#8984 )	2023-03-30 12:00:35 +08:00
Jiaming Yuan	5891f752c8	Rework the MAP metric. (#8931 ) - The new implementation is more strict as only binary labels are accepted. The previous implementation converts values greater than 1 to 1. - Deterministic GPU. (no atomic add). - Fix top-k handling. - Precise definition of MAP. (There are other variants on how to handle top-k). - Refactor GPU ranking tests.	2023-03-22 17:45:20 +08:00
Jiaming Yuan	72e8331eab	Reimplement the NDCG metric. (#8906 ) - Add support for non-exp gain. - Cache the DMatrix object to avoid re-calculating the IDCG. - Make GPU implementation deterministic. (no atomic add)	2023-03-15 03:26:17 +08:00
Jiaming Yuan	46dfcc7d22	Define a new ranking parameter. (#8887 )	2023-03-09 17:46:24 +08:00
Jiaming Yuan	282b1729da	Specify the number of threads for parallel sort. (#8735 ) * Specify the number of threads for parallel sort. - Pass context object into argsort. - Replace macros with inline functions.	2023-02-16 00:20:19 +08:00
Jiaming Yuan	81b2ee1153	Pass DMatrix into metric for caching. (#8790 )	2023-02-13 22:15:05 +08:00
Jiaming Yuan	5f76edd296	Extract make metric name from ranking metric. (#8768 ) - Extract the metric parsing routine from ranking. - Add a test. - Accept null for string view.	2023-02-09 18:30:21 +08:00
Jiaming Yuan	9f598efc3e	Rename context in Metric. (#8686 )	2023-01-17 01:10:13 +08:00
Jiaming Yuan	3e26107a9c	Rename and extract `Context`. (#8528 ) * Rename `GenericParameter` to `Context`. * Rename header file to reflect the change. * Rename all references.	2022-12-07 04:58:54 +08:00
Rong Ou	668b8a0ea4	[Breaking] Switch from rabit to the collective communicator (#8257 ) * Switch from rabit to the collective communicator * fix size_t specialization * really fix size_t * try again * add include * more include * fix lint errors * remove rabit includes * fix pylint error * return dict from communicator context * fix communicator shutdown * fix dask test * reset communicator mocklist * fix distributed tests * do not save device communicator * fix jvm gpu tests * add python test for federated communicator * Update gputreeshap submodule Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>	2022-10-05 14:39:01 -08:00
Jiaming Yuan	1a33b50a0d	Fix compiler warnings. (#7974 ) - Remove unused parameters. There are still many warnings that are not yet addressed. Currently, the warnings in dmlc-core dominate the error log. - Remove `distributed` parameter from metric. - Fixes some warnings about signed comparison.	2022-06-06 22:56:25 +08:00
Jiaming Yuan	81210420c6	Remove `omp_get_max_threads` (#7608 ) This is the one last PR for removing omp global variable. * Add context object to the `DMatrix`. This bridges `DMatrix` with https://github.com/dmlc/xgboost/issues/7308 . * Require context to be available at the construction time of booster. * Add `n_threads` support for R csc DMatrix constructor. * Remove `omp_get_max_threads` in R glue code. * Remove threading utilities that rely on omp global variable.	2022-01-28 16:09:22 +08:00
Jiaming Yuan	5b1161bb64	Convert labels into tensor. (#7456 ) * Add a new ctor to tensor for `initilizer_list`. * Change labels from host device vector to tensor. * Rename the field from `labels_` to `labels` since it's a public member.	2021-12-17 00:58:35 +08:00
Jiaming Yuan	0f7a9b42f1	Use double precision in metric calculation. (#7364 )	2021-11-02 12:00:32 +08:00
Jiaming Yuan	d4349426d8	Re-implement PR-AUC. (#7297 ) * Support binary/multi-class classification, ranking. * Add documents. * Handle missing data.	2021-10-26 13:07:50 +08:00
Jiaming Yuan	fd61c61071	Avoid omp reduction in rank metric. (#7349 )	2021-10-22 14:13:34 +08:00
Andrew Ziem	3e7e426b36	Fix spelling in documents (#6948 ) * Update roxygen2 doc. Co-authored-by: fis <jm.yuan@outlook.com>	2021-05-11 20:44:36 +08:00
Jiaming Yuan	bcc0277338	Re-implement ROC-AUC. (#6747 ) * Re-implement ROC-AUC. * Binary * MultiClass * LTR * Add documents. This PR resolves a few issues: - Define a value when the dataset is invalid, which can happen if there's an empty dataset, or when the dataset contains only positive or negative values. - Define ROC-AUC for multi-class classification. - Define weighted average value for distributed setting. - A correct implementation for learning to rank task. Previous implementation is just binary classification with averaging across groups, which doesn't measure ordered learning to rank.	2021-03-20 16:52:40 +08:00
Louis Desreumaux	9b530e5697	Improve OpenMP exception handling (#6680 )	2021-02-25 13:56:16 +08:00
Igor Moura	5e1e972aea	Clean up warnings (#6325 )	2020-10-30 23:50:29 +08:00
Philip Hyunsu Cho	1d22a9be1c	Revert "Reorder includes. (#5749 )" (#5771 ) This reverts commit d3a0efbf162f3dceaaf684109e1178c150b32de3.	2020-06-09 10:29:28 -07:00
Jiaming Yuan	d3a0efbf16	Reorder includes. (#5749 ) * Reorder includes. * R.	2020-06-03 17:30:47 +12:00
Jiaming Yuan	ccd30e4491	Fix non-openmp build. (#5566 ) * Add test to Jenkins. * Fix threading utils tests. * Require thread library.	2020-04-20 12:16:38 +08:00
sriramch	d2231fc840	Ranking metric acceleration on the gpu (#5398 )	2020-03-22 19:38:48 +13:00
sriramch	1ba6706167	- create a gpu metrics (internal) registry (#5387 ) * - create a gpu metrics (internal) registry - the objective is to separate the cpu and gpu implementations such that they evolve indepedently. to that end, this approach will: - preserve the same metrics configuration (from the end user perspective) - internally delegate the responsibility to the gpu metrics builder when there is a valid device present - decouple the gpu metrics builder from the cpu ones to prevent misuse - move away from including the cuda file from within the cc file and segregate the code via ifdef's	2020-03-07 15:31:35 +13:00
sriramch	5dc8e894c9	Fixes and changes to the ranking metrics computed on cpu (#5380 ) * - fixes and changes to the ranking metrics computed on cpu - auc/aucpr ranking metric accelerated on cpu - fixes to the auc/aucpr metrics	2020-03-03 15:56:36 +13:00
sriramch	2abe69d774	- ndcg ltr implementation on gpu (#5004 ) * - ndcg ltr implementation on gpu - this is a follow-up to the pairwise ltr implementation	2019-11-13 11:21:04 +13:00
Jiaming Yuan	095de3bf5f	Export c++ headers in CMake installation. (#4897 ) * Move get transpose into cc. * Clean up headers in host device vector, remove thrust dependency. * Move span and host device vector into public. * Install c++ headers. * Short notes for c and c++. Co-Authored-By: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2019-10-06 23:53:09 -04:00
TinkleG	2aed0ae230	Fix auc error in distributed mode (#4798 ) Need more work for a complete fix. See #4663 .	2019-09-01 02:54:14 -04:00
Xu Xiao	cd1526d3b1	fix auc error in distributed mode caused by unbalanced dataset (#4645 )	2019-07-08 16:01:52 +08:00
Philip Hyunsu Cho	96bf91725b	Support ndcg- and map- (#4635 )	2019-07-03 22:51:48 -07:00
Xin Yin	8d1098a983	In AUC and AUCPR metrics, detect whether weights are per-instance or per-group (#4216 ) * In AUC and AUCPR metrics, detect whether weights are per-instance or per-group * Fix C++ style check * Add a test for weighted AUC	2019-05-04 00:53:04 -07:00
Philip Hyunsu Cho	9252b686ae	Make AUCPR work with multiple query groups (#4436 ) * Make AUCPR work with multiple query groups * Check AUCPR <= 1.0 in distributed setting	2019-05-03 10:34:44 -07:00
Jiaming Yuan	48dddfd635	Porting elementwise metrics to GPU. (#3952 ) * Port elementwise metrics to GPU. * All elementwise metrics are converted to static polymorphic. * Create a reducer for metrics reduction. * Remove const of Metric::Eval to accommodate CubMemory.	2018-12-01 18:46:45 +13:00
Jiaming Yuan	19ee0a3579	Refactor fast-hist, add tests for some updaters. (#3836 ) Add unittest for prune. Add unittest for refresh. Refactor fast_hist. * Remove fast_hist_param. * Rename to quantile_hist. Add unittests for QuantileHist. * Refactor QuantileHist into .h and .cc file. * Remove sync.h. * Remove MGPU_mock test. Rename fast hist method to quantile hist.	2018-11-07 21:15:07 +13:00
Andy Adinets	72cd1517d6	Replaced std::vector with HostDeviceVector in MetaInfo and SparsePage. (#3446 ) * Replaced std::vector with HostDeviceVector in MetaInfo and SparsePage. - added distributions to HostDeviceVector - using HostDeviceVector for labels, weights and base margings in MetaInfo - using HostDeviceVector for offset and data in SparsePage - other necessary refactoring * Added const version of HostDeviceVector API calls. - const versions added to calls that can trigger data transfers, e.g. DevicePointer() - updated the code that uses HostDeviceVector - objective functions now accept const HostDeviceVector<bst_float>& for predictions * Updated src/linear/updater_gpu_coordinate.cu. * Added read-only state for HostDeviceVector sync. - this means no copies are performed if both host and devices access the HostDeviceVector read-only * Fixed linter and test errors. - updated the lz4 plugin - added ConstDeviceSpan to HostDeviceVector - using device % dh::NVisibleDevices() for the physical device number, e.g. in calls to cudaSetDevice() * Fixed explicit template instantiation errors for HostDeviceVector. - replaced HostDeviceVector<unsigned int> with HostDeviceVector<int> * Fixed HostDeviceVector tests that require multiple GPUs. - added a mock set device handler; when set, it is called instead of cudaSetDevice()	2018-08-30 14:28:47 +12:00
Rory Mitchell	ccf80703ef	Clang-tidy static analysis (#3222 ) * Clang-tidy static analysis * Modernise checks * Google coding standard checks * Identifier renaming according to Google style	2018-04-19 18:57:13 +12:00
Arjan van der Velde	04221a7469	rank_metric: add AUC-PR (#3172 ) * rank_metric: add AUC-PR Implementation of the AUC-PR calculation for weighted data, proposed by Keilwagen, Grosse and Grau (https://doi.org/10.1371/journal.pone.0092209) * rank_metric: fix lint warnings * Implement tests for AUC-PR and fix implementation * add aucpr to documentation for other languages	2018-03-23 10:43:47 -04:00
Scott Lundberg	d878c36c84	Add SHAP interaction effects, fix minor bug, and add cox loss (#3043 ) * Add interaction effects and cox loss * Minimize whitespace changes * Cox loss now no longer needs a pre-sorted dataset. * Address code review comments * Remove mem check, rename to pred_interactions, include bias * Make lint happy * More lint fixes * Fix cox loss indexing * Fix main effects and tests * Fix lint * Use half interaction values on the off-diagonals * Fix lint again	2018-02-07 20:38:01 -06:00
Philip Cho	14fba01b5a	Improve multi-threaded performance (#2104 ) * Add UpdatePredictionCache() option to updaters Some updaters (e.g. fast_hist) has enough information to quickly compute prediction cache for the training data. Each updater may override UpdaterPredictionCache() method to update the prediction cache. Note: this trick does not apply to validation data. * Respond to code review * Disable some debug messages by default * Document UpdatePredictionCache() interface * Remove base_margin logic from UpdatePredictionCache() implementation * Do not take pointer to cfg, as reference may get stale * Improve multi-threaded performance * Use columnwise accessor to accelerate ApplySplit() step, with support for a compressed representation * Parallel sort for evaluation step * Inline BuildHist() function * Cache gradient pairs when building histograms in BuildHist() * Add missing #if macro * Respond to code review * Use wrapper to enable parallel sort on Linux * Fix C++ compatibility issues * MSVC doesn't support unsigned in OpenMP loops * gcc 4.6 doesn't support using keyword * Fix lint issues * Respond to code review * Fix bug in ApplySplitSparseData() * Attempting to read beyond the end of a sparse column * Mishandling the case where an entire range of rows have missing values * Fix training continuation bug Disable UpdatePredictionCache() in the first iteration. This way, we can accomodate the scenario where we build off of an existing (nonempty) ensemble. * Add regression test for fast_hist * Respond to code review * Add back old version of ApplySplitSparseData	2017-03-25 10:35:01 -07:00
Tianqi Chen	d581a3d0e7	[UPDATE] Update rabit and threadlocal (#2114 ) * [UPDATE] Update rabit and threadlocal * minor fix to make build system happy * upgrade requirement to g++4.8 * upgrade dmlc-core * update travis	2017-03-16 18:48:37 -07:00
AbdealiJK	5912e051b1	rank_metric.cc: Use GetWeight in EvalAMS The GetWeight is a wrapper which sets the correct weight if the weights vector is not provided. Hence accessing the default weights vector is not recommended.	2016-12-04 11:25:57 -08:00
AbdealiJK	6f16f0ef58	Use bst_float consistently throughout (#1824 ) * Fix various typos * Add override to functions that are overridden gcc gives warnings about functions that are being overridden by not being marked as oveirridden. This fixes it. * Use bst_float consistently Use bst_float for all the variables that involve weight, leaf value, gradient, hessian, gain, loss_chg, predictions, base_margin, feature values. In some cases, when due to additions and so on the value can take a larger value, double is used. This ensures that type conversions are minimal and reduces loss of precision.	2016-11-30 10:02:10 -08:00
Liam Huang	001d8c4023	correct CalcDCG in rank_metric.cc and rank_obj.cc (#1642 ) * correct CalcDCG in rank_metric.cc DCG use log base-2, however `std::log` returns log base-e. * correct CalcDCG in rank_obj.cc DCG use log base-2, however `std::log` returns log base-e. * use std::log2 instead of std::log make it more elegant * use std::log2 instead of std::log make it more elegant	2016-10-18 10:23:41 -07:00
phoenixbai	915ac0b8fe	the fix of missing value assignment for name_ variable in EvalRankList method (#1558 )	2016-09-26 08:57:17 -05:00
Vadim Khotilovich	75f401481f	no exception throwing within omp parallel; set nthread in Learner (#1421 )	2016-07-29 10:08:03 -07:00
tqchen	d75e3ed05d	[LIBXGBOOST] pass demo running.	2016-01-16 10:24:01 -08:00
tqchen	c8ccb61b9e	[TREE] Enable updater registry	2016-01-16 10:24:01 -08:00
tqchen	b4d0bb5a6d	[METRIC] all metric move finished	2016-01-16 10:24:01 -08:00

49 Commits