31 Commits

Author SHA1 Message Date
Igor Rukhovich
19a2c54265
Prediction by indices (subsample < 1) (#6683)
* Another implementation of predicting by indices

* Fixed omp parallel_for variable type

* Removed SparsePageView from Updater
2021-03-16 15:08:20 +13:00
ShvetsKS
7f4d3a91b9
Multiclass prediction caching for CPU Hist (#6550)
Co-authored-by: Kirill Shvets <kirill.shvets@intel.com>
2021-01-13 04:42:07 +08:00
Sergio Gavilán
b181a88f9f
Reduced some C++ compiler warnings (#6197)
* Removed some warnings

* Rebase with master

* Solved C++ Google Tests errors made by refactoring in order to remove warnings

* Undo renaming path -> path_

* Fix style check

Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-10-29 12:36:00 -07:00
Philip Hyunsu Cho
9e955fb9b0
[R] Check warnings explicitly for model compatibility tests (#6114)
* [R] Check warnings explicitly for model compatibility tests

* Address reviewer's feedback
2020-09-15 10:49:48 -07:00
Jiaming Yuan
2fcc4f2886
Unify evaluation functions. (#6037) 2020-08-26 14:23:27 +08:00
Philip Hyunsu Cho
ace7fd328b
[R] Add a compatibility layer to load Booster object from an old RDS file (#5940)
* [R] Add a compatibility layer to load Booster from an old RDS
* Modify QuantileHistMaker::LoadConfig() to be backward compatible with 1.1.x
* Add a big warning about compatibility in QuantileHistMaker::LoadConfig()
* Add testing suite
* Discourage use of saveRDS() in CRAN doc
2020-07-26 00:06:49 -07:00
Philip Hyunsu Cho
1d22a9be1c
Revert "Reorder includes. (#5749)" (#5771)
This reverts commit d3a0efbf162f3dceaaf684109e1178c150b32de3.
2020-06-09 10:29:28 -07:00
Jiaming Yuan
d3a0efbf16
Reorder includes. (#5749)
* Reorder includes.

* R.
2020-06-03 17:30:47 +12:00
ShvetsKS
cd3d14ad0e
Add float32 histogram (#5624)
* new single_precision_histogram param was added.

Co-authored-by: SHVETS, KIRILL <kirill.shvets@intel.com>
Co-authored-by: fis <jm.yuan@outlook.com>
2020-06-03 11:24:53 +08:00
ShvetsKS
dd01e4ba8d
Distributed optimizations for 'hist' method with CPUs (#5557)
Co-authored-by: SHVETS, KIRILL <kirill.shvets@intel.com>
2020-05-20 06:03:03 +03:00
ShvetsKS
a2d86b8e4b
Optimizations for RNG in InitData kernel (#5522)
* optimizations for subsampling in InitData

* optimizations for subsampling in InitData

Co-authored-by: SHVETS, KIRILL <kirill.shvets@intel.com>
2020-04-16 18:24:32 +03:00
Jiaming Yuan
0012f2ef93
Upgrade clang-tidy on CI. (#5469)
* Correct all clang-tidy errors.
* Upgrade clang-tidy to 10 on CI.

Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-04-05 04:42:29 +08:00
ShvetsKS
27a8e36fc3
Reducing memory consumption for 'hist' method on CPU (#5334) 2020-03-28 14:45:52 +13:00
Egor Smirnov
1b97eaf7a7
Optimized ApplySplit, BuildHist and UpdatePredictCache functions on CPU (#5244)
* Split up sparse and dense build hist kernels.
* Add `PartitionBuilder`.
2020-02-29 16:11:42 +08:00
Egor Smirnov
c67163250e
Optimized BuildHist function (#5156) 2020-01-29 23:32:57 -08:00
Egor Smirnov
7b17e76c5b Optimized EvaluateSplut function (#5138)
* Add block based threading utilities.
2019-12-31 18:18:42 +08:00
Jiaming Yuan
04db125699
Quick fix for memory leak in CPU Hist. (#5153)
Closes https://github.com/dmlc/xgboost/issues/3579 .

* Don't use map.
2019-12-31 14:05:53 +08:00
Jiaming Yuan
7ef5b78003
Implement JSON IO for updaters (#5094)
* Implement JSON IO for updaters.

* Remove parameters in split evaluator.
2019-12-07 00:24:00 +08:00
Jiaming Yuan
97abcc7ee2
Extract interaction constraint from split evaluator. (#5034)
*  Extract interaction constraints from split evaluator.

The reason for doing so is mostly for model IO, where num_feature and interaction_constraints are copied in split evaluator. Also interaction constraint by itself is a feature selector, acting like column sampler and it's inefficient to bury it deep in the evaluator chain. Lastly removing one another copied parameter is a win.

*  Enable inc for approx tree method.

As now the implementation is spited up from evaluator class, it's also enabled for approx method.

*  Removing obsoleted code in colmaker.

They are never documented nor actually used in real world. Also there isn't a single test for those code blocks.

*  Unifying the types used for row and column.

As the size of input dataset is marching to billion, incorrect use of int is subject to overflow, also singed integer overflow is undefined behaviour. This PR starts the procedure for unifying used index type to unsigned integers. There's optimization that can utilize this undefined behaviour, but after some testings I don't see the optimization is beneficial to XGBoost.
2019-11-14 20:11:41 +08:00
Philip Hyunsu Cho
f4e7b707c9
Revert #4529 (#5008)
* Revert " Optimize ‘hist’ for multi-core CPU (#4529)"

This reverts commit 4d6590be3c9a043d44d9e4fe0a456a9f8179ec72.

* Fix build
2019-11-12 09:35:03 -08:00
Jiaming Yuan
f0064c07ab
Refactor configuration [Part II]. (#4577)
* Refactor configuration [Part II].

* General changes:
** Remove `Init` methods to avoid ambiguity.
** Remove `Configure(std::map<>)` to avoid redundant copying and prepare for
   parameter validation. (`std::vector` is returned from `InitAllowUnknown`).
** Add name to tree updaters for easier debugging.

* Learner changes:
** Make `LearnerImpl` the only source of configuration.

    All configurations are stored and carried out by `LearnerImpl::Configure()`.

** Remove booster in C API.

    Originally kept for "compatibility reason", but did not state why.  So here
    we just remove it.

** Add a `metric_names_` field in `LearnerImpl`.
** Remove `LazyInit`.  Configuration will always be lazy.
** Run `Configure` before every iteration.

* Predictor changes:
** Allocate both cpu and gpu predictor.
** Remove cpu_predictor from gpu_predictor.

    `GBTree` is now used to dispatch the predictor.

** Remove some GPU Predictor tests.

* IO

No IO changes.  The binary model format stability is tested by comparing
hashing value of save models between two commits
2019-07-20 08:34:56 -04:00
Jiaming Yuan
d9a47794a5 Fix CPU hist init for sparse dataset. (#4625)
* Fix CPU hist init for sparse dataset.

* Implement sparse histogram cut.
* Allow empty features.

* Fix windows build, don't use sparse in distributed environment.

* Comments.

* Smaller threshold.

* Fix windows omp.

* Fix msvc lambda capture.

* Fix MSVC macro.

* Fix MSVC initialization list.

* Fix MSVC initialization list x2.

* Preserve categorical feature behavior.

* Rename matrix to sparse cuts.
* Reuse UseGroup.
* Check for categorical data when adding cut.

Co-Authored-By: Philip Hyunsu Cho <chohyu01@cs.washington.edu>

* Sanity check.

* Fix comments.

* Fix comment.
2019-07-04 16:27:03 -07:00
Egor Smirnov
4d6590be3c Optimize ‘hist’ for multi-core CPU (#4529)
* Initial performance optimizations for xgboost

* remove includes

* revert float->double

* fix for CI

* fix for CI

* fix for CI

* fix for CI

* fix for CI

* fix for CI

* fix for CI

* fix for CI

* fix for CI

* fix for CI

* Check existence of _mm_prefetch and __builtin_prefetch

* Fix lint

* optimizations for CPU

* appling comments in review

* add some comments, code refactoring

* fixing issues in CI

* adding runtime checks

* remove 1 extra check

* remove extra checks in BuildHist

* remove checks

* add debug info

* added debug info

* revert changes

* added comments

* Apply suggestions from code review

Co-Authored-By: Philip Hyunsu Cho <chohyu01@cs.washington.edu>

* apply review comments

* Remove unused function CreateNewNodes()

* Add descriptive comment on node_idx variable in QuantileHistMaker::Builder::BuildHistsBatch()
2019-06-27 11:33:49 -07:00
Egor Smirnov
711397d645 Optimizations of pre-processing for 'hist' tree method (#4310)
* oprimizations for pre-processing

* code cleaning

* code cleaning

* code cleaning after review

* Apply suggestions from code review

Co-Authored-By: SmirnovEgorRu <egor.smirnov@intel.com>
2019-04-16 17:36:19 -07:00
Jiaming Yuan
09bd9e68cf
Use Monitor in quantile hist. (#4273) 2019-03-20 09:26:22 +08:00
Nan Zhu
1dac5e2410
more correct way to build node stats in distributed fast hist (#4140)
* add back train method but mark as deprecated

* add back train method but mark as deprecated

* add back train method but mark as deprecated

* fix scalastyle error

* fix scalastyle error

* fix scalastyle error

* fix scalastyle error

* more changes

* temp

* update

* udpate rabit

* change the histogram

* update kfactor

* sync per node stats

* temp

* update

* final

* code clean

* update rabit

* more cleanup

* fix errors

* fix failed tests

* enforce c++11

* broadcast subsampled feature correctly

* init col

* temp

* col sampling

* fix histmastrix init

* fix col sampling

* remove cout

* fix out of bound access

* fix core dump

remove core dump file

* update

* add fid

* update

* revert some changes

* temp

* temp

* pass all tests

* bring back some tests

* recover some changes

* fix lint issue

* enable monotone and interaction constraints

* don't specify default for monotone and interactions

* recover column init part

* more recovery

* fix core dumps

* code clean

* revert some changes

* fix test compilation issue

* fix lint issue

* resolve compilation issue

* fix issues of lint caused by rebase

* fix stylistic changes and change variable names

* modularize depth width

* address the comments

* fix failed tests

* wrap perf timers with class

* temp

* pass all lossguide

* pass tests

* add comments

* more changes

* use separate flow for single and tests

* add test for lossguide hist

* remove duplications

* syncing stats for only once

* recover more changes

* recover more changes

* fix root-stats

* simplify code

* remove outdated comments
2019-02-18 13:45:30 -08:00
Jiaming Yuan
2e618af743
Fix cpplint. (#4157)
* Add comment after #endif.
* Add missing headers.
2019-02-18 00:16:29 +08:00
Nan Zhu
c18a3660fa
Separate Depthwidth and Lossguide growing policy in fast histogram (#4102)
* add back train method but mark as deprecated

* add back train method but mark as deprecated

* add back train method but mark as deprecated

* fix scalastyle error

* fix scalastyle error

* fix scalastyle error

* fix scalastyle error

* init

* more changes

* temp

* update

* udpate rabit

* change the histogram

* update kfactor

* sync per node stats

* temp

* update

* final

* code clean

* update rabit

* more cleanup

* fix errors

* fix failed tests

* enforce c++11

* broadcast subsampled feature correctly

* init col

* temp

* col sampling

* fix histmastrix init

* fix col sampling

* remove cout

* fix out of bound access

* fix core dump

remove core dump file

* disbale test temporarily

* update

* add fid

* print perf data

* update

* revert some changes

* temp

* temp

* pass all tests

* bring back some tests

* recover some changes

* fix lint issue

* enable monotone and interaction constraints

* don't specify default for monotone and interactions

* recover column init part

* more recovery

* fix core dumps

* code clean

* revert some changes

* fix test compilation issue

* fix lint issue

* resolve compilation issue

* fix issues of lint caused by rebase

* fix stylistic changes and change variable names

* use regtree internal function

* modularize depth width

* address the comments

* fix failed tests

* wrap perf timers with class

* fix lint

* fix num_leaves count

* fix indention

* Update src/tree/updater_quantile_hist.cc

Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com>

* Update src/tree/updater_quantile_hist.h

Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com>

* Update src/tree/updater_quantile_hist.cc

Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com>

* Update src/tree/updater_quantile_hist.cc

Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com>

* Update src/tree/updater_quantile_hist.cc

Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com>

* Update src/tree/updater_quantile_hist.h

Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com>

* merge

* fix compilation
2019-02-13 12:56:19 -08:00
Jiaming Yuan
017c97b8ce
Clean up training code. (#3825)
* Remove GHistRow, GHistEntry, GHistIndexRow.
* Remove kSimpleStats.
* Remove CheckInfo, SetLeafVec in GradStats and in SKStats.
* Clean up the GradStats.
* Cleanup calcgain.
* Move LossChangeMissing out of common.
* Remove [] operator from GHistIndexBlock.
2019-02-07 14:22:13 +08:00
Nan Zhu
ae3bb9c2d5
Distributed Fast Histogram Algorithm (#4011)
* add back train method but mark as deprecated

* add back train method but mark as deprecated

* add back train method but mark as deprecated

* fix scalastyle error

* fix scalastyle error

* fix scalastyle error

* fix scalastyle error

* init

* allow hist algo

* more changes

* temp

* update

* remove hist sync

* udpate rabit

* change hist size

* change the histogram

* update kfactor

* sync per node stats

* temp

* update

* final

* code clean

* update rabit

* more cleanup

* fix errors

* fix failed tests

* enforce c++11

* fix lint issue

* broadcast subsampled feature correctly

* revert some changes

* fix lint issue

* enable monotone and interaction constraints

* don't specify default for monotone and interactions

* update docs
2019-02-05 05:12:53 -08:00
Jiaming Yuan
19ee0a3579
Refactor fast-hist, add tests for some updaters. (#3836)
Add unittest for prune.

Add unittest for refresh.

Refactor fast_hist.

* Remove fast_hist_param.
* Rename to quantile_hist.

Add unittests for QuantileHist.

* Refactor QuantileHist into .h and .cc file.
* Remove sync.h.
* Remove MGPU_mock test.

Rename fast hist method to quantile hist.
2018-11-07 21:15:07 +13:00