Go to file

Matthew Jones 92b7577c62 [REVIEW] Enable Multi-Node Multi-GPU functionality (#4095 )

* Initial commit to support multi-node multi-gpu xgboost using dask

* Fixed NCCL initialization by not ignoring the opg parameter.

- it now crashes on NCCL initialization, but at least we're attempting it properly

* At the root node, perform a rabit::Allreduce to get initial sum_gradient across workers

* Synchronizing in a couple of more places.

- now the workers don't go down, but just hang
- no more "wild" values of gradients
- probably needs syncing in more places

* Added another missing max-allreduce operation inside BuildHistLeftRight

* Removed unnecessary collective operations.

* Simplified rabit::Allreduce() sync of gradient sums.

* Removed unnecessary rabit syncs around ncclAllReduce.

- this improves performance _significantly_ (7x faster for overall training,
  20x faster for xgboost proper)

* pulling in latest xgboost

* removing changes to updater_quantile_hist.cc

* changing use_nccl_opg initialization, removing unnecessary if statements

* added definition for opaque ncclUniqueId struct to properly encapsulate GetUniqueId

* placing struct defintion in guard to avoid duplicate code errors

* addressing linting errors

* removing

* removing additional arguments to AllReduer initialization

* removing distributed flag

* making comm init symmetric

* removing distributed flag

* changing ncclCommInit to support multiple modalities

* fix indenting

* updating ncclCommInitRank block with necessary group calls

* fix indenting

* adding print statement, and updating accessor in vector

* improving print statement to end-line

* generalizing nccl_rank construction using rabit

* assume device_ordinals is the same for every node

* test, assume device_ordinals is identical for all nodes

* test, assume device_ordinals is unique for all nodes

* changing names of offset variable to be more descriptive, editing indenting

* wrapping ncclUniqueId GetUniqueId() and aesthetic changes

* adding synchronization, and tests for distributed

* adding  to tests

* fixing broken #endif

* fixing initialization of gpu histograms, correcting errors in tests

* adding to contributors list

* adding distributed tests to jenkins

* fixing bad path in distributed test

* debugging

* adding kubernetes for distributed tests

* adding proper import for OrderedDict

* adding urllib3==1.22 to address ordered_dict import error

* added sleep to allow workers to save their models for comparison

* adding name to GPU contributors under docs

2019-03-02 10:03:22 +13:00

.github

Enable auto-locking of issues closed long ago (#3821 )

2018-10-23 19:21:58 -07:00

amalgamation

Refactor fast-hist, add tests for some updaters. (#3836 )

2018-11-07 21:15:07 +13:00

cmake

Fix cpplint. (#4157 )

2019-02-18 00:16:29 +08:00

cub @ b20808b1b0

Update cub submodule again (fixes GPU build) (#2599 )

2017-08-13 22:14:40 +12:00

demo

Fix typo in demo (#4156 )

2019-02-18 18:42:41 +08:00

dmlc-core @ ac983092ee

Update dmlc-core submodule (#3907 )

2018-11-15 18:50:49 -08:00

doc

[REVIEW] Enable Multi-Node Multi-GPU functionality (#4095 )

2019-03-02 10:03:22 +13:00

include/xgboost

Add PushCSC for SparsePage. (#4193 )

2019-03-02 01:58:08 +08:00

jvm-packages

[jvm-packages] Fix early stop with xgboost4j-spark (#4176 )

2019-03-01 13:02:57 -08:00

make

Not use -msse2 on power or arm arch. close #2446 (#2475 )

2017-07-06 20:06:55 -04:00

plugin

Replaced std::vector with HostDeviceVector in MetaInfo and SparsePage. (#3446 )

2018-08-30 14:28:47 +12:00

python-package

Added trees_to_df() method for Booster class (#4153 )

2019-02-26 13:28:24 -08:00

R-package

Fix cpplint. (#4157 )

2019-02-18 00:16:29 +08:00

rabit @ 1cc34f01db

Upgrade rabit. (#4159 )

2019-02-18 22:16:58 +08:00

src

[REVIEW] Enable Multi-Node Multi-GPU functionality (#4095 )

2019-03-02 10:03:22 +13:00

tests

[REVIEW] Enable Multi-Node Multi-GPU functionality (#4095 )

2019-03-02 10:03:22 +13:00

.clang-tidy

Perform clang-tidy on both cpp and cuda source. (#4034 )

2019-02-05 16:07:43 +08:00

.editorconfig

Added configuration for python into .editorconfig (#3494 )

2018-07-23 00:24:10 -07:00

.gitignore

Performance optimizations for Intel CPUs (#3957 )

2019-01-08 21:08:13 -08:00

.gitmodules

Upgrading to NCCL2 (#3404 )

2018-07-10 00:42:15 -07:00

.travis.yml

Fix broken R test: Install Homebrew GCC (#4142 )

2019-02-15 07:23:05 -08:00

appveyor.yml

Fix test_gpu_coordinate. (#3974 )

2019-02-19 14:09:10 -08:00

build.sh

Suggest git submodule update instead of delete + reclone (#3214 )

2018-05-09 14:39:17 -07:00

CITATION

simplify software citation (#2912 )

2017-12-01 02:58:13 -08:00

CMakeLists.txt

Perform clang-tidy on both cpp and cuda source. (#4034 )

2019-02-05 16:07:43 +08:00

CONTRIBUTORS.md

[REVIEW] Enable Multi-Node Multi-GPU functionality (#4095 )

2019-03-02 10:03:22 +13:00

Jenkinsfile

Perform clang-tidy on both cpp and cuda source. (#4034 )

2019-02-05 16:07:43 +08:00

Jenkinsfile-restricted

Disable retries in Jenkins CI, since we're now using On-Demand instances instead of Spot (#3948 )

2018-11-28 14:57:09 -08:00

LICENSE

Include full text of Apache 2.0 license (#3698 )

2018-09-12 20:46:55 -07:00

Makefile

Fix CRAN check warnings/notes (#3988 )

2018-12-12 08:23:20 -06:00

NEWS.md

Document GPU objectives in NEWS. (#3865 )

2018-11-05 14:46:45 +13:00

README.md

Add Jenkins status badge (#4090 )

2019-01-30 14:03:18 -08:00

README.md

eXtreme Gradient Boosting

Community | Documentation | Resources | Contributors | Release Notes

XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework. XGBoost provides a parallel tree boosting (also known as GBDT, GBM) that solve many data science problems in a fast and accurate way. The same code runs on major distributed environment (Hadoop, SGE, MPI) and can solve problems beyond billions of examples.

License

Contribute to XGBoost

XGBoost has been developed and used by a group of active community members. Your help is very valuable to make the package better for everyone. Checkout the Community Page

Reference

Tianqi Chen and Carlos Guestrin. XGBoost: A Scalable Tree Boosting System. In 22nd SIGKDD Conference on Knowledge Discovery and Data Mining, 2016
XGBoost originates from research project at University of Washington.

Languages

C++ 45.5%

Python 20.3%

Cuda 15.2%

R 6.8%

Scala 6.4%

Other 5.6%