xgboost

Files

Matthew Jones 92b7577c62 [REVIEW] Enable Multi-Node Multi-GPU functionality (#4095 )

* Initial commit to support multi-node multi-gpu xgboost using dask

* Fixed NCCL initialization by not ignoring the opg parameter.

- it now crashes on NCCL initialization, but at least we're attempting it properly

* At the root node, perform a rabit::Allreduce to get initial sum_gradient across workers

* Synchronizing in a couple of more places.

- now the workers don't go down, but just hang
- no more "wild" values of gradients
- probably needs syncing in more places

* Added another missing max-allreduce operation inside BuildHistLeftRight

* Removed unnecessary collective operations.

* Simplified rabit::Allreduce() sync of gradient sums.

* Removed unnecessary rabit syncs around ncclAllReduce.

- this improves performance _significantly_ (7x faster for overall training,
  20x faster for xgboost proper)

* pulling in latest xgboost

* removing changes to updater_quantile_hist.cc

* changing use_nccl_opg initialization, removing unnecessary if statements

* added definition for opaque ncclUniqueId struct to properly encapsulate GetUniqueId

* placing struct defintion in guard to avoid duplicate code errors

* addressing linting errors

* removing

* removing additional arguments to AllReduer initialization

* removing distributed flag

* making comm init symmetric

* removing distributed flag

* changing ncclCommInit to support multiple modalities

* fix indenting

* updating ncclCommInitRank block with necessary group calls

* fix indenting

* adding print statement, and updating accessor in vector

* improving print statement to end-line

* generalizing nccl_rank construction using rabit

* assume device_ordinals is the same for every node

* test, assume device_ordinals is identical for all nodes

* test, assume device_ordinals is unique for all nodes

* changing names of offset variable to be more descriptive, editing indenting

* wrapping ncclUniqueId GetUniqueId() and aesthetic changes

* adding synchronization, and tests for distributed

* adding  to tests

* fixing broken #endif

* fixing initialization of gpu histograms, correcting errors in tests

* adding to contributors list

* adding distributed tests to jenkins

* fixing bad path in distributed test

* debugging

* adding kubernetes for distributed tests

* adding proper import for OrderedDict

* adding urllib3==1.22 to address ordered_dict import error

* added sleep to allow workers to save their models for comparison

* adding name to GPU contributors under docs

2019-03-02 10:03:22 +13:00

_static

Doc modernization (#3474 )

2018-07-19 14:22:16 -07:00

gpu

[REVIEW] Enable Multi-Node Multi-GPU functionality (#4095 )

2019-03-02 10:03:22 +13:00

jvm

[jvm-packages] Fix early stop with xgboost4j-spark (#4176 )

2019-03-01 13:02:57 -08:00

python

Make `HistCutMatrix::Init' be aware of groups. (#4115 )

2019-02-16 04:39:41 +08:00

R-package

fix typos (#4027 )

2018-12-28 00:36:47 +08:00

tutorials

Fix typo in Feature Interaction Constraints tutorial (#3975 )

2018-12-06 19:38:40 -08:00

.gitignore

[DOC] Update R doc

2016-01-16 11:52:33 -08:00

build.rst

Correct typo

2018-11-04 05:22:53 -08:00

cli.rst

Doc modernization (#3474 )

2018-07-19 14:22:16 -07:00

conf.py

Fix broken doc build due to Matplotlib 3.0 release (#3764 )

2018-10-07 13:34:37 -07:00

contribute.rst

Perform clang-tidy on both cpp and cuda source. (#4034 )

2019-02-05 16:07:43 +08:00

Doxyfile

[TRAVIS] cleanup travis script

2016-01-16 10:25:12 -08:00

faq.rst

Update faq.rst (#3521 )

2018-07-28 10:34:14 -07:00

get_started.rst

replace nround with nrounds to match actual parameter (#3592 )

2018-08-15 11:13:53 -07:00

index.rst

Doc modernization (#3474 )

2018-07-19 14:22:16 -07:00

julia.rst

Doc modernization (#3474 )

2018-07-19 14:22:16 -07:00

Makefile

enable basic sphinx doc

2015-08-01 11:27:13 -07:00

parameter.rst

Distributed Fast Histogram Algorithm (#4011 )

2019-02-05 05:12:53 -08:00

README

Fix doc build (#3126 )

2018-02-21 16:57:30 -08:00

requirements.txt

Revert #3677 and #3674 (#3678 )

2018-09-06 20:43:17 -07:00

sphinx_util.py

Doc modernization (#3474 )

2018-07-19 14:22:16 -07:00

README

The documentation of xgboost is generated with recommonmark and sphinx.

You can build it locally by typing "make html" in this folder.

Checkout https://recommonmark.readthedocs.org for guide on how to write markdown with extensions used in this doc, such as math formulas and table of content.