- Rewrite GPU demos. notebook is converted to script to avoid committing additional png plots.
- Add GPU demos into the sphinx gallery.
- Add RMM demos into the sphinx gallery.
- Test for firing threads with different device ordinals.
* Handle the new `device` parameter in dask and demos.
- Check no ordinal is specified in the dask interface.
- Update demos.
- Update dask doc.
- Update the condition for QDM.
- A `DeviceOrd` struct is implemented to indicate the device. It will eventually replace the `gpu_id` parameter.
- The `predictor` parameter is removed.
- Fallback to `DMatrix` when `inplace_predict` is not available.
- The heuristic for choosing a predictor is only used during training.
* [CI] Drop CUDA 10.1; Require 11.0
* Change NCCL version
* Use CUDA 10.1 for clang-tidy, for now
* Remove JDK 11 and 12
* Fix NCCL version
* Don't require 11.0 just yet, until clang-tidy is fixed
* Skip MultiClassesSerializationTest.GpuHist
* Initial commit to support multi-node multi-gpu xgboost using dask
* Fixed NCCL initialization by not ignoring the opg parameter.
- it now crashes on NCCL initialization, but at least we're attempting it properly
* At the root node, perform a rabit::Allreduce to get initial sum_gradient across workers
* Synchronizing in a couple of more places.
- now the workers don't go down, but just hang
- no more "wild" values of gradients
- probably needs syncing in more places
* Added another missing max-allreduce operation inside BuildHistLeftRight
* Removed unnecessary collective operations.
* Simplified rabit::Allreduce() sync of gradient sums.
* Removed unnecessary rabit syncs around ncclAllReduce.
- this improves performance _significantly_ (7x faster for overall training,
20x faster for xgboost proper)
* pulling in latest xgboost
* removing changes to updater_quantile_hist.cc
* changing use_nccl_opg initialization, removing unnecessary if statements
* added definition for opaque ncclUniqueId struct to properly encapsulate GetUniqueId
* placing struct defintion in guard to avoid duplicate code errors
* addressing linting errors
* removing
* removing additional arguments to AllReduer initialization
* removing distributed flag
* making comm init symmetric
* removing distributed flag
* changing ncclCommInit to support multiple modalities
* fix indenting
* updating ncclCommInitRank block with necessary group calls
* fix indenting
* adding print statement, and updating accessor in vector
* improving print statement to end-line
* generalizing nccl_rank construction using rabit
* assume device_ordinals is the same for every node
* test, assume device_ordinals is identical for all nodes
* test, assume device_ordinals is unique for all nodes
* changing names of offset variable to be more descriptive, editing indenting
* wrapping ncclUniqueId GetUniqueId() and aesthetic changes
* adding synchronization, and tests for distributed
* adding to tests
* fixing broken #endif
* fixing initialization of gpu histograms, correcting errors in tests
* adding to contributors list
* adding distributed tests to jenkins
* fixing bad path in distributed test
* debugging
* adding kubernetes for distributed tests
* adding proper import for OrderedDict
* adding urllib3==1.22 to address ordered_dict import error
* added sleep to allow workers to save their models for comparison
* adding name to GPU contributors under docs
* Enable running objectives with 0 GPU.
* Enable 0 GPU for objectives.
* Add doc for GPU objectives.
* Fix some objectives defaulted to running on all GPUs.
* Change doc build to reST exclusively
* Rewrite Intro doc in reST; create toctree
* Update parameter and contribute
* Convert tutorials to reST
* Convert Python tutorials to reST
* Convert CLI and Julia docs to reST
* Enable markdown for R vignettes
* Done migrating to reST
* Add guzzle_sphinx_theme to requirements
* Add breathe to requirements
* Fix search bar
* Add link to user forum
* Fix#2905
* Fix gpu_exact test failures
* Fix bug in GPU prediction where multiple calls to batch prediction can produce incorrect results
* Fix GPU documentation formatting