Fix dask API sphinx docstrings (#4507)

* Fix dask API sphinx docstrings

* Update GPU docs page
This commit is contained in:
Rory Mitchell 2019-05-28 16:39:26 +12:00 committed by GitHub
parent 3f7e5d9c47
commit 972f693eaf
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 24 additions and 5 deletions

View File

@ -67,11 +67,6 @@ The experimental parameter ``single_precision_histogram`` can be set to True to
The device ordinal can be selected using the ``gpu_id`` parameter, which defaults to 0. The device ordinal can be selected using the ``gpu_id`` parameter, which defaults to 0.
Multiple GPUs can be used with the ``gpu_hist`` tree method using the ``n_gpus`` parameter. which defaults to 1. If this is set to -1 all available GPUs will be used. If ``gpu_id`` is specified as non-zero, the selected gpu devices will be from ``gpu_id`` to ``gpu_id+n_gpus``, please note that ``gpu_id+n_gpus`` must be less than or equal to the number of available GPUs on your system. As with GPU vs. CPU, multi-GPU will not always be faster than a single GPU due to PCI bus bandwidth that can limit performance.
.. note:: Enabling multi-GPU training
Default installation may not enable multi-GPU training. To use multiple GPUs, make sure to read :ref:`build_gpu_support`.
The GPU algorithms currently work with CLI, Python and R packages. See :doc:`/build` for details. The GPU algorithms currently work with CLI, Python and R packages. See :doc:`/build` for details.
@ -82,6 +77,24 @@ The GPU algorithms currently work with CLI, Python and R packages. See :doc:`/bu
param['max_bin'] = 16 param['max_bin'] = 16
param['tree_method'] = 'gpu_hist' param['tree_method'] = 'gpu_hist'
Single Node Multi-GPU
=====================
Multiple GPUs can be used with the ``gpu_hist`` tree method using the ``n_gpus`` parameter. which defaults to 1. If this is set to -1 all available GPUs will be used. If ``gpu_id`` is specified as non-zero, the selected gpu devices will be from ``gpu_id`` to ``gpu_id+n_gpus``, please note that ``gpu_id+n_gpus`` must be less than or equal to the number of available GPUs on your system. As with GPU vs. CPU, multi-GPU will not always be faster than a single GPU due to PCI bus bandwidth that can limit performance.
.. note:: Enabling multi-GPU training
Default installation may not enable multi-GPU training. To use multiple GPUs, make sure to read :ref:`build_gpu_support`.
XGBoost supports multi-GPU training on a single machine via specifying the `n_gpus' parameter.
Multi-node Multi-GPU Training
=============================
XGBoost supports fully distributed GPU training using `Dask
<https://dask.org/>`_. See Python documentation :ref:`dask_api` and worked examples `here
<https://github.com/dmlc/xgboost/tree/master/demo/dask>`_.
Objective functions Objective functions
=================== ===================
Most of the objective functions implemented in XGBoost can be run on GPU. Following table shows current support status. Most of the objective functions implemented in XGBoost can be run on GPU. Following table shows current support status.
@ -209,6 +222,7 @@ References
Contributors Contributors
======= =======
Many thanks to the following contributors (alphabetical order): Many thanks to the following contributors (alphabetical order):
* Andrey Adinets * Andrey Adinets
* Jiaming Yuan * Jiaming Yuan
* Jonathan C. McKinney * Jonathan C. McKinney

View File

@ -74,6 +74,8 @@ Callback API
.. autofunction:: xgboost.callback.early_stop .. autofunction:: xgboost.callback.early_stop
.. _dask_api:
Dask API Dask API
-------- --------
.. automodule:: xgboost.dask .. automodule:: xgboost.dask
@ -83,3 +85,4 @@ Dask API
.. autofunction:: xgboost.dask.create_worker_dmatrix .. autofunction:: xgboost.dask.create_worker_dmatrix
.. autofunction:: xgboost.dask.get_local_data .. autofunction:: xgboost.dask.get_local_data

View File

@ -43,6 +43,7 @@ def _start_tracker(n_workers):
def get_local_data(data): def get_local_data(data):
""" """
Unpacks a distributed data object to get the rows local to this worker Unpacks a distributed data object to get the rows local to this worker
:param data: A distributed dask data object :param data: A distributed dask data object
:return: Local data partition e.g. numpy or pandas :return: Local data partition e.g. numpy or pandas
""" """
@ -107,6 +108,7 @@ def run(client, func, *args):
dask by default, unless the user overrides the nthread parameter. dask by default, unless the user overrides the nthread parameter.
Note: Windows platforms are not officially supported. Contributions are welcome here. Note: Windows platforms are not officially supported. Contributions are welcome here.
:param client: Dask client representing the cluster :param client: Dask client representing the cluster
:param func: Python function to be executed by each worker. Typically contains xgboost :param func: Python function to be executed by each worker. Typically contains xgboost
training code. training code.