Note for DaskDMatrix. (#5144)

* Brief introduction to `DaskDMatrix`.

* Add xgboost.dask.train to API doc
This commit is contained in:
Jiaming Yuan
2019-12-23 18:55:32 +08:00
committed by GitHub
parent c8bdb652c4
commit a4b929385e
4 changed files with 63 additions and 37 deletions

View File

@@ -113,25 +113,28 @@ def _assert_client(client):
class DaskDMatrix:
# pylint: disable=missing-docstring, too-many-instance-attributes
'''DMatrix holding on references to Dask DataFrame or Dask Array.
'''DMatrix holding on references to Dask DataFrame or Dask Array. Constructing
a `DaskDMatrix` forces all lazy computation to be carried out. Wait for
the input data explicitly if you want to see actual computation of
constructing `DaskDMatrix`.
Parameters
----------
client: dask.distributed.Client
Parameters
----------
client: dask.distributed.Client
Specify the dask client used for training. Use default client
returned from dask if it's set to None.
data : dask.array.Array/dask.dataframe.DataFrame
data : dask.array.Array/dask.dataframe.DataFrame
data source of DMatrix.
label: dask.array.Array/dask.dataframe.DataFrame
label: dask.array.Array/dask.dataframe.DataFrame
label used for trainin.
missing : float, optional
Value in the input data (e.g. `numpy.ndarray`) which needs
to be present as a missing value. If None, defaults to np.nan.
weight : dask.array.Array/dask.dataframe.DataFrame
missing : float, optional
Value in the input data (e.g. `numpy.ndarray`) which needs
to be present as a missing value. If None, defaults to np.nan.
weight : dask.array.Array/dask.dataframe.DataFrame
Weight for each instance.
feature_names : list, optional
feature_names : list, optional
Set names for features.
feature_types : list, optional
feature_types : list, optional
Set types for features
'''
@@ -349,23 +352,23 @@ def train(client, params, dtrain, *args, evals=(), **kwargs):
Parameters
----------
client: dask.distributed.Client
Specify the dask client used for training. Use default client
returned from dask if it's set to None.
Other parameters are the same as `xgboost.train` except for `evals_result`,
which is returned as part of function return value instead of argument.
Specify the dask client used for training. Use default client
returned from dask if it's set to None.
\\*\\*kwargs:
Other parameters are the same as `xgboost.train` except for `evals_result`,
which is returned as part of function return value instead of argument.
Returns
-------
results: dict
A dictionary containing trained booster and evaluation history.
`history` field is the same as `eval_result` from `xgboost.train`.
A dictionary containing trained booster and evaluation history.
`history` field is the same as `eval_result` from `xgboost.train`.
.. code-block:: python
.. code-block:: python
{'booster': xgboost.Booster,
'history': {'train': {'logloss': ['0.48253', '0.35953']},
'eval': {'logloss': ['0.480385', '0.357756']}}}
{'booster': xgboost.Booster,
'history': {'train': {'logloss': ['0.48253', '0.35953']},
'eval': {'logloss': ['0.480385', '0.357756']}}}
'''
_assert_dask_support()
@@ -420,15 +423,15 @@ def train(client, params, dtrain, *args, evals=(), **kwargs):
def predict(client, model, data, *args):
'''Run prediction with a trained booster.
.. note::
.. note::
Only default prediction mode is supported right now.
Only default prediction mode is supported right now.
Parameters
----------
client: dask.distributed.Client
Specify the dask client used for training. Use default client
returned from dask if it's set to None.
Specify the dask client used for training. Use default client
returned from dask if it's set to None.
model: A Booster or a dictionary returned by `xgboost.dask.train`.
The trained model.
data: DaskDMatrix

View File

@@ -136,26 +136,26 @@ def to_graphviz(booster, fmap='', num_trees=0, rankdir=None,
Edge color when meets the node condition.
no_color : str, default '#FF0000'
Edge color when doesn't meet the node condition.
condition_node_params : dict (optional)
condition_node_params : dict, optional
Condition node configuration for for graphviz. Example:
.. code-block:: python
{'shape': 'box',
'style': 'filled,rounded',
'fillcolor': '#78bceb'}
{'shape': 'box',
'style': 'filled,rounded',
'fillcolor': '#78bceb'}
leaf_node_params : dict (optional)
leaf_node_params : dict, optional
Leaf node configuration for graphviz. Example:
.. code-block:: python
{'shape': 'box',
'style': 'filled',
'fillcolor': '#e48038'}
{'shape': 'box',
'style': 'filled',
'fillcolor': '#e48038'}
kwargs : Other keywords passed to graphviz graph_attr, E.g.:
``graph [ {key} = {value} ]``
\\*\\*kwargs: dict, optional
Other keywords passed to graphviz graph_attr, e.g. ``graph [ {key} = {value} ]``
Returns
-------