[doc] [dask] Add example on early stopping with Dask (#6501)

Co-authored-by: fis <jm.yuan@outlook.com>
This commit is contained in:
James Lamb 2020-12-15 14:23:23 +00:00 committed by GitHub
parent 8139849ab6
commit 1e2c3ade9e
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -273,6 +273,84 @@ actual computation will return a coroutine and hence require awaiting:
# Use `client.compute` instead of the `compute` method from dask collection
print(await client.compute(prediction))
*****************************
Evaluation and Early Stopping
*****************************
.. versionadded:: 1.3.0
The Dask interface allows the use of validation sets that are stored in distributed collections (Dask DataFrame or Dask Array). These can be used for evaluation and early stopping.
To enable early stopping, pass one or more validation sets containing ``DaskDMatrix`` objects.
.. code-block:: python
import dask.array as da
import xgboost as xgb
num_rows = 1e6
num_features = 100
num_partitions = 10
rows_per_chunk = num_rows / num_partitions
data = da.random.random(
size=(num_rows, num_features),
chunks=(rows_per_chunk, num_features)
)
labels = da.random.random(
size=(num_rows, 1),
chunks=(rows_per_chunk, 1)
)
X_eval = da.random.random(
size=(num_rows, num_features),
chunks=(rows_per_chunk, num_features)
)
y_eval = da.random.random(
size=(num_rows, 1),
chunks=(rows_per_chunk, 1)
)
dtrain = xgb.dask.DaskDMatrix(
client=client,
data=data,
label=labels
)
dvalid = xgb.dask.DaskDMatrix(
client=client,
data=X_eval,
label=y_eval
)
result = xgb.dask.train(
client=client,
params={
"objective": "reg:squarederror",
},
dtrain=dtrain,
num_boost_round=10,
evals=[(dvalid, "valid1")],
early_stopping_rounds=3
)
When validation sets are provided to ``xgb.dask.train()`` in this way, the model object returned by ``xgb.dask.train()`` contains a history of evaluation metrics for each validation set, across all boosting rounds.
.. code-block:: python
print(result["history"])
# {'valid1': OrderedDict([('rmse', [0.28857, 0.28858, 0.288592, 0.288598])])}
If early stopping is enabled by also passing ``early_stopping_rounds``, you can check the best iteration in the returned booster.
.. code-block:: python
booster = result["booster"]
print(booster.best_iteration)
best_model = booster[: booster.best_iteration]
*****************************************************************************
Why is the initialization of ``DaskDMatrix`` so slow and throws weird errors
*****************************************************************************