[doc] [dask] Add example on early stopping with Dask (#6501)
Co-authored-by: fis <jm.yuan@outlook.com>
This commit is contained in:
parent
8139849ab6
commit
1e2c3ade9e
@ -273,6 +273,84 @@ actual computation will return a coroutine and hence require awaiting:
|
|||||||
# Use `client.compute` instead of the `compute` method from dask collection
|
# Use `client.compute` instead of the `compute` method from dask collection
|
||||||
print(await client.compute(prediction))
|
print(await client.compute(prediction))
|
||||||
|
|
||||||
|
*****************************
|
||||||
|
Evaluation and Early Stopping
|
||||||
|
*****************************
|
||||||
|
|
||||||
|
.. versionadded:: 1.3.0
|
||||||
|
|
||||||
|
The Dask interface allows the use of validation sets that are stored in distributed collections (Dask DataFrame or Dask Array). These can be used for evaluation and early stopping.
|
||||||
|
|
||||||
|
To enable early stopping, pass one or more validation sets containing ``DaskDMatrix`` objects.
|
||||||
|
|
||||||
|
.. code-block:: python
|
||||||
|
|
||||||
|
import dask.array as da
|
||||||
|
import xgboost as xgb
|
||||||
|
|
||||||
|
num_rows = 1e6
|
||||||
|
num_features = 100
|
||||||
|
num_partitions = 10
|
||||||
|
rows_per_chunk = num_rows / num_partitions
|
||||||
|
|
||||||
|
data = da.random.random(
|
||||||
|
size=(num_rows, num_features),
|
||||||
|
chunks=(rows_per_chunk, num_features)
|
||||||
|
)
|
||||||
|
|
||||||
|
labels = da.random.random(
|
||||||
|
size=(num_rows, 1),
|
||||||
|
chunks=(rows_per_chunk, 1)
|
||||||
|
)
|
||||||
|
|
||||||
|
X_eval = da.random.random(
|
||||||
|
size=(num_rows, num_features),
|
||||||
|
chunks=(rows_per_chunk, num_features)
|
||||||
|
)
|
||||||
|
|
||||||
|
y_eval = da.random.random(
|
||||||
|
size=(num_rows, 1),
|
||||||
|
chunks=(rows_per_chunk, 1)
|
||||||
|
)
|
||||||
|
|
||||||
|
dtrain = xgb.dask.DaskDMatrix(
|
||||||
|
client=client,
|
||||||
|
data=data,
|
||||||
|
label=labels
|
||||||
|
)
|
||||||
|
|
||||||
|
dvalid = xgb.dask.DaskDMatrix(
|
||||||
|
client=client,
|
||||||
|
data=X_eval,
|
||||||
|
label=y_eval
|
||||||
|
)
|
||||||
|
|
||||||
|
result = xgb.dask.train(
|
||||||
|
client=client,
|
||||||
|
params={
|
||||||
|
"objective": "reg:squarederror",
|
||||||
|
},
|
||||||
|
dtrain=dtrain,
|
||||||
|
num_boost_round=10,
|
||||||
|
evals=[(dvalid, "valid1")],
|
||||||
|
early_stopping_rounds=3
|
||||||
|
)
|
||||||
|
|
||||||
|
When validation sets are provided to ``xgb.dask.train()`` in this way, the model object returned by ``xgb.dask.train()`` contains a history of evaluation metrics for each validation set, across all boosting rounds.
|
||||||
|
|
||||||
|
.. code-block:: python
|
||||||
|
|
||||||
|
print(result["history"])
|
||||||
|
# {'valid1': OrderedDict([('rmse', [0.28857, 0.28858, 0.288592, 0.288598])])}
|
||||||
|
|
||||||
|
If early stopping is enabled by also passing ``early_stopping_rounds``, you can check the best iteration in the returned booster.
|
||||||
|
|
||||||
|
.. code-block:: python
|
||||||
|
|
||||||
|
booster = result["booster"]
|
||||||
|
print(booster.best_iteration)
|
||||||
|
best_model = booster[: booster.best_iteration]
|
||||||
|
|
||||||
*****************************************************************************
|
*****************************************************************************
|
||||||
Why is the initialization of ``DaskDMatrix`` so slow and throws weird errors
|
Why is the initialization of ``DaskDMatrix`` so slow and throws weird errors
|
||||||
*****************************************************************************
|
*****************************************************************************
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user