[dask][doc] Add small example for sklearn interface. (#6970)

This commit is contained in:
Jiaming Yuan 2021-05-19 13:50:45 +08:00 committed by GitHub
parent 7e846bb965
commit 5cb51a191e
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 36 additions and 4 deletions

View File

@ -115,8 +115,8 @@ See next section for details.
Alternatively, XGBoost also implements the Scikit-Learn interface with Alternatively, XGBoost also implements the Scikit-Learn interface with
``DaskXGBClassifier``, ``DaskXGBRegressor``, ``DaskXGBRanker`` and 2 random forest ``DaskXGBClassifier``, ``DaskXGBRegressor``, ``DaskXGBRanker`` and 2 random forest
variances. This wrapper is similar to the single node Scikit-Learn interface in xgboost, variances. This wrapper is similar to the single node Scikit-Learn interface in xgboost,
with dask collection as inputs and has an additional ``client`` attribute. See with dask collection as inputs and has an additional ``client`` attribute. See following
``xgboost/demo/dask`` for more examples. sections and ``xgboost/demo/dask`` for more examples.
****************** ******************
@ -191,6 +191,38 @@ Scikit-Learn wrapper object:
booster = cls.get_booster() booster = cls.get_booster()
**********************
Scikit-Learn interface
**********************
As mentioned previously, there's another interface that mimics the scikit-learn estimators
with higher level of of abstraction. The interface is easier to use compared to the
functional interface but with more constraints. It's worth mentioning that, although the
interface mimics scikit-learn estimators, it doesn't work with normal scikit-learn
utilities like ``GridSearchCV`` as scikit-learn doesn't understand distributed dask data
collection.
.. code-block:: python
from distributed import LocalCluster, Client
import xgboost as xgb
def main(client: Client) -> None:
X, y = load_data()
clf = xgb.dask.DaskXGBClassifier(n_estimators=100, tree_method="hist")
clf.client = client # assign the client
clf.fit(X, y, eval_set=[(X, y)])
proba = clf.predict_proba(X)
if __name__ == "__main__":
with LocalCluster() as cluster:
with Client(cluster) as client:
main(client)
*************************** ***************************
Working with other clusters Working with other clusters
*************************** ***************************

View File

@ -1,6 +1,6 @@
######################### #############################
Random Forests(TM) in XGBoost Random Forests(TM) in XGBoost
######################### #############################
XGBoost is normally used to train gradient-boosted decision trees and other gradient XGBoost is normally used to train gradient-boosted decision trees and other gradient
boosted models. Random Forests use the same model representation and inference, as boosted models. Random Forests use the same model representation and inference, as