[dask] Fix missing value for scikit-learn interface. (#5435)

2020-03-20 22:56:01 +08:00
parent 4b7e2b7bff
commit cd7d6f7d59
3 changed files with 77 additions and 12 deletions
--- a/doc/tutorials/dask.rst
+++ b/doc/tutorials/dask.rst
@@ -131,8 +131,14 @@ Basic functionalities including training and generating predictions for regressi
 classification are implemented.  But there are still some other limitations we haven't
 addressed yet.

- Label encoding for Scikit-Learn classifier.
- Ranking
+- Label encoding for Scikit-Learn classifier may not be supported.  Meaning that user need
+  to encode their training labels into discrete values first.
+- Ranking is not supported right now.
+- Empty worker is not well supported by classifier.  If the training hangs for classifier
+  with a warning about empty DMatrix, please consider balancing your data first.  But
+  regressor works fine with empty DMatrix.
 - Callback functions are not tested.
- To use cross validation one needs to explicitly train different models instead of using
-  a functional API like ``xgboost.cv``.
+- Only ``GridSearchCV`` from ``scikit-learn`` is supported for dask interface.  Meaning
+  that we can distribute data among workers but have to train one model at a time.  If you
+  want to scale up grid searching with model parallelism by ``dask-ml``, please consider
+  using normal ``scikit-learn`` interface like `xgboost.XGBRegressor` for now.