Run training with empty DMatrix. (#4990)

This makes GPU Hist robust in distributed environment as some workers might not be associated with any data in either training or evaluation. * Disable rabit mock test for now: See #5012 . * Disable dask-cudf test at prediction for now: See #5003 * Launch dask job for all workers despite they might not have any data. * Check 0 rows in elementwise evaluation metrics. Using AUC and AUC-PR still throws an error. See #4663 for a robust fix. * Add tests for edge cases. * Add `LaunchKernel` wrapper handling zero sized grid. * Move some parts of allreducer into a cu file. * Don't validate feature names when the booster is empty. * Sync number of columns in DMatrix. As num_feature is required to be the same across all workers in data split mode. * Filtering in dask interface now by default syncs all booster that's not empty, instead of using rank 0. * Fix Jenkins' GPU tests. * Install dask-cuda from source in Jenkins' test. Now all tests are actually running. * Restore GPU Hist tree synchronization test. * Check UUID of running devices. The check is only performed on CUDA version >= 10.x, as 9.x doesn't have UUID field. * Fix CMake policy and project variables. Use xgboost_SOURCE_DIR uniformly, add policy for CMake >= 3.13. * Fix copying data to CPU * Fix race condition in cpu predictor. * Fix duplicated DMatrix construction. * Don't download extra nccl in CI script.
2019-11-06 16:13:13 +08:00
parent 807a244517
commit 7663de956c
44 changed files with 603 additions and 272 deletions
--- a/src/gbm/gbtree.h
+++ b/src/gbm/gbtree.h
@@ -246,6 +246,14 @@ class GBTree : public GradientBooster {
  std::unique_ptr<Predictor> const& GetPredictor(HostDeviceVector<float> const* out_pred = nullptr,
                                                 DMatrix* f_dmat = nullptr) const {
    CHECK(configured_);
+    auto on_device = f_dmat && (*(f_dmat->GetBatches<SparsePage>().begin())).data.DeviceCanRead();
+#if defined(XGBOOST_USE_CUDA)
+    // Use GPU Predictor if data is already on device.
+    if (!specified_predictor_ && on_device) {
+      CHECK(gpu_predictor_);
+      return gpu_predictor_;
+    }
+#endif  // defined(XGBOOST_USE_CUDA)
    // GPU_Hist by default has prediction cache calculated from quantile values, so GPU
    // Predictor is not used for training dataset.  But when XGBoost performs continue
    // training with an existing model, the prediction cache is not availbale and number
@@ -256,7 +264,7 @@ class GBTree : public GradientBooster {
        (model_.param.num_trees != 0) &&
        // FIXME(trivialfis): Implement a better method for testing whether data is on
        // device after DMatrix refactoring is done.
-        (f_dmat && !((*(f_dmat->GetBatches<SparsePage>().begin())).data.DeviceCanRead()))) {
+        !on_device) {
      return cpu_predictor_;
    }
    if (tparam_.predictor == "cpu_predictor") {