Improve multi-threaded performance (#2104)

* Add UpdatePredictionCache() option to updaters Some updaters (e.g. fast_hist) has enough information to quickly compute prediction cache for the training data. Each updater may override UpdaterPredictionCache() method to update the prediction cache. Note: this trick does not apply to validation data. * Respond to code review * Disable some debug messages by default * Document UpdatePredictionCache() interface * Remove base_margin logic from UpdatePredictionCache() implementation * Do not take pointer to cfg, as reference may get stale * Improve multi-threaded performance * Use columnwise accessor to accelerate ApplySplit() step, with support for a compressed representation * Parallel sort for evaluation step * Inline BuildHist() function * Cache gradient pairs when building histograms in BuildHist() * Add missing #if macro * Respond to code review * Use wrapper to enable parallel sort on Linux * Fix C++ compatibility issues * MSVC doesn't support unsigned in OpenMP loops * gcc 4.6 doesn't support using keyword * Fix lint issues * Respond to code review * Fix bug in ApplySplitSparseData() * Attempting to read beyond the end of a sparse column * Mishandling the case where an entire range of rows have missing values * Fix training continuation bug Disable UpdatePredictionCache() in the first iteration. This way, we can accomodate the scenario where we build off of an existing (nonempty) ensemble. * Add regression test for fast_hist * Respond to code review * Add back old version of ApplySplitSparseData
2017-03-25 10:35:01 -07:00
parent 332aea26a3
commit 14fba01b5a
14 changed files with 719 additions and 171 deletions
--- a/include/xgboost/base.h
+++ b/include/xgboost/base.h
@@ -48,6 +48,15 @@
 #define XGBOOST_ALIGNAS(X)
 #endif

+#if defined(__GNUC__) && __GNUC__ == 4 && __GNUC_MINOR__ >= 8
+#include <parallel/algorithm>
+#define XGBOOST_PARALLEL_SORT(X, Y, Z) __gnu_parallel::sort((X), (Y), (Z))
+#define XGBOOST_PARALLEL_STABLE_SORT(X, Y, Z) __gnu_parallel::stable_sort((X), (Y), (Z))
+#else
+#define XGBOOST_PARALLEL_SORT(X, Y, Z) std::sort((X), (Y), (Z))
+#define XGBOOST_PARALLEL_STABLE_SORT(X, Y, Z) std::stable_sort((X), (Y), (Z))
+#endif
+
 /*! \brief namespace of xgboo st*/
 namespace xgboost {
 /*!
--- a/include/xgboost/tree_updater.h
+++ b/include/xgboost/tree_updater.h
@@ -45,14 +45,20 @@ class TreeUpdater {
  virtual void Update(const std::vector<bst_gpair>& gpair,
                      DMatrix* data,
                      const std::vector<RegTree*>& trees) = 0;
+
  /*!
-   * \brief this is simply a function for optimizing performance
-   * this function asks the updater to return the leaf position of each instance in the previous performed update.
-   * if it is cached in the updater, if it is not available, return nullptr
-   * \return array of leaf position of each instance in the last updated tree
+   * \brief determines whether updater has enough knowledge about a given dataset
+   *        to quickly update prediction cache its training data and performs the
+   *        update if possible.
+   * \param data: data matrix
+   * \param out_preds: prediction cache to be updated
+   * \return boolean indicating whether updater has capability to update
+   *         the prediction cache. If true, the prediction cache will have been
+   *         updated by the time this function returns.
   */
-  virtual const int* GetLeafPosition() const {
-    return nullptr;
+  virtual bool UpdatePredictionCache(const DMatrix* data,
+                                     std::vector<bst_float>* out_preds) const {
+    return false;
  }
  /*!
   * \brief Create a tree updater given name