Enforce correct data shape. (#5191)

* Fix syncing DMatrix columns. * notes for tree method. * Enable feature validation for all interfaces except for jvm. * Better tests for boosting from predictions. * Disable validation on JVM.
2020-01-13 15:48:17 +08:00
parent 8cbcc53ccb
commit 7b65698187
14 changed files with 108 additions and 60 deletions
--- a/doc/parameter.rst
+++ b/doc/parameter.rst
@@ -112,18 +112,24 @@ Parameters for Tree Booster

  - The tree construction algorithm used in XGBoost. See description in the `reference paper <http://arxiv.org/abs/1603.02754>`_.
  - XGBoost supports  ``approx``, ``hist`` and ``gpu_hist`` for distributed training.  Experimental support for external memory is available for ``approx`` and ``gpu_hist``.
-  - Choices: ``auto``, ``exact``, ``approx``, ``hist``, ``gpu_hist``
+
+  - Choices: ``auto``, ``exact``, ``approx``, ``hist``, ``gpu_hist``, this is a
+    combination of commonly used updaters.  For other updaters like ``refresh``, set the
+    parameter ``updater`` directly.

    - ``auto``: Use heuristic to choose the fastest method.

-      - For small to medium dataset, exact greedy (``exact``) will be used.
-      - For very large dataset, approximate algorithm (``approx``) will be chosen.
-      - Because old behavior is always use exact greedy in single machine,
-        user will get a message when approximate algorithm is chosen to notify this choice.
+      - For small dataset, exact greedy (``exact``) will be used.
+      - For larger dataset, approximate algorithm (``approx``) will be chosen.  It's
+        recommended to try ``hist`` and ``gpu_hist`` for higher performance with large
+        dataset.
+        (``gpu_hist``)has support for ``external memory``.

-    - ``exact``: Exact greedy algorithm.
+      - Because old behavior is always use exact greedy in single machine, user will get a
+        message when approximate algorithm is chosen to notify this choice.
+    - ``exact``: Exact greedy algorithm.  Enumerates all split candidates.
    - ``approx``: Approximate greedy algorithm using quantile sketch and gradient histogram.
-    - ``hist``: Fast histogram optimized approximate greedy algorithm. It uses some performance improvements such as bins caching.
+    - ``hist``: Faster histogram optimized approximate greedy algorithm.
    - ``gpu_hist``: GPU implementation of ``hist`` algorithm.

 * ``sketch_eps`` [default=0.03]
--- a/doc/tutorials/param_tuning.rst
+++ b/doc/tutorials/param_tuning.rst
@@ -38,6 +38,11 @@ There are in general two ways that you can control overfitting in XGBoost:
  - This includes ``subsample`` and ``colsample_bytree``.
  - You can also reduce stepsize ``eta``. Remember to increase ``num_round`` when you do so.

+***************************
+Faster training performance
+***************************
+There's a parameter called ``tree_method``, set it to ``hist`` or ``gpu_hist`` for faster computation.
+
 *************************
 Handle Imbalanced Dataset
 *************************