Enforce correct data shape. (#5191)

* Fix syncing DMatrix columns.
* notes for tree method.
* Enable feature validation for all interfaces except for jvm.
* Better tests for boosting from predictions.
* Disable validation on JVM.
This commit is contained in:
Jiaming Yuan
2020-01-13 15:48:17 +08:00
committed by GitHub
parent 8cbcc53ccb
commit 7b65698187
14 changed files with 108 additions and 60 deletions

View File

@@ -112,18 +112,24 @@ Parameters for Tree Booster
- The tree construction algorithm used in XGBoost. See description in the `reference paper <http://arxiv.org/abs/1603.02754>`_.
- XGBoost supports ``approx``, ``hist`` and ``gpu_hist`` for distributed training. Experimental support for external memory is available for ``approx`` and ``gpu_hist``.
- Choices: ``auto``, ``exact``, ``approx``, ``hist``, ``gpu_hist``
- Choices: ``auto``, ``exact``, ``approx``, ``hist``, ``gpu_hist``, this is a
combination of commonly used updaters. For other updaters like ``refresh``, set the
parameter ``updater`` directly.
- ``auto``: Use heuristic to choose the fastest method.
- For small to medium dataset, exact greedy (``exact``) will be used.
- For very large dataset, approximate algorithm (``approx``) will be chosen.
- Because old behavior is always use exact greedy in single machine,
user will get a message when approximate algorithm is chosen to notify this choice.
- For small dataset, exact greedy (``exact``) will be used.
- For larger dataset, approximate algorithm (``approx``) will be chosen. It's
recommended to try ``hist`` and ``gpu_hist`` for higher performance with large
dataset.
(``gpu_hist``)has support for ``external memory``.
- ``exact``: Exact greedy algorithm.
- Because old behavior is always use exact greedy in single machine, user will get a
message when approximate algorithm is chosen to notify this choice.
- ``exact``: Exact greedy algorithm. Enumerates all split candidates.
- ``approx``: Approximate greedy algorithm using quantile sketch and gradient histogram.
- ``hist``: Fast histogram optimized approximate greedy algorithm. It uses some performance improvements such as bins caching.
- ``hist``: Faster histogram optimized approximate greedy algorithm.
- ``gpu_hist``: GPU implementation of ``hist`` algorithm.
* ``sketch_eps`` [default=0.03]

View File

@@ -38,6 +38,11 @@ There are in general two ways that you can control overfitting in XGBoost:
- This includes ``subsample`` and ``colsample_bytree``.
- You can also reduce stepsize ``eta``. Remember to increase ``num_round`` when you do so.
***************************
Faster training performance
***************************
There's a parameter called ``tree_method``, set it to ``hist`` or ``gpu_hist`` for faster computation.
*************************
Handle Imbalanced Dataset
*************************