[Breaking] Remove rabit support for custom reductions and grow_local_histmaker updater (#7992)
This commit is contained in:
@@ -151,15 +151,6 @@ Parameters for Tree Booster
|
||||
- ``hist``: Faster histogram optimized approximate greedy algorithm.
|
||||
- ``gpu_hist``: GPU implementation of ``hist`` algorithm.
|
||||
|
||||
* ``sketch_eps`` [default=0.03]
|
||||
|
||||
- Only used for ``updater=grow_local_histmaker``.
|
||||
- This roughly translates into ``O(1 / sketch_eps)`` number of bins.
|
||||
Compared to directly select number of bins, this comes with theoretical guarantee with sketch accuracy.
|
||||
- Usually user does not have to tune this.
|
||||
But consider setting to a lower number for more accurate enumeration of split candidates.
|
||||
- range: (0, 1)
|
||||
|
||||
* ``scale_pos_weight`` [default=1]
|
||||
|
||||
- Control the balance of positive and negative weights, useful for unbalanced classes. A typical value to consider: ``sum(negative instances) / sum(positive instances)``. See :doc:`Parameters Tuning </tutorials/param_tuning>` for more discussion. Also, see Higgs Kaggle competition demo for examples: `R <https://github.com/dmlc/xgboost/blob/master/demo/kaggle-higgs/higgs-train.R>`_, `py1 <https://github.com/dmlc/xgboost/blob/master/demo/kaggle-higgs/higgs-numpy.py>`_, `py2 <https://github.com/dmlc/xgboost/blob/master/demo/kaggle-higgs/higgs-cv.py>`_, `py3 <https://github.com/dmlc/xgboost/blob/master/demo/guide-python/cross_validation.py>`_.
|
||||
@@ -170,7 +161,6 @@ Parameters for Tree Booster
|
||||
|
||||
- ``grow_colmaker``: non-distributed column-based construction of trees.
|
||||
- ``grow_histmaker``: distributed tree construction with row-based data splitting based on global proposal of histogram counting.
|
||||
- ``grow_local_histmaker``: based on local histogram counting.
|
||||
- ``grow_quantile_histmaker``: Grow tree using quantized histogram.
|
||||
- ``grow_gpu_hist``: Grow tree with GPU.
|
||||
- ``sync``: synchronizes trees in all distributed nodes.
|
||||
|
||||
@@ -5,7 +5,7 @@ Tree Methods
|
||||
For training boosted tree models, there are 2 parameters used for choosing algorithms,
|
||||
namely ``updater`` and ``tree_method``. XGBoost has 4 builtin tree methods, namely
|
||||
``exact``, ``approx``, ``hist`` and ``gpu_hist``. Along with these tree methods, there
|
||||
are also some free standing updaters including ``grow_local_histmaker``, ``refresh``,
|
||||
are also some free standing updaters including ``refresh``,
|
||||
``prune`` and ``sync``. The parameter ``updater`` is more primitive than ``tree_method``
|
||||
as the latter is just a pre-configuration of the former. The difference is mostly due to
|
||||
historical reasons that each updater requires some specific configurations and might has
|
||||
@@ -37,27 +37,18 @@ approximated training algorithms. These algorithms build a gradient histogram f
|
||||
node and iterate through the histogram instead of real dataset. Here we introduce the
|
||||
implementations in XGBoost below.
|
||||
|
||||
1. ``grow_local_histmaker`` updater: An approximation tree method described in `reference
|
||||
paper <http://arxiv.org/abs/1603.02754>`_. This updater is rarely used in practice so
|
||||
it's still an updater rather than tree method. During split finding, it first runs a
|
||||
weighted GK sketching for data points belong to current node to find split candidates,
|
||||
using hessian as weights. The histogram is built upon this per-node sketch. It's
|
||||
faster than ``exact`` in some applications, but still slow in computation.
|
||||
1. ``approx`` tree method: An approximation tree method described in `reference paper
|
||||
<http://arxiv.org/abs/1603.02754>`_. It runs sketching before building each tree
|
||||
using all the rows (rows belonging to the root). Hessian is used as weights during
|
||||
sketch. The algorithm can be accessed by setting ``tree_method`` to ``approx``.
|
||||
|
||||
2. ``approx`` tree method: An approximation tree method described in `reference paper
|
||||
<http://arxiv.org/abs/1603.02754>`_. Different from ``grow_local_histmaker``, it runs
|
||||
sketching before building each tree using all the rows (rows belonging to the root)
|
||||
instead of per-node dataset. Similar to ``grow_local_histmaker`` updater, hessian is
|
||||
used as weights during sketch. The algorithm can be accessed by setting
|
||||
``tree_method`` to ``approx``.
|
||||
|
||||
3. ``hist`` tree method: An approximation tree method used in LightGBM with slight
|
||||
2. ``hist`` tree method: An approximation tree method used in LightGBM with slight
|
||||
differences in implementation. It runs sketching before training using only user
|
||||
provided weights instead of hessian. The subsequent per-node histogram is built upon
|
||||
this global sketch. This is the fastest algorithm as it runs sketching only once. The
|
||||
algorithm can be accessed by setting ``tree_method`` to ``hist``.
|
||||
|
||||
4. ``gpu_hist`` tree method: The ``gpu_hist`` tree method is a GPU implementation of
|
||||
3. ``gpu_hist`` tree method: The ``gpu_hist`` tree method is a GPU implementation of
|
||||
``hist``, with additional support for gradient based sampling. The algorithm can be
|
||||
accessed by setting ``tree_method`` to ``gpu_hist``.
|
||||
|
||||
@@ -102,19 +93,32 @@ Other Updaters
|
||||
Removed Updaters
|
||||
****************
|
||||
|
||||
2 Updaters were removed during development due to maintainability. We describe them here
|
||||
solely for the interest of documentation. First one is distributed colmaker, which was a
|
||||
distributed version of exact tree method. It required specialization for column based
|
||||
splitting strategy and a different prediction procedure. As the exact tree method is slow
|
||||
by itself and scaling is even less efficient, we removed it entirely. Second one is
|
||||
``skmaker``. Per-node weighted sketching employed by ``grow_local_histmaker`` is slow,
|
||||
the ``skmaker`` was unmaintained and seems to be a workaround trying to eliminate the
|
||||
histogram creation step and uses sketching values directly during split evaluation. It
|
||||
was never tested and contained some unknown bugs, we decided to remove it and focus our
|
||||
resources on more promising algorithms instead. For accuracy, most of the time
|
||||
``approx``, ``hist`` and ``gpu_hist`` are enough with some parameters tuning, so removing
|
||||
them don't have any real practical impact.
|
||||
3 Updaters were removed during development due to maintainability. We describe them here
|
||||
solely for the interest of documentation.
|
||||
|
||||
1. Distributed colmaker, which was a distributed version of exact tree method. It
|
||||
required specialization for column based splitting strategy and a different prediction
|
||||
procedure. As the exact tree method is slow by itself and scaling is even less
|
||||
efficient, we removed it entirely.
|
||||
|
||||
2. ``skmaker``. Per-node weighted sketching employed by ``grow_local_histmaker`` is slow,
|
||||
the ``skmaker`` was unmaintained and seems to be a workaround trying to eliminate the
|
||||
histogram creation step and uses sketching values directly during split evaluation. It
|
||||
was never tested and contained some unknown bugs, we decided to remove it and focus our
|
||||
resources on more promising algorithms instead. For accuracy, most of the time
|
||||
``approx``, ``hist`` and ``gpu_hist`` are enough with some parameters tuning, so
|
||||
removing them don't have any real practical impact.
|
||||
|
||||
3. ``grow_local_histmaker`` updater: An approximation tree method described in `reference
|
||||
paper <http://arxiv.org/abs/1603.02754>`_. This updater was rarely used in practice so
|
||||
it was still an updater rather than tree method. During split finding, it first runs a
|
||||
weighted GK sketching for data points belong to current node to find split candidates,
|
||||
using hessian as weights. The histogram is built upon this per-node sketch. It was
|
||||
faster than ``exact`` in some applications, but still slow in computation. It was
|
||||
removed because it depended on Rabit's customized reduction function that handles all
|
||||
the data structure that can be serialized/deserialized into fixed size buffer, which is
|
||||
not directly supported by NCCL or federated learning gRPC, making it hard to refactor
|
||||
into a common allreducer interface.
|
||||
|
||||
**************
|
||||
Feature Matrix
|
||||
|
||||
Reference in New Issue
Block a user