diff --git a/doc/parameter.rst b/doc/parameter.rst index 0ce60916e..a7d8203b0 100644 --- a/doc/parameter.rst +++ b/doc/parameter.rst @@ -391,6 +391,8 @@ Specify the learning task and the corresponding learning objective. The objectiv - If ``base_margin`` is supplied, ``base_score`` will not be added. - For sufficient number of iterations, changing this value will not have too much effect. + See :doc:`/tutorials/intercept` for more info. + * ``eval_metric`` [default according to objective] - Evaluation metrics for validation data, a default metric will be assigned according to objective (rmse for regression, and logloss for classification, `mean average precision` for ``rank:map``, etc.) diff --git a/doc/tutorials/custom_metric_obj.rst b/doc/tutorials/custom_metric_obj.rst index f5c08bf59..76ee1b3de 100644 --- a/doc/tutorials/custom_metric_obj.rst +++ b/doc/tutorials/custom_metric_obj.rst @@ -271,7 +271,8 @@ available in XGBoost: We use ``multi:softmax`` to illustrate the differences of transformed prediction. With ``softprob`` the output prediction array has shape ``(n_samples, n_classes)`` while for ``softmax`` it's ``(n_samples, )``. A demo for multi-class objective function is also -available at :ref:`sphx_glr_python_examples_custom_softmax.py`. +available at :ref:`sphx_glr_python_examples_custom_softmax.py`. Also, see +:doc:`/tutorials/intercept` for some more explanation. ********************** diff --git a/doc/tutorials/index.rst b/doc/tutorials/index.rst index 5d090ce65..c82abf43f 100644 --- a/doc/tutorials/index.rst +++ b/doc/tutorials/index.rst @@ -30,4 +30,5 @@ See `Awesome XGBoost `_ for mo input_format param_tuning custom_metric_obj + intercept privacy_preserving \ No newline at end of file diff --git a/doc/tutorials/intercept.rst b/doc/tutorials/intercept.rst new file mode 100644 index 000000000..8452918e1 --- /dev/null +++ b/doc/tutorials/intercept.rst @@ -0,0 +1,104 @@ +######### +Intercept +######### + +.. versionadded:: 2.0.0 + +Since 2.0.0, XGBoost supports estimating the model intercept (named ``base_score``) +automatically based on targets upon training. The behavior can be controlled by setting +``base_score`` to a constant value. The following snippet disables the automatic +estimation: + +.. code-block:: python + + import xgboost as xgb + + reg = xgb.XGBRegressor() + reg.set_params(base_score=0.5) + +In addition, here 0.5 represents the value after applying the inverse link function. See +the end of the document for a description. + +Other than the ``base_score``, users can also provide global bias via the data field +``base_margin``, which is a vector or a matrix depending on the task. With multi-output +and multi-class, the ``base_margin`` is a matrix with size ``(n_samples, n_targets)`` or +``(n_samples, n_classes)``. + +.. code-block:: python + + import xgboost as xgb + from sklearn.datasets import make_regression + + X, y = make_regression() + + reg = xgb.XGBRegressor() + reg.fit(X, y) + # Request for raw prediction + m = reg.predict(X, output_margin=True) + + reg_1 = xgb.XGBRegressor() + # Feed the prediction into the next model + reg.fit(X, y, base_margin=m) + reg.predict(X, base_margin=m) + + +It specifies the bias for each sample and can be used for stacking an XGBoost model on top +of other models, see :ref:`sphx_glr_python_examples_boost_from_prediction.py` for a worked +example. When ``base_margin`` is specified, it automatically overrides the ``base_score`` +parameter. If you are stacking XGBoost models, then the usage should be relatively +straightforward, with the previous model providing raw prediction and a new model using +the prediction as bias. For more customized inputs, users need to take extra care of the +link function. Let :math:`F` be the model and :math:`g` be the link function, since +``base_score`` is overridden when sample-specific ``base_margin`` is available, we will +omit it here: + +.. math:: + + g(E[y_i]) = F(x_i) + + +When base margin :math:`b` is provided, it's added to the raw model output :math:`F`: + +.. math:: + + g(E[y_i]) = F(x_i) + b_i + +and the output of the final model is: + + +.. math:: + + g^{-1}(F(x_i) + b_i) + +Using the gamma deviance objective ``reg:gamma`` as an example, which has a log link +function, hence: + +.. math:: + + \ln{(E[y_i])} = F(x_i) + b_i \\ + E[y_i] = \exp{(F(x_i) + b_i)} + +As a result, if you are feeding outputs from models like GLM with a corresponding +objective function, make sure the outputs are not yet transformed by the inverse link. + +In the case of ``base_score`` (intercept), it can be accessed through +:py:meth:`~xgboost.Booster.save_config` after estimation. Unlike the ``base_margin``, the +returned value represents a value after applying inverse link. With logistic regression +and the logit link function as an example, given the ``base_score`` as 0.5, +:math:`g(intercept) = logit(0.5) = 0` is added to the raw model output: + +.. math:: + + E[y_i] = g^{-1}{(F(x_i) + g(intercept))} + +and 0.5 is the same as :math:`base_score = g^{-1}(0) = 0.5`. This is more intuitive if you +remove the model and consider only the intercept, which is estimated before the model is +fitted: + +.. math:: + + E[y] = g^{-1}{g(intercept))} \\ + E[y] = intercept + +For some objectives like MAE, there are close solutions, while for others it's estimated +with one step Newton method. \ No newline at end of file diff --git a/python-package/xgboost/core.py b/python-package/xgboost/core.py index 2e72dbbd2..097fb0935 100644 --- a/python-package/xgboost/core.py +++ b/python-package/xgboost/core.py @@ -785,7 +785,7 @@ class DMatrix: # pylint: disable=too-many-instance-attributes,too-many-public-m so it doesn't make sense to assign weights to individual data points. base_margin : - Base margin used for boosting from existing model. + Global bias for each instance. See :doc:`/tutorials/intercept` for details. missing : Value in the input data which needs to be present as a missing value. If None, defaults to np.nan. diff --git a/python-package/xgboost/sklearn.py b/python-package/xgboost/sklearn.py index ded37881f..ea8d8d041 100644 --- a/python-package/xgboost/sklearn.py +++ b/python-package/xgboost/sklearn.py @@ -1006,7 +1006,7 @@ class XGBModel(XGBModelBase): sample_weight : instance weights base_margin : - global bias for each instance. + Global bias for each instance. See :doc:`/tutorials/intercept` for details. eval_set : A list of (X, y) tuple pairs to use as validation sets, for which metrics will be computed. @@ -1146,7 +1146,7 @@ class XGBModel(XGBModelBase): When this is True, validate that the Booster's and data's feature_names are identical. Otherwise, it is assumed that the feature_names are the same. base_margin : - Margin added to prediction. + Global bias for each instance. See :doc:`/tutorials/intercept` for details. iteration_range : Specifies which layer of trees are used in prediction. For example, if a random forest is trained with 100 rounds. Specifying ``iteration_range=(10, @@ -1599,7 +1599,7 @@ class XGBClassifier(XGBModel, XGBClassifierBase): When this is True, validate that the Booster's and data's feature_names are identical. Otherwise, it is assumed that the feature_names are the same. base_margin : - Margin added to prediction. + Global bias for each instance. See :doc:`/tutorials/intercept` for details. iteration_range : Specifies which layer of trees are used in prediction. For example, if a random forest is trained with 100 rounds. Specifying `iteration_range=(10, @@ -1942,7 +1942,7 @@ class XGBRanker(XGBModel, XGBRankerMixIn): weights to individual data points. base_margin : - Global bias for each instance. + Global bias for each instance. See :doc:`/tutorials/intercept` for details. eval_set : A list of (X, y) tuple pairs to use as validation sets, for which metrics will be computed.