[R] update serialization advise for new xgboost class (#10794)

2024-09-01 20:46:11 +02:00 · 2024-09-01 20:46:11 +02:00 · 15b72571f3
commit 15b72571f3
parent 4f88ada219
2 changed files with 30 additions and 8 deletions
--- a/R-package/R/utils.R
+++ b/R-package/R/utils.R
@ -427,7 +427,8 @@ NULL
 #' its own serializers with better compatibility guarantees, which allow loading
 #' said models in other language bindings of XGBoost.
 #'
-#' Note that an `xgb.Booster` object, outside of its core components, might also keep:
+#' Note that an `xgb.Booster` object (**as produced by [xgb.train()]**, see rest of the doc
+#' for objects produced by [xgboost()]), outside of its core components, might also keep:
 #' - Additional model configuration (accessible through [xgb.config()]), which includes
 #'   model fitting parameters like `max_depth` and runtime parameters like `nthread`.
 #'   These are not necessarily useful for prediction/importance/plotting.
@ -450,6 +451,16 @@ NULL
 #' not used for prediction / importance / plotting / etc.
 #' These R attributes are only preserved when using R's serializers.
 #'
+#' In addition to the regular `xgb.Booster` objects producted by [xgb.train()], the
+#' function [xgboost()] produces a different subclass `xgboost`, which keeps other
+#' additional metadata as R attributes such as class names in classification problems,
+#' and which has a dedicated `predict` method that uses different defaults. XGBoost's
+#' own serializers can work with this `xgboost` class, but as they do not keep R
+#' attributes, the resulting object, when deserialized, is downcasted to the regular
+#' `xgb.Booster` class (i.e. it loses the metadata, and the resulting object will use
+#' `predict.xgb.Booster` instead of `predict.xgboost`) - for these `xgboost` objects,
+#' `saveRDS` might thus be a better option if the extra functionalities are needed.
+#'
 #' Note that XGBoost models in R starting from version `2.1.0` and onwards, and
 #' XGBoost models before version `2.1.0`; have a very different R object structure and
 #' are incompatible with each other. Hence, models that were saved with R serializers
@ -474,9 +485,9 @@ NULL
 #' as part of another R object.
 #'
 #' Use [saveRDS()] if you require the R-specific attributes that a booster might have, such
-#' as evaluation logs, but note that future compatibility of such objects is outside XGBoost's
-#' control as it relies on R's serialization format (see e.g. the details section in
-#' [serialize] and [save()] from base R).
+#' as evaluation logs or the model class `xgboost` instead of `xgb.Booster`, but note that
+#' future compatibility of such objects is outside XGBoost's control as it relies on R's
+#' serialization format (see e.g. the details section in [serialize] and [save()] from base R).
 #'
 #' For more details and explanation about model persistence and archival, consult the page
 #' \url{https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html}.
--- a/R-package/man/a-compatibility-note-for-saveRDS-save.Rd
+++ b/R-package/man/a-compatibility-note-for-saveRDS-save.Rd
@ -9,7 +9,8 @@ When it comes to serializing XGBoost models, it's possible to use R serializers
 its own serializers with better compatibility guarantees, which allow loading
 said models in other language bindings of XGBoost.

-Note that an \code{xgb.Booster} object, outside of its core components, might also keep:
+Note that an \code{xgb.Booster} object (\strong{as produced by \code{\link[=xgb.train]{xgb.train()}}}, see rest of the doc
+for objects produced by \code{\link[=xgboost]{xgboost()}}), outside of its core components, might also keep:
 \itemize{
 \item Additional model configuration (accessible through \code{\link[=xgb.config]{xgb.config()}}), which includes
 model fitting parameters like \code{max_depth} and runtime parameters like \code{nthread}.
@ -34,6 +35,16 @@ the model was fit, or saving the R call that produced the model, but are otherwi
 not used for prediction / importance / plotting / etc.
 These R attributes are only preserved when using R's serializers.

+In addition to the regular \code{xgb.Booster} objects producted by \code{\link[=xgb.train]{xgb.train()}}, the
+function \code{\link[=xgboost]{xgboost()}} produces a different subclass \code{xgboost}, which keeps other
+additional metadata as R attributes such as class names in classification problems,
+and which has a dedicated \code{predict} method that uses different defaults. XGBoost's
+own serializers can work with this \code{xgboost} class, but as they do not keep R
+attributes, the resulting object, when deserialized, is downcasted to the regular
+\code{xgb.Booster} class (i.e. it loses the metadata, and the resulting object will use
+\code{predict.xgb.Booster} instead of \code{predict.xgboost}) - for these \code{xgboost} objects,
+\code{saveRDS} might thus be a better option if the extra functionalities are needed.
+
 Note that XGBoost models in R starting from version \verb{2.1.0} and onwards, and
 XGBoost models before version \verb{2.1.0}; have a very different R object structure and
 are incompatible with each other. Hence, models that were saved with R serializers
@ -58,9 +69,9 @@ The \code{\link[=xgb.save.raw]{xgb.save.raw()}} function is useful if you would
 as part of another R object.

 Use \code{\link[=saveRDS]{saveRDS()}} if you require the R-specific attributes that a booster might have, such
-as evaluation logs, but note that future compatibility of such objects is outside XGBoost's
-control as it relies on R's serialization format (see e.g. the details section in
-\link{serialize} and \code{\link[=save]{save()}} from base R).
+as evaluation logs or the model class \code{xgboost} instead of \code{xgb.Booster}, but note that
+future compatibility of such objects is outside XGBoost's control as it relies on R's
+serialization format (see e.g. the details section in \link{serialize} and \code{\link[=save]{save()}} from base R).

 For more details and explanation about model persistence and archival, consult the page
 \url{https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html}.