From 15b72571f3be5b668875f9913b343a5545802a7c Mon Sep 17 00:00:00 2001 From: david-cortes Date: Sun, 1 Sep 2024 20:46:11 +0200 Subject: [PATCH] [R] update serialization advise for new xgboost class (#10794) --- R-package/R/utils.R | 19 +++++++++++++++---- .../a-compatibility-note-for-saveRDS-save.Rd | 19 +++++++++++++++---- 2 files changed, 30 insertions(+), 8 deletions(-) diff --git a/R-package/R/utils.R b/R-package/R/utils.R index 46b05c43a..a2ea8f89f 100644 --- a/R-package/R/utils.R +++ b/R-package/R/utils.R @@ -427,7 +427,8 @@ NULL #' its own serializers with better compatibility guarantees, which allow loading #' said models in other language bindings of XGBoost. #' -#' Note that an `xgb.Booster` object, outside of its core components, might also keep: +#' Note that an `xgb.Booster` object (**as produced by [xgb.train()]**, see rest of the doc +#' for objects produced by [xgboost()]), outside of its core components, might also keep: #' - Additional model configuration (accessible through [xgb.config()]), which includes #' model fitting parameters like `max_depth` and runtime parameters like `nthread`. #' These are not necessarily useful for prediction/importance/plotting. @@ -450,6 +451,16 @@ NULL #' not used for prediction / importance / plotting / etc. #' These R attributes are only preserved when using R's serializers. #' +#' In addition to the regular `xgb.Booster` objects producted by [xgb.train()], the +#' function [xgboost()] produces a different subclass `xgboost`, which keeps other +#' additional metadata as R attributes such as class names in classification problems, +#' and which has a dedicated `predict` method that uses different defaults. XGBoost's +#' own serializers can work with this `xgboost` class, but as they do not keep R +#' attributes, the resulting object, when deserialized, is downcasted to the regular +#' `xgb.Booster` class (i.e. it loses the metadata, and the resulting object will use +#' `predict.xgb.Booster` instead of `predict.xgboost`) - for these `xgboost` objects, +#' `saveRDS` might thus be a better option if the extra functionalities are needed. +#' #' Note that XGBoost models in R starting from version `2.1.0` and onwards, and #' XGBoost models before version `2.1.0`; have a very different R object structure and #' are incompatible with each other. Hence, models that were saved with R serializers @@ -474,9 +485,9 @@ NULL #' as part of another R object. #' #' Use [saveRDS()] if you require the R-specific attributes that a booster might have, such -#' as evaluation logs, but note that future compatibility of such objects is outside XGBoost's -#' control as it relies on R's serialization format (see e.g. the details section in -#' [serialize] and [save()] from base R). +#' as evaluation logs or the model class `xgboost` instead of `xgb.Booster`, but note that +#' future compatibility of such objects is outside XGBoost's control as it relies on R's +#' serialization format (see e.g. the details section in [serialize] and [save()] from base R). #' #' For more details and explanation about model persistence and archival, consult the page #' \url{https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html}. diff --git a/R-package/man/a-compatibility-note-for-saveRDS-save.Rd b/R-package/man/a-compatibility-note-for-saveRDS-save.Rd index 6d4446f78..af90ddded 100644 --- a/R-package/man/a-compatibility-note-for-saveRDS-save.Rd +++ b/R-package/man/a-compatibility-note-for-saveRDS-save.Rd @@ -9,7 +9,8 @@ When it comes to serializing XGBoost models, it's possible to use R serializers its own serializers with better compatibility guarantees, which allow loading said models in other language bindings of XGBoost. -Note that an \code{xgb.Booster} object, outside of its core components, might also keep: +Note that an \code{xgb.Booster} object (\strong{as produced by \code{\link[=xgb.train]{xgb.train()}}}, see rest of the doc +for objects produced by \code{\link[=xgboost]{xgboost()}}), outside of its core components, might also keep: \itemize{ \item Additional model configuration (accessible through \code{\link[=xgb.config]{xgb.config()}}), which includes model fitting parameters like \code{max_depth} and runtime parameters like \code{nthread}. @@ -34,6 +35,16 @@ the model was fit, or saving the R call that produced the model, but are otherwi not used for prediction / importance / plotting / etc. These R attributes are only preserved when using R's serializers. +In addition to the regular \code{xgb.Booster} objects producted by \code{\link[=xgb.train]{xgb.train()}}, the +function \code{\link[=xgboost]{xgboost()}} produces a different subclass \code{xgboost}, which keeps other +additional metadata as R attributes such as class names in classification problems, +and which has a dedicated \code{predict} method that uses different defaults. XGBoost's +own serializers can work with this \code{xgboost} class, but as they do not keep R +attributes, the resulting object, when deserialized, is downcasted to the regular +\code{xgb.Booster} class (i.e. it loses the metadata, and the resulting object will use +\code{predict.xgb.Booster} instead of \code{predict.xgboost}) - for these \code{xgboost} objects, +\code{saveRDS} might thus be a better option if the extra functionalities are needed. + Note that XGBoost models in R starting from version \verb{2.1.0} and onwards, and XGBoost models before version \verb{2.1.0}; have a very different R object structure and are incompatible with each other. Hence, models that were saved with R serializers @@ -58,9 +69,9 @@ The \code{\link[=xgb.save.raw]{xgb.save.raw()}} function is useful if you would as part of another R object. Use \code{\link[=saveRDS]{saveRDS()}} if you require the R-specific attributes that a booster might have, such -as evaluation logs, but note that future compatibility of such objects is outside XGBoost's -control as it relies on R's serialization format (see e.g. the details section in -\link{serialize} and \code{\link[=save]{save()}} from base R). +as evaluation logs or the model class \code{xgboost} instead of \code{xgb.Booster}, but note that +future compatibility of such objects is outside XGBoost's control as it relies on R's +serialization format (see e.g. the details section in \link{serialize} and \code{\link[=save]{save()}} from base R). For more details and explanation about model persistence and archival, consult the page \url{https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html}.