107 lines
5.3 KiB
R
107 lines
5.3 KiB
R
% Generated by roxygen2: do not edit by hand
|
|
% Please edit documentation in R/utils.R
|
|
\name{a-compatibility-note-for-saveRDS-save}
|
|
\alias{a-compatibility-note-for-saveRDS-save}
|
|
\title{Model Serialization and Compatibility}
|
|
\description{
|
|
When it comes to serializing XGBoost models, it's possible to use R serializers such as
|
|
\code{\link[=save]{save()}} or \code{\link[=saveRDS]{saveRDS()}} to serialize an XGBoost R model, but XGBoost also provides
|
|
its own serializers with better compatibility guarantees, which allow loading
|
|
said models in other language bindings of XGBoost.
|
|
|
|
Note that an \code{xgb.Booster} object, outside of its core components, might also keep:
|
|
\itemize{
|
|
\item Additional model configuration (accessible through \code{\link[=xgb.config]{xgb.config()}}), which includes
|
|
model fitting parameters like \code{max_depth} and runtime parameters like \code{nthread}.
|
|
These are not necessarily useful for prediction/importance/plotting.
|
|
\item Additional R specific attributes - e.g. results of callbacks, such as evaluation logs,
|
|
which are kept as a \code{data.table} object, accessible through
|
|
\code{attributes(model)$evaluation_log} if present.
|
|
}
|
|
|
|
The first one (configurations) does not have the same compatibility guarantees as
|
|
the model itself, including attributes that are set and accessed through
|
|
\code{\link[=xgb.attributes]{xgb.attributes()}} - that is, such configuration might be lost after loading the
|
|
booster in a different XGBoost version, regardless of the serializer that was used.
|
|
These are saved when using \code{\link[=saveRDS]{saveRDS()}}, but will be discarded if loaded into an
|
|
incompatible XGBoost version. They are not saved when using XGBoost's
|
|
serializers from its public interface including \code{\link[=xgb.save]{xgb.save()}} and \code{\link[=xgb.save.raw]{xgb.save.raw()}}.
|
|
|
|
The second ones (R attributes) are not part of the standard XGBoost model structure,
|
|
and thus are not saved when using XGBoost's own serializers. These attributes are
|
|
only used for informational purposes, such as keeping track of evaluation metrics as
|
|
the model was fit, or saving the R call that produced the model, but are otherwise
|
|
not used for prediction / importance / plotting / etc.
|
|
These R attributes are only preserved when using R's serializers.
|
|
|
|
Note that XGBoost models in R starting from version \verb{2.1.0} and onwards, and
|
|
XGBoost models before version \verb{2.1.0}; have a very different R object structure and
|
|
are incompatible with each other. Hence, models that were saved with R serializers
|
|
like \code{\link[=saveRDS]{saveRDS()}} or \code{\link[=save]{save()}} before version \verb{2.1.0} will not work with latter
|
|
\code{xgboost} versions and vice versa. Be aware that the structure of R model objects
|
|
could in theory change again in the future, so XGBoost's serializers
|
|
should be preferred for long-term storage.
|
|
|
|
Furthermore, note that using the package \code{qs} for serialization will require
|
|
version 0.26 or higher of said package, and will have the same compatibility
|
|
restrictions as R serializers.
|
|
}
|
|
\details{
|
|
Use \code{\link[=xgb.save]{xgb.save()}} to save the XGBoost model as a stand-alone file. You may opt into
|
|
the JSON format by specifying the JSON extension. To read the model back, use
|
|
\code{\link[=xgb.load]{xgb.load()}}.
|
|
|
|
Use \code{\link[=xgb.save.raw]{xgb.save.raw()}} to save the XGBoost model as a sequence (vector) of raw bytes
|
|
in a future-proof manner. Future releases of XGBoost will be able to read the raw bytes and
|
|
re-construct the corresponding model. To read the model back, use \code{\link[=xgb.load.raw]{xgb.load.raw()}}.
|
|
The \code{\link[=xgb.save.raw]{xgb.save.raw()}} function is useful if you would like to persist the XGBoost model
|
|
as part of another R object.
|
|
|
|
Use \code{\link[=saveRDS]{saveRDS()}} if you require the R-specific attributes that a booster might have, such
|
|
as evaluation logs, but note that future compatibility of such objects is outside XGBoost's
|
|
control as it relies on R's serialization format (see e.g. the details section in
|
|
\link{serialize} and \code{\link[=save]{save()}} from base R).
|
|
|
|
For more details and explanation about model persistence and archival, consult the page
|
|
\url{https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html}.
|
|
}
|
|
\examples{
|
|
data(agaricus.train, package = "xgboost")
|
|
|
|
bst <- xgb.train(
|
|
data = xgb.DMatrix(agaricus.train$data, label = agaricus.train$label),
|
|
max_depth = 2,
|
|
eta = 1,
|
|
nthread = 2,
|
|
nrounds = 2,
|
|
objective = "binary:logistic"
|
|
)
|
|
|
|
# Save as a stand-alone file; load it with xgb.load()
|
|
fname <- file.path(tempdir(), "xgb_model.ubj")
|
|
xgb.save(bst, fname)
|
|
bst2 <- xgb.load(fname)
|
|
|
|
# Save as a stand-alone file (JSON); load it with xgb.load()
|
|
fname <- file.path(tempdir(), "xgb_model.json")
|
|
xgb.save(bst, fname)
|
|
bst2 <- xgb.load(fname)
|
|
|
|
# Save as a raw byte vector; load it with xgb.load.raw()
|
|
xgb_bytes <- xgb.save.raw(bst)
|
|
bst2 <- xgb.load.raw(xgb_bytes)
|
|
|
|
# Persist XGBoost model as part of another R object
|
|
obj <- list(xgb_model_bytes = xgb.save.raw(bst), description = "My first XGBoost model")
|
|
# Persist the R object. Here, saveRDS() is okay, since it doesn't persist
|
|
# xgb.Booster directly. What's being persisted is the future-proof byte representation
|
|
# as given by xgb.save.raw().
|
|
fname <- file.path(tempdir(), "my_object.Rds")
|
|
saveRDS(obj, fname)
|
|
# Read back the R object
|
|
obj2 <- readRDS(fname)
|
|
# Re-construct xgb.Booster object from the bytes
|
|
bst2 <- xgb.load.raw(obj2$xgb_model_bytes)
|
|
|
|
}
|