96 lines
5.1 KiB
R
96 lines
5.1 KiB
R
% Generated by roxygen2: do not edit by hand
|
|
% Please edit documentation in R/utils.R
|
|
\name{a-compatibility-note-for-saveRDS-save}
|
|
\alias{a-compatibility-note-for-saveRDS-save}
|
|
\title{Model Serialization and Compatibility}
|
|
\description{
|
|
When it comes to serializing XGBoost models, it's possible to use R serializers such as
|
|
\link{save} or \link{saveRDS} to serialize an XGBoost R model, but XGBoost also provides
|
|
its own serializers with better compatibility guarantees, which allow loading
|
|
said models in other language bindings of XGBoost.
|
|
|
|
Note that an \code{xgb.Booster} object, outside of its core components, might also keep:\itemize{
|
|
\item Additional model configuration (accessible through \link{xgb.config}),
|
|
which includes model fitting parameters like \code{max_depth} and runtime parameters like \code{nthread}.
|
|
These are not necessarily useful for prediction/importance/plotting.
|
|
\item Additional R-specific attributes - e.g. results of callbacks, such as evaluation logs,
|
|
which are kept as a \code{data.table} object, accessible through \code{attributes(model)$evaluation_log}
|
|
if present.
|
|
}
|
|
|
|
The first one (configurations) does not have the same compatibility guarantees as
|
|
the model itself, including attributes that are set and accessed through \link{xgb.attributes} - that is, such configuration
|
|
might be lost after loading the booster in a different XGBoost version, regardless of the
|
|
serializer that was used. These are saved when using \link{saveRDS}, but will be discarded
|
|
if loaded into an incompatible XGBoost version. They are not saved when using XGBoost's
|
|
serializers from its public interface including \link{xgb.save} and \link{xgb.save.raw}.
|
|
|
|
The second ones (R attributes) are not part of the standard XGBoost model structure, and thus are
|
|
not saved when using XGBoost's own serializers. These attributes are only used for informational
|
|
purposes, such as keeping track of evaluation metrics as the model was fit, or saving the R
|
|
call that produced the model, but are otherwise not used for prediction / importance / plotting / etc.
|
|
These R attributes are only preserved when using R's serializers.
|
|
|
|
Note that XGBoost models in R starting from version \verb{2.1.0} and onwards, and XGBoost models
|
|
before version \verb{2.1.0}; have a very different R object structure and are incompatible with
|
|
each other. Hence, models that were saved with R serializers live \code{saveRDS} or \code{save} before
|
|
version \verb{2.1.0} will not work with latter \code{xgboost} versions and vice versa. Be aware that
|
|
the structure of R model objects could in theory change again in the future, so XGBoost's serializers
|
|
should be preferred for long-term storage.
|
|
|
|
Furthermore, note that using the package \code{qs} for serialization will require version 0.26 or
|
|
higher of said package, and will have the same compatibility restrictions as R serializers.
|
|
}
|
|
\details{
|
|
Use \code{\link{xgb.save}} to save the XGBoost model as a stand-alone file. You may opt into
|
|
the JSON format by specifying the JSON extension. To read the model back, use
|
|
\code{\link{xgb.load}}.
|
|
|
|
Use \code{\link{xgb.save.raw}} to save the XGBoost model as a sequence (vector) of raw bytes
|
|
in a future-proof manner. Future releases of XGBoost will be able to read the raw bytes and
|
|
re-construct the corresponding model. To read the model back, use \code{\link{xgb.load.raw}}.
|
|
The \code{\link{xgb.save.raw}} function is useful if you'd like to persist the XGBoost model
|
|
as part of another R object.
|
|
|
|
Use \link{saveRDS} if you require the R-specific attributes that a booster might have, such
|
|
as evaluation logs, but note that future compatibility of such objects is outside XGBoost's
|
|
control as it relies on R's serialization format (see e.g. the details section in
|
|
\link{serialize} and \link{save} from base R).
|
|
|
|
For more details and explanation about model persistence and archival, consult the page
|
|
\url{https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html}.
|
|
}
|
|
\examples{
|
|
data(agaricus.train, package='xgboost')
|
|
bst <- xgb.train(data = xgb.DMatrix(agaricus.train$data, label = agaricus.train$label),
|
|
max_depth = 2, eta = 1, nthread = 2, nrounds = 2,
|
|
objective = "binary:logistic")
|
|
|
|
# Save as a stand-alone file; load it with xgb.load()
|
|
fname <- file.path(tempdir(), "xgb_model.ubj")
|
|
xgb.save(bst, fname)
|
|
bst2 <- xgb.load(fname)
|
|
|
|
# Save as a stand-alone file (JSON); load it with xgb.load()
|
|
fname <- file.path(tempdir(), "xgb_model.json")
|
|
xgb.save(bst, fname)
|
|
bst2 <- xgb.load(fname)
|
|
|
|
# Save as a raw byte vector; load it with xgb.load.raw()
|
|
xgb_bytes <- xgb.save.raw(bst)
|
|
bst2 <- xgb.load.raw(xgb_bytes)
|
|
|
|
# Persist XGBoost model as part of another R object
|
|
obj <- list(xgb_model_bytes = xgb.save.raw(bst), description = "My first XGBoost model")
|
|
# Persist the R object. Here, saveRDS() is okay, since it doesn't persist
|
|
# xgb.Booster directly. What's being persisted is the future-proof byte representation
|
|
# as given by xgb.save.raw().
|
|
fname <- file.path(tempdir(), "my_object.Rds")
|
|
saveRDS(obj, fname)
|
|
# Read back the R object
|
|
obj2 <- readRDS(fname)
|
|
# Re-construct xgb.Booster object from the bytes
|
|
bst2 <- xgb.load.raw(obj2$xgb_model_bytes)
|
|
|
|
}
|