[R] On-demand serialization + standardization of attributes (#9924)

---------

Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
This commit is contained in:
david-cortes
2024-01-10 22:08:42 +01:00
committed by GitHub
parent 01c4711556
commit d3a8d284ab
64 changed files with 1773 additions and 1281 deletions

View File

@@ -2,16 +2,44 @@
% Please edit documentation in R/utils.R
\name{a-compatibility-note-for-saveRDS-save}
\alias{a-compatibility-note-for-saveRDS-save}
\title{Do not use \code{\link[base]{saveRDS}} or \code{\link[base]{save}} for long-term archival of
models. Instead, use \code{\link{xgb.save}} or \code{\link{xgb.save.raw}}.}
\title{Model Serialization and Compatibility}
\description{
It is a common practice to use the built-in \code{\link[base]{saveRDS}} function (or
\code{\link[base]{save}}) to persist R objects to the disk. While it is possible to persist
\code{xgb.Booster} objects using \code{\link[base]{saveRDS}}, it is not advisable to do so if
the model is to be accessed in the future. If you train a model with the current version of
XGBoost and persist it with \code{\link[base]{saveRDS}}, the model is not guaranteed to be
accessible in later releases of XGBoost. To ensure that your model can be accessed in future
releases of XGBoost, use \code{\link{xgb.save}} or \code{\link{xgb.save.raw}} instead.
When it comes to serializing XGBoost models, it's possible to use R serializers such as
\link{save} or \link{saveRDS} to serialize an XGBoost R model, but XGBoost also provides
its own serializers with better compatibility guarantees, which allow loading
said models in other language bindings of XGBoost.
Note that an \code{xgb.Booster} object, outside of its core components, might also keep:\itemize{
\item Additional model configuration (accessible through \link{xgb.config}),
which includes model fitting parameters like \code{max_depth} and runtime parameters like \code{nthread}.
These are not necessarily useful for prediction/importance/plotting.
\item Additional R-specific attributes - e.g. results of callbacks, such as evaluation logs,
which are kept as a \code{data.table} object, accessible through \code{attributes(model)$evaluation_log}
if present.
}
The first one (configurations) does not have the same compatibility guarantees as
the model itself, including attributes that are set and accessed through \link{xgb.attributes} - that is, such configuration
might be lost after loading the booster in a different XGBoost version, regardless of the
serializer that was used. These are saved when using \link{saveRDS}, but will be discarded
if loaded into an incompatible XGBoost version. They are not saved when using XGBoost's
serializers from its public interface including \link{xgb.save} and \link{xgb.save.raw}.
The second ones (R attributes) are not part of the standard XGBoost model structure, and thus are
not saved when using XGBoost's own serializers. These attributes are only used for informational
purposes, such as keeping track of evaluation metrics as the model was fit, or saving the R
call that produced the model, but are otherwise not used for prediction / importance / plotting / etc.
These R attributes are only preserved when using R's serializers.
Note that XGBoost models in R starting from version \verb{2.1.0} and onwards, and XGBoost models
before version \verb{2.1.0}; have a very different R object structure and are incompatible with
each other. Hence, models that were saved with R serializers live \code{saveRDS} or \code{save} before
version \verb{2.1.0} will not work with latter \code{xgboost} versions and vice versa. Be aware that
the structure of R model objects could in theory change again in the future, so XGBoost's serializers
should be preferred for long-term storage.
Furthermore, note that using the package \code{qs} for serialization will require version 0.26 or
higher of said package, and will have the same compatibility restrictions as R serializers.
}
\details{
Use \code{\link{xgb.save}} to save the XGBoost model as a stand-alone file. You may opt into
@@ -24,9 +52,10 @@ re-construct the corresponding model. To read the model back, use \code{\link{xg
The \code{\link{xgb.save.raw}} function is useful if you'd like to persist the XGBoost model
as part of another R object.
Note: Do not use \code{\link{xgb.serialize}} to store models long-term. It persists not only the
model but also internal configurations and parameters, and its format is not stable across
multiple XGBoost versions. Use \code{\link{xgb.serialize}} only for checkpointing.
Use \link{saveRDS} if you require the R-specific attributes that a booster might have, such
as evaluation logs, but note that future compatibility of such objects is outside XGBoost's
control as it relies on R's serialization format (see e.g. the details section in
\link{serialize} and \link{save} from base R).
For more details and explanation about model persistence and archival, consult the page
\url{https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html}.

View File

@@ -4,17 +4,22 @@
\alias{cb.save.model}
\title{Callback closure for saving a model file.}
\usage{
cb.save.model(save_period = 0, save_name = "xgboost.model")
cb.save.model(save_period = 0, save_name = "xgboost.ubj")
}
\arguments{
\item{save_period}{save the model to disk after every
\code{save_period} iterations; 0 means save the model at the end.}
\item{save_name}{the name or path for the saved model file.
It can contain a \code{\link[base]{sprintf}} formatting specifier
to include the integer iteration number in the file name.
E.g., with \code{save_name} = 'xgboost_\%04d.model',
the file saved at iteration 50 would be named "xgboost_0050.model".}
\if{html}{\out{<div class="sourceCode">}}\preformatted{ Note that the format of the model being saved is determined by the file
extension specified here (see \link{xgb.save} for details about how it works).
It can contain a \code{\link[base]{sprintf}} formatting specifier
to include the integer iteration number in the file name.
E.g., with \code{save_name} = 'xgboost_\%04d.ubj',
the file saved at iteration 50 would be named "xgboost_0050.ubj".
}\if{html}{\out{</div>}}}
}
\description{
Callback closure for saving a model file.
@@ -29,5 +34,7 @@ Callback function expects the following values to be set in its calling frame:
\code{end_iteration}.
}
\seealso{
\link{xgb.save}
\code{\link{callbacks}}
}

View File

@@ -0,0 +1,50 @@
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/xgb.Booster.R
\name{coef.xgb.Booster}
\alias{coef.xgb.Booster}
\title{Extract coefficients from linear booster}
\usage{
\method{coef}{xgb.Booster}(object, ...)
}
\arguments{
\item{object}{A fitted booster of 'gblinear' type.}
\item{...}{Not used.}
}
\value{
The extracted coefficients:\itemize{
\item If there's only one coefficient per column in the data, will be returned as a
vector, potentially containing the feature names if available, with the intercept
as first column.
\item If there's more than one coefficient per column in the data (e.g. when using
\code{objective="multi:softmax"}), will be returned as a matrix with dimensions equal
to \verb{[num_features, num_cols]}, with the intercepts as first row. Note that the column
(classes in multi-class classification) dimension will not be named.
}
The intercept returned here will include the 'base_score' parameter (unlike the 'bias'
or the last coefficient in the model dump, which doesn't have 'base_score' added to it),
hence one should get the same values from calling \code{predict(..., outputmargin = TRUE)} and
from performing a matrix multiplication with \code{model.matrix(~., ...)}.
Be aware that the coefficients are obtained by first converting them to strings and
back, so there will always be some very small lose of precision compared to the actual
coefficients as used by \link{predict.xgb.Booster}.
}
\description{
Extracts the coefficients from a 'gblinear' booster object,
as produced by \code{xgb.train} when using parameter \code{booster="gblinear"}.
Note: this function will error out if passing a booster model
which is not of "gblinear" type.
}
\examples{
library(xgboost)
data(mtcars)
y <- mtcars[, 1]
x <- as.matrix(mtcars[, -1])
dm <- xgb.DMatrix(data = x, label = y, nthread = 1)
params <- list(booster = "gblinear", nthread = 1)
model <- xgb.train(data = dm, params = params, nrounds = 2)
coef(model)
}

View File

@@ -1,24 +1,42 @@
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/xgb.DMatrix.R
\name{getinfo}
% Please edit documentation in R/xgb.Booster.R, R/xgb.DMatrix.R
\name{getinfo.xgb.Booster}
\alias{getinfo.xgb.Booster}
\alias{setinfo.xgb.Booster}
\alias{getinfo}
\alias{getinfo.xgb.DMatrix}
\title{Get information of an xgb.DMatrix object}
\alias{setinfo}
\alias{setinfo.xgb.DMatrix}
\title{Get or set information of xgb.DMatrix and xgb.Booster objects}
\usage{
\method{getinfo}{xgb.Booster}(object, name)
\method{setinfo}{xgb.Booster}(object, name, info)
getinfo(object, name)
\method{getinfo}{xgb.DMatrix}(object, name)
setinfo(object, name, info)
\method{setinfo}{xgb.DMatrix}(object, name, info)
}
\arguments{
\item{object}{Object of class \code{xgb.DMatrix}}
\item{object}{Object of class \code{xgb.DMatrix} of \code{xgb.Booster}.}
\item{name}{the name of the information field to get (see details)}
\item{info}{the specific field of information to set}
}
\value{
For \code{getinfo}, will return the requested field. For \code{setinfo}, will always return value \code{TRUE}
if it succeeds.
}
\description{
Get information of an xgb.DMatrix object
Get or set information of xgb.DMatrix and xgb.Booster objects
}
\details{
The \code{name} field can be one of the following:
The \code{name} field can be one of the following for \code{xgb.DMatrix}:
\itemize{
\item \code{label}
@@ -33,8 +51,28 @@ The \code{name} field can be one of the following:
}
See the documentation for \link{xgb.DMatrix} for more information about these fields.
For \code{xgb.Booster}, can be one of the following:
\itemize{
\item \code{feature_type}
\item \code{feature_name}
}
Note that, while 'qid' cannot be retrieved, it's possible to get the equivalent 'group'
for a DMatrix that had 'qid' assigned.
\bold{Important}: when calling \code{setinfo}, the objects are modified in-place. See
\link{xgb.copy.Booster} for an idea of this in-place assignment works.
See the documentation for \link{xgb.DMatrix} for possible fields that can be set
(which correspond to arguments in that function).
Note that the following fields are allowed in the construction of an \code{xgb.DMatrix}
but \bold{aren't} allowed here:\itemize{
\item data
\item missing
\item silent
\item nthread
}
}
\examples{
data(agaricus.train, package='xgboost')
@@ -45,4 +83,11 @@ setinfo(dtrain, 'label', 1-labels)
labels2 <- getinfo(dtrain, 'label')
stopifnot(all(labels2 == 1-labels))
data(agaricus.train, package='xgboost')
dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
labels <- getinfo(dtrain, 'label')
setinfo(dtrain, 'label', 1-labels)
labels2 <- getinfo(dtrain, 'label')
stopifnot(all.equal(labels2, 1-labels))
}

View File

@@ -2,7 +2,6 @@
% Please edit documentation in R/xgb.Booster.R
\name{predict.xgb.Booster}
\alias{predict.xgb.Booster}
\alias{predict.xgb.Booster.handle}
\title{Predict method for XGBoost model}
\usage{
\method{predict}{xgb.Booster}(
@@ -21,11 +20,9 @@
strict_shape = FALSE,
...
)
\method{predict}{xgb.Booster.handle}(object, ...)
}
\arguments{
\item{object}{Object of class \code{xgb.Booster} or \code{xgb.Booster.handle}.}
\item{object}{Object of class \code{xgb.Booster}.}
\item{newdata}{Takes \code{matrix}, \code{dgCMatrix}, \code{dgRMatrix}, \code{dsparseVector},
local data file, or \code{xgb.DMatrix}.

View File

@@ -4,14 +4,15 @@
\alias{print.xgb.Booster}
\title{Print xgb.Booster}
\usage{
\method{print}{xgb.Booster}(x, verbose = FALSE, ...)
\method{print}{xgb.Booster}(x, ...)
}
\arguments{
\item{x}{An \code{xgb.Booster} object.}
\item{verbose}{Whether to print detailed data (e.g., attribute values).}
\item{...}{Not currently used.}
\item{...}{Not used.}
}
\value{
The same \code{x} object, returned invisibly
}
\description{
Print information about \code{xgb.Booster}.
@@ -33,6 +34,5 @@ bst <- xgboost(
attr(bst, "myattr") <- "memo"
print(bst)
print(bst, verbose = TRUE)
}

View File

@@ -1,42 +0,0 @@
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/xgb.DMatrix.R
\name{setinfo}
\alias{setinfo}
\alias{setinfo.xgb.DMatrix}
\title{Set information of an xgb.DMatrix object}
\usage{
setinfo(object, name, info)
\method{setinfo}{xgb.DMatrix}(object, name, info)
}
\arguments{
\item{object}{Object of class "xgb.DMatrix"}
\item{name}{the name of the field to get}
\item{info}{the specific field of information to set}
}
\description{
Set information of an xgb.DMatrix object
}
\details{
See the documentation for \link{xgb.DMatrix} for possible fields that can be set
(which correspond to arguments in that function).
Note that the following fields are allowed in the construction of an \code{xgb.DMatrix}
but \bold{aren't} allowed here:\itemize{
\item data
\item missing
\item silent
\item nthread
}
}
\examples{
data(agaricus.train, package='xgboost')
dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
labels <- getinfo(dtrain, 'label')
setinfo(dtrain, 'label', 1-labels)
labels2 <- getinfo(dtrain, 'label')
stopifnot(all.equal(labels2, 1-labels))
}

View File

@@ -0,0 +1,22 @@
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/xgb.Booster.R
\name{variable.names.xgb.Booster}
\alias{variable.names.xgb.Booster}
\title{Get Features Names from Booster}
\usage{
\method{variable.names}{xgb.Booster}(object, ...)
}
\arguments{
\item{object}{An \code{xgb.Booster} object.}
\item{...}{Not used.}
}
\description{
Returns the feature / variable / column names from a fitted
booster object, which are set automatically during the call to \link{xgb.train}
from the DMatrix names, or which can be set manually through \link{setinfo}.
If the object doesn't have feature names, will return \code{NULL}.
It is equivalent to calling \code{getinfo(object, "feature_name")}.
}

View File

@@ -1,61 +0,0 @@
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/xgb.Booster.R
\name{xgb.Booster.complete}
\alias{xgb.Booster.complete}
\title{Restore missing parts of an incomplete xgb.Booster object}
\usage{
xgb.Booster.complete(object, saveraw = TRUE)
}
\arguments{
\item{object}{Object of class \code{xgb.Booster}.}
\item{saveraw}{A flag indicating whether to append \code{raw} Booster memory dump data
when it doesn't already exist.}
}
\value{
An object of \code{xgb.Booster} class.
}
\description{
It attempts to complete an \code{xgb.Booster} object by restoring either its missing
raw model memory dump (when it has no \code{raw} data but its \code{xgb.Booster.handle} is valid)
or its missing internal handle (when its \code{xgb.Booster.handle} is not valid
but it has a raw Booster memory dump).
}
\details{
While this method is primarily for internal use, it might be useful in some practical situations.
E.g., when an \code{xgb.Booster} model is saved as an R object and then is loaded as an R object,
its handle (pointer) to an internal xgboost model would be invalid. The majority of xgboost methods
should still work for such a model object since those methods would be using
\code{xgb.Booster.complete()} internally. However, one might find it to be more efficient to call the
\code{xgb.Booster.complete()} function explicitly once after loading a model as an R-object.
That would prevent further repeated implicit reconstruction of an internal booster model.
}
\examples{
data(agaricus.train, package = "xgboost")
bst <- xgboost(
data = agaricus.train$data,
label = agaricus.train$label,
max_depth = 2,
eta = 1,
nthread = 2,
nrounds = 2,
objective = "binary:logistic"
)
fname <- file.path(tempdir(), "xgb_model.Rds")
saveRDS(bst, fname)
# Warning: The resulting RDS file is only compatible with the current XGBoost version.
# Refer to the section titled "a-compatibility-note-for-saveRDS-save".
bst1 <- readRDS(fname)
# the handle is invalid:
print(bst1$handle)
bst1 <- xgb.Booster.complete(bst1)
# now the handle points to a valid internal booster model:
print(bst1$handle)
}

View File

@@ -16,7 +16,7 @@ xgb.attributes(object)
xgb.attributes(object) <- value
}
\arguments{
\item{object}{Object of class \code{xgb.Booster} or \code{xgb.Booster.handle}.}
\item{object}{Object of class \code{xgb.Booster}. \bold{Will be modified in-place} when assigning to it.}
\item{name}{A non-empty character string specifying which attribute is to be accessed.}
@@ -51,15 +51,14 @@ Also, setting an attribute that has the same name as one of xgboost's parameters
change the value of that parameter for a model.
Use \code{\link[=xgb.parameters<-]{xgb.parameters<-()}} to set or change model parameters.
The attribute setters would usually work more efficiently for \code{xgb.Booster.handle}
than for \code{xgb.Booster}, since only just a handle (pointer) would need to be copied.
That would only matter if attributes need to be set many times.
Note, however, that when feeding a handle of an \code{xgb.Booster} object to the attribute setters,
the raw model cache of an \code{xgb.Booster} object would not be automatically updated,
and it would be the user's responsibility to call \code{\link[=xgb.serialize]{xgb.serialize()}} to update it.
The \verb{xgb.attributes<-} setter either updates the existing or adds one or several attributes,
but it doesn't delete the other existing attributes.
Important: since this modifies the booster's C object, semantics for assignment here
will differ from R's, as any object reference to the same booster will be modified
too, while assignment of R attributes through \verb{attributes(model)$<attr> <- <value>}
will follow the usual copy-on-write R semantics (see \link{xgb.copy.Booster} for an
example of these behaviors).
}
\examples{
data(agaricus.train, package = "xgboost")

View File

@@ -10,13 +10,23 @@ xgb.config(object)
xgb.config(object) <- value
}
\arguments{
\item{object}{Object of class \code{xgb.Booster}.}
\item{object}{Object of class \code{xgb.Booster}. \bold{Will be modified in-place} when assigning to it.}
\item{value}{A JSON string.}
\item{value}{An R list.}
}
\value{
\code{xgb.config} will return the parameters as an R list.
}
\description{
Accessors for model parameters as JSON string
}
\details{
Note that assignment is performed in-place on the booster C object, which unlike assignment
of R attributes, doesn't follow typical copy-on-write semantics for assignment - i.e. all references
to the same booster will also get updated.
See \link{xgb.copy.Booster} for an example of this behavior.
}
\examples{
data(agaricus.train, package = "xgboost")

View File

@@ -0,0 +1,53 @@
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/xgb.Booster.R
\name{xgb.copy.Booster}
\alias{xgb.copy.Booster}
\title{Deep-copies a Booster Object}
\usage{
xgb.copy.Booster(model)
}
\arguments{
\item{model}{An 'xgb.Booster' object.}
}
\value{
A deep copy of \code{model} - it will be identical in every way, but C-level
functions called on that copy will not affect the \code{model} variable.
}
\description{
Creates a deep copy of an 'xgb.Booster' object, such that the
C object pointer contained will be a different object, and hence functions
like \link{xgb.attr} will not affect the object from which it was copied.
}
\examples{
library(xgboost)
data(mtcars)
y <- mtcars$mpg
x <- mtcars[, -1]
dm <- xgb.DMatrix(x, label = y, nthread = 1)
model <- xgb.train(
data = dm,
params = list(nthread = 1),
nround = 3
)
# Set an arbitrary attribute kept at the C level
xgb.attr(model, "my_attr") <- 100
print(xgb.attr(model, "my_attr"))
# Just assigning to a new variable will not create
# a deep copy - C object pointer is shared, and in-place
# modifications will affect both objects
model_shallow_copy <- model
xgb.attr(model_shallow_copy, "my_attr") <- 333
# 'model' was also affected by this change:
print(xgb.attr(model, "my_attr"))
model_deep_copy <- xgb.copy.Booster(model)
xgb.attr(model_deep_copy, "my_attr") <- 444
# 'model' was NOT affected by this change
# (keeps previous value that was assigned before)
print(xgb.attr(model, "my_attr"))
# Verify that the new object was actually modified
print(xgb.attr(model_deep_copy, "my_attr"))
}

View File

@@ -8,7 +8,8 @@ xgb.gblinear.history(model, class_index = NULL)
}
\arguments{
\item{model}{either an \code{xgb.Booster} or a result of \code{xgb.cv()}, trained
using the \code{cb.gblinear.history()} callback.}
using the \code{cb.gblinear.history()} callback, but \bold{not} a booster
loaded from \link{xgb.load} or \link{xgb.load.raw}.}
\item{class_index}{zero-based class index to extract the coefficients for only that
specific class in a multinomial multiclass model. When it is NULL, all the
@@ -27,3 +28,11 @@ A helper function to extract the matrix of linear coefficients' history
from a gblinear model created while using the \code{cb.gblinear.history()}
callback.
}
\details{
Note that this is an R-specific function that relies on R attributes that
are not saved when using xgboost's own serialization functions like \link{xgb.load}
or \link{xgb.load.raw}.
In order for a serialized model to be accepted by tgis function, one must use R
serializers such as \link{saveRDS}.
}

View File

@@ -0,0 +1,22 @@
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/xgb.Booster.R
\name{xgb.get.num.boosted.rounds}
\alias{xgb.get.num.boosted.rounds}
\title{Get number of boosting in a fitted booster}
\usage{
xgb.get.num.boosted.rounds(model)
}
\arguments{
\item{model}{A fitted \code{xgb.Booster} model.}
}
\value{
The number of rounds saved in the model, as an integer.
}
\description{
Get number of boosting in a fitted booster
}
\details{
Note that setting booster parameters related to training
continuation / updates through \link{xgb.parameters<-} will reset the
number of rounds to zero.
}

View File

@@ -0,0 +1,59 @@
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/xgb.Booster.R
\name{xgb.is.same.Booster}
\alias{xgb.is.same.Booster}
\title{Check if two boosters share the same C object}
\usage{
xgb.is.same.Booster(obj1, obj2)
}
\arguments{
\item{obj1}{Booster model to compare with \code{obj2}.}
\item{obj2}{Booster model to compare with \code{obj1}.}
}
\value{
Either \code{TRUE} or \code{FALSE} according to whether the two boosters share
the underlying C object.
}
\description{
Checks whether two booster objects refer to the same underlying C object.
}
\details{
As booster objects (as returned by e.g. \link{xgb.train}) contain an R 'externalptr'
object, they don't follow typical copy-on-write semantics of other R objects - that is, if
one assigns a booster to a different variable and modifies that new variable through in-place
methods like \link{xgb.attr<-}, the modification will be applied to both the old and the new
variable, unlike typical R assignments which would only modify the latter.
This function allows checking whether two booster objects share the same 'externalptr',
regardless of the R attributes that they might have.
In order to duplicate a booster in such a way that the copy wouldn't share the same
'externalptr', one can use function \link{xgb.copy.Booster}.
}
\examples{
library(xgboost)
data(mtcars)
y <- mtcars$mpg
x <- as.matrix(mtcars[, -1])
model <- xgb.train(
params = list(nthread = 1),
data = xgb.DMatrix(x, label = y, nthread = 1),
nround = 3
)
model_shallow_copy <- model
xgb.is.same.Booster(model, model_shallow_copy) # same C object
model_deep_copy <- xgb.copy.Booster(model)
xgb.is.same.Booster(model, model_deep_copy) # different C objects
# In-place assignments modify all references,
# but not full/deep copies of the booster
xgb.attr(model_shallow_copy, "my_attr") <- 111
xgb.attr(model, "my_attr") # gets modified
xgb.attr(model_deep_copy, "my_attr") # doesn't get modified
}
\seealso{
\link{xgb.copy.Booster}
}

View File

@@ -48,5 +48,5 @@ xgb.save(bst, fname)
bst <- xgb.load(fname)
}
\seealso{
\code{\link{xgb.save}}, \code{\link{xgb.Booster.complete}}.
\code{\link{xgb.save}}
}

View File

@@ -4,12 +4,10 @@
\alias{xgb.load.raw}
\title{Load serialised xgboost model from R's raw vector}
\usage{
xgb.load.raw(buffer, as_booster = FALSE)
xgb.load.raw(buffer)
}
\arguments{
\item{buffer}{the buffer returned by xgb.save.raw}
\item{as_booster}{Return the loaded model as xgb.Booster instead of xgb.Booster.handle.}
}
\description{
User can generate raw memory buffer by calling xgb.save.raw

View File

@@ -14,8 +14,11 @@ xgb.model.dt.tree(
)
}
\arguments{
\item{feature_names}{Character vector used to overwrite the feature names
of the model. The default (\code{NULL}) uses the original feature names.}
\item{feature_names}{Character vector of feature names. If the model already
contains feature names, those will be used when \code{feature_names=NULL} (default value).
\if{html}{\out{<div class="sourceCode">}}\preformatted{ Note that, if the model already contains feature names, it's \\bold\{not\} possible to override them here.
}\if{html}{\out{</div>}}}
\item{model}{Object of class \code{xgb.Booster}.}
@@ -76,8 +79,6 @@ bst <- xgboost(
objective = "binary:logistic"
)
(dt <- xgb.model.dt.tree(colnames(agaricus.train$data), bst))
# This bst model already has feature_names stored with it, so those would be used when
# feature_names is not set:
(dt <- xgb.model.dt.tree(model = bst))

View File

@@ -7,17 +7,27 @@
xgb.parameters(object) <- value
}
\arguments{
\item{object}{Object of class \code{xgb.Booster} or \code{xgb.Booster.handle}.}
\item{object}{Object of class \code{xgb.Booster}. \bold{Will be modified in-place}.}
\item{value}{A list (or an object coercible to a list) with the names of parameters to set
and the elements corresponding to parameter values.}
}
\value{
The same booster \code{object}, which gets modified in-place.
}
\description{
Only the setter for xgboost parameters is currently implemented.
}
\details{
Note that the setter would usually work more efficiently for \code{xgb.Booster.handle}
than for \code{xgb.Booster}, since only just a handle would need to be copied.
Just like \link{xgb.attr}, this function will make in-place modifications
on the booster object which do not follow typical R assignment semantics - that is,
all references to the same booster will also be updated, unlike assingment of R
attributes which follow copy-on-write semantics.
See \link{xgb.copy.Booster} for an example of this behavior.
Be aware that setting parameters of a fitted booster related to training continuation / updates
will reset its number of rounds indicator to zero.
}
\examples{
data(agaricus.train, package = "xgboost")

View File

@@ -7,15 +7,27 @@
xgb.save(model, fname)
}
\arguments{
\item{model}{model object of \code{xgb.Booster} class.}
\item{model}{Model object of \code{xgb.Booster} class.}
\item{fname}{name of the file to write.}
\item{fname}{Name of the file to write.
Note that the extension of this file name determined the serialization format to use:\itemize{
\item Extension ".ubj" will use the universal binary JSON format (recommended).
This format uses binary types for e.g. floating point numbers, thereby preventing any loss
of precision when converting to a human-readable JSON text or similar.
\item Extension ".json" will use plain JSON, which is a human-readable format.
\item Extension ".deprecated" will use a \bold{deprecated} binary format. This format will
not be able to save attributes introduced after v1 of XGBoost, such as the "best_iteration"
attribute that boosters might keep, nor feature names or user-specifiec attributes.
\item If the format is not specified by passing one of the file extensions above, will
default to UBJ.
}}
}
\description{
Save xgboost model to a file in binary format.
Save xgboost model to a file in binary or JSON format.
}
\details{
This methods allows to save a model in an xgboost-internal binary format which is universal
This methods allows to save a model in an xgboost-internal binary or text format which is universal
among the various xgboost interfaces. In R, the saved model file could be read-in later
using either the \code{\link{xgb.load}} function or the \code{xgb_model} parameter
of \code{\link{xgb.train}}.
@@ -23,7 +35,7 @@ of \code{\link{xgb.train}}.
Note: a model can also be saved as an R-object (e.g., by using \code{\link[base]{readRDS}}
or \code{\link[base]{save}}). However, it would then only be compatible with R, and
corresponding R-methods would need to be used to load it. Moreover, persisting the model with
\code{\link[base]{readRDS}} or \code{\link[base]{save}}) will cause compatibility problems in
\code{\link[base]{readRDS}} or \code{\link[base]{save}}) might cause compatibility problems in
future versions of XGBoost. Consult \code{\link{a-compatibility-note-for-saveRDS-save}} to learn
how to persist models in a future-proof way, i.e. to make the model accessible in future
releases of XGBoost.
@@ -51,5 +63,5 @@ xgb.save(bst, fname)
bst <- xgb.load(fname)
}
\seealso{
\code{\link{xgb.load}}, \code{\link{xgb.Booster.complete}}.
\code{\link{xgb.load}}
}

View File

@@ -5,7 +5,7 @@
\title{Save xgboost model to R's raw vector,
user can call xgb.load.raw to load the model back from raw vector}
\usage{
xgb.save.raw(model, raw_format = "deprecated")
xgb.save.raw(model, raw_format = "ubj")
}
\arguments{
\item{model}{the model object.}
@@ -15,9 +15,7 @@ xgb.save.raw(model, raw_format = "deprecated")
\item \code{json}: Encode the booster into JSON text document.
\item \code{ubj}: Encode the booster into Universal Binary JSON.
\item \code{deprecated}: Encode the booster into old customized binary format.
}
Right now the default is \code{deprecated} but will be changed to \code{ubj} in upcoming release.}
}}
}
\description{
Save xgboost model from xgboost or xgb.train

View File

@@ -1,29 +0,0 @@
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/xgb.serialize.R
\name{xgb.serialize}
\alias{xgb.serialize}
\title{Serialize the booster instance into R's raw vector. The serialization method differs
from \code{\link{xgb.save.raw}} as the latter one saves only the model but not
parameters. This serialization format is not stable across different xgboost versions.}
\usage{
xgb.serialize(booster)
}
\arguments{
\item{booster}{the booster instance}
}
\description{
Serialize the booster instance into R's raw vector. The serialization method differs
from \code{\link{xgb.save.raw}} as the latter one saves only the model but not
parameters. This serialization format is not stable across different xgboost versions.
}
\examples{
data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')
train <- agaricus.train
test <- agaricus.test
bst <- xgb.train(data = xgb.DMatrix(train$data, label = train$label), max_depth = 2,
eta = 1, nthread = 2, nrounds = 2,objective = "binary:logistic")
raw <- xgb.serialize(bst)
bst <- xgb.unserialize(raw)
}

View File

@@ -205,7 +205,12 @@ file with a previously saved model.}
\item{callbacks}{a list of callback functions to perform various task during boosting.
See \code{\link{callbacks}}. Some of the callbacks are automatically created depending on the
parameters' values. User can provide either existing or their own callback methods in order
to customize the training process.}
to customize the training process.
\if{html}{\out{<div class="sourceCode">}}\preformatted{ Note that some callbacks might try to set an evaluation log - be aware that these evaluation logs
are kept as R attributes, and thus do not get saved when using non-R serializaters like
\link{xgb.save} (but are kept when using R serializers like \link{saveRDS}).
}\if{html}{\out{</div>}}}
\item{...}{other parameters to pass to \code{params}.}
@@ -219,27 +224,7 @@ This parameter is only used when input is a dense matrix.}
\item{weight}{a vector indicating the weight for each row of the input.}
}
\value{
An object of class \code{xgb.Booster} with the following elements:
\itemize{
\item \code{handle} a handle (pointer) to the xgboost model in memory.
\item \code{raw} a cached memory dump of the xgboost model saved as R's \code{raw} type.
\item \code{niter} number of boosting iterations.
\item \code{evaluation_log} evaluation history stored as a \code{data.table} with the
first column corresponding to iteration number and the rest corresponding to evaluation
metrics' values. It is created by the \code{\link{cb.evaluation.log}} callback.
\item \code{call} a function call.
\item \code{params} parameters that were passed to the xgboost library. Note that it does not
capture parameters changed by the \code{\link{cb.reset.parameters}} callback.
\item \code{callbacks} callback functions that were either automatically assigned or
explicitly passed.
\item \code{best_iteration} iteration number with the best evaluation metric value
(only available with early stopping).
\item \code{best_score} the best evaluation metric value during early stopping.
(only available with early stopping).
\item \code{feature_names} names of the training dataset features
(only when column names were defined in training data).
\item \code{nfeatures} number of features in training data.
}
An object of class \code{xgb.Booster}.
}
\description{
\code{xgb.train} is an advanced interface for training an xgboost model.
@@ -285,6 +270,21 @@ and the \code{print_every_n} parameter is passed to it.
\item \code{cb.early.stop}: when \code{early_stopping_rounds} is set.
\item \code{cb.save.model}: when \code{save_period > 0} is set.
}
Note that objects of type \code{xgb.Booster} as returned by this function behave a bit differently
from typical R objects (it's an 'altrep' list class), and it makes a separation between
internal booster attributes (restricted to jsonifyable data), accessed through \link{xgb.attr}
and shared between interfaces through serialization functions like \link{xgb.save}; and
R-specific attributes, accessed through \link{attributes} and \link{attr}, which are otherwise
only used in the R interface, only kept when using R's serializers like \link{saveRDS}, and
not anyhow used by functions like \link{predict.xgb.Booster}.
Be aware that one such R attribute that is automatically added is \code{params} - this attribute
is assigned from the \code{params} argument to this function, and is only meant to serve as a
reference for what went into the booster, but is not used in other methods that take a booster
object - so for example, changing the booster's configuration requires calling \verb{xgb.config<-}
or 'xgb.parameters<-', while simply modifying \verb{attributes(model)$params$<...>} will have no
effect elsewhere.
}
\examples{
data(agaricus.train, package='xgboost')

View File

@@ -1,21 +0,0 @@
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/xgb.unserialize.R
\name{xgb.unserialize}
\alias{xgb.unserialize}
\title{Load the instance back from \code{\link{xgb.serialize}}}
\usage{
xgb.unserialize(buffer, handle = NULL)
}
\arguments{
\item{buffer}{the buffer containing booster instance saved by \code{\link{xgb.serialize}}}
\item{handle}{An \code{xgb.Booster.handle} object which will be overwritten with
the new deserialized object. Must be a null handle (e.g. when loading the model through
\code{readRDS}). If not provided, a new handle will be created.}
}
\value{
An \code{xgb.Booster.handle} object.
}
\description{
Load the instance back from \code{\link{xgb.serialize}}
}