[R] On-demand serialization + standardization of attributes (#9924)
--------- Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
This commit is contained in:
@@ -2,16 +2,44 @@
|
||||
% Please edit documentation in R/utils.R
|
||||
\name{a-compatibility-note-for-saveRDS-save}
|
||||
\alias{a-compatibility-note-for-saveRDS-save}
|
||||
\title{Do not use \code{\link[base]{saveRDS}} or \code{\link[base]{save}} for long-term archival of
|
||||
models. Instead, use \code{\link{xgb.save}} or \code{\link{xgb.save.raw}}.}
|
||||
\title{Model Serialization and Compatibility}
|
||||
\description{
|
||||
It is a common practice to use the built-in \code{\link[base]{saveRDS}} function (or
|
||||
\code{\link[base]{save}}) to persist R objects to the disk. While it is possible to persist
|
||||
\code{xgb.Booster} objects using \code{\link[base]{saveRDS}}, it is not advisable to do so if
|
||||
the model is to be accessed in the future. If you train a model with the current version of
|
||||
XGBoost and persist it with \code{\link[base]{saveRDS}}, the model is not guaranteed to be
|
||||
accessible in later releases of XGBoost. To ensure that your model can be accessed in future
|
||||
releases of XGBoost, use \code{\link{xgb.save}} or \code{\link{xgb.save.raw}} instead.
|
||||
When it comes to serializing XGBoost models, it's possible to use R serializers such as
|
||||
\link{save} or \link{saveRDS} to serialize an XGBoost R model, but XGBoost also provides
|
||||
its own serializers with better compatibility guarantees, which allow loading
|
||||
said models in other language bindings of XGBoost.
|
||||
|
||||
Note that an \code{xgb.Booster} object, outside of its core components, might also keep:\itemize{
|
||||
\item Additional model configuration (accessible through \link{xgb.config}),
|
||||
which includes model fitting parameters like \code{max_depth} and runtime parameters like \code{nthread}.
|
||||
These are not necessarily useful for prediction/importance/plotting.
|
||||
\item Additional R-specific attributes - e.g. results of callbacks, such as evaluation logs,
|
||||
which are kept as a \code{data.table} object, accessible through \code{attributes(model)$evaluation_log}
|
||||
if present.
|
||||
}
|
||||
|
||||
The first one (configurations) does not have the same compatibility guarantees as
|
||||
the model itself, including attributes that are set and accessed through \link{xgb.attributes} - that is, such configuration
|
||||
might be lost after loading the booster in a different XGBoost version, regardless of the
|
||||
serializer that was used. These are saved when using \link{saveRDS}, but will be discarded
|
||||
if loaded into an incompatible XGBoost version. They are not saved when using XGBoost's
|
||||
serializers from its public interface including \link{xgb.save} and \link{xgb.save.raw}.
|
||||
|
||||
The second ones (R attributes) are not part of the standard XGBoost model structure, and thus are
|
||||
not saved when using XGBoost's own serializers. These attributes are only used for informational
|
||||
purposes, such as keeping track of evaluation metrics as the model was fit, or saving the R
|
||||
call that produced the model, but are otherwise not used for prediction / importance / plotting / etc.
|
||||
These R attributes are only preserved when using R's serializers.
|
||||
|
||||
Note that XGBoost models in R starting from version \verb{2.1.0} and onwards, and XGBoost models
|
||||
before version \verb{2.1.0}; have a very different R object structure and are incompatible with
|
||||
each other. Hence, models that were saved with R serializers live \code{saveRDS} or \code{save} before
|
||||
version \verb{2.1.0} will not work with latter \code{xgboost} versions and vice versa. Be aware that
|
||||
the structure of R model objects could in theory change again in the future, so XGBoost's serializers
|
||||
should be preferred for long-term storage.
|
||||
|
||||
Furthermore, note that using the package \code{qs} for serialization will require version 0.26 or
|
||||
higher of said package, and will have the same compatibility restrictions as R serializers.
|
||||
}
|
||||
\details{
|
||||
Use \code{\link{xgb.save}} to save the XGBoost model as a stand-alone file. You may opt into
|
||||
@@ -24,9 +52,10 @@ re-construct the corresponding model. To read the model back, use \code{\link{xg
|
||||
The \code{\link{xgb.save.raw}} function is useful if you'd like to persist the XGBoost model
|
||||
as part of another R object.
|
||||
|
||||
Note: Do not use \code{\link{xgb.serialize}} to store models long-term. It persists not only the
|
||||
model but also internal configurations and parameters, and its format is not stable across
|
||||
multiple XGBoost versions. Use \code{\link{xgb.serialize}} only for checkpointing.
|
||||
Use \link{saveRDS} if you require the R-specific attributes that a booster might have, such
|
||||
as evaluation logs, but note that future compatibility of such objects is outside XGBoost's
|
||||
control as it relies on R's serialization format (see e.g. the details section in
|
||||
\link{serialize} and \link{save} from base R).
|
||||
|
||||
For more details and explanation about model persistence and archival, consult the page
|
||||
\url{https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html}.
|
||||
|
||||
@@ -4,17 +4,22 @@
|
||||
\alias{cb.save.model}
|
||||
\title{Callback closure for saving a model file.}
|
||||
\usage{
|
||||
cb.save.model(save_period = 0, save_name = "xgboost.model")
|
||||
cb.save.model(save_period = 0, save_name = "xgboost.ubj")
|
||||
}
|
||||
\arguments{
|
||||
\item{save_period}{save the model to disk after every
|
||||
\code{save_period} iterations; 0 means save the model at the end.}
|
||||
|
||||
\item{save_name}{the name or path for the saved model file.
|
||||
It can contain a \code{\link[base]{sprintf}} formatting specifier
|
||||
to include the integer iteration number in the file name.
|
||||
E.g., with \code{save_name} = 'xgboost_\%04d.model',
|
||||
the file saved at iteration 50 would be named "xgboost_0050.model".}
|
||||
|
||||
\if{html}{\out{<div class="sourceCode">}}\preformatted{ Note that the format of the model being saved is determined by the file
|
||||
extension specified here (see \link{xgb.save} for details about how it works).
|
||||
|
||||
It can contain a \code{\link[base]{sprintf}} formatting specifier
|
||||
to include the integer iteration number in the file name.
|
||||
E.g., with \code{save_name} = 'xgboost_\%04d.ubj',
|
||||
the file saved at iteration 50 would be named "xgboost_0050.ubj".
|
||||
}\if{html}{\out{</div>}}}
|
||||
}
|
||||
\description{
|
||||
Callback closure for saving a model file.
|
||||
@@ -29,5 +34,7 @@ Callback function expects the following values to be set in its calling frame:
|
||||
\code{end_iteration}.
|
||||
}
|
||||
\seealso{
|
||||
\link{xgb.save}
|
||||
|
||||
\code{\link{callbacks}}
|
||||
}
|
||||
|
||||
50
R-package/man/coef.xgb.Booster.Rd
Normal file
50
R-package/man/coef.xgb.Booster.Rd
Normal file
@@ -0,0 +1,50 @@
|
||||
% Generated by roxygen2: do not edit by hand
|
||||
% Please edit documentation in R/xgb.Booster.R
|
||||
\name{coef.xgb.Booster}
|
||||
\alias{coef.xgb.Booster}
|
||||
\title{Extract coefficients from linear booster}
|
||||
\usage{
|
||||
\method{coef}{xgb.Booster}(object, ...)
|
||||
}
|
||||
\arguments{
|
||||
\item{object}{A fitted booster of 'gblinear' type.}
|
||||
|
||||
\item{...}{Not used.}
|
||||
}
|
||||
\value{
|
||||
The extracted coefficients:\itemize{
|
||||
\item If there's only one coefficient per column in the data, will be returned as a
|
||||
vector, potentially containing the feature names if available, with the intercept
|
||||
as first column.
|
||||
\item If there's more than one coefficient per column in the data (e.g. when using
|
||||
\code{objective="multi:softmax"}), will be returned as a matrix with dimensions equal
|
||||
to \verb{[num_features, num_cols]}, with the intercepts as first row. Note that the column
|
||||
(classes in multi-class classification) dimension will not be named.
|
||||
}
|
||||
|
||||
The intercept returned here will include the 'base_score' parameter (unlike the 'bias'
|
||||
or the last coefficient in the model dump, which doesn't have 'base_score' added to it),
|
||||
hence one should get the same values from calling \code{predict(..., outputmargin = TRUE)} and
|
||||
from performing a matrix multiplication with \code{model.matrix(~., ...)}.
|
||||
|
||||
Be aware that the coefficients are obtained by first converting them to strings and
|
||||
back, so there will always be some very small lose of precision compared to the actual
|
||||
coefficients as used by \link{predict.xgb.Booster}.
|
||||
}
|
||||
\description{
|
||||
Extracts the coefficients from a 'gblinear' booster object,
|
||||
as produced by \code{xgb.train} when using parameter \code{booster="gblinear"}.
|
||||
|
||||
Note: this function will error out if passing a booster model
|
||||
which is not of "gblinear" type.
|
||||
}
|
||||
\examples{
|
||||
library(xgboost)
|
||||
data(mtcars)
|
||||
y <- mtcars[, 1]
|
||||
x <- as.matrix(mtcars[, -1])
|
||||
dm <- xgb.DMatrix(data = x, label = y, nthread = 1)
|
||||
params <- list(booster = "gblinear", nthread = 1)
|
||||
model <- xgb.train(data = dm, params = params, nrounds = 2)
|
||||
coef(model)
|
||||
}
|
||||
@@ -1,24 +1,42 @@
|
||||
% Generated by roxygen2: do not edit by hand
|
||||
% Please edit documentation in R/xgb.DMatrix.R
|
||||
\name{getinfo}
|
||||
% Please edit documentation in R/xgb.Booster.R, R/xgb.DMatrix.R
|
||||
\name{getinfo.xgb.Booster}
|
||||
\alias{getinfo.xgb.Booster}
|
||||
\alias{setinfo.xgb.Booster}
|
||||
\alias{getinfo}
|
||||
\alias{getinfo.xgb.DMatrix}
|
||||
\title{Get information of an xgb.DMatrix object}
|
||||
\alias{setinfo}
|
||||
\alias{setinfo.xgb.DMatrix}
|
||||
\title{Get or set information of xgb.DMatrix and xgb.Booster objects}
|
||||
\usage{
|
||||
\method{getinfo}{xgb.Booster}(object, name)
|
||||
|
||||
\method{setinfo}{xgb.Booster}(object, name, info)
|
||||
|
||||
getinfo(object, name)
|
||||
|
||||
\method{getinfo}{xgb.DMatrix}(object, name)
|
||||
|
||||
setinfo(object, name, info)
|
||||
|
||||
\method{setinfo}{xgb.DMatrix}(object, name, info)
|
||||
}
|
||||
\arguments{
|
||||
\item{object}{Object of class \code{xgb.DMatrix}}
|
||||
\item{object}{Object of class \code{xgb.DMatrix} of \code{xgb.Booster}.}
|
||||
|
||||
\item{name}{the name of the information field to get (see details)}
|
||||
|
||||
\item{info}{the specific field of information to set}
|
||||
}
|
||||
\value{
|
||||
For \code{getinfo}, will return the requested field. For \code{setinfo}, will always return value \code{TRUE}
|
||||
if it succeeds.
|
||||
}
|
||||
\description{
|
||||
Get information of an xgb.DMatrix object
|
||||
Get or set information of xgb.DMatrix and xgb.Booster objects
|
||||
}
|
||||
\details{
|
||||
The \code{name} field can be one of the following:
|
||||
The \code{name} field can be one of the following for \code{xgb.DMatrix}:
|
||||
|
||||
\itemize{
|
||||
\item \code{label}
|
||||
@@ -33,8 +51,28 @@ The \code{name} field can be one of the following:
|
||||
}
|
||||
See the documentation for \link{xgb.DMatrix} for more information about these fields.
|
||||
|
||||
For \code{xgb.Booster}, can be one of the following:
|
||||
\itemize{
|
||||
\item \code{feature_type}
|
||||
\item \code{feature_name}
|
||||
}
|
||||
|
||||
Note that, while 'qid' cannot be retrieved, it's possible to get the equivalent 'group'
|
||||
for a DMatrix that had 'qid' assigned.
|
||||
|
||||
\bold{Important}: when calling \code{setinfo}, the objects are modified in-place. See
|
||||
\link{xgb.copy.Booster} for an idea of this in-place assignment works.
|
||||
|
||||
See the documentation for \link{xgb.DMatrix} for possible fields that can be set
|
||||
(which correspond to arguments in that function).
|
||||
|
||||
Note that the following fields are allowed in the construction of an \code{xgb.DMatrix}
|
||||
but \bold{aren't} allowed here:\itemize{
|
||||
\item data
|
||||
\item missing
|
||||
\item silent
|
||||
\item nthread
|
||||
}
|
||||
}
|
||||
\examples{
|
||||
data(agaricus.train, package='xgboost')
|
||||
@@ -45,4 +83,11 @@ setinfo(dtrain, 'label', 1-labels)
|
||||
|
||||
labels2 <- getinfo(dtrain, 'label')
|
||||
stopifnot(all(labels2 == 1-labels))
|
||||
data(agaricus.train, package='xgboost')
|
||||
dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
||||
|
||||
labels <- getinfo(dtrain, 'label')
|
||||
setinfo(dtrain, 'label', 1-labels)
|
||||
labels2 <- getinfo(dtrain, 'label')
|
||||
stopifnot(all.equal(labels2, 1-labels))
|
||||
}
|
||||
|
||||
@@ -2,7 +2,6 @@
|
||||
% Please edit documentation in R/xgb.Booster.R
|
||||
\name{predict.xgb.Booster}
|
||||
\alias{predict.xgb.Booster}
|
||||
\alias{predict.xgb.Booster.handle}
|
||||
\title{Predict method for XGBoost model}
|
||||
\usage{
|
||||
\method{predict}{xgb.Booster}(
|
||||
@@ -21,11 +20,9 @@
|
||||
strict_shape = FALSE,
|
||||
...
|
||||
)
|
||||
|
||||
\method{predict}{xgb.Booster.handle}(object, ...)
|
||||
}
|
||||
\arguments{
|
||||
\item{object}{Object of class \code{xgb.Booster} or \code{xgb.Booster.handle}.}
|
||||
\item{object}{Object of class \code{xgb.Booster}.}
|
||||
|
||||
\item{newdata}{Takes \code{matrix}, \code{dgCMatrix}, \code{dgRMatrix}, \code{dsparseVector},
|
||||
local data file, or \code{xgb.DMatrix}.
|
||||
|
||||
@@ -4,14 +4,15 @@
|
||||
\alias{print.xgb.Booster}
|
||||
\title{Print xgb.Booster}
|
||||
\usage{
|
||||
\method{print}{xgb.Booster}(x, verbose = FALSE, ...)
|
||||
\method{print}{xgb.Booster}(x, ...)
|
||||
}
|
||||
\arguments{
|
||||
\item{x}{An \code{xgb.Booster} object.}
|
||||
|
||||
\item{verbose}{Whether to print detailed data (e.g., attribute values).}
|
||||
|
||||
\item{...}{Not currently used.}
|
||||
\item{...}{Not used.}
|
||||
}
|
||||
\value{
|
||||
The same \code{x} object, returned invisibly
|
||||
}
|
||||
\description{
|
||||
Print information about \code{xgb.Booster}.
|
||||
@@ -33,6 +34,5 @@ bst <- xgboost(
|
||||
attr(bst, "myattr") <- "memo"
|
||||
|
||||
print(bst)
|
||||
print(bst, verbose = TRUE)
|
||||
|
||||
}
|
||||
|
||||
@@ -1,42 +0,0 @@
|
||||
% Generated by roxygen2: do not edit by hand
|
||||
% Please edit documentation in R/xgb.DMatrix.R
|
||||
\name{setinfo}
|
||||
\alias{setinfo}
|
||||
\alias{setinfo.xgb.DMatrix}
|
||||
\title{Set information of an xgb.DMatrix object}
|
||||
\usage{
|
||||
setinfo(object, name, info)
|
||||
|
||||
\method{setinfo}{xgb.DMatrix}(object, name, info)
|
||||
}
|
||||
\arguments{
|
||||
\item{object}{Object of class "xgb.DMatrix"}
|
||||
|
||||
\item{name}{the name of the field to get}
|
||||
|
||||
\item{info}{the specific field of information to set}
|
||||
}
|
||||
\description{
|
||||
Set information of an xgb.DMatrix object
|
||||
}
|
||||
\details{
|
||||
See the documentation for \link{xgb.DMatrix} for possible fields that can be set
|
||||
(which correspond to arguments in that function).
|
||||
|
||||
Note that the following fields are allowed in the construction of an \code{xgb.DMatrix}
|
||||
but \bold{aren't} allowed here:\itemize{
|
||||
\item data
|
||||
\item missing
|
||||
\item silent
|
||||
\item nthread
|
||||
}
|
||||
}
|
||||
\examples{
|
||||
data(agaricus.train, package='xgboost')
|
||||
dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
||||
|
||||
labels <- getinfo(dtrain, 'label')
|
||||
setinfo(dtrain, 'label', 1-labels)
|
||||
labels2 <- getinfo(dtrain, 'label')
|
||||
stopifnot(all.equal(labels2, 1-labels))
|
||||
}
|
||||
22
R-package/man/variable.names.xgb.Booster.Rd
Normal file
22
R-package/man/variable.names.xgb.Booster.Rd
Normal file
@@ -0,0 +1,22 @@
|
||||
% Generated by roxygen2: do not edit by hand
|
||||
% Please edit documentation in R/xgb.Booster.R
|
||||
\name{variable.names.xgb.Booster}
|
||||
\alias{variable.names.xgb.Booster}
|
||||
\title{Get Features Names from Booster}
|
||||
\usage{
|
||||
\method{variable.names}{xgb.Booster}(object, ...)
|
||||
}
|
||||
\arguments{
|
||||
\item{object}{An \code{xgb.Booster} object.}
|
||||
|
||||
\item{...}{Not used.}
|
||||
}
|
||||
\description{
|
||||
Returns the feature / variable / column names from a fitted
|
||||
booster object, which are set automatically during the call to \link{xgb.train}
|
||||
from the DMatrix names, or which can be set manually through \link{setinfo}.
|
||||
|
||||
If the object doesn't have feature names, will return \code{NULL}.
|
||||
|
||||
It is equivalent to calling \code{getinfo(object, "feature_name")}.
|
||||
}
|
||||
@@ -1,61 +0,0 @@
|
||||
% Generated by roxygen2: do not edit by hand
|
||||
% Please edit documentation in R/xgb.Booster.R
|
||||
\name{xgb.Booster.complete}
|
||||
\alias{xgb.Booster.complete}
|
||||
\title{Restore missing parts of an incomplete xgb.Booster object}
|
||||
\usage{
|
||||
xgb.Booster.complete(object, saveraw = TRUE)
|
||||
}
|
||||
\arguments{
|
||||
\item{object}{Object of class \code{xgb.Booster}.}
|
||||
|
||||
\item{saveraw}{A flag indicating whether to append \code{raw} Booster memory dump data
|
||||
when it doesn't already exist.}
|
||||
}
|
||||
\value{
|
||||
An object of \code{xgb.Booster} class.
|
||||
}
|
||||
\description{
|
||||
It attempts to complete an \code{xgb.Booster} object by restoring either its missing
|
||||
raw model memory dump (when it has no \code{raw} data but its \code{xgb.Booster.handle} is valid)
|
||||
or its missing internal handle (when its \code{xgb.Booster.handle} is not valid
|
||||
but it has a raw Booster memory dump).
|
||||
}
|
||||
\details{
|
||||
While this method is primarily for internal use, it might be useful in some practical situations.
|
||||
|
||||
E.g., when an \code{xgb.Booster} model is saved as an R object and then is loaded as an R object,
|
||||
its handle (pointer) to an internal xgboost model would be invalid. The majority of xgboost methods
|
||||
should still work for such a model object since those methods would be using
|
||||
\code{xgb.Booster.complete()} internally. However, one might find it to be more efficient to call the
|
||||
\code{xgb.Booster.complete()} function explicitly once after loading a model as an R-object.
|
||||
That would prevent further repeated implicit reconstruction of an internal booster model.
|
||||
}
|
||||
\examples{
|
||||
|
||||
data(agaricus.train, package = "xgboost")
|
||||
|
||||
bst <- xgboost(
|
||||
data = agaricus.train$data,
|
||||
label = agaricus.train$label,
|
||||
max_depth = 2,
|
||||
eta = 1,
|
||||
nthread = 2,
|
||||
nrounds = 2,
|
||||
objective = "binary:logistic"
|
||||
)
|
||||
|
||||
fname <- file.path(tempdir(), "xgb_model.Rds")
|
||||
saveRDS(bst, fname)
|
||||
|
||||
# Warning: The resulting RDS file is only compatible with the current XGBoost version.
|
||||
# Refer to the section titled "a-compatibility-note-for-saveRDS-save".
|
||||
bst1 <- readRDS(fname)
|
||||
# the handle is invalid:
|
||||
print(bst1$handle)
|
||||
|
||||
bst1 <- xgb.Booster.complete(bst1)
|
||||
# now the handle points to a valid internal booster model:
|
||||
print(bst1$handle)
|
||||
|
||||
}
|
||||
@@ -16,7 +16,7 @@ xgb.attributes(object)
|
||||
xgb.attributes(object) <- value
|
||||
}
|
||||
\arguments{
|
||||
\item{object}{Object of class \code{xgb.Booster} or \code{xgb.Booster.handle}.}
|
||||
\item{object}{Object of class \code{xgb.Booster}. \bold{Will be modified in-place} when assigning to it.}
|
||||
|
||||
\item{name}{A non-empty character string specifying which attribute is to be accessed.}
|
||||
|
||||
@@ -51,15 +51,14 @@ Also, setting an attribute that has the same name as one of xgboost's parameters
|
||||
change the value of that parameter for a model.
|
||||
Use \code{\link[=xgb.parameters<-]{xgb.parameters<-()}} to set or change model parameters.
|
||||
|
||||
The attribute setters would usually work more efficiently for \code{xgb.Booster.handle}
|
||||
than for \code{xgb.Booster}, since only just a handle (pointer) would need to be copied.
|
||||
That would only matter if attributes need to be set many times.
|
||||
Note, however, that when feeding a handle of an \code{xgb.Booster} object to the attribute setters,
|
||||
the raw model cache of an \code{xgb.Booster} object would not be automatically updated,
|
||||
and it would be the user's responsibility to call \code{\link[=xgb.serialize]{xgb.serialize()}} to update it.
|
||||
|
||||
The \verb{xgb.attributes<-} setter either updates the existing or adds one or several attributes,
|
||||
but it doesn't delete the other existing attributes.
|
||||
|
||||
Important: since this modifies the booster's C object, semantics for assignment here
|
||||
will differ from R's, as any object reference to the same booster will be modified
|
||||
too, while assignment of R attributes through \verb{attributes(model)$<attr> <- <value>}
|
||||
will follow the usual copy-on-write R semantics (see \link{xgb.copy.Booster} for an
|
||||
example of these behaviors).
|
||||
}
|
||||
\examples{
|
||||
data(agaricus.train, package = "xgboost")
|
||||
|
||||
@@ -10,13 +10,23 @@ xgb.config(object)
|
||||
xgb.config(object) <- value
|
||||
}
|
||||
\arguments{
|
||||
\item{object}{Object of class \code{xgb.Booster}.}
|
||||
\item{object}{Object of class \code{xgb.Booster}. \bold{Will be modified in-place} when assigning to it.}
|
||||
|
||||
\item{value}{A JSON string.}
|
||||
\item{value}{An R list.}
|
||||
}
|
||||
\value{
|
||||
\code{xgb.config} will return the parameters as an R list.
|
||||
}
|
||||
\description{
|
||||
Accessors for model parameters as JSON string
|
||||
}
|
||||
\details{
|
||||
Note that assignment is performed in-place on the booster C object, which unlike assignment
|
||||
of R attributes, doesn't follow typical copy-on-write semantics for assignment - i.e. all references
|
||||
to the same booster will also get updated.
|
||||
|
||||
See \link{xgb.copy.Booster} for an example of this behavior.
|
||||
}
|
||||
\examples{
|
||||
data(agaricus.train, package = "xgboost")
|
||||
|
||||
|
||||
53
R-package/man/xgb.copy.Booster.Rd
Normal file
53
R-package/man/xgb.copy.Booster.Rd
Normal file
@@ -0,0 +1,53 @@
|
||||
% Generated by roxygen2: do not edit by hand
|
||||
% Please edit documentation in R/xgb.Booster.R
|
||||
\name{xgb.copy.Booster}
|
||||
\alias{xgb.copy.Booster}
|
||||
\title{Deep-copies a Booster Object}
|
||||
\usage{
|
||||
xgb.copy.Booster(model)
|
||||
}
|
||||
\arguments{
|
||||
\item{model}{An 'xgb.Booster' object.}
|
||||
}
|
||||
\value{
|
||||
A deep copy of \code{model} - it will be identical in every way, but C-level
|
||||
functions called on that copy will not affect the \code{model} variable.
|
||||
}
|
||||
\description{
|
||||
Creates a deep copy of an 'xgb.Booster' object, such that the
|
||||
C object pointer contained will be a different object, and hence functions
|
||||
like \link{xgb.attr} will not affect the object from which it was copied.
|
||||
}
|
||||
\examples{
|
||||
library(xgboost)
|
||||
data(mtcars)
|
||||
y <- mtcars$mpg
|
||||
x <- mtcars[, -1]
|
||||
dm <- xgb.DMatrix(x, label = y, nthread = 1)
|
||||
model <- xgb.train(
|
||||
data = dm,
|
||||
params = list(nthread = 1),
|
||||
nround = 3
|
||||
)
|
||||
|
||||
# Set an arbitrary attribute kept at the C level
|
||||
xgb.attr(model, "my_attr") <- 100
|
||||
print(xgb.attr(model, "my_attr"))
|
||||
|
||||
# Just assigning to a new variable will not create
|
||||
# a deep copy - C object pointer is shared, and in-place
|
||||
# modifications will affect both objects
|
||||
model_shallow_copy <- model
|
||||
xgb.attr(model_shallow_copy, "my_attr") <- 333
|
||||
# 'model' was also affected by this change:
|
||||
print(xgb.attr(model, "my_attr"))
|
||||
|
||||
model_deep_copy <- xgb.copy.Booster(model)
|
||||
xgb.attr(model_deep_copy, "my_attr") <- 444
|
||||
# 'model' was NOT affected by this change
|
||||
# (keeps previous value that was assigned before)
|
||||
print(xgb.attr(model, "my_attr"))
|
||||
|
||||
# Verify that the new object was actually modified
|
||||
print(xgb.attr(model_deep_copy, "my_attr"))
|
||||
}
|
||||
@@ -8,7 +8,8 @@ xgb.gblinear.history(model, class_index = NULL)
|
||||
}
|
||||
\arguments{
|
||||
\item{model}{either an \code{xgb.Booster} or a result of \code{xgb.cv()}, trained
|
||||
using the \code{cb.gblinear.history()} callback.}
|
||||
using the \code{cb.gblinear.history()} callback, but \bold{not} a booster
|
||||
loaded from \link{xgb.load} or \link{xgb.load.raw}.}
|
||||
|
||||
\item{class_index}{zero-based class index to extract the coefficients for only that
|
||||
specific class in a multinomial multiclass model. When it is NULL, all the
|
||||
@@ -27,3 +28,11 @@ A helper function to extract the matrix of linear coefficients' history
|
||||
from a gblinear model created while using the \code{cb.gblinear.history()}
|
||||
callback.
|
||||
}
|
||||
\details{
|
||||
Note that this is an R-specific function that relies on R attributes that
|
||||
are not saved when using xgboost's own serialization functions like \link{xgb.load}
|
||||
or \link{xgb.load.raw}.
|
||||
|
||||
In order for a serialized model to be accepted by tgis function, one must use R
|
||||
serializers such as \link{saveRDS}.
|
||||
}
|
||||
|
||||
22
R-package/man/xgb.get.num.boosted.rounds.Rd
Normal file
22
R-package/man/xgb.get.num.boosted.rounds.Rd
Normal file
@@ -0,0 +1,22 @@
|
||||
% Generated by roxygen2: do not edit by hand
|
||||
% Please edit documentation in R/xgb.Booster.R
|
||||
\name{xgb.get.num.boosted.rounds}
|
||||
\alias{xgb.get.num.boosted.rounds}
|
||||
\title{Get number of boosting in a fitted booster}
|
||||
\usage{
|
||||
xgb.get.num.boosted.rounds(model)
|
||||
}
|
||||
\arguments{
|
||||
\item{model}{A fitted \code{xgb.Booster} model.}
|
||||
}
|
||||
\value{
|
||||
The number of rounds saved in the model, as an integer.
|
||||
}
|
||||
\description{
|
||||
Get number of boosting in a fitted booster
|
||||
}
|
||||
\details{
|
||||
Note that setting booster parameters related to training
|
||||
continuation / updates through \link{xgb.parameters<-} will reset the
|
||||
number of rounds to zero.
|
||||
}
|
||||
59
R-package/man/xgb.is.same.Booster.Rd
Normal file
59
R-package/man/xgb.is.same.Booster.Rd
Normal file
@@ -0,0 +1,59 @@
|
||||
% Generated by roxygen2: do not edit by hand
|
||||
% Please edit documentation in R/xgb.Booster.R
|
||||
\name{xgb.is.same.Booster}
|
||||
\alias{xgb.is.same.Booster}
|
||||
\title{Check if two boosters share the same C object}
|
||||
\usage{
|
||||
xgb.is.same.Booster(obj1, obj2)
|
||||
}
|
||||
\arguments{
|
||||
\item{obj1}{Booster model to compare with \code{obj2}.}
|
||||
|
||||
\item{obj2}{Booster model to compare with \code{obj1}.}
|
||||
}
|
||||
\value{
|
||||
Either \code{TRUE} or \code{FALSE} according to whether the two boosters share
|
||||
the underlying C object.
|
||||
}
|
||||
\description{
|
||||
Checks whether two booster objects refer to the same underlying C object.
|
||||
}
|
||||
\details{
|
||||
As booster objects (as returned by e.g. \link{xgb.train}) contain an R 'externalptr'
|
||||
object, they don't follow typical copy-on-write semantics of other R objects - that is, if
|
||||
one assigns a booster to a different variable and modifies that new variable through in-place
|
||||
methods like \link{xgb.attr<-}, the modification will be applied to both the old and the new
|
||||
variable, unlike typical R assignments which would only modify the latter.
|
||||
|
||||
This function allows checking whether two booster objects share the same 'externalptr',
|
||||
regardless of the R attributes that they might have.
|
||||
|
||||
In order to duplicate a booster in such a way that the copy wouldn't share the same
|
||||
'externalptr', one can use function \link{xgb.copy.Booster}.
|
||||
}
|
||||
\examples{
|
||||
library(xgboost)
|
||||
data(mtcars)
|
||||
y <- mtcars$mpg
|
||||
x <- as.matrix(mtcars[, -1])
|
||||
model <- xgb.train(
|
||||
params = list(nthread = 1),
|
||||
data = xgb.DMatrix(x, label = y, nthread = 1),
|
||||
nround = 3
|
||||
)
|
||||
|
||||
model_shallow_copy <- model
|
||||
xgb.is.same.Booster(model, model_shallow_copy) # same C object
|
||||
|
||||
model_deep_copy <- xgb.copy.Booster(model)
|
||||
xgb.is.same.Booster(model, model_deep_copy) # different C objects
|
||||
|
||||
# In-place assignments modify all references,
|
||||
# but not full/deep copies of the booster
|
||||
xgb.attr(model_shallow_copy, "my_attr") <- 111
|
||||
xgb.attr(model, "my_attr") # gets modified
|
||||
xgb.attr(model_deep_copy, "my_attr") # doesn't get modified
|
||||
}
|
||||
\seealso{
|
||||
\link{xgb.copy.Booster}
|
||||
}
|
||||
@@ -48,5 +48,5 @@ xgb.save(bst, fname)
|
||||
bst <- xgb.load(fname)
|
||||
}
|
||||
\seealso{
|
||||
\code{\link{xgb.save}}, \code{\link{xgb.Booster.complete}}.
|
||||
\code{\link{xgb.save}}
|
||||
}
|
||||
|
||||
@@ -4,12 +4,10 @@
|
||||
\alias{xgb.load.raw}
|
||||
\title{Load serialised xgboost model from R's raw vector}
|
||||
\usage{
|
||||
xgb.load.raw(buffer, as_booster = FALSE)
|
||||
xgb.load.raw(buffer)
|
||||
}
|
||||
\arguments{
|
||||
\item{buffer}{the buffer returned by xgb.save.raw}
|
||||
|
||||
\item{as_booster}{Return the loaded model as xgb.Booster instead of xgb.Booster.handle.}
|
||||
}
|
||||
\description{
|
||||
User can generate raw memory buffer by calling xgb.save.raw
|
||||
|
||||
@@ -14,8 +14,11 @@ xgb.model.dt.tree(
|
||||
)
|
||||
}
|
||||
\arguments{
|
||||
\item{feature_names}{Character vector used to overwrite the feature names
|
||||
of the model. The default (\code{NULL}) uses the original feature names.}
|
||||
\item{feature_names}{Character vector of feature names. If the model already
|
||||
contains feature names, those will be used when \code{feature_names=NULL} (default value).
|
||||
|
||||
\if{html}{\out{<div class="sourceCode">}}\preformatted{ Note that, if the model already contains feature names, it's \\bold\{not\} possible to override them here.
|
||||
}\if{html}{\out{</div>}}}
|
||||
|
||||
\item{model}{Object of class \code{xgb.Booster}.}
|
||||
|
||||
@@ -76,8 +79,6 @@ bst <- xgboost(
|
||||
objective = "binary:logistic"
|
||||
)
|
||||
|
||||
(dt <- xgb.model.dt.tree(colnames(agaricus.train$data), bst))
|
||||
|
||||
# This bst model already has feature_names stored with it, so those would be used when
|
||||
# feature_names is not set:
|
||||
(dt <- xgb.model.dt.tree(model = bst))
|
||||
|
||||
@@ -7,17 +7,27 @@
|
||||
xgb.parameters(object) <- value
|
||||
}
|
||||
\arguments{
|
||||
\item{object}{Object of class \code{xgb.Booster} or \code{xgb.Booster.handle}.}
|
||||
\item{object}{Object of class \code{xgb.Booster}. \bold{Will be modified in-place}.}
|
||||
|
||||
\item{value}{A list (or an object coercible to a list) with the names of parameters to set
|
||||
and the elements corresponding to parameter values.}
|
||||
}
|
||||
\value{
|
||||
The same booster \code{object}, which gets modified in-place.
|
||||
}
|
||||
\description{
|
||||
Only the setter for xgboost parameters is currently implemented.
|
||||
}
|
||||
\details{
|
||||
Note that the setter would usually work more efficiently for \code{xgb.Booster.handle}
|
||||
than for \code{xgb.Booster}, since only just a handle would need to be copied.
|
||||
Just like \link{xgb.attr}, this function will make in-place modifications
|
||||
on the booster object which do not follow typical R assignment semantics - that is,
|
||||
all references to the same booster will also be updated, unlike assingment of R
|
||||
attributes which follow copy-on-write semantics.
|
||||
|
||||
See \link{xgb.copy.Booster} for an example of this behavior.
|
||||
|
||||
Be aware that setting parameters of a fitted booster related to training continuation / updates
|
||||
will reset its number of rounds indicator to zero.
|
||||
}
|
||||
\examples{
|
||||
data(agaricus.train, package = "xgboost")
|
||||
|
||||
@@ -7,15 +7,27 @@
|
||||
xgb.save(model, fname)
|
||||
}
|
||||
\arguments{
|
||||
\item{model}{model object of \code{xgb.Booster} class.}
|
||||
\item{model}{Model object of \code{xgb.Booster} class.}
|
||||
|
||||
\item{fname}{name of the file to write.}
|
||||
\item{fname}{Name of the file to write.
|
||||
|
||||
Note that the extension of this file name determined the serialization format to use:\itemize{
|
||||
\item Extension ".ubj" will use the universal binary JSON format (recommended).
|
||||
This format uses binary types for e.g. floating point numbers, thereby preventing any loss
|
||||
of precision when converting to a human-readable JSON text or similar.
|
||||
\item Extension ".json" will use plain JSON, which is a human-readable format.
|
||||
\item Extension ".deprecated" will use a \bold{deprecated} binary format. This format will
|
||||
not be able to save attributes introduced after v1 of XGBoost, such as the "best_iteration"
|
||||
attribute that boosters might keep, nor feature names or user-specifiec attributes.
|
||||
\item If the format is not specified by passing one of the file extensions above, will
|
||||
default to UBJ.
|
||||
}}
|
||||
}
|
||||
\description{
|
||||
Save xgboost model to a file in binary format.
|
||||
Save xgboost model to a file in binary or JSON format.
|
||||
}
|
||||
\details{
|
||||
This methods allows to save a model in an xgboost-internal binary format which is universal
|
||||
This methods allows to save a model in an xgboost-internal binary or text format which is universal
|
||||
among the various xgboost interfaces. In R, the saved model file could be read-in later
|
||||
using either the \code{\link{xgb.load}} function or the \code{xgb_model} parameter
|
||||
of \code{\link{xgb.train}}.
|
||||
@@ -23,7 +35,7 @@ of \code{\link{xgb.train}}.
|
||||
Note: a model can also be saved as an R-object (e.g., by using \code{\link[base]{readRDS}}
|
||||
or \code{\link[base]{save}}). However, it would then only be compatible with R, and
|
||||
corresponding R-methods would need to be used to load it. Moreover, persisting the model with
|
||||
\code{\link[base]{readRDS}} or \code{\link[base]{save}}) will cause compatibility problems in
|
||||
\code{\link[base]{readRDS}} or \code{\link[base]{save}}) might cause compatibility problems in
|
||||
future versions of XGBoost. Consult \code{\link{a-compatibility-note-for-saveRDS-save}} to learn
|
||||
how to persist models in a future-proof way, i.e. to make the model accessible in future
|
||||
releases of XGBoost.
|
||||
@@ -51,5 +63,5 @@ xgb.save(bst, fname)
|
||||
bst <- xgb.load(fname)
|
||||
}
|
||||
\seealso{
|
||||
\code{\link{xgb.load}}, \code{\link{xgb.Booster.complete}}.
|
||||
\code{\link{xgb.load}}
|
||||
}
|
||||
|
||||
@@ -5,7 +5,7 @@
|
||||
\title{Save xgboost model to R's raw vector,
|
||||
user can call xgb.load.raw to load the model back from raw vector}
|
||||
\usage{
|
||||
xgb.save.raw(model, raw_format = "deprecated")
|
||||
xgb.save.raw(model, raw_format = "ubj")
|
||||
}
|
||||
\arguments{
|
||||
\item{model}{the model object.}
|
||||
@@ -15,9 +15,7 @@ xgb.save.raw(model, raw_format = "deprecated")
|
||||
\item \code{json}: Encode the booster into JSON text document.
|
||||
\item \code{ubj}: Encode the booster into Universal Binary JSON.
|
||||
\item \code{deprecated}: Encode the booster into old customized binary format.
|
||||
}
|
||||
|
||||
Right now the default is \code{deprecated} but will be changed to \code{ubj} in upcoming release.}
|
||||
}}
|
||||
}
|
||||
\description{
|
||||
Save xgboost model from xgboost or xgb.train
|
||||
|
||||
@@ -1,29 +0,0 @@
|
||||
% Generated by roxygen2: do not edit by hand
|
||||
% Please edit documentation in R/xgb.serialize.R
|
||||
\name{xgb.serialize}
|
||||
\alias{xgb.serialize}
|
||||
\title{Serialize the booster instance into R's raw vector. The serialization method differs
|
||||
from \code{\link{xgb.save.raw}} as the latter one saves only the model but not
|
||||
parameters. This serialization format is not stable across different xgboost versions.}
|
||||
\usage{
|
||||
xgb.serialize(booster)
|
||||
}
|
||||
\arguments{
|
||||
\item{booster}{the booster instance}
|
||||
}
|
||||
\description{
|
||||
Serialize the booster instance into R's raw vector. The serialization method differs
|
||||
from \code{\link{xgb.save.raw}} as the latter one saves only the model but not
|
||||
parameters. This serialization format is not stable across different xgboost versions.
|
||||
}
|
||||
\examples{
|
||||
data(agaricus.train, package='xgboost')
|
||||
data(agaricus.test, package='xgboost')
|
||||
train <- agaricus.train
|
||||
test <- agaricus.test
|
||||
bst <- xgb.train(data = xgb.DMatrix(train$data, label = train$label), max_depth = 2,
|
||||
eta = 1, nthread = 2, nrounds = 2,objective = "binary:logistic")
|
||||
raw <- xgb.serialize(bst)
|
||||
bst <- xgb.unserialize(raw)
|
||||
|
||||
}
|
||||
@@ -205,7 +205,12 @@ file with a previously saved model.}
|
||||
\item{callbacks}{a list of callback functions to perform various task during boosting.
|
||||
See \code{\link{callbacks}}. Some of the callbacks are automatically created depending on the
|
||||
parameters' values. User can provide either existing or their own callback methods in order
|
||||
to customize the training process.}
|
||||
to customize the training process.
|
||||
|
||||
\if{html}{\out{<div class="sourceCode">}}\preformatted{ Note that some callbacks might try to set an evaluation log - be aware that these evaluation logs
|
||||
are kept as R attributes, and thus do not get saved when using non-R serializaters like
|
||||
\link{xgb.save} (but are kept when using R serializers like \link{saveRDS}).
|
||||
}\if{html}{\out{</div>}}}
|
||||
|
||||
\item{...}{other parameters to pass to \code{params}.}
|
||||
|
||||
@@ -219,27 +224,7 @@ This parameter is only used when input is a dense matrix.}
|
||||
\item{weight}{a vector indicating the weight for each row of the input.}
|
||||
}
|
||||
\value{
|
||||
An object of class \code{xgb.Booster} with the following elements:
|
||||
\itemize{
|
||||
\item \code{handle} a handle (pointer) to the xgboost model in memory.
|
||||
\item \code{raw} a cached memory dump of the xgboost model saved as R's \code{raw} type.
|
||||
\item \code{niter} number of boosting iterations.
|
||||
\item \code{evaluation_log} evaluation history stored as a \code{data.table} with the
|
||||
first column corresponding to iteration number and the rest corresponding to evaluation
|
||||
metrics' values. It is created by the \code{\link{cb.evaluation.log}} callback.
|
||||
\item \code{call} a function call.
|
||||
\item \code{params} parameters that were passed to the xgboost library. Note that it does not
|
||||
capture parameters changed by the \code{\link{cb.reset.parameters}} callback.
|
||||
\item \code{callbacks} callback functions that were either automatically assigned or
|
||||
explicitly passed.
|
||||
\item \code{best_iteration} iteration number with the best evaluation metric value
|
||||
(only available with early stopping).
|
||||
\item \code{best_score} the best evaluation metric value during early stopping.
|
||||
(only available with early stopping).
|
||||
\item \code{feature_names} names of the training dataset features
|
||||
(only when column names were defined in training data).
|
||||
\item \code{nfeatures} number of features in training data.
|
||||
}
|
||||
An object of class \code{xgb.Booster}.
|
||||
}
|
||||
\description{
|
||||
\code{xgb.train} is an advanced interface for training an xgboost model.
|
||||
@@ -285,6 +270,21 @@ and the \code{print_every_n} parameter is passed to it.
|
||||
\item \code{cb.early.stop}: when \code{early_stopping_rounds} is set.
|
||||
\item \code{cb.save.model}: when \code{save_period > 0} is set.
|
||||
}
|
||||
|
||||
Note that objects of type \code{xgb.Booster} as returned by this function behave a bit differently
|
||||
from typical R objects (it's an 'altrep' list class), and it makes a separation between
|
||||
internal booster attributes (restricted to jsonifyable data), accessed through \link{xgb.attr}
|
||||
and shared between interfaces through serialization functions like \link{xgb.save}; and
|
||||
R-specific attributes, accessed through \link{attributes} and \link{attr}, which are otherwise
|
||||
only used in the R interface, only kept when using R's serializers like \link{saveRDS}, and
|
||||
not anyhow used by functions like \link{predict.xgb.Booster}.
|
||||
|
||||
Be aware that one such R attribute that is automatically added is \code{params} - this attribute
|
||||
is assigned from the \code{params} argument to this function, and is only meant to serve as a
|
||||
reference for what went into the booster, but is not used in other methods that take a booster
|
||||
object - so for example, changing the booster's configuration requires calling \verb{xgb.config<-}
|
||||
or 'xgb.parameters<-', while simply modifying \verb{attributes(model)$params$<...>} will have no
|
||||
effect elsewhere.
|
||||
}
|
||||
\examples{
|
||||
data(agaricus.train, package='xgboost')
|
||||
|
||||
@@ -1,21 +0,0 @@
|
||||
% Generated by roxygen2: do not edit by hand
|
||||
% Please edit documentation in R/xgb.unserialize.R
|
||||
\name{xgb.unserialize}
|
||||
\alias{xgb.unserialize}
|
||||
\title{Load the instance back from \code{\link{xgb.serialize}}}
|
||||
\usage{
|
||||
xgb.unserialize(buffer, handle = NULL)
|
||||
}
|
||||
\arguments{
|
||||
\item{buffer}{the buffer containing booster instance saved by \code{\link{xgb.serialize}}}
|
||||
|
||||
\item{handle}{An \code{xgb.Booster.handle} object which will be overwritten with
|
||||
the new deserialized object. Must be a null handle (e.g. when loading the model through
|
||||
\code{readRDS}). If not provided, a new handle will be created.}
|
||||
}
|
||||
\value{
|
||||
An \code{xgb.Booster.handle} object.
|
||||
}
|
||||
\description{
|
||||
Load the instance back from \code{\link{xgb.serialize}}
|
||||
}
|
||||
Reference in New Issue
Block a user