[R] Add missing DMatrix functions (#9929)

* `XGDMatrixGetQuantileCut`
* `XGDMatrixNumNonMissing`
* `XGDMatrixGetDataAsCSR`

---------

Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
This commit is contained in:
david-cortes
2024-01-03 10:29:21 +01:00
committed by GitHub
parent 49247458f9
commit 3c004a4145
14 changed files with 438 additions and 9 deletions

View File

@@ -0,0 +1,19 @@
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/xgb.DMatrix.R
\name{xgb.get.DMatrix.data}
\alias{xgb.get.DMatrix.data}
\title{Get DMatrix Data}
\usage{
xgb.get.DMatrix.data(dmat)
}
\arguments{
\item{dmat}{An \code{xgb.DMatrix} object, as returned by \link{xgb.DMatrix}.}
}
\value{
The data held in the DMatrix, as a sparse CSR matrix (class \code{dgRMatrix}
from package \code{Matrix}). If it had feature names, these will be added as column names
in the output.
}
\description{
Get DMatrix Data
}

View File

@@ -0,0 +1,17 @@
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/xgb.DMatrix.R
\name{xgb.get.DMatrix.num.non.missing}
\alias{xgb.get.DMatrix.num.non.missing}
\title{Get Number of Non-Missing Entries in DMatrix}
\usage{
xgb.get.DMatrix.num.non.missing(dmat)
}
\arguments{
\item{dmat}{An \code{xgb.DMatrix} object, as returned by \link{xgb.DMatrix}.}
}
\value{
The number of non-missing entries in the DMatrix
}
\description{
Get Number of Non-Missing Entries in DMatrix
}

View File

@@ -0,0 +1,58 @@
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/xgb.DMatrix.R
\name{xgb.get.DMatrix.qcut}
\alias{xgb.get.DMatrix.qcut}
\title{Get Quantile Cuts from DMatrix}
\usage{
xgb.get.DMatrix.qcut(dmat, output = c("list", "arrays"))
}
\arguments{
\item{dmat}{An \code{xgb.DMatrix} object, as returned by \link{xgb.DMatrix}.}
\item{output}{Output format for the quantile cuts. Possible options are:\itemize{
\item \code{"list"} will return the output as a list with one entry per column, where
each column will have a numeric vector with the cuts. The list will be named if
\code{dmat} has column names assigned to it.
\item \code{"arrays"} will return a list with entries \code{indptr} (base-0 indexing) and
\code{data}. Here, the cuts for column 'i' are obtained by slicing 'data' from entries
\code{indptr[i]+1} to \code{indptr[i+1]}.
}}
}
\value{
The quantile cuts, in the format specified by parameter \code{output}.
}
\description{
Get the quantile cuts (a.k.a. borders) from an \code{xgb.DMatrix}
that has been quantized for the histogram method (\code{tree_method="hist"}).
These cuts are used in order to assign observations to bins - i.e. these are ordered
boundaries which are used to determine assignment condition \verb{border_low < x < border_high}.
As such, the first and last bin will be outside of the range of the data, so as to include
all of the observations there.
If a given column has 'n' bins, then there will be 'n+1' cuts / borders for that column,
which will be output in sorted order from lowest to highest.
Different columns can have different numbers of bins according to their range.
}
\examples{
library(xgboost)
data(mtcars)
y <- mtcars$mpg
x <- as.matrix(mtcars[, -1])
dm <- xgb.DMatrix(x, label = y, nthread = 1)
# DMatrix is not quantized right away, but will be once a hist model is generated
model <- xgb.train(
data = dm,
params = list(
tree_method = "hist",
max_bin = 8,
nthread = 1
),
nrounds = 3
)
# Now can get the quantile cuts
xgb.get.DMatrix.qcut(dm)
}