xgboost/java/doc/xgboost4j.md
2015-06-10 20:09:49 -07:00

4.1 KiB

xgboost4j : java wrapper for xgboost

This page will introduce xgboost4j, the java wrapper for xgboost, including:

=

Build xgboost4j

  • Build native library
    first make sure you have installed jdk and JAVA_HOME has been setted properly, then simply run ./create_wrap.sh.

  • Package xgboost4j
    to package xgboost4j, you can run mvn package in xgboost4j folder or just use IDE(eclipse/netbeans) to open this maven project and build.

=

Data Interface

Like the xgboost python module, xgboost4j use DMatrix to handle data, libsvm txt format file, sparse matrix in CSR/CSC format, and dense matrix is supported.

  • To import DMatrix :
import org.dmlc.xgboost4j.DMatrix;
  • To load libsvm text format file, the usage is like :
DMatrix dmat = new DMatrix("train.svm.txt");
  • To load sparse matrix in CSR/CSC format is a little complicated, the usage is like :
    suppose a sparse matrix :
    1 0 2 0
    4 0 0 3
    3 1 2 0

    for CSR format

long[] rowHeaders = new long[] {0,2,4,7};
float[] data = new float[] {1f,2f,4f,3f,3f,1f,2f};
int[] colIndex = new int[] {0,2,0,3,0,1,2};
DMatrix dmat = new DMatrix(rowHeaders, colIndex, data, DMatrix.SparseType.CSR);

for CSC format

long[] colHeaders = new long[] {0,3,4,6,7};
float[] data = new float[] {1f,4f,3f,1f,2f,2f,3f};
int[] rowIndex = new int[] {0,1,2,2,0,2,1};
DMatrix dmat = new DMatrix(colHeaders, rowIndex, data, DMatrix.SparseType.CSC);
  • To load 3*2 dense matrix, the usage is like :
    suppose a matrix :
    1 2
    3 4
    5 6
float[] data = new float[] {1f,2f,3f,4f,5f,6f};
int nrow = 3;
int ncol = 2;
float missing = 0.0f;
DMatrix dmat = new Matrix(data, nrow, ncol, missing);
  • To set weight :
float[] weights = new float[] {1f,2f,1f};
dmat.setWeight(weights);

Setting Parameters

  • A util class Params in xgboost4j is used to handle parameters.
  • To import Params :
import org.dmlc.xgboost4j.util.Params;
  • to set parameters :
Params params = new Params() {
  {
    put("eta", 1.0);
    put("max_depth", 2);
    put("silent", 1);
    put("objective", "binary:logistic");
    put("eval_metric", "logloss");
  }
};
  • Multiple values with same param key is handled naturally in Params, e.g. :
Params params = new Params() {
  {
    put("eta", 1.0);
    put("max_depth", 2);
    put("silent", 1);
    put("objective", "binary:logistic");
    put("eval_metric", "logloss");
    put("eval_metric", "error");
  }
};

Training Model

With parameters and data, you are able to train a booster model.

  • Import Trainer and Booster :
import org.dmlc.xgboost4j.Booster;
import org.dmlc.xgboost4j.util.Trainer;
import org.dmlc.xgboost4j.util.WatchList;
  • Training
DMatrix trainMat = new DMatrix("train.svm.txt");
DMatrix validMat = new DMatrix("valid.svm.txt");
//specifiy a watchList to see the performance
WatchList watchs = new WatchList();
watchs.put("train", trainMat);
watchs.put("test", testMat);
int round = 2;
Booster booster = Trainer.train(params, trainMat, round, watchs, null, null);
  • Saving model After training, you can save model and dump it out.
booster.saveModel("model.bin");
  • Dump Model and Feature Map
booster.dumpModel("modelInfo.txt", false)
//dump with featureMap
booster.dumpModel("modelInfo.txt", "featureMap.txt", false)
  • Load a model
Params param = new Params() {
  {
    put("silent", 1);
    put("nthread", 6);
  }
};
Booster booster = new Booster(param, "model.bin");

####Prediction after training and loading a model, you use it to predict other data, the predict results will be a two-dimension float array (nsample, nclass) ,for predict leaf, it would be (nsample, nclass*ntrees)

DMatrix dtest = new DMatrix("test.svm.txt");
//predict
float[][] predicts = booster.predict(dtest);
//predict leaf
float[][] leafPredicts = booster.predict(dtest, 0, true);