Backport doc fixes that are compatible with 0.72 release

* Clarify behavior of LIBSVM in XGBoost4J-Spark (#3524)
* Fix typo in faq.rst (#3521)
* Fix typo in parameter.rst, gblinear section (#3518)
* Clarify supported OSes for XGBoost4J published JARs (#3547)
* Update broken links (#3565)
* Grammar fixes and typos (#3568)
* Bring XGBoost4J Intro up-to-date (#3574)
This commit is contained in:
Nan Zhu
2018-07-28 17:34:39 -07:00
committed by Philip Cho
parent e19dded9a3
commit 4334b9cc91
6 changed files with 77 additions and 80 deletions

View File

@@ -56,6 +56,13 @@ For sbt, please add the repository and dependency in build.sbt as following:
"ml.dmlc" % "xgboost4j" % "latest_source_version_num"
If you want to use XGBoost4J-Spark, replace ``xgboost4j`` with ``xgboost4j-spark``.
.. note:: Spark 2.0 Required
After integrating with Dataframe/Dataset APIs of Spark 2.0, XGBoost4J-Spark only supports compile with Spark 2.x. You can build XGBoost4J-Spark as a component of XGBoost4J by running ``mvn package``, and you can specify the version of spark with ``mvn -Dspark.version=2.0.0 package``. (To continue working with Spark 1.x, the users are supposed to update pom.xml by modifying the properties like ``spark.version``, ``scala.version``, and ``scala.binary.version``. Users also need to change the implementation by replacing ``SparkSession`` with ``SQLContext`` and the type of API parameters from ``Dataset[_]`` to ``Dataframe``)
Installation from maven repo
============================
@@ -76,9 +83,11 @@ Access release version
"ml.dmlc" % "xgboost4j" % "latest_version_num"
This will checkout the latest stable version from the Maven Central.
For the latest release version number, please check `here <https://github.com/dmlc/xgboost/releases>`_.
if you want to use XGBoost4J-Spark, you just need to replace ``xgboost4j`` with ``xgboost4j-spark``.
if you want to use XGBoost4J-Spark, replace ``xgboost4j`` with ``xgboost4j-spark``.
Access SNAPSHOT version
-----------------------
@@ -117,9 +126,9 @@ Then add dependency as following:
For the latest release version number, please check `here <https://github.com/CodingCat/xgboost/tree/maven-repo/ml/dmlc/xgboost4j>`_.
if you want to use XGBoost4J-Spark, you just need to replace ``xgboost4j`` with ``xgboost4j-spark``.
.. note:: Windows not supported by published JARs
After integrating with Dataframe/Dataset APIs of Spark 2.0, XGBoost4J-Spark only supports compile with Spark 2.x. You can build XGBoost4J-Spark as a component of XGBoost4J by running ``mvn package``, and you can specify the version of spark with ``mvn -Dspark.version=2.0.0 package``. (To continue working with Spark 1.x, the users are supposed to update pom.xml by modifying the properties like ``spark.version``, ``scala.version``, and ``scala.binary.version``. Users also need to change the implementation by replacing ``SparkSession`` with ``SQLContext`` and the type of API parameters from ``Dataset[_]`` to ``Dataframe``)
The published JARs from the Maven Central and GitHub currently only supports Linux and MacOS. Windows users should consider building XGBoost4J / XGBoost4J-Spark from the source. Alternatively, checkout pre-built JARs from `criteo-forks/xgboost-jars <https://github.com/criteo-forks/xgboost-jars>`_.
Enabling OpenMP for Mac OS
--------------------------
@@ -136,8 +145,9 @@ Contents
********
.. toctree::
:maxdepth: 2
Java Overview Tutorial <java_intro>
java_intro
Code Examples <https://github.com/dmlc/xgboost/tree/master/jvm-packages/xgboost4j-example>
XGBoost4J Java API <http://dmlc.ml/docs/javadocs/index.html>
XGBoost4J Scala API <http://dmlc.ml/docs/scaladocs/xgboost4j/index.html>

View File

@@ -1,28 +1,28 @@
##################
XGBoost4J Java API
##################
##############################
Getting Started with XGBoost4J
##############################
This tutorial introduces Java API for XGBoost.
**************
Data Interface
**************
Like the XGBoost python module, XGBoost4J uses ``DMatrix`` to handle data,
libsvm txt format file, sparse matrix in CSR/CSC format, and dense matrix is
Like the XGBoost python module, XGBoost4J uses DMatrix to handle data.
LIBSVM txt format file, sparse matrix in CSR/CSC format, and dense matrix are
supported.
* The first step is to import ``DMatrix``:
* The first step is to import DMatrix:
.. code-block:: java
import org.dmlc.xgboost4j.DMatrix;
import ml.dmlc.xgboost4j.java.DMatrix;
* Use ``DMatrix`` constructor to load data from a libsvm text format file:
* Use DMatrix constructor to load data from a libsvm text format file:
.. code-block:: java
DMatrix dmat = new DMatrix("train.svm.txt");
* Pass arrays to ``DMatrix`` constructor to load from sparse matrix.
* Pass arrays to DMatrix constructor to load from sparse matrix.
Suppose we have a sparse matrix
@@ -39,7 +39,8 @@ supported.
long[] rowHeaders = new long[] {0,2,4,7};
float[] data = new float[] {1f,2f,4f,3f,3f,1f,2f};
int[] colIndex = new int[] {0,2,0,3,0,1,2};
DMatrix dmat = new DMatrix(rowHeaders, colIndex, data, DMatrix.SparseType.CSR);
int numColumn = 4;
DMatrix dmat = new DMatrix(rowHeaders, colIndex, data, DMatrix.SparseType.CSR, numColumn);
... or in `Compressed Sparse Column (CSC) <https://en.wikipedia.org/wiki/Sparse_matrix#Compressed_sparse_column_(CSC_or_CCS)>`_ format:
@@ -48,7 +49,8 @@ supported.
long[] colHeaders = new long[] {0,3,4,6,7};
float[] data = new float[] {1f,4f,3f,1f,2f,2f,3f};
int[] rowIndex = new int[] {0,1,2,2,0,2,1};
DMatrix dmat = new DMatrix(colHeaders, rowIndex, data, DMatrix.SparseType.CSC);
int numRow = 3;
DMatrix dmat = new DMatrix(colHeaders, rowIndex, data, DMatrix.SparseType.CSC, numRow);
* You may also load your data from a dense matrix. Let's assume we have a matrix of form
@@ -66,7 +68,7 @@ supported.
int nrow = 3;
int ncol = 2;
float missing = 0.0f;
DMatrix dmat = new Matrix(data, nrow, ncol, missing);
DMatrix dmat = new DMatrix(data, nrow, ncol, missing);
* To set weight:
@@ -78,47 +80,31 @@ supported.
******************
Setting Parameters
******************
* In XGBoost4J any ``Iterable<Entry<String, Object>>`` object could be used as parameters.
To set parameters, parameters are specified as a Map:
* To set parameters, for non-multiple value params, you can simply use entrySet of an Map:
.. code-block:: java
.. code-block:: java
Map<String, Object> paramMap = new HashMap<>() {
{
put("eta", 1.0);
put("max_depth", 2);
put("silent", 1);
put("objective", "binary:logistic");
put("eval_metric", "logloss");
}
};
Iterable<Entry<String, Object>> params = paramMap.entrySet();
* for the situation that multiple values with same param key, List<Entry<String, Object>> would be a good choice, e.g. :
.. code-block:: java
List<Entry<String, Object>> params = new ArrayList<Entry<String, Object>>() {
{
add(new SimpleEntry<String, Object>("eta", 1.0));
add(new SimpleEntry<String, Object>("max_depth", 2.0));
add(new SimpleEntry<String, Object>("silent", 1));
add(new SimpleEntry<String, Object>("objective", "binary:logistic"));
}
};
Map<String, Object> params = new HashMap<String, Object>() {
{
put("eta", 1.0);
put("max_depth", 2);
put("silent", 1);
put("objective", "binary:logistic");
put("eval_metric", "logloss");
}
};
**************
Training Model
**************
With parameters and data, you are able to train a booster model.
* Import ``Trainer`` and ``Booster``:
* Import Booster and XGBoost:
.. code-block:: java
import org.dmlc.xgboost4j.Booster;
import org.dmlc.xgboost4j.util.Trainer;
import ml.dmlc.xgboost4j.java.Booster;
import ml.dmlc.xgboost4j.java.XGBoost;
* Training
@@ -126,13 +112,15 @@ With parameters and data, you are able to train a booster model.
DMatrix trainMat = new DMatrix("train.svm.txt");
DMatrix validMat = new DMatrix("valid.svm.txt");
//specify a watchList to see the performance
//any Iterable<Entry<String, DMatrix>> object could be used as watchList
List<Entry<String, DMatrix>> watchs = new ArrayList<>();
watchs.add(new SimpleEntry<>("train", trainMat));
watchs.add(new SimpleEntry<>("test", testMat));
int round = 2;
Booster booster = Trainer.train(params, trainMat, round, watchs, null, null);
// Specify a watch list to see model accuracy on data sets
Map<String, DMatrix> watches = new HashMap<String, DMatrix>() {
{
put("train", trainMat);
put("test", testMat);
}
};
int nround = 2;
Booster booster = XGBoost.train(trainMat, params, nround, watches, null, null);
* Saving model
@@ -142,25 +130,20 @@ With parameters and data, you are able to train a booster model.
booster.saveModel("model.bin");
* Dump Model and Feature Map
* Generaing model dump with feature map
.. code-block:: java
booster.dumpModel("modelInfo.txt", false)
//dump with featureMap
booster.dumpModel("modelInfo.txt", "featureMap.txt", false)
// dump without feature map
String[] model_dump = booster.getModelDump(null, false);
// dump with feature map
String[] model_dump_with_feature_map = booster.getModelDump("featureMap.txt", false);
* Load a model
.. code-block:: java
Params param = new Params() {
{
put("silent", 1);
put("nthread", 6);
}
};
Booster booster = new Booster(param, "model.bin");
Booster booster = XGBoost.loadModel("model.bin");
**********
Prediction
@@ -170,8 +153,8 @@ After training and loading a model, you can use it to make prediction for other
.. code-block:: java
DMatrix dtest = new DMatrix("test.svm.txt");
//predict
// predict
float[][] predicts = booster.predict(dtest);
//predict leaf
float[][] leafPredicts = booster.predict(dtest, 0, true);
// predict leaf
float[][] leafPredicts = booster.predictLeaf(dtest, 0);