Backport doc fixes that are compatible with 0.72 release

* Clarify behavior of LIBSVM in XGBoost4J-Spark (#3524) * Fix typo in faq.rst (#3521) * Fix typo in parameter.rst, gblinear section (#3518) * Clarify supported OSes for XGBoost4J published JARs (#3547) * Update broken links (#3565) * Grammar fixes and typos (#3568) * Bring XGBoost4J Intro up-to-date (#3574)
2018-07-28 17:34:39 -07:00
parent e19dded9a3
commit 4334b9cc91
6 changed files with 77 additions and 80 deletions
--- a/doc/jvm/index.rst
+++ b/doc/jvm/index.rst
@@ -56,6 +56,13 @@ For sbt, please add the repository and dependency in build.sbt as following:

  "ml.dmlc" % "xgboost4j" % "latest_source_version_num"

+If you want to use XGBoost4J-Spark, replace ``xgboost4j`` with ``xgboost4j-spark``.
+
+.. note:: Spark 2.0 Required
+
+  After integrating with Dataframe/Dataset APIs of Spark 2.0, XGBoost4J-Spark only supports compile with Spark 2.x. You can build XGBoost4J-Spark as a component of XGBoost4J by running ``mvn package``, and you can specify the version of spark with ``mvn -Dspark.version=2.0.0 package``.   (To continue working with Spark 1.x, the users are supposed to update pom.xml by modifying the properties like ``spark.version``, ``scala.version``, and ``scala.binary.version``. Users also need to change the implementation by replacing ``SparkSession`` with ``SQLContext`` and the type of API parameters from ``Dataset[_]`` to ``Dataframe``)
+
+
 Installation from maven repo
 ============================

@@ -76,9 +83,11 @@ Access release version

  "ml.dmlc" % "xgboost4j" % "latest_version_num"

+This will checkout the latest stable version from the Maven Central.
+
 For the latest release version number, please check `here <https://github.com/dmlc/xgboost/releases>`_.

-if you want to use XGBoost4J-Spark, you just need to replace ``xgboost4j`` with ``xgboost4j-spark``.
+if you want to use XGBoost4J-Spark, replace ``xgboost4j`` with ``xgboost4j-spark``.

 Access SNAPSHOT version
 -----------------------
@@ -117,9 +126,9 @@ Then add dependency as following:

 For the latest release version number, please check `here <https://github.com/CodingCat/xgboost/tree/maven-repo/ml/dmlc/xgboost4j>`_.

-if you want to use XGBoost4J-Spark, you just need to replace ``xgboost4j`` with ``xgboost4j-spark``.
+.. note:: Windows not supported by published JARs

-After integrating with Dataframe/Dataset APIs of Spark 2.0, XGBoost4J-Spark only supports compile with Spark 2.x. You can build XGBoost4J-Spark as a component of XGBoost4J by running ``mvn package``, and you can specify the version of spark with ``mvn -Dspark.version=2.0.0 package``.   (To continue working with Spark 1.x, the users are supposed to update pom.xml by modifying the properties like ``spark.version``, ``scala.version``, and ``scala.binary.version``. Users also need to change the implementation by replacing ``SparkSession`` with ``SQLContext`` and the type of API parameters from ``Dataset[_]`` to ``Dataframe``)
+  The published JARs from the Maven Central and GitHub currently only supports Linux and MacOS. Windows users should consider building XGBoost4J / XGBoost4J-Spark from the source. Alternatively, checkout pre-built JARs from `criteo-forks/xgboost-jars <https://github.com/criteo-forks/xgboost-jars>`_.

 Enabling OpenMP for Mac OS
 --------------------------
@@ -136,8 +145,9 @@ Contents
 ********

 .. toctree::
+  :maxdepth: 2

-  Java Overview Tutorial <java_intro>
+  java_intro
  Code Examples <https://github.com/dmlc/xgboost/tree/master/jvm-packages/xgboost4j-example>
  XGBoost4J Java API <http://dmlc.ml/docs/javadocs/index.html>
  XGBoost4J Scala API <http://dmlc.ml/docs/scaladocs/xgboost4j/index.html>
--- a/doc/jvm/java_intro.rst
+++ b/doc/jvm/java_intro.rst
@@ -1,28 +1,28 @@
-##################
-XGBoost4J Java API
-##################
+##############################
+Getting Started with XGBoost4J
+##############################
 This tutorial introduces Java API for XGBoost.

 **************
 Data Interface
 **************
-Like the XGBoost python module, XGBoost4J uses ``DMatrix`` to handle data,
-libsvm txt format file, sparse matrix in CSR/CSC format, and dense matrix is
+Like the XGBoost python module, XGBoost4J uses DMatrix to handle data.
+LIBSVM txt format file, sparse matrix in CSR/CSC format, and dense matrix are
 supported.

-* The first step is to import ``DMatrix``:
+* The first step is to import DMatrix:

  .. code-block:: java

-    import org.dmlc.xgboost4j.DMatrix;
+    import ml.dmlc.xgboost4j.java.DMatrix;

-* Use ``DMatrix`` constructor to load data from a libsvm text format file:
+* Use DMatrix constructor to load data from a libsvm text format file:

  .. code-block:: java

    DMatrix dmat = new DMatrix("train.svm.txt");

-* Pass arrays to ``DMatrix`` constructor to load from sparse matrix.
+* Pass arrays to DMatrix constructor to load from sparse matrix.

  Suppose we have a sparse matrix
  
@@ -39,7 +39,8 @@ supported.
    long[] rowHeaders = new long[] {0,2,4,7};
    float[] data = new float[] {1f,2f,4f,3f,3f,1f,2f};
    int[] colIndex = new int[] {0,2,0,3,0,1,2};
-    DMatrix dmat = new DMatrix(rowHeaders, colIndex, data, DMatrix.SparseType.CSR);
+    int numColumn = 4;
+    DMatrix dmat = new DMatrix(rowHeaders, colIndex, data, DMatrix.SparseType.CSR, numColumn);
  
  ... or in `Compressed Sparse Column (CSC) <https://en.wikipedia.org/wiki/Sparse_matrix#Compressed_sparse_column_(CSC_or_CCS)>`_ format:
  
@@ -48,7 +49,8 @@ supported.
    long[] colHeaders = new long[] {0,3,4,6,7};
    float[] data = new float[] {1f,4f,3f,1f,2f,2f,3f};
    int[] rowIndex = new int[] {0,1,2,2,0,2,1};
-    DMatrix dmat = new DMatrix(colHeaders, rowIndex, data, DMatrix.SparseType.CSC);
+    int numRow = 3;
+    DMatrix dmat = new DMatrix(colHeaders, rowIndex, data, DMatrix.SparseType.CSC, numRow);

 * You may also load your data from a dense matrix. Let's assume we have a matrix of form

@@ -66,7 +68,7 @@ supported.
    int nrow = 3;
    int ncol = 2;
    float missing = 0.0f;
-    DMatrix dmat = new Matrix(data, nrow, ncol, missing);
+    DMatrix dmat = new DMatrix(data, nrow, ncol, missing);

 * To set weight:

@@ -78,47 +80,31 @@ supported.
 ******************
 Setting Parameters
 ******************
-* In XGBoost4J any ``Iterable<Entry<String, Object>>`` object could be used as parameters.
+To set parameters, parameters are specified as a Map:

-* To set parameters, for non-multiple value params, you can simply use entrySet of an Map:
+.. code-block:: java

-  .. code-block:: java
-
-    Map<String, Object> paramMap = new HashMap<>() {
-      {
-        put("eta", 1.0);
-        put("max_depth", 2);
-        put("silent", 1);
-        put("objective", "binary:logistic");
-        put("eval_metric", "logloss");
-      }
-    };
-    Iterable<Entry<String, Object>> params = paramMap.entrySet();
-
-* for the situation that multiple values with same param key, List<Entry<String, Object>> would be a good choice, e.g. :
-
-  .. code-block:: java
-
-    List<Entry<String, Object>> params = new ArrayList<Entry<String, Object>>() {
-        {
-            add(new SimpleEntry<String, Object>("eta", 1.0));
-            add(new SimpleEntry<String, Object>("max_depth", 2.0));
-            add(new SimpleEntry<String, Object>("silent", 1));
-            add(new SimpleEntry<String, Object>("objective", "binary:logistic"));
-        }
-    };
+  Map<String, Object> params = new HashMap<String, Object>() {
+    {
+      put("eta", 1.0);
+      put("max_depth", 2);
+      put("silent", 1);
+      put("objective", "binary:logistic");
+      put("eval_metric", "logloss");
+    }
+  };

 **************
 Training Model
 **************
 With parameters and data, you are able to train a booster model.

-* Import ``Trainer`` and ``Booster``:
+* Import Booster and XGBoost:

  .. code-block:: java

-    import org.dmlc.xgboost4j.Booster;
-    import org.dmlc.xgboost4j.util.Trainer;
+    import ml.dmlc.xgboost4j.java.Booster;
+    import ml.dmlc.xgboost4j.java.XGBoost;

 * Training

@@ -126,13 +112,15 @@ With parameters and data, you are able to train a booster model.

    DMatrix trainMat = new DMatrix("train.svm.txt");
    DMatrix validMat = new DMatrix("valid.svm.txt");
-    //specify a watchList to see the performance
-    //any Iterable<Entry<String, DMatrix>> object could be used as watchList
-    List<Entry<String, DMatrix>> watchs =  new ArrayList<>();
-    watchs.add(new SimpleEntry<>("train", trainMat));
-    watchs.add(new SimpleEntry<>("test", testMat));
-    int round = 2;
-    Booster booster = Trainer.train(params, trainMat, round, watchs, null, null);
+    // Specify a watch list to see model accuracy on data sets
+    Map<String, DMatrix> watches = new HashMap<String, DMatrix>() {
+      {
+        put("train", trainMat);
+        put("test", testMat);
+      }
+    };
+    int nround = 2;
+    Booster booster = XGBoost.train(trainMat, params, nround, watches, null, null);

 * Saving model

@@ -142,25 +130,20 @@ With parameters and data, you are able to train a booster model.

    booster.saveModel("model.bin");

-* Dump Model and Feature Map
+* Generaing model dump with feature map

  .. code-block:: java

-    booster.dumpModel("modelInfo.txt", false)
-    //dump with featureMap
-    booster.dumpModel("modelInfo.txt", "featureMap.txt", false)
+    // dump without feature map
+    String[] model_dump = booster.getModelDump(null, false);
+    // dump with feature map
+    String[] model_dump_with_feature_map = booster.getModelDump("featureMap.txt", false);

 * Load a model

  .. code-block:: java

-    Params param = new Params() {
-      {
-        put("silent", 1);
-        put("nthread", 6);
-      }
-    };
-    Booster booster = new Booster(param, "model.bin");
+    Booster booster = XGBoost.loadModel("model.bin");

 **********
 Prediction
@@ -170,8 +153,8 @@ After training and loading a model, you can use it to make prediction for other
 .. code-block:: java

  DMatrix dtest = new DMatrix("test.svm.txt");
-  //predict
+  // predict
  float[][] predicts = booster.predict(dtest);
-  //predict leaf
-  float[][] leafPredicts = booster.predict(dtest, 0, true);
+  // predict leaf
+  float[][] leafPredicts = booster.predictLeaf(dtest, 0);