[DOC] reorg docs

2016-02-25 14:08:30 -08:00
parent 02e98e5d45
commit 6b02317ea8
5 changed files with 73 additions and 62 deletions
--- a/doc/index.md
+++ b/doc/index.md
@@ -7,13 +7,22 @@ for large scale tree boosting.

 This document is hosted at http://xgboost.readthedocs.org/. You can also browse most of the documents in github directly.

-User Guide
----------
-* [Installation Guide](build.md)
-* [Introduction to Boosted Trees](model.md)
+
+Package Documents
+-----------------
+This section contains language specific package guide.
+
+* [XGBoost Command Line Usage Walkthrough](../demo/binary_classification/README.md)
 * [Python Package Document](python/index.md)
 * [R Package Document](R-package/index.md)
 * [XGBoost.jl Julia Package](https://github.com/dmlc/XGBoost.jl)
+
+User Guides
+-----------
+This section contains users guides that are general across languages.
+
+* [Installation Guide](build.md)
+* [Introduction to Boosted Trees](model.md)
 * [Distributed Training](../demo/distributed-training)
 * [Frequently Asked Questions](faq.md)
 * [External Memory Version](external_memory.md)
@@ -22,28 +31,24 @@ User Guide
 * [Text input format](input_format.md)
 * [Notes on Parameter Tunning](param_tuning.md)

-Developer Guide
---------------
-* [Contributor Guide](dev-guide/contribute.md)

 Tutorials
 ---------
-Tutorials are self contained materials that teaches you how to achieve a complete data science task with xgboost, these
-are great resources to learn xgboost by real examples. If you think you have something that belongs to here, send a pull request.
-* [Binary classification using XGBoost Command Line](../demo/binary_classification/) (CLI)
-  - This tutorial introduces the basic usage of CLI version of xgboost
-* [Introduction of XGBoost in Python](python/python_intro.md) (python)
-  - This tutorial introduces the python package of xgboost
+This section contains official tutorials of XGBoost package.
+See [Awesome XGBoost](https://github.com/dmlc/xgboost/tree/master/demo) for links to mores resources.
+
 * [Introduction to XGBoost in R](R-package/xgboostPresentation.md) (R package)
  - This is a general presentation about xgboost in R.
 * [Discover your data with XGBoost in R](R-package/discoverYourData.md) (R package)
  - This tutorial explaining feature analysis in xgboost.
+* [Introduction of XGBoost in Python](python/python_intro.md) (python)
+  - This tutorial introduces the python package of xgboost
 * [Understanding XGBoost Model on Otto Dataset](../demo/kaggle-otto/understandingXGBoostModel.Rmd) (R package)
  - This tutorial teaches you how to use xgboost to compete kaggle otto challenge.

-Resources
---------
-See [awesome xgboost page](https://github.com/dmlc/xgboost/tree/master/demo) for links to other resources.
+Developer Guide
+---------------
+* [Contributor Guide](dev-guide/contribute.md)


 Indices and tables
--- a/doc/input_format.md
+++ b/doc/input_format.md
@@ -12,9 +12,14 @@ train.txt
 1 0:0.01 1:0.3
 0 0:0.2 1:0.3
 ```
-Each line represent a single instance, and in the first line '1' is the instance label,'101' and '102' are feature indices, '1.2' and '0.03' are feature values. In the binary classification case, '1' is used to indicate positive samples, and '0' is used to indicate negative samples. We also support probability values in [0,1] as label, to indicate the probability of the instance being positive.
+Each line represent a single instance, and in the first line '1' is the instance label,'101' and '102' are feature indices, '1.2' and '0.03' are feature values. In the binary classification case, '1' is used to indicate positive samples, and '0' is used to indicate negative samples. We also support probability values in [0,1] as label, to indicate the probability of the instanc
+e being positive.

-## Group Input Format
+Additional Information
+----------------------
+Note: these additional information are only applicable to single machine version of the package.
+
+### Group Input Format
 As XGBoost supports accomplishing [ranking task](../demo/rank), we support the group input format. In ranking task, instances are categorized into different groups in real world scenarios, for example, in the learning to rank web pages scenario, the web page instances are grouped by their queries. Except the instance file mentioned in the group input format, XGBoost need an file indicating the group information. For example, if the instance file is the "train.txt" shown above,
 and the group file is as below:

@@ -26,7 +31,7 @@ train.txt.group
 This means that, the data set contains 5 instances, and the first two instances are in a group and the other three are in another group. The numbers in the group file are actually indicating the number of instances in each group in the instance file in order.
 While configuration, you do not have to indicate the path of the group file. If the instance file name is "xxx", XGBoost will check whether there is a file named "xxx.group" in the same directory and decides whether to read the data as group input format.

-## Instance Weight File
+### Instance Weight File
 XGBoost supports providing each instance an weight to differentiate the importance of instances. For example, if we provide an instance weight file for the "train.txt" file in the example as below:

 train.txt.weight
@@ -40,7 +45,7 @@ train.txt.weight
 It means that XGBoost will emphasize more on the first and fourth instance， that is to say positive instances while training.
 The configuration is similar to configuring the group information. If the instance file name is "xxx", XGBoost will check whether there is a file named "xxx.weight" in the same directory and if there is, will use the weights while training models. Weights will be included into an "xxx.buffer" file that is created by XGBoost automatically. If you want to update the weights, you need to delete the "xxx.buffer" file prior to launching XGBoost.

-## Initial Margin file
+### Initial Margin file
 XGBoost supports providing each instance an initial margin prediction. For example, if we have a initial prediction using logistic regression for "train.txt" file, we can create the following file:

 train.txt.base_margin