Group CLI demo into subdirectory. (#6258)

CLI is not most developed interface. Putting them into correct directory can help new users to avoid it as most of the use cases are from a language binding.
2020-10-29 05:40:44 +08:00
parent 6383757dca
commit dfac5f89e9
32 changed files with 146 additions and 100 deletions
--- a/demo/CLI/distributed-training/README.md
+++ b/demo/CLI/distributed-training/README.md
@@ -0,0 +1,27 @@
+Distributed XGBoost Training
+============================
+This is an tutorial of Distributed XGBoost Training.
+Currently xgboost supports distributed training via CLI program with the configuration file.
+There is also plan push distributed python and other language bindings, please open an issue
+if you are interested in contributing.
+
+Build XGBoost with Distributed Filesystem Support
+-------------------------------------------------
+To use distributed xgboost, you only need to turn the options on to build
+with distributed filesystems(HDFS or S3) in cmake.
+
+```
+cmake <path/to/xgboost> -DUSE_HDFS=ON -DUSE_S3=ON -DUSE_AZURE=ON
+```
+
+
+Step by Step Tutorial on AWS
+----------------------------
+Checkout [this tutorial](https://xgboost.readthedocs.org/en/latest/tutorials/aws_yarn.html) for running distributed xgboost.
+
+
+Model Analysis
+--------------
+XGBoost is exchangeable across all bindings and platforms.
+This means you can use python or R to analyze the learnt model and do prediction.
+For example, you can use the [plot_model.ipynb](plot_model.ipynb) to visualize the learnt model.
--- a/demo/CLI/distributed-training/mushroom.aws.conf
+++ b/demo/CLI/distributed-training/mushroom.aws.conf
@@ -0,0 +1,27 @@
+# General Parameters, see comment for each definition
+# choose the booster, can be gbtree or gblinear
+booster = gbtree
+# choose logistic regression loss function for binary classification
+objective = binary:logistic
+
+# Tree Booster Parameters
+# step size shrinkage
+eta = 1.0
+# minimum loss reduction required to make a further partition
+gamma = 1.0
+# minimum sum of instance weight(hessian) needed in a child
+min_child_weight = 1
+# maximum depth of a tree
+max_depth = 3
+
+# Task Parameters
+# the number of round to do boosting
+num_round = 2
+# 0 means do not save any model except the final round model
+save_period = 0
+# The path of training data
+data = "s3://mybucket/xgb-demo/train"
+# The path of validation data, used to monitor training process, here [test] sets name of the validation set
+# evaluate on training data as well each round
+eval_train = 1
+
--- a/demo/CLI/distributed-training/plot_model.ipynb
+++ b/demo/CLI/distributed-training/plot_model.ipynb
@@ -0,0 +1,107 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# XGBoost Model Analysis\n",
+    "\n",
+    "This notebook can be used to load and analysis model learnt from all xgboost bindings, including distributed training. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "import sys\n",
+    "import os\n",
+    "%matplotlib inline "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Please change the ```pkg_path``` and ```model_file``` to be correct path"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "pkg_path = '../../python-package/'\n",
+    "model_file = 's3://my-bucket/xgb-demo/model/0002.model'\n",
+    "sys.path.insert(0, pkg_path)\n",
+    "import xgboost as xgb"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Plot the Feature Importance"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "# plot the first two trees.\n",
+    "bst = xgb.Booster(model_file=model_file)\n",
+    "xgb.plot_importance(bst)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Plot the First Tree"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "tree_id = 0\n",
+    "xgb.to_graphviz(bst, tree_id)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 2",
+   "language": "python",
+   "name": "python2"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2",
+   "version": "2.7.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
--- a/demo/CLI/distributed-training/run_aws.sh
+++ b/demo/CLI/distributed-training/run_aws.sh
@@ -0,0 +1,11 @@
+# This is the example script to run distributed xgboost on AWS.
+# Change the following two lines for configuration
+
+export BUCKET=mybucket
+
+# submit the job to YARN
+../../../dmlc-core/tracker/dmlc-submit --cluster=yarn --num-workers=2 --worker-cores=2\
+				       ../../../xgboost mushroom.aws.conf nthread=2\
+				       data=s3://${BUCKET}/xgb-demo/train\
+				       eval[test]=s3://${BUCKET}/xgb-demo/test\
+				       model_dir=s3://${BUCKET}/xgb-demo/model