Group CLI demo into subdirectory. (#6258)
CLI is not most developed interface. Putting them into correct directory can help new users to avoid it as most of the use cases are from a language binding.
This commit is contained in:
27
demo/CLI/distributed-training/README.md
Normal file
27
demo/CLI/distributed-training/README.md
Normal file
@@ -0,0 +1,27 @@
|
||||
Distributed XGBoost Training
|
||||
============================
|
||||
This is an tutorial of Distributed XGBoost Training.
|
||||
Currently xgboost supports distributed training via CLI program with the configuration file.
|
||||
There is also plan push distributed python and other language bindings, please open an issue
|
||||
if you are interested in contributing.
|
||||
|
||||
Build XGBoost with Distributed Filesystem Support
|
||||
-------------------------------------------------
|
||||
To use distributed xgboost, you only need to turn the options on to build
|
||||
with distributed filesystems(HDFS or S3) in cmake.
|
||||
|
||||
```
|
||||
cmake <path/to/xgboost> -DUSE_HDFS=ON -DUSE_S3=ON -DUSE_AZURE=ON
|
||||
```
|
||||
|
||||
|
||||
Step by Step Tutorial on AWS
|
||||
----------------------------
|
||||
Checkout [this tutorial](https://xgboost.readthedocs.org/en/latest/tutorials/aws_yarn.html) for running distributed xgboost.
|
||||
|
||||
|
||||
Model Analysis
|
||||
--------------
|
||||
XGBoost is exchangeable across all bindings and platforms.
|
||||
This means you can use python or R to analyze the learnt model and do prediction.
|
||||
For example, you can use the [plot_model.ipynb](plot_model.ipynb) to visualize the learnt model.
|
||||
27
demo/CLI/distributed-training/mushroom.aws.conf
Normal file
27
demo/CLI/distributed-training/mushroom.aws.conf
Normal file
@@ -0,0 +1,27 @@
|
||||
# General Parameters, see comment for each definition
|
||||
# choose the booster, can be gbtree or gblinear
|
||||
booster = gbtree
|
||||
# choose logistic regression loss function for binary classification
|
||||
objective = binary:logistic
|
||||
|
||||
# Tree Booster Parameters
|
||||
# step size shrinkage
|
||||
eta = 1.0
|
||||
# minimum loss reduction required to make a further partition
|
||||
gamma = 1.0
|
||||
# minimum sum of instance weight(hessian) needed in a child
|
||||
min_child_weight = 1
|
||||
# maximum depth of a tree
|
||||
max_depth = 3
|
||||
|
||||
# Task Parameters
|
||||
# the number of round to do boosting
|
||||
num_round = 2
|
||||
# 0 means do not save any model except the final round model
|
||||
save_period = 0
|
||||
# The path of training data
|
||||
data = "s3://mybucket/xgb-demo/train"
|
||||
# The path of validation data, used to monitor training process, here [test] sets name of the validation set
|
||||
# evaluate on training data as well each round
|
||||
eval_train = 1
|
||||
|
||||
107
demo/CLI/distributed-training/plot_model.ipynb
Normal file
107
demo/CLI/distributed-training/plot_model.ipynb
Normal file
@@ -0,0 +1,107 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# XGBoost Model Analysis\n",
|
||||
"\n",
|
||||
"This notebook can be used to load and analysis model learnt from all xgboost bindings, including distributed training. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import sys\n",
|
||||
"import os\n",
|
||||
"%matplotlib inline "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Please change the ```pkg_path``` and ```model_file``` to be correct path"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"pkg_path = '../../python-package/'\n",
|
||||
"model_file = 's3://my-bucket/xgb-demo/model/0002.model'\n",
|
||||
"sys.path.insert(0, pkg_path)\n",
|
||||
"import xgboost as xgb"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Plot the Feature Importance"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# plot the first two trees.\n",
|
||||
"bst = xgb.Booster(model_file=model_file)\n",
|
||||
"xgb.plot_importance(bst)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Plot the First Tree"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tree_id = 0\n",
|
||||
"xgb.to_graphviz(bst, tree_id)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 2",
|
||||
"language": "python",
|
||||
"name": "python2"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 2
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython2",
|
||||
"version": "2.7.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
}
|
||||
11
demo/CLI/distributed-training/run_aws.sh
Normal file
11
demo/CLI/distributed-training/run_aws.sh
Normal file
@@ -0,0 +1,11 @@
|
||||
# This is the example script to run distributed xgboost on AWS.
|
||||
# Change the following two lines for configuration
|
||||
|
||||
export BUCKET=mybucket
|
||||
|
||||
# submit the job to YARN
|
||||
../../../dmlc-core/tracker/dmlc-submit --cluster=yarn --num-workers=2 --worker-cores=2\
|
||||
../../../xgboost mushroom.aws.conf nthread=2\
|
||||
data=s3://${BUCKET}/xgb-demo/train\
|
||||
eval[test]=s3://${BUCKET}/xgb-demo/test\
|
||||
model_dir=s3://${BUCKET}/xgb-demo/model
|
||||
Reference in New Issue
Block a user