Group CLI demo into subdirectory. (#6258)

CLI is not most developed interface. Putting them into correct directory can help new users to avoid it as most of the use cases are from a language binding.
This commit is contained in:
Jiaming Yuan
2020-10-29 05:40:44 +08:00
committed by GitHub
parent 6383757dca
commit dfac5f89e9
32 changed files with 146 additions and 100 deletions

View File

@@ -0,0 +1,27 @@
Distributed XGBoost Training
============================
This is an tutorial of Distributed XGBoost Training.
Currently xgboost supports distributed training via CLI program with the configuration file.
There is also plan push distributed python and other language bindings, please open an issue
if you are interested in contributing.
Build XGBoost with Distributed Filesystem Support
-------------------------------------------------
To use distributed xgboost, you only need to turn the options on to build
with distributed filesystems(HDFS or S3) in cmake.
```
cmake <path/to/xgboost> -DUSE_HDFS=ON -DUSE_S3=ON -DUSE_AZURE=ON
```
Step by Step Tutorial on AWS
----------------------------
Checkout [this tutorial](https://xgboost.readthedocs.org/en/latest/tutorials/aws_yarn.html) for running distributed xgboost.
Model Analysis
--------------
XGBoost is exchangeable across all bindings and platforms.
This means you can use python or R to analyze the learnt model and do prediction.
For example, you can use the [plot_model.ipynb](plot_model.ipynb) to visualize the learnt model.

View File

@@ -0,0 +1,27 @@
# General Parameters, see comment for each definition
# choose the booster, can be gbtree or gblinear
booster = gbtree
# choose logistic regression loss function for binary classification
objective = binary:logistic
# Tree Booster Parameters
# step size shrinkage
eta = 1.0
# minimum loss reduction required to make a further partition
gamma = 1.0
# minimum sum of instance weight(hessian) needed in a child
min_child_weight = 1
# maximum depth of a tree
max_depth = 3
# Task Parameters
# the number of round to do boosting
num_round = 2
# 0 means do not save any model except the final round model
save_period = 0
# The path of training data
data = "s3://mybucket/xgb-demo/train"
# The path of validation data, used to monitor training process, here [test] sets name of the validation set
# evaluate on training data as well each round
eval_train = 1

View File

@@ -0,0 +1,107 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# XGBoost Model Analysis\n",
"\n",
"This notebook can be used to load and analysis model learnt from all xgboost bindings, including distributed training. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import sys\n",
"import os\n",
"%matplotlib inline "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Please change the ```pkg_path``` and ```model_file``` to be correct path"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"pkg_path = '../../python-package/'\n",
"model_file = 's3://my-bucket/xgb-demo/model/0002.model'\n",
"sys.path.insert(0, pkg_path)\n",
"import xgboost as xgb"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Plot the Feature Importance"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# plot the first two trees.\n",
"bst = xgb.Booster(model_file=model_file)\n",
"xgb.plot_importance(bst)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Plot the First Tree"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"tree_id = 0\n",
"xgb.to_graphviz(bst, tree_id)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}

View File

@@ -0,0 +1,11 @@
# This is the example script to run distributed xgboost on AWS.
# Change the following two lines for configuration
export BUCKET=mybucket
# submit the job to YARN
../../../dmlc-core/tracker/dmlc-submit --cluster=yarn --num-workers=2 --worker-cores=2\
../../../xgboost mushroom.aws.conf nthread=2\
data=s3://${BUCKET}/xgb-demo/train\
eval[test]=s3://${BUCKET}/xgb-demo/test\
model_dir=s3://${BUCKET}/xgb-demo/model