Add tutorial for distributed training and batch prediction with Kubernetes (#4621)

* provide the readme

* update for format

* reformat

* reformat -2

* update again

* update format

* update w.r.t yinlou's comments

* Add kubernetes tutorial to Table of Contents

* Style edit
This commit is contained in:
Mingjie Tang 2019-07-14 23:27:27 -07:00 committed by Philip Hyunsu Cho
parent 3e339d9557
commit beb7b295a8
2 changed files with 37 additions and 0 deletions

View File

@ -11,6 +11,7 @@ See `Awesome XGBoost <https://github.com/dmlc/xgboost/tree/master/demo>`_ for mo
model model
Distributed XGBoost with AWS YARN <aws_yarn> Distributed XGBoost with AWS YARN <aws_yarn>
kubernetes
Distributed XGBoost with XGBoost4J-Spark <https://xgboost.readthedocs.io/en/latest/jvm/xgboost4j_spark_tutorial.html> Distributed XGBoost with XGBoost4J-Spark <https://xgboost.readthedocs.io/en/latest/jvm/xgboost4j_spark_tutorial.html>
dart dart
monotonic monotonic

View File

@ -0,0 +1,36 @@
###################################
Distributed XGBoost with Kubernetes
###################################
Kubeflow community provides `XGBoost Operator <https://github.com/kubeflow/xgboost-operator>`_ to support distributed XGBoost training and batch prediction in a Kubernetes cluster. It provides an easy and efficient XGBoost model training and batch prediction in distributed fashion.
**********
How to use
**********
In order to run a XGBoost job in a Kubernetes cluster, carry out the following steps:
1. Install XGBoost Operator in Kubernetes.
a. XGBoost Operator is designed to manage XGBoost jobs, including job scheduling, monitoring, pods and services recovery etc. Follow the `installation guide <https://github.com/kubeflow/xgboost-operator#installing-xgboost-operator>`_ to install XGBoost Operator.
2. Write application code to interface with the XGBoost operator.
a. You'll need to furnish a few scripts to inteface with the XGBoost operator. Refer to the `Iris classification example <https://github.com/kubeflow/xgboost-operator/tree/master/config/samples/xgboost-dist>`_.
b. Data reader/writer: you need to have your data source reader and writer based on the requirement. For example, if your data is stored in a Hive Table, you have to write your own code to read/write Hive table based on the ID of worker.
c. Model persistence: in this example, model is stored in the OSS storage. If you want to store your model into Amazon S3, Google NFS or other storage, you'll need to specify the model reader and writer based on the requirement of storage system.
3. Configure the XGBoost job using a YAML file.
a. YAML file is used to configure the computation resource and environment for your XGBoost job to run, e.g. the number of workers and masters. The template `YAML template <https://github.com/kubeflow/xgboost-operator/blob/master/config/samples/xgboost-dist/xgboostjob_v1alpha1_iris_train.yaml>`_ is provided for reference.
4. Submit XGBoost job to Kubernetes cluster.
a. `Kubectl command <https://github.com/kubeflow/xgboost-operator#creating-a-xgboost-trainingprediction-job>`_ is used to submit a XGBoost job, and then you can monitor the job status.
****************
Work in progress
****************
- XGBoost Model serving
- Distributed data reader/writer from/to HDFS, HBase, Hive etc.
- Model persistence on Amazon S3, Google NFS etc.