[doc] How to configure regarding to stage-level (#9727)
--------- Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
This commit is contained in:
parent
2cfc90e8db
commit
fa65cf6646
@ -87,8 +87,8 @@ XGBoost PySpark GPU support
|
|||||||
XGBoost PySpark fully supports GPU acceleration. Users are not only able to enable
|
XGBoost PySpark fully supports GPU acceleration. Users are not only able to enable
|
||||||
efficient training but also utilize their GPUs for the whole PySpark pipeline including
|
efficient training but also utilize their GPUs for the whole PySpark pipeline including
|
||||||
ETL and inference. In below sections, we will walk through an example of training on a
|
ETL and inference. In below sections, we will walk through an example of training on a
|
||||||
PySpark standalone GPU cluster. To get started, first we need to install some additional
|
Spark standalone cluster with GPU support. To get started, first we need to install some
|
||||||
packages, then we can set the ``device`` parameter to ``cuda`` or ``gpu``.
|
additional packages, then we can set the ``device`` parameter to ``cuda`` or ``gpu``.
|
||||||
|
|
||||||
Prepare the necessary packages
|
Prepare the necessary packages
|
||||||
==============================
|
==============================
|
||||||
@ -128,7 +128,8 @@ Write your PySpark application
|
|||||||
==============================
|
==============================
|
||||||
|
|
||||||
Below snippet is a small example for training xgboost model with PySpark. Notice that we are
|
Below snippet is a small example for training xgboost model with PySpark. Notice that we are
|
||||||
using a list of feature names and the additional parameter ``device``:
|
using a list of feature names instead of vector type as the input. The parameter ``"device=cuda"``
|
||||||
|
specifically indicates that the training will be performed on a GPU.
|
||||||
|
|
||||||
.. code-block:: python
|
.. code-block:: python
|
||||||
|
|
||||||
@ -163,14 +164,29 @@ using a list of feature names and the additional parameter ``device``:
|
|||||||
predict_df = model.transform(test_df)
|
predict_df = model.transform(test_df)
|
||||||
predict_df.show()
|
predict_df.show()
|
||||||
|
|
||||||
Like other distributed interfaces, the ```device`` parameter doesn't support specifying ordinal as GPUs are managed by Spark instead of XGBoost (good: ``device=cuda``, bad: ``device=cuda:0``).
|
Like other distributed interfaces, the ``device`` parameter doesn't support specifying ordinal as GPUs are managed by Spark instead of XGBoost (good: ``device=cuda``, bad: ``device=cuda:0``).
|
||||||
|
|
||||||
|
.. _stage-level-scheduling:
|
||||||
|
|
||||||
Submit the PySpark application
|
Submit the PySpark application
|
||||||
==============================
|
==============================
|
||||||
|
|
||||||
Assuming you have configured your Spark cluster with GPU support. Otherwise, please
|
Assuming you have configured the Spark standalone cluster with GPU support. Otherwise, please
|
||||||
refer to `spark standalone configuration with GPU support <https://nvidia.github.io/spark-rapids/docs/get-started/getting-started-on-prem.html#spark-standalone-cluster>`_.
|
refer to `spark standalone configuration with GPU support <https://nvidia.github.io/spark-rapids/docs/get-started/getting-started-on-prem.html#spark-standalone-cluster>`_.
|
||||||
|
|
||||||
|
Starting from XGBoost 2.0.1, stage-level scheduling is automatically enabled. Therefore,
|
||||||
|
if you are using Spark standalone cluster version 3.4.0 or higher, we strongly recommend
|
||||||
|
configuring the ``"spark.task.resource.gpu.amount"`` as a fractional value. This will
|
||||||
|
enable running multiple tasks in parallel during the ETL phase. An example configuration
|
||||||
|
would be ``"spark.task.resource.gpu.amount=1/spark.executor.cores"``. However, if you are
|
||||||
|
using a XGBoost version earlier than 2.0.1 or a Spark standalone cluster version below 3.4.0,
|
||||||
|
you still need to set ``"spark.task.resource.gpu.amount"`` equal to ``"spark.executor.resource.gpu.amount"``.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
As of now, the stage-level scheduling feature in XGBoost is limited to the Spark standalone cluster mode.
|
||||||
|
However, we have plans to expand its compatibility to YARN and Kubernetes once Spark 3.5.1 is officially released.
|
||||||
|
|
||||||
.. code-block:: bash
|
.. code-block:: bash
|
||||||
|
|
||||||
export PYSPARK_DRIVER_PYTHON=python
|
export PYSPARK_DRIVER_PYTHON=python
|
||||||
@ -178,19 +194,21 @@ refer to `spark standalone configuration with GPU support <https://nvidia.github
|
|||||||
|
|
||||||
spark-submit \
|
spark-submit \
|
||||||
--master spark://<master-ip>:7077 \
|
--master spark://<master-ip>:7077 \
|
||||||
|
--conf spark.executor.cores=12 \
|
||||||
|
--conf spark.task.cpus=1 \
|
||||||
--conf spark.executor.resource.gpu.amount=1 \
|
--conf spark.executor.resource.gpu.amount=1 \
|
||||||
--conf spark.task.resource.gpu.amount=1 \
|
--conf spark.task.resource.gpu.amount=0.08 \
|
||||||
--archives xgboost_env.tar.gz#environment \
|
--archives xgboost_env.tar.gz#environment \
|
||||||
xgboost_app.py
|
xgboost_app.py
|
||||||
|
|
||||||
|
The above command submits the xgboost pyspark application with the python environment created by pip or conda,
|
||||||
The submit command sends the Python environment created by pip or conda along with the
|
specifying a request for 1 GPU and 12 CPUs per executor. So you can see, a total of 12 tasks per executor will be
|
||||||
specification of GPU allocation. We will revisit this command later on.
|
executed concurrently during the ETL phase.
|
||||||
|
|
||||||
Model Persistence
|
Model Persistence
|
||||||
=================
|
=================
|
||||||
|
|
||||||
Similar to standard PySpark ml estimators, one can persist and reuse the model with ``save`
|
Similar to standard PySpark ml estimators, one can persist and reuse the model with ``save``
|
||||||
and ``load`` methods:
|
and ``load`` methods:
|
||||||
|
|
||||||
.. code-block:: python
|
.. code-block:: python
|
||||||
@ -230,8 +248,13 @@ Accelerate the whole pipeline for xgboost pyspark
|
|||||||
|
|
||||||
With `RAPIDS Accelerator for Apache Spark <https://nvidia.github.io/spark-rapids/>`_, you
|
With `RAPIDS Accelerator for Apache Spark <https://nvidia.github.io/spark-rapids/>`_, you
|
||||||
can leverage GPUs to accelerate the whole pipeline (ETL, Train, Transform) for xgboost
|
can leverage GPUs to accelerate the whole pipeline (ETL, Train, Transform) for xgboost
|
||||||
pyspark without any Python code change. An example submit command is shown below with
|
pyspark without the need for any code modifications. Likewise, you have the option to configure
|
||||||
additional spark configurations and dependencies:
|
the ``"spark.task.resource.gpu.amount"`` setting as a fractional value, enabling a higher
|
||||||
|
number of tasks to be executed in parallel during the ETL phase. please refer to
|
||||||
|
:ref:`stage-level-scheduling` for more details.
|
||||||
|
|
||||||
|
|
||||||
|
An example submit command is shown below with additional spark configurations and dependencies:
|
||||||
|
|
||||||
.. code-block:: bash
|
.. code-block:: bash
|
||||||
|
|
||||||
@ -240,8 +263,10 @@ additional spark configurations and dependencies:
|
|||||||
|
|
||||||
spark-submit \
|
spark-submit \
|
||||||
--master spark://<master-ip>:7077 \
|
--master spark://<master-ip>:7077 \
|
||||||
|
--conf spark.executor.cores=12 \
|
||||||
|
--conf spark.task.cpus=1 \
|
||||||
--conf spark.executor.resource.gpu.amount=1 \
|
--conf spark.executor.resource.gpu.amount=1 \
|
||||||
--conf spark.task.resource.gpu.amount=1 \
|
--conf spark.task.resource.gpu.amount=0.08 \
|
||||||
--packages com.nvidia:rapids-4-spark_2.12:23.04.0 \
|
--packages com.nvidia:rapids-4-spark_2.12:23.04.0 \
|
||||||
--conf spark.plugins=com.nvidia.spark.SQLPlugin \
|
--conf spark.plugins=com.nvidia.spark.SQLPlugin \
|
||||||
--conf spark.sql.execution.arrow.maxRecordsPerBatch=1000000 \
|
--conf spark.sql.execution.arrow.maxRecordsPerBatch=1000000 \
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user