Cosmetic fixes in faq.rst (#6161)

This commit is contained in:
Zeno Gantner 2020-09-24 15:05:10 +02:00 committed by GitHub
parent 14afdb4d92
commit 5b05f88ba9
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -18,33 +18,33 @@ See :doc:`Introduction to Boosted Trees </tutorials/model>`.
I have a big dataset I have a big dataset
******************** ********************
XGBoost is designed to be memory efficient. Usually it can handle problems as long as the data fit into your memory. XGBoost is designed to be memory efficient. Usually it can handle problems as long as the data fit into your memory.
(This usually means millions of instances) This usually means millions of instances.
If you are running out of memory, checkout :doc:`external memory version </tutorials/external_memory>` or If you are running out of memory, checkout :doc:`external memory version </tutorials/external_memory>` or
:doc:`distributed version </tutorials/aws_yarn>` of XGBoost. :doc:`distributed version </tutorials/aws_yarn>` of XGBoost.
************************************************** **************************************************
Running XGBoost on Platform X (Hadoop/Yarn, Mesos) Running XGBoost on platform X (Hadoop/Yarn, Mesos)
************************************************** **************************************************
The distributed version of XGBoost is designed to be portable to various environment. The distributed version of XGBoost is designed to be portable to various environment.
Distributed XGBoost can be ported to any platform that supports `rabit <https://github.com/dmlc/rabit>`_. Distributed XGBoost can be ported to any platform that supports `rabit <https://github.com/dmlc/rabit>`_.
You can directly run XGBoost on Yarn. In theory Mesos and other resource allocation engines can be easily supported as well. You can directly run XGBoost on Yarn. In theory Mesos and other resource allocation engines can be easily supported as well.
***************************************************************** ******************************************************************
Why not implement distributed XGBoost on top of X (Spark, Hadoop) Why not implement distributed XGBoost on top of X (Spark, Hadoop)?
***************************************************************** ******************************************************************
The first fact we need to know is going distributed does not necessarily solve all the problems. The first fact we need to know is going distributed does not necessarily solve all the problems.
Instead, it creates more problems such as more communication overhead and fault tolerance. Instead, it creates more problems such as more communication overhead and fault tolerance.
The ultimate question will still come back to how to push the limit of each computation node The ultimate question will still come back to how to push the limit of each computation node
and use less resources to complete the task (thus with less communication and chance of failure). and use less resources to complete the task (thus with less communication and chance of failure).
To achieve these, we decide to reuse the optimizations in the single node XGBoost and build distributed version on top of it. To achieve these, we decide to reuse the optimizations in the single node XGBoost and build the distributed version on top of it.
The demand of communication in machine learning is rather simple, in the sense that we can depend on a limited set of API (in our case rabit). The demand of communication in machine learning is rather simple, in the sense that we can depend on a limited set of APIs (in our case rabit).
Such design allows us to reuse most of the code, while being portable to major platforms such as Hadoop/Yarn, MPI, SGE. Such design allows us to reuse most of the code, while being portable to major platforms such as Hadoop/Yarn, MPI, SGE.
Most importantly, it pushes the limit of the computation resources we can use. Most importantly, it pushes the limit of the computation resources we can use.
***************************************** ****************************************
How can I port the model to my own system How can I port a model to my own system?
***************************************** ****************************************
The model and data format of XGBoost is exchangeable, The model and data format of XGBoost is exchangeable,
which means the model trained by one language can be loaded in another. which means the model trained by one language can be loaded in another.
This means you can train the model using R, while running prediction using This means you can train the model using R, while running prediction using
@ -52,15 +52,15 @@ Java or C++, which are more common in production systems.
You can also train the model using distributed versions, You can also train the model using distributed versions,
and load them in from Python to do some interactive analysis. and load them in from Python to do some interactive analysis.
************************* **************************
Do you support LambdaMART Do you support LambdaMART?
************************* **************************
Yes, XGBoost implements LambdaMART. Checkout the objective section in :doc:`parameters </parameter>`. Yes, XGBoost implements LambdaMART. Checkout the objective section in :doc:`parameters </parameter>`.
****************************** *******************************
How to deal with Missing Value How to deal with missing values
****************************** *******************************
XGBoost supports missing value by default. XGBoost supports missing values by default.
In tree algorithms, branch directions for missing values are learned during training. In tree algorithms, branch directions for missing values are learned during training.
Note that the gblinear booster treats missing values as zeros. Note that the gblinear booster treats missing values as zeros.