Doc modernization (#3474)

* Change doc build to reST exclusively

* Rewrite Intro doc in reST; create toctree

* Update parameter and contribute

* Convert tutorials to reST

* Convert Python tutorials to reST

* Convert CLI and Julia docs to reST

* Enable markdown for R vignettes

* Done migrating to reST

* Add guzzle_sphinx_theme to requirements

* Add breathe to requirements

* Fix search bar

* Add link to user forum
This commit is contained in:
Philip Hyunsu Cho
2018-07-19 14:22:16 -07:00
committed by GitHub
parent c004cea788
commit 05b089405d
57 changed files with 2833 additions and 3957 deletions

View File

@@ -0,0 +1,51 @@
############################################
Using XGBoost External Memory Version (beta)
############################################
There is no big difference between using external memory version and in-memory version.
The only difference is the filename format.
The external memory version takes in the following filename format:
.. code-block:: none
filename#cacheprefix
The ``filename`` is the normal path to libsvm file you want to load in, and ``cacheprefix`` is a
path to a cache file that XGBoost will use for external memory cache.
The following code was extracted from `demo/guide-python/external_memory.py <https://github.com/dmlc/xgboost/blob/master/demo/guide-python/external_memory.py>`_:
.. code-block:: python
dtrain = xgb.DMatrix('../data/agaricus.txt.train#dtrain.cache')
You can find that there is additional ``#dtrain.cache`` following the libsvm file, this is the name of cache file.
For CLI version, simply add the cache suffix, e.g. ``"../data/agaricus.txt.train#dtrain.cache"``.
****************
Performance Note
****************
* the parameter ``nthread`` should be set to number of **physical** cores
- Most modern CPUs use hyperthreading, which means a 4 core CPU may carry 8 threads
- Set ``nthread`` to be 4 for maximum performance in such case
*******************
Distributed Version
*******************
The external memory mode naturally works on distributed version, you can simply set path like
.. code-block:: none
data = "hdfs://path-to-data/#dtrain.cache"
XGBoost will cache the data to the local position. When you run on YARN, the current folder is temporal
so that you can directly use ``dtrain.cache`` to cache to current folder.
**********
Usage Note
**********
* This is a experimental version
* Currently only importing from libsvm format is supported
- Contribution of ingestion from other common external memory data source is welcomed