[doc] Some notes for external memory. (#5065)
This commit is contained in:
@@ -4,24 +4,34 @@ Using XGBoost External Memory Version (beta)
|
||||
There is no big difference between using external memory version and in-memory version.
|
||||
The only difference is the filename format.
|
||||
|
||||
The external memory version takes in the following filename format:
|
||||
The external memory version takes in the following `URI <https://en.wikipedia.org/wiki/Uniform_Resource_Identifier>`_ format:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
filename#cacheprefix
|
||||
|
||||
The ``filename`` is the normal path to libsvm file you want to load in, and ``cacheprefix`` is a
|
||||
path to a cache file that XGBoost will use for external memory cache.
|
||||
The ``filename`` is the normal path to libsvm format file you want to load in, and
|
||||
``cacheprefix`` is a path to a cache file that XGBoost will use for caching preprocessed
|
||||
data in binary form.
|
||||
|
||||
.. note:: External memory is also available with GPU algorithms (i.e. when ``tree_method`` is set to ``gpu_hist``)
|
||||
|
||||
The following code was extracted from `demo/guide-python/external_memory.py <https://github.com/dmlc/xgboost/blob/master/demo/guide-python/external_memory.py>`_:
|
||||
To provide a simple example for illustration, extracting the code from
|
||||
`demo/guide-python/external_memory.py <https://github.com/dmlc/xgboost/blob/master/demo/guide-python/external_memory.py>`_. If
|
||||
you have a dataset stored in a file similar to ``agaricus.txt.train`` with libSVM format, the external memory support can be enabled by:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
dtrain = DMatrix('../data/agaricus.txt.train#dtrain.cache')
|
||||
|
||||
XGBoost will first load ``agaricus.txt.train`` in, preprocess it, then write to a new file named
|
||||
``dtrain.cache`` as an on disk cache for storing preprocessed data in a internal binary format. For
|
||||
more notes about text input formats, see :doc:`/tutorials/input_format`.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
dtrain = xgb.DMatrix('../data/agaricus.txt.train#dtrain.cache')
|
||||
|
||||
You can find that there is additional ``#dtrain.cache`` following the libsvm file, this is the name of cache file.
|
||||
For CLI version, simply add the cache suffix, e.g. ``"../data/agaricus.txt.train#dtrain.cache"``.
|
||||
|
||||
****************
|
||||
@@ -47,7 +57,7 @@ so that you can directly use ``dtrain.cache`` to cache to current folder.
|
||||
**********
|
||||
Usage Note
|
||||
**********
|
||||
* This is a experimental version
|
||||
* This is an experimental version
|
||||
* Currently only importing from libsvm format is supported
|
||||
* OSX is not tested.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user