xgboost/doc/external_memory.md
2015-04-19 01:00:37 -07:00

1.4 KiB

Using XGBoost External Memory Version(beta)

There is no big difference between using external memory version and in-memory version. The only difference is the filename format.

The external memory version takes in the following filename format

filename#cacheprefix

The filename is the normal path to libsvm file you want to load in, cacheprefix is a path to a cache file that xgboost will use for external memory cache.

The following code was extracted from ../demo/guide-python/external_memory.py

dtrain = xgb.DMatrix('../data/agaricus.txt.train#dtrain.cache')

You can find that there is additional #dtrain.cache following the libsvm file, this is the name of cache file. For CLI version, simply use "../data/agaricus.txt.train#dtrain.cache" in filename.

Performance Note

  • the parameter nthread should be set to number of real cores
    • Most modern CPU offer hyperthreading, which means you can have a 4 core cpu with 8 threads
    • Set nthread to be 4 for maximum performance in such case

Usage Note:

  • This is a experimental version
  • Currently only importing from libsvm format is supported
    • Contribution of ingestion from other common external memory data source is welcomed