1.4 KiB
1.4 KiB
Using XGBoost External Memory Version
There is no big difference between using external memory version and in-memory version. The only difference is the filename format.
The external memory version takes in the following filename format
filename#cacheprefix
The filename is the normal path to libsvm file you want to load in, cacheprefix is a
path to a cache file that xgboost will use for external memory cache.
The following code was extracted from ../demo/guide-python/external_memory.py
dtrain = xgb.DMatrix('../data/agaricus.txt.train#dtrain.cache')
You can find that there is additional #dtrain.cache following the libsvm file, this is the name of cache file.
For CLI version, simply use "../data/agaricus.txt.train#dtrain.cache" in filename.
Performance Note
- the parameter
nthreadshould be set to number of real cores- Most modern CPU offer hyperthreading, which means you can have a 4 core cpu with 8 threads
- Set nthread to be 4 for maximum performance in such case
Usage Note:
- This is a experimental version
- If you like to try and test it, report results to https://github.com/dmlc/xgboost/issues/244
- Currently only importing from libsvm format is supported
- Contribution of ingestion from other common external memory data source is welcomed