Documenting CSV loading into DMatrix (#3137)
* Support CSV file in DMatrix We'd just need to expose the CSV parser in dmlc-core to the Python wrapper * Revert extra code; document existing CSV support CSV support is already there but undocumented * Add notice about categorical features
This commit is contained in:
committed by
GitHub
parent
d5992dd881
commit
32ea70c1c9
@@ -25,7 +25,9 @@ Data Interface
|
||||
--------------
|
||||
The XGBoost python module is able to load data from:
|
||||
- libsvm txt format file
|
||||
- Numpy 2D array, and
|
||||
- comma-separated values (CSV) file
|
||||
- Numpy 2D array
|
||||
- Scipy 2D sparse array, and
|
||||
- xgboost binary buffer file.
|
||||
|
||||
The data is stored in a ```DMatrix``` object.
|
||||
@@ -35,6 +37,16 @@ The data is stored in a ```DMatrix``` object.
|
||||
dtrain = xgb.DMatrix('train.svm.txt')
|
||||
dtest = xgb.DMatrix('test.svm.buffer')
|
||||
```
|
||||
* To load a CSV file into ```DMatrix```:
|
||||
```python
|
||||
# label_column specifies the index of the column containing the true label
|
||||
dtrain = xgb.DMatrix('train.csv?format=csv&label_column=0')
|
||||
dtest = xgb.DMatrix('test.csv?format=csv&label_column=0')
|
||||
```
|
||||
(Note that XGBoost does not support categorical features; if your data contains
|
||||
categorical features, load it as a numpy array first and then perform
|
||||
[one-hot encoding](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html).)
|
||||
|
||||
* To load a numpy array into ```DMatrix```:
|
||||
```python
|
||||
data = np.random.rand(5, 10) # 5 entities, each contains 10 features
|
||||
|
||||
Reference in New Issue
Block a user