Fix data loading (#4862)
* Fix loading text data. * Fix config regex. * Try to explain the error better in exception. * Update doc.
This commit is contained in:
@@ -28,5 +28,3 @@ Examples
|
||||
* We are super excited to hear about your story, if you have blogposts,
|
||||
tutorials code solutions using XGBoost, please tell us and we will add
|
||||
a link in the example pages.
|
||||
|
||||
|
||||
|
||||
@@ -18,6 +18,8 @@ To verify your installation, run the following in Python:
|
||||
|
||||
import xgboost as xgb
|
||||
|
||||
.. _python_data_interface:
|
||||
|
||||
Data Interface
|
||||
--------------
|
||||
The XGBoost python module is able to load data from:
|
||||
@@ -50,7 +52,7 @@ The data is stored in a :py:class:`DMatrix <xgboost.DMatrix>` object.
|
||||
|
||||
.. note:: Categorical features not supported
|
||||
|
||||
Note that XGBoost does not support categorical features; if your data contains
|
||||
Note that XGBoost does not provide specialization for categorical features; if your data contains
|
||||
categorical features, load it as a NumPy array first and then perform
|
||||
`one-hot encoding <http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html>`_.
|
||||
|
||||
|
||||
@@ -7,6 +7,9 @@ Basic Input Format
|
||||
******************
|
||||
XGBoost currently supports two text formats for ingesting data: LibSVM and CSV. The rest of this document will describe the LibSVM format. (See `this Wikipedia article <https://en.wikipedia.org/wiki/Comma-separated_values>`_ for a description of the CSV format.)
|
||||
|
||||
.. note::
|
||||
* XGBoost does **not** understand file extensions nor try to guess the file format. Instead it employs uri format for specifying input file type. For example if you provide a `csv` file ``./data.train.csv`` as input, XGBoost will use the default libsvm parser to digest it and generate a parser error. Instead, users need to provide an uri in the form of ``train.csv?format=csv``. For external memory input, the uri should of a form similar to ``train.csv?format=csv#dtrain.cache``. See :ref:`python_data_interface` also.
|
||||
|
||||
For training or predicting, XGBoost takes an instance file with the format as below:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
Reference in New Issue
Block a user