- Pandas DataFrame supports more dtypes than 'int64', 'float64' and 'bool', therefor added a bunch of extra dtypes for the data variable.
- From now on the label variable can be a Pandas DataFrame with the same dtypes as the data variable.
- If label is a Pandas DataFrame will be converted to float.
- If no feature_types is set, the data dtypes will be converted to 'int' or 'float'.
- The feature_names may contain every character except [, ] or <
Changed the name of eval_results to evals_result, so that the naming is the same in training.py and sklearn.py
Made the structure of evals_result the same as in training.py, the names of the keys are different:
In sklearn.py you cannot name your evals_result, but they are automatically called 'validation_0', 'validation_1' etc.
The dict evals_result will output something like: {'validation_0': {'logloss': ['0.674800', '0.657121']}, 'validation_1': {'logloss': ['0.63776', '0.58372']}}
In training.py you can name your multiple evals_result with a watchlist like: watchlist = [(dtest,'eval'), (dtrain,'train')]
The dict evals_result will output something like: {'train': {'logloss': ['0.68495', '0.67691']}, 'eval': {'logloss': ['0.684877', '0.676767']}}
You can access the evals_result using the evals_result() function.
Made changes to training.py to make sure all eval_metric information get passed to evals_result. Previous version lost and mislabeled data in evals_result when using more than one eval_metric.
Structure of eval_metric is now:
eval_metric[evals][eval_metric] = list of metrics
Example:
>>> dtrain = xgb.DMatrix('agaricus.txt.train', silent=True)
>>> dtest = xgb.DMatrix('agaricus.txt.test', silent=True)
>>> param = [('max_depth', 2), ('objective', 'binary:logistic'), ('bst:eta', 0.01), ('eval_metric', 'logloss'), ('eval_metric', 'error')]
>>> watchlist = [(dtest,'eval'), (dtrain,'train')]
>>> num_round = 3
>>> evals_result = {}
>>> bst = xgb.train(param, dtrain, num_round, watchlist, evals_result=evals_result)
>>> print(evals_result['eval']['logloss'])
>>> print(evals_result)
Prints:
['0.684877', '0.676767', '0.668817']
{'train': {'logloss': ['0.684954', '0.676917', '0.669036'], 'error': ['0.04652', '0.04652', '0.04652']}, 'eval': {'logloss': ['0.684877', '0.676767', '0.668817'], 'error': ['0.042831', '0.042831', '0.042831']}}
Currently `pip install xgboost` will raise traceback like this
```
Traceback (most recent call last):
File "<string>", line 20, in <module>
File "/tmp/pip-build-IAdqYE/xgboost/setup.py", line 20, in <module>
import xgboost
File "./xgboost/__init__.py", line 8, in <module>
from .core import DMatrix, Booster
File "./xgboost/core.py", line 12, in <module>
import numpy as np
ImportError: No module named numpy
```
We should avoid importing numpy in setup.py and let pip install numpy and scipy automatically.
That's what `install_requires` for.