* added new function to calculate other feature importances
* added capability to plot other feature importance measures
* changed plotting default to fscore
* added info on importance_type to boilerplate comment
* updated text of error statement
* added self module name to fix call
* added unit test for feature importances
* style fixes
This error message can be hard to understand when there are several fields, as shown in the example below. This improves the error message, letting the user know which fields were unexpected or missing.
import xgboost as xgb
import pandas as pd
train = pd.DataFrame({'a':[1], 'b':[2], 'c':[3], 'd':[4], 'f':[2], 'g':2, 'etc etc etc':[11]})
dtrain = xgb.DMatrix(train.drop('d', axis=1), train.d)
test = pd.DataFrame({'a':[1], 'b':[2], 'c':[1], 'd':[4], 'e':[2], 'f':[2], 'g':2, 'etc etc etc':[11]})
dtest = xgb.DMatrix(test)
modl = xgb.train({}, dtrain)
modl.predict(dtest)
# ValueError: feature_names mismatch: [u'a', u'b', u'c', u'etc etc etc', u'f', u'g'] [u'a', u'b', u'c', u'd', u'e', u'etc etc etc', u'f', u'g']
- allows feval to return a list of tuples (name, error/score value)
- changed behavior for multiple eval_metrics in conjunction with
early_stopping: Instead of raising an error, the last passed evel_metric
(or last entry in return value of feval) is used for early stopping
- allows list of eval_metrics in dict-typed params
- unittest for new features / behavior
documentation updated
- example for assigning a list to 'eval_metric'
- note about early stopping on last passed eval metric
- info msg for used eval metric added
- Pandas DataFrame supports more dtypes than 'int64', 'float64' and 'bool', therefor added a bunch of extra dtypes for the data variable.
- From now on the label variable can be a Pandas DataFrame with the same dtypes as the data variable.
- If label is a Pandas DataFrame will be converted to float.
- If no feature_types is set, the data dtypes will be converted to 'int' or 'float'.
- The feature_names may contain every character except [, ] or <
Currently `pip install xgboost` will raise traceback like this
```
Traceback (most recent call last):
File "<string>", line 20, in <module>
File "/tmp/pip-build-IAdqYE/xgboost/setup.py", line 20, in <module>
import xgboost
File "./xgboost/__init__.py", line 8, in <module>
from .core import DMatrix, Booster
File "./xgboost/core.py", line 12, in <module>
import numpy as np
ImportError: No module named numpy
```
We should avoid importing numpy in setup.py and let pip install numpy and scipy automatically.
That's what `install_requires` for.