- allows feval to return a list of tuples (name, error/score value)
- changed behavior for multiple eval_metrics in conjunction with
early_stopping: Instead of raising an error, the last passed evel_metric
(or last entry in return value of feval) is used for early stopping
- allows list of eval_metrics in dict-typed params
- unittest for new features / behavior
documentation updated
- example for assigning a list to 'eval_metric'
- note about early stopping on last passed eval metric
- info msg for used eval metric added
- Pandas DataFrame supports more dtypes than 'int64', 'float64' and 'bool', therefor added a bunch of extra dtypes for the data variable.
- From now on the label variable can be a Pandas DataFrame with the same dtypes as the data variable.
- If label is a Pandas DataFrame will be converted to float.
- If no feature_types is set, the data dtypes will be converted to 'int' or 'float'.
- The feature_names may contain every character except [, ] or <
Changed the name of eval_results to evals_result, so that the naming is the same in training.py and sklearn.py
Made the structure of evals_result the same as in training.py, the names of the keys are different:
In sklearn.py you cannot name your evals_result, but they are automatically called 'validation_0', 'validation_1' etc.
The dict evals_result will output something like: {'validation_0': {'logloss': ['0.674800', '0.657121']}, 'validation_1': {'logloss': ['0.63776', '0.58372']}}
In training.py you can name your multiple evals_result with a watchlist like: watchlist = [(dtest,'eval'), (dtrain,'train')]
The dict evals_result will output something like: {'train': {'logloss': ['0.68495', '0.67691']}, 'eval': {'logloss': ['0.684877', '0.676767']}}
You can access the evals_result using the evals_result() function.
Made changes to training.py to make sure all eval_metric information get passed to evals_result. Previous version lost and mislabeled data in evals_result when using more than one eval_metric.
Structure of eval_metric is now:
eval_metric[evals][eval_metric] = list of metrics
Example:
>>> dtrain = xgb.DMatrix('agaricus.txt.train', silent=True)
>>> dtest = xgb.DMatrix('agaricus.txt.test', silent=True)
>>> param = [('max_depth', 2), ('objective', 'binary:logistic'), ('bst:eta', 0.01), ('eval_metric', 'logloss'), ('eval_metric', 'error')]
>>> watchlist = [(dtest,'eval'), (dtrain,'train')]
>>> num_round = 3
>>> evals_result = {}
>>> bst = xgb.train(param, dtrain, num_round, watchlist, evals_result=evals_result)
>>> print(evals_result['eval']['logloss'])
>>> print(evals_result)
Prints:
['0.684877', '0.676767', '0.668817']
{'train': {'logloss': ['0.684954', '0.676917', '0.669036'], 'error': ['0.04652', '0.04652', '0.04652']}, 'eval': {'logloss': ['0.684877', '0.676767', '0.668817'], 'error': ['0.042831', '0.042831', '0.042831']}}