289 Commits

Author SHA1 Message Date
Pasha Stetsenko
ff2d4c99fa Update datatable usage (#4123) 2019-02-17 03:44:09 +08:00
Rong Ou
3be1b9ae30 reformat benchmark_tree.py to get rid of lint errors (#4126) 2019-02-13 18:54:56 +13:00
Philip Hyunsu Cho
99a290489c
Update Python docstring for ranking functions (#4121)
* Update Python docstring for ranking functions

* Fix formatting
2019-02-10 12:22:02 -08:00
Jiaming Yuan
1088dff42c Prevent training without setting up caches. (#4066)
* Prevent training without setting up caches.

* Add warning for internal functions.
* Check number of features.

* Address reviewer's comment.
2019-02-03 01:03:29 -08:00
tmitanitky
59f868bc60 enable xgb_model in scklearn XGBClassifier and test. (#4092)
* Enable xgb_model parameter in XGClassifier scikit-learn API

https://github.com/dmlc/xgboost/issues/3049

* add test_XGBClassifier_resume():

test for xgb_model parameter in XGBClassifier API.

* Update test_with_sklearn.py

* Fix lint
2019-01-31 11:29:19 -08:00
Jiaming Yuan
4fac9874e0
Check booster for dart in feature importance. (#4073)
* Check booster for dart in feature importance.
2019-01-22 16:03:54 +08:00
Jiaming Yuan
e0a279114e
Unify logging facilities. (#3982)
* Unify logging facilities.

* Enhance `ConsoleLogger` to handle different verbosity.
* Override macros from `dmlc`.
* Don't use specialized gamma when building with GPU.
* Remove verbosity cache in monitor.
* Test monitor.
* Deprecate `silent`.
* Fix doc and messages.
* Fix python test.
* Fix silent tests.
2018-12-14 19:29:58 +08:00
Sam Wilkinson
fd722d60cd Deprecation warning for lists passed into DMatrix (#3970)
* Ensure lists cannot be passed into DMatrix

The documentation does not include lists as an allowed type for the data inputted into DMatrix. Despite this, a list can be passed in without an error. This change would prevent a list form being passed in directly.
2018-12-14 19:26:11 +08:00
lyxthe
53f695acf2 scikit-learn api section documentation correction (#3967)
* update description of early stopping rounds

the description of early stopping round was quite inconsistent in the scikit-learn api section since the fit paragraph tells that when early stopping rounds occurs, the last iteration is returned not the best one, but the predict paragraph tells that when the predict is called without ntree_limit specified, then ntree_limit is equals to best_ntree_limit.

Thus, when reading the fit part, one could think that it is needed to specify what is the best iter when calling the predict, but when reading the predict part, then the best iter is given by default, it is the last iter that you have to specify if needed.

* Update sklearn.py

* Update sklearn.py

fix doc according to the python_lightweight_test error
2018-12-14 00:27:04 -08:00
Philip Hyunsu Cho
c5130e487a
Fix #3894: Allow loading pickles without self.booster attributes (redux) (#3944) 2018-11-28 09:31:46 -08:00
Philip Hyunsu Cho
f9302a56fb
Fix #3894: Allow loading pickles without self.booster attributes (#3938)
The addition of self.booster attribute broke backward compatibility.
2018-11-23 12:15:50 -08:00
Philip Hyunsu Cho
7d3149a21f
Add AUC-PR to list of metrics to maximize for early stopping (#3936) 2018-11-23 12:15:34 -08:00
Jiaming Yuan
93f63324e6
Address deprecation of Python ABC. (#3909) 2018-11-16 19:43:32 +13:00
Joey Gao
0cd326c1bc Add parameter to make node type configurable in plot tree (#3859)
* add parameters 'conditionNodeParams' and 'leafNodeParams' to function `to_graphviz` enable to configure node type
2018-11-16 17:29:37 +13:00
Philip Hyunsu Cho
c76d993681
Enforce naming style in Python lint (#3896) 2018-11-14 10:35:25 -08:00
Dr. Kashif Rasul
143475b27b use gain for sklearn feature_importances_ (#3876)
* use gain for sklearn feature_importances_

`gain` is a better feature importance criteria than the currently used `weight`

* added importance_type to class

* fixed test

* white space

* fix variable name

* fix deprecation warning

* fix exp array

* white spaces
2018-11-13 03:30:40 -08:00
Philip Hyunsu Cho
ad6e0d55f1
Fix coef_ and intercept_ signature to be compatible with sklearn.RFECV (#3873)
* Fix coef_ and intercept_ signature to be compatible with sklearn.RFECV

* Fix lint

* Fix lint
2018-11-08 19:41:35 -08:00
Jelle Zijlstra
d9642cf757 handle $PATH not being set in python library (#3845)
Fixes #3844
2018-11-06 15:27:02 -08:00
Nikita Titov
1bf4083dc6 open README with utf-8 and add gcc-8 (#3867) 2018-11-06 14:53:33 -08:00
Philip Hyunsu Cho
78ec77fa97
Release 0.81 version (#3864)
* Release 0.81 version

* Update NEWS.md
2018-11-04 05:49:11 -08:00
Philip Hyunsu Cho
e04ab56b57
Fix #3747: Add coef_ and intercept_ as properties of sklearn wrapper (#3855)
* Fix #3747: Add coef_ and intercept_ as properties of sklearn wrapper

Scikit-learn expects linear learners to expose `coef_` and `intercept_`
as properties.

Closes #3747.

* Fix lint
2018-11-02 01:44:37 -07:00
Rory Mitchell
42200ec03e
Allow XGBRanker sklearn interface to use other xgboost ranking objectives (#3848) 2018-11-01 13:34:25 +13:00
Philip Hyunsu Cho
d83c818000
Recommend pickling as the way to save XGBClassifier / XGBRegressor / XGBRanker (#3829)
The `save_model()` and `load_model()` method only saves the part of the model
that's common to all language interfaces and do not preserve Python-specific
attributes, such as `feature_names`. More crucially, label encoder is not
preserved either; this is needed for the scikit-learn wrapper, since you may
have string labels.

Fix: Explicitly recommend pickling as the way to save scikit-learn model
objects.
2018-10-25 11:12:41 -07:00
Rory Mitchell
5d6baed998
Allow sklearn grid search over parameters specified as kwargs (#3791) 2018-10-14 12:44:53 +13:00
Philip Hyunsu Cho
ea99b53d8e
Document behavior of get_fscore() for zero-importance features (#3763) 2018-10-08 01:52:25 -07:00
Philip Hyunsu Cho
10cd7c8447
Fix #3714: preserve feature names when slicing DMatrix (#3766)
* Fix #3714: preserve feature names when slicing DMatrix

* Add test
2018-10-08 01:04:33 -07:00
Philip Hyunsu Cho
c23783a0d1
Add notes to doc (#3765) 2018-10-07 14:09:09 -07:00
Takahiro Kojima
2405c59352 remove extra of (#3713) 2018-09-21 11:55:39 -07:00
Philip Hyunsu Cho
bd41bd6605
Better error message for failed library loading (#3690)
* Better error message for failed lib loading

* Address review comment + fix lint
2018-09-12 22:37:26 -07:00
Philip Hyunsu Cho
3564b68b98
Fix #3397: early_stop callback does not maximize metric of form NDCG@n- (#3685)
* Fix #3397: early_stop callback does not maximize metric of form NDCG@n-

Early stopping callback makes splits with '-' letter, which interferes
with metrics of form NDCG@n-. As a result, XGBoost tries to minimize
NDCG@n-, where it should be maximized instead.

Fix. Specify maxsplit=1.

* Python 2.x compatibility fix
2018-09-08 19:46:25 -07:00
mrgutkun
4b43810f51 Fix #3663: Allow sklearn API to use callbacks (#3682)
* Fix #3663: Allow sklearn API to use callbacks

* Fix lint

* Add Callback API to Python API doc
2018-09-07 13:51:26 -07:00
Philip Hyunsu Cho
5a8bbb39a1
Revert #3677 and #3674 (#3678)
* Revert "Add scikit-learn as dependency for doc build (#3677)"

This reverts commit 308f664ade0547242608e21f6198c895415f03da.

* Revert "Add scikit-learn tests (#3674)"

This reverts commit d176a0fbc8165e3afe3e42ff464ab7b253211555.
2018-09-06 20:43:17 -07:00
Philip Hyunsu Cho
d176a0fbc8
Add scikit-learn tests (#3674)
* Add scikit-learn tests

Goal is to pass scikit-learn's check_estimator() for XGBClassifier,
XGBRegressor, and XGBRanker. It is actually not possible to do so
entirely, since check_estimator() assumes that NaN is disallowed,
but XGBoost allows for NaN as missing values. However, it is always
good ideas to add some checks inspired by check_estimator().

* Fix lint

* Fix lint
2018-09-06 09:55:28 -07:00
Philip Hyunsu Cho
86d88c0758
Fix #3648: XGBClassifier.predict() should return margin scores when output_margin=True (#3651)
* Fix #3648: XGBClassifier.predict() should return margin scores when output_margin=True

* Fix tests to reflect correct implementation of XGBClassifier.predict(output_margin=True)

* Fix flaky test test_with_sklearn.test_sklearn_api_gblinear
2018-08-30 21:05:05 -07:00
Philip Hyunsu Cho
7b1427f926
Add validate_features parameter to sklearn API (#3653) 2018-08-29 23:21:46 -07:00
Philip Hyunsu Cho
4ed8a88240
Update Python API doc (#3619)
* Add XGBRanker to Python API doc

* Show inherited members of XGBRegressor in API doc, since XGBRegressor uses default methods from XGBModel

* Add table of contents to Python API doc

* Skip JVM doc download if not available

* Show inherited members for XGBRegressor and XGBRanker

* Expose XGBRanker to Python XGBoost module directory

* Add docstring to XGBRegressor.predict() and XGBRanker.predict()

* Fix rendering errors in Python docstrings

* Fix lint
2018-08-22 18:59:30 -07:00
Shiki-H
24a268a2e3 sklearn api for ranking (#3560)
* added xgbranker

* fixed predict method and ranking test

* reformatted code in accordance with pep8

* fixed lint error

* fixed docstring and added checks on objective

* added ranking demo for python

* fixed suffix in rank.py
2018-08-21 08:26:48 -07:00
Grace Lam
993e62b9e7 Add JSON model dump functionality (#3603)
* Add JSON model dump functionality

* Fix lint
2018-08-17 16:18:43 -07:00
trivialfis
7c82dc92b2 Fix accessing DMatrix.handle before set. (#3599)
Close #3597.
2018-08-16 15:26:06 -07:00
Philip Hyunsu Cho
96826a3515
Release version 0.80 (#3541)
* Up versions

* Write release note for 0.80
2018-08-13 01:38:37 -07:00
Philip Hyunsu Cho
3c72654e3b
Revert "Fix #3485, #3540: Don't use dropout for predicting test sets" (#3563)
* Revert "Fix #3485, #3540: Don't use dropout for predicting test sets (#3556)"

This reverts commit 44811f233071c5805d70c287abd22b155b732727.

* Document behavior of predict() for DART booster

* Add notice to parameter.rst
2018-08-08 09:48:55 -07:00
wenduowang
3b62e75f2e Fix bug of using list(x) function when x is string (#3432)
* Fix bug of using list(x) function when x is string

list('abcdcba') = ['a', 'b', 'c', 'd', 'c', 'b', 'a']

* Allow feature_names/feature_types to be of any type

If feature_names/feature_types is iterable, e.g. tuple, list, then convert the value to list, except for string; otherwise construct a list with a single value

* Delete excess whitespace

* Fix whitespace to pass lint
2018-07-30 07:36:34 -07:00
jqmp
e9a97e0d88 Add total_gain and total_cover importance measures (#3498)
Add `'total_gain'` and `'total_cover'` as possible `importance_type`
arguments to `Booster.get_score` in the Python package.

`get_score` already accepts a `'gain'` argument, which returns each
feature's average gain over all of its splits.  `'total_gain'` does the
same, but returns a total rather than an average.  This seems more
intuitively meaningful, and also matches the behavior of the R package's
`xgb.importance` function.

I also added an analogous `'total_cover'` command for consistency.

This should resolve #3484.
2018-07-23 00:30:55 -07:00
KOLANICH
a393d44c5d Improved library loading a bit (#3481)
* Improved library loading a bit

* Fixed indentation.

* Fixes according to the discussion

* Moved the comment to a separate line.
* specified exception type
2018-07-20 16:03:44 -07:00
Philip Hyunsu Cho
8e90b60c4d
Fix relpath in setup.py on Windows (#3493)
* Fix relpath in setup.py on Windows

Fixes #3480.

* Use only one lib file; use 4 space indent
2018-07-20 12:28:08 -07:00
kodonnell
6bed54ac39 python sklearn api: defaulting to best_ntree_limit if defined, otherwise current behaviour (#3445)
* python sklearn api: defaulting to best_ntree_limit if defined, otherwise current behaviour

* Fix whitespace
2018-07-08 14:35:52 -07:00
Philip Hyunsu Cho
66e74d2223 Fix get_uint_info() (#3442)
* Add regression test
2018-07-05 20:06:59 -07:00
Philip Hyunsu Cho
48d6e68690
Add callback interface to re-direct console output (#3438)
* Add callback interface to re-direct console output

* Exempt TrackerLogger from custom logging

* Fix lint
2018-07-05 11:32:30 -07:00
Oliver Laslett
18813a26ab allow arbitrary cross validation fold indices (#3353)
* allow arbitrary cross validation fold indices

 - use training indices passed to `folds` parameter in `training.cv`
 - update doc string

* add tests for arbitrary fold indices
2018-06-30 19:23:49 +00:00
Mike Liu
594bcea83e Save and load model in sklearn API (#3192)
* Add (load|save)_model to XGBModel

* Add docstring

* Fix docstring

* Fix mixed use of space and tab

* Add a test

* Fix Flake8 style errors
2018-06-30 19:21:49 +00:00