152 Commits

Author SHA1 Message Date
Mike Liu
594bcea83e Save and load model in sklearn API (#3192)
* Add (load|save)_model to XGBModel

* Add docstring

* Fix docstring

* Fix mixed use of space and tab

* Add a test

* Fix Flake8 style errors
2018-06-30 19:21:49 +00:00
Yanbo Liang
b018ef104f Remove output_margin from XGBClassifier.predict_proba argument list. (#3343) 2018-05-28 10:30:21 -07:00
pdavalo
480e3fd764 Sklearn: validation set weights (#2354)
* Add option to use weights when evaluating metrics in validation sets

* Add test for validation-set weights functionality

* simplify case with no weights for test sets

* fix lint issues
2018-05-23 17:06:20 -07:00
Felipe Arruda Pontes
81d1b17f9c adding some docs based on core.Boost.predict (#1865) 2018-02-09 06:38:38 -08:00
csgwma
33ac8a0927 delete duplicated code in python-package (#2985) 2017-12-30 20:26:35 +08:00
Julian Niedermeier
9a81c74a7b Add xgb_model parameter to sklearn fit (#2623)
Adding xgb_model paramter allows the continuation of model training.
Model has to be saved by calling `model.get_booster().save_model(path)`
2017-10-01 08:47:17 -04:00
Andrew Hannigan
5c9f0ff9d9 Check existance of seed/nthread keys before checking their value. (#2669) 2017-09-27 03:05:59 -04:00
Tsukasa OMOTO
8d15024ac7 python: follow the default warning filters of Python (#2666)
* python: follow the default warning filters of Python

https://docs.python.org/3/library/warnings.html#default-warning-filters

* update tests

* update tests
2017-09-27 03:03:01 -04:00
PSEUDOTENSOR / Jonathan McKinney
0664298bb2 Update sklearn API to pass along n_jobs to DMatrix creation (#2658) 2017-08-31 15:24:59 +12:00
René Scheibe
a0c5bde024 Fix typo in sklearn documentation (#2580) 2017-08-07 19:06:11 +02:00
wxchan
65d2513714 [python-package] fix sklearn n_jobs/nthreads and seed/random_state bug (#2378)
* add a testcase causing RuntimeError

* move seed/random_state/nthread/n_jobs check to get_xgb_params()

* fix failed test
2017-06-12 09:33:42 -04:00
gaw89
0f3a404d91 Sklearn kwargs (#2338)
* Added kwargs support for Sklearn API

* Updated NEWS and CONTRIBUTORS

* Fixed CONTRIBUTORS.md

* Added clarification of **kwargs and test for proper usage

* Fixed lint error

* Fixed more lint errors and clf assigned but never used

* Fixed more lint errors

* Fixed more lint errors

* Fixed issue with changes from different branch bleeding over

* Fixed issue with changes from other branch bleeding over

* Added note that kwargs may not be compatible with Sklearn

* Fixed linting on kwargs note
2017-05-23 21:47:53 -05:00
gaw89
6cea1e3fb7 Sklearn convention update (#2323)
* Added n_jobs and random_state to keep up to date with sklearn API.
Deprecated nthread and seed.  Added tests for new params and
deprecations.

* Fixed docstring to reflect updates to n_jobs and random_state.

* Fixed whitespace issues and removed nose import.

* Added deprecation note for nthread and seed in docstring.

* Attempted fix of deprecation tests.

* Second attempted fix to tests.

* Set n_jobs to 1.
2017-05-22 08:22:05 -05:00
jayzed82
29289d2302 Add option to choose booster in scikit intreface (gbtree by default) (#2303)
* Add option to choose booster in scikit intreface (gbtree by default)

* Add option to choose booster in scikit intreface: complete docstring.

* Fix XGBClassifier to work with booster option

* Added test case for gblinear booster
2017-05-18 23:12:27 -04:00
Srivatsan Ramanujam
036ee55fe0 adding sample weights for XGBRegressor (was this forgotten?) (#1874) 2017-01-21 11:58:03 -08:00
ccphillippi
dd477ac903 Move feature_importances_ to base XGBModel for XGBRegressor access (#1591) 2016-12-01 10:17:37 -08:00
AbdealiJK
6f16f0ef58 Use bst_float consistently throughout (#1824)
* Fix various typos

* Add override to functions that are overridden

gcc gives warnings about functions that are being overridden by not
being marked as oveirridden. This fixes it.

* Use bst_float consistently

Use bst_float for all the variables that involve weight,
leaf value, gradient, hessian, gain, loss_chg, predictions,
base_margin, feature values.

In some cases, when due to additions and so on the value can
take a larger value, double is used.

This ensures that type conversions are minimal and reduces loss of
precision.
2016-11-30 10:02:10 -08:00
Titouan Lorieul
75d9be55de [py] fix label encoding of eval sets in sklearn API (#1244) 2016-07-11 05:29:46 -05:00
Antonio Augusto Santos
19129b289c Preserve the actal objective used on the booster
Save the actual objective used on xgboost.train.

Not saving it was giving problem in predict_proba, as issue  #1215
2016-05-31 19:01:10 -03:00
Alberto Torres
af2e9ebd82 Update sklearn.py 2016-05-25 15:00:11 +02:00
tqchen
149589c583 [PYTHON] Refactor trainnig API to use callback 2016-05-19 21:31:23 -07:00
Borun Dev Chowdhury
fc02f8a2dc cosmetic change
cosmetic change of putting space after comma compared to previous edit.
2016-05-07 12:33:37 +02:00
borundev
95bcff90af XGBModel.fit had a call to DMatrix without missing=self.missing. fixed that 2016-05-07 12:32:03 +02:00
Titouan Lorieul
3ab8f0b13d [py] added apply function in sklearn API to return the predicted leaves 2016-05-04 12:27:30 +02:00
Alistair Johnson
6750c8b743 Added other feature importances in python package (#1135)
* added new function to calculate other feature importances

* added capability to plot other feature importance measures

* changed plotting default to fscore

* added info on importance_type to boilerplate comment

* updated text of error statement

* added self module name to fix call

* added unit test for feature importances

* style fixes
2016-05-02 12:25:24 -05:00
sinhrks
9da2f3e613 DOC/TST: Fix Python sklearn dep 2016-05-01 17:27:43 +09:00
sinhrks
6bab164d80 Bug mixing DMatrix's with and without feature names 2016-04-30 14:42:57 +09:00
sinhrks
c55cc809e5 BUG: XGBClassifier.feature_importances_ raises ValueError if input is pandas DataFrame 2016-04-27 21:50:03 +09:00
sinhrks
8fc2456c87 Enable flake8 2016-04-24 17:32:31 +09:00
DAndrey
311f7c8f47 change type of xgbclassifier.classes_ from list to numpy array 2016-03-17 16:54:33 +03:00
Paulo Alves
592004b38f XGBClassifier.feature_importances_ compatible with sklearn RFECV 2016-02-26 08:56:07 -03:00
catena
790dc877c3 return best_ntree_limit if early stopped 2016-02-25 13:42:19 +05:30
Alexis Mignon
a46706c82e Merge branch 'master' into master 2016-02-17 09:35:30 +01:00
Alexis Mignon
07bd149b68 Created decorator function so that custom objective function passed to the constructor are more consistent with the sklearn conventions. Added comments in the doc string 2016-02-16 10:58:22 +01:00
Alexis Mignon
c8714f587a Added the possibility to use custom objective function in the sklearn API 2016-02-15 17:13:13 +01:00
Pavel Gladkov
31c0408cb4 add feature_importances_ property for XGBClassifier 2016-02-10 23:01:33 +03:00
Yuan (Terry) Tang
7606bf8156 Fixes #725 2016-01-06 18:21:29 -06:00
terrytangyuan
0eb6240fd0 Fixed all lint errors 2015-12-11 18:46:15 -06:00
sinhrks
25c4fbd0cb Cleanup pandas support 2015-11-13 06:55:04 +09:00
Faron
79813097b5 sklearn_wrapper additions
added output_margin & ntree_limit to predict and predict_proba
2015-11-02 17:41:30 +01:00
Faron
738e420128 correcting wrong default values 2015-10-25 11:26:33 +01:00
Faron
b80d5d6b33 fixed too long lines 2015-10-25 11:17:35 +01:00
Faron
422febd18e added missing params 2015-10-25 10:58:07 +01:00
Takahisa Shimoda
607599f2a1 fix sklearn.py for evals_result in python3 2015-10-23 05:40:31 +09:00
Johan Manders
00387cb645 Removed th last few trailing whitespaces 2015-10-14 14:26:18 +02:00
Johan Manders
82c2ba4c44 Removed trailing whitespaces and Change Error to XGBoostError 2015-10-14 14:17:57 +02:00
Johan Manders
e960a09ff4 Made eval_results for sklearn output the same structure as in the new training.py
Changed the name of eval_results to evals_result, so that the naming is the same in training.py and sklearn.py

Made the structure of evals_result the same as in training.py, the names of the keys are different:

In sklearn.py you cannot name your evals_result, but they are automatically called 'validation_0', 'validation_1' etc.
The dict evals_result will output something like: {'validation_0': {'logloss': ['0.674800', '0.657121']}, 'validation_1': {'logloss': ['0.63776', '0.58372']}}

In training.py you can name your multiple evals_result with a watchlist like: watchlist  = [(dtest,'eval'), (dtrain,'train')]
The dict evals_result will output something like: {'train': {'logloss': ['0.68495', '0.67691']}, 'eval': {'logloss': ['0.684877', '0.676767']}}

You can access the evals_result using the evals_result() function.
2015-10-14 12:51:46 +02:00
Johan Manders
e339cdec52 Too many branches and unused key 2015-10-12 16:47:24 +02:00
Johan Manders
40566cdbba update sklearn.py because evals_result in training.py changed
Because I changed the training.py, the sklearn.py had to be changed also to be able to read all the data form evals_result.
2015-10-12 16:31:23 +02:00
Den Raskovalov
35944a13b4 make XGBClassifier.score compatible with arrays 2015-09-06 20:41:55 -07:00