877 Commits

Author SHA1 Message Date
Philip Hyunsu Cho
32ea70c1c9
Documenting CSV loading into DMatrix (#3137)
* Support CSV file in DMatrix

We'd just need to expose the CSV parser in dmlc-core to the Python wrapper

* Revert extra code; document existing CSV support

CSV support is already there but undocumented

* Add notice about categorical features
2018-02-28 18:41:10 -08:00
Oleg Panichev
cf19caa46a Fix for ZeroDivisionError when verbose_eval equals to 0. (#3115) 2018-02-15 17:58:06 -06:00
Felipe Arruda Pontes
81d1b17f9c adding some docs based on core.Boost.predict (#1865) 2018-02-09 06:38:38 -08:00
Scott Lundberg
d878c36c84 Add SHAP interaction effects, fix minor bug, and add cox loss (#3043)
* Add interaction effects and cox loss

* Minimize whitespace changes

* Cox loss now no longer needs a pre-sorted dataset.

* Address code review comments

* Remove mem check, rename to pred_interactions, include bias

* Make lint happy

* More lint fixes

* Fix cox loss indexing

* Fix main effects and tests

* Fix lint

* Use half interaction values on the off-diagonals

* Fix lint again
2018-02-07 20:38:01 -06:00
Zhirui Wang
bf43671841 update macOS gcc@5 installation guide (#3003)
After installing ``gcc@5``, ``CMAKE_C_COMPILER`` will not be set to gcc-5 in some macOS environment automatically and the installation of xgboost will still fail. Manually setting the compiler will solve the problem.
2018-01-04 11:28:26 -08:00
Philip Cho
4aa346c10b
Update PyPI maintainer; use VERSION for binary wheels (#2992) 2017-12-31 12:03:08 +09:00
csgwma
33ac8a0927 delete duplicated code in python-package (#2985) 2017-12-30 20:26:35 +08:00
Philip Cho
8d35c09c55 Tag version 0.7 (#2975)
* Tag version 0.7

* Document all changes made in year 2016
2017-12-30 20:16:41 +08:00
Yuchao Dai
eedca8c8ec fix the typo in core.py (#2978) 2017-12-25 21:08:27 -08:00
jac-stripe
1e3aabbadc Include symlinks to make wheel build work (#2909) 2017-12-01 11:27:58 -05:00
Jerry Dumblauskas
5867c1b96d update doc string for grid parameter (#2647)
* update doc string for grid parameter

* update doc string for grid parameter
2017-11-29 11:22:46 -08:00
Rajiv Abraham
77715d5c62 Update to correct brew gcc command (#1931)
The previous command did not work for me. This one did.
2017-11-29 11:20:49 -08:00
Sam O
602b34ab91 Fix performance of c_array in python core.py (#2786) 2017-11-29 11:12:49 -08:00
Joe Nyland
88177691b8 Update README (#2204)
I found the installation of the Python XGBoost package to be problematic as the documentation around compiler requirements was unclear, as discussed in #1501. I decided that I would improve the README.
2017-11-19 17:12:16 -08:00
Rory Mitchell
16c63f30d0
Fix MultiIndex detection (breaks for latest pandas==0.21.0). (#2872) 2017-11-11 11:12:23 +13:00
caoyi
3610025fb6 Fix typo (#2818)
Fix typo
2017-10-23 10:45:49 -05:00
Scott Lundberg
78c4188cec SHAP values for feature contributions (#2438)
* SHAP values for feature contributions

* Fix commenting error

* New polynomial time SHAP value estimation algorithm

* Update API to support SHAP values

* Fix merge conflicts with updates in master

* Correct submodule hashes

* Fix variable sized stack allocation

* Make lint happy

* Add docs

* Fix typo

* Adjust tolerances

* Remove unneeded def

* Fixed cpp test setup

* Updated R API and cleaned up

* Fixed test typo
2017-10-12 12:35:51 -07:00
Julian Niedermeier
9a81c74a7b Add xgb_model parameter to sklearn fit (#2623)
Adding xgb_model paramter allows the continuation of model training.
Model has to be saved by calling `model.get_booster().save_model(path)`
2017-10-01 08:47:17 -04:00
Andrew Hannigan
5c9f0ff9d9 Check existance of seed/nthread keys before checking their value. (#2669) 2017-09-27 03:05:59 -04:00
Philip Cho
31ad40b963 Make __del__ method idempotent (#2627)
Addresses Issue #2533.
2017-09-27 03:03:55 -04:00
Tsukasa OMOTO
8d15024ac7 python: follow the default warning filters of Python (#2666)
* python: follow the default warning filters of Python

https://docs.python.org/3/library/warnings.html#default-warning-filters

* update tests

* update tests
2017-09-27 03:03:01 -04:00
Icyblade Dai
0e85b30fdd Fix issue 2670 (#2671)
* fix issue 2670

* add python<3.6 compatibility

* fix Index

* fix Index/MultiIndex

* fix lint

* fix W0622

really nonsense

* fix lambda

* Trigger Travis

* add test for MultiIndex

* remove tailing whitespace
2017-09-19 15:49:41 -04:00
SimonAB
2e9d06443e Add show_values option to feature importances plot (#2351)
Adding an option to remove the values from the features importances plot in Python.
2017-08-31 12:26:54 -05:00
PSEUDOTENSOR / Jonathan McKinney
0664298bb2 Update sklearn API to pass along n_jobs to DMatrix creation (#2658) 2017-08-31 15:24:59 +12:00
René Scheibe
a0c5bde024 Fix typo in sklearn documentation (#2580) 2017-08-07 19:06:11 +02:00
Vadim Khotilovich
2b3a4318c5 Several fixes (#2572)
* repared serialization after update process; fixes #2545

* non-stratified folds in python could omit some data instances

* Makefile: fixes for older makes on windows; clean R-package too

* make cub to be a shallow submodule

* improve $(MAKE) recovery
2017-08-06 13:03:50 -05:00
PSEUDOTENSOR / Jonathan McKinney
6b375f6ad8 Multi-threaded XGDMatrixCreateFromMat for faster DMatrix creation (#2530)
* Multi-threaded XGDMatrixCreateFromMat for faster DMatrix creation from numpy arrays for python interface.
2017-07-21 14:43:17 +12:00
Rory Mitchell
56550ff3f1 Fix pylint (#2537) 2017-07-21 11:41:56 +12:00
Sergei Lebedev
88488fdbb9 Fixed shared library loading in the Python package (#2461)
* Fixed DLL name on Windows in ``xgboost.libpath``

* Added support for OS X to ``xgboost.libpath``

* Use .dylib for shared library on OS X

This does not affect the JNI library, because it is not trully
cross-platform in the Makefile-build anyway.
2017-06-29 11:50:50 +12:00
Alfredo Cambera
46b9889cc5 Update build_trouble_shooting.md (#2430)
I had to fight with my linux box for a day to find the solution to the problem. I hope than this may help other users to save some time.
2017-06-20 21:36:10 -07:00
wxchan
65d2513714 [python-package] fix sklearn n_jobs/nthreads and seed/random_state bug (#2378)
* add a testcase causing RuntimeError

* move seed/random_state/nthread/n_jobs check to get_xgb_params()

* fix failed test
2017-06-12 09:33:42 -04:00
Jakub Zakrzewski
ed6384ecbf [Python] Use appropriate integer types when calling native code. (#2361)
Don't use implicit conversions to c_int, which incidentally happen to work
on (some) 64-bit platforms, but:
* may lead to truncation of the input value to a 32-bit signed int,
* cause segfaults on some 32-bit architectures (tested on Ubuntu ARM,
  but is also the likely cause of issue #1707).

Also, when passing references use explicit 64-bit integers, where needed,
instead of c_ulong, which is not guaranteed to be this large.
2017-06-02 10:16:54 -07:00
Juang, Yi-Lin
6776292951 Minor cleanup (#2342)
* Clean up demo of multiclass classification

* Remove extra space
2017-05-26 09:40:41 -04:00
gaw89
0f3a404d91 Sklearn kwargs (#2338)
* Added kwargs support for Sklearn API

* Updated NEWS and CONTRIBUTORS

* Fixed CONTRIBUTORS.md

* Added clarification of **kwargs and test for proper usage

* Fixed lint error

* Fixed more lint errors and clf assigned but never used

* Fixed more lint errors

* Fixed more lint errors

* Fixed issue with changes from different branch bleeding over

* Fixed issue with changes from other branch bleeding over

* Added note that kwargs may not be compatible with Sklearn

* Fixed linting on kwargs note
2017-05-23 21:47:53 -05:00
gaw89
6cea1e3fb7 Sklearn convention update (#2323)
* Added n_jobs and random_state to keep up to date with sklearn API.
Deprecated nthread and seed.  Added tests for new params and
deprecations.

* Fixed docstring to reflect updates to n_jobs and random_state.

* Fixed whitespace issues and removed nose import.

* Added deprecation note for nthread and seed in docstring.

* Attempted fix of deprecation tests.

* Second attempted fix to tests.

* Set n_jobs to 1.
2017-05-22 08:22:05 -05:00
jayzed82
29289d2302 Add option to choose booster in scikit intreface (gbtree by default) (#2303)
* Add option to choose booster in scikit intreface (gbtree by default)

* Add option to choose booster in scikit intreface: complete docstring.

* Fix XGBClassifier to work with booster option

* Added test case for gblinear booster
2017-05-18 23:12:27 -04:00
Maurus Cuelenaere
6bd1869026 Add prediction of feature contributions (#2003)
* Add prediction of feature contributions

This implements the idea described at http://blog.datadive.net/interpreting-random-forests/
which tries to give insight in how a prediction is composed of its feature contributions
and a bias.

* Support multi-class models

* Calculate learning_rate per-tree instead of using the one from the first tree

* Do not rely on node.base_weight * learning_rate having the same value as the node mean value (aka leaf value, if it were a leaf); instead calculate them (lazily) on-the-fly

* Add simple test for contributions feature

* Check against param.num_nodes instead of checking for non-zero length

* Loop over all roots instead of only the first
2017-05-14 00:58:10 -05:00
Liam Huang
3a2b8332a6 bugfix: when metric's name contains - (#2090)
When metric's name contains `-`, Python will complain about insufficient arguments to unpack.
2017-03-16 10:36:39 -07:00
Matthew R. Becker
4a63f4ab43 BUG make sure to specify no openmp for some mac osx builds properly (#2095) 2017-03-10 18:36:15 -08:00
Holger Peters
95510b9667 Inform setuptools that this is a binary package (#1996)
* Inform setuptools that this is a binary package that needs platform-tags in wheel names.

This fixes issue #1995 .

* PEP8 Formatting

* Add docstring
2017-03-07 09:26:04 -06:00
Eric Liu
7927031ffe print_evaluation callback output on last iteration (#2036)
verbose_eval docs claim it will log the last iteration (http://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.train). this is also consistent w/the behavior from 0.4. not a huge deal but I found it handy to see the last iter's result b/c my period is usually large.

this doesn't address logging the last stage found by early_stopping (as noted in docs) as I'm not sure how to do that.
2017-02-24 23:06:48 -05:00
yexu15
179b384e39 A fix regarding the compatibility with python 2.6 (#1981)
* A fix regarding the compatibility with python 2.6

the syntax of {n: self.attr(n) for n in attr_names} is illegal in python 2.6

* Update core.py

add a space after comma
2017-01-29 20:18:28 -08:00
Srivatsan Ramanujam
036ee55fe0 adding sample weights for XGBRegressor (was this forgotten?) (#1874) 2017-01-21 11:58:03 -08:00
wxchan
a073a2c3d4 fix ylim with max_num_features in python plot_importance (#1974) 2017-01-18 11:59:50 -08:00
Félix MIKAELIAN
a7d2833766 added the max_features parameter to the plot_importance function. (#1963)
* added the max_features parameter to the plot_importance function.

* renamed max_features parameter to max_num_features for better understanding

* removed unwanted character in docstring
2017-01-16 14:49:47 -08:00
Andrey Tereskin
cfb9b11aa4 Make lib path relatrive to fix setup error #1932 (#1947) 2017-01-09 10:40:24 -08:00
jokari69
fb0fc0c580 option to shuffle data in mknfolds (#1459)
* option to shuffle data in mknfolds

* removed possibility to run as stand alone test

* split function def in 2 lines for lint

* option to shuffle data in mknfolds

* removed possibility to run as stand alone test

* split function def in 2 lines for lint
2016-12-23 07:53:30 +08:00
Ian
167864da75 python package tree plotting support fmap (#1856)
* to_graphviz and plot_tree support fmap

* [python-package] add model_plot docstring
2016-12-13 07:36:17 -06:00
ccphillippi
dd477ac903 Move feature_importances_ to base XGBModel for XGBRegressor access (#1591) 2016-12-01 10:17:37 -08:00
AbdealiJK
6f16f0ef58 Use bst_float consistently throughout (#1824)
* Fix various typos

* Add override to functions that are overridden

gcc gives warnings about functions that are being overridden by not
being marked as oveirridden. This fixes it.

* Use bst_float consistently

Use bst_float for all the variables that involve weight,
leaf value, gradient, hessian, gain, loss_chg, predictions,
base_margin, feature values.

In some cases, when due to additions and so on the value can
take a larger value, double is used.

This ensures that type conversions are minimal and reduces loss of
precision.
2016-11-30 10:02:10 -08:00