376 Commits

Author SHA1 Message Date
Yuan (Terry) Tang
ca0069b708 Fix typo - eval_metric in param should be dictionary (#1791) 2016-11-20 18:52:41 -06:00
Nan Zhu
5217e53156 stylistic fix (#1789)
* stylistic fix

* try multiple repos

* fix

* fix
2016-11-19 22:03:10 -05:00
baderbuddy
c52b2faba4 Added license information (#1783)
Added license information to the setup.py
2016-11-17 13:36:47 -08:00
Zhongxiao Ma
55bfc29942 keep builtin evaluations while using customized evaluation function (#1624)
* keep builtin evaluations while using customized evaluation function

* fix concat bytes to str
2016-11-10 12:40:48 -08:00
AbdealiJK
b94fcab4dc Add dump_format=json option (#1726)
* Add format to the params accepted by DumpModel

Currently, only the test format is supported when trying to dump
a model. The plan is to add more such formats like JSON which are
easy to read and/or parse by machines. And to make the interface
for this even more generic to allow other formats to be added.

Hence, we make some modifications to make these function generic
and accept a new parameter "format" which signifies the format of
the dump to be created.

* Fix typos and errors in docs

* plugin: Mention all the register macros available

Document the register macros currently available to the plugin
writers so they know what exactly can be extended using hooks.

* sparce_page_source: Use same arg name in .h and .cc

* gbm: Add JSON dump

The dump_format argument can be used to specify what type
of dump file should be created. Add functionality to dump
gblinear and gbtree into a JSON file.

The JSON file has an array, each item is a JSON object for the tree.
For gblinear:
 - The item is the bias and weights vectors
For gbtree:
 - The item is the root node. The root node has a attribute "children"
   which holds the children nodes. This happens recursively.

* core.py: Add arg dump_format for get_dump()
2016-11-04 09:55:25 -07:00
Eric Liu
9b2e41340b make DMatrix._init_from_npy2d only copy data when necessary (#1637)
* make DMatrix._init_from_npy2d only copy data when necessary

When creating DMatrix from a 2d ndarray, it can unnecessarily copy the input data. This can be problematic when the data is already very large--running out of memory. The copy is temporary (going out of scope at the end of this function) but it still adds to peak memory usage.

``numpy.array`` copies its input no matter what by default. By adding ``copy=False``, it will only do so when necessary. Since XGDMatrixCreateFromMat is readonly on the input buffer, this copy is not needed.

Also added comments explaining when a copy can happen (if data ordering/layout is wrong or if type is not 32-bit float).

* remove whitespace
2016-10-20 09:30:52 -07:00
ziguang1216
94a9e3222e [python-package] Fix the issue #1439 (#1666)
*Fix 1439
        *Fix python_wrapper when eval set name contain '-' will cause early_stop maximize variable con't set to True propely

Change-Id: Ib0595afd4ae7b445a84c00a3a8faeccc506c6d13
2016-10-18 10:22:51 -07:00
Yuan (Terry) Tang
63829d656c Fix mknfold using new StratifiedKFold API (#1660) 2016-10-12 14:43:37 -07:00
Jonathan Rahn
c8ae52f17a add scikit-learn v0.18 compatibility (#1636)
* add scikit-learn v0.18 compatibility

import KFold & StratifiedKFold from sklearn.model_selection instead of sklearn.cross_validation

* change DeprecationWarning to ImportError

DeprecationWarning isn't an exception, so it should work the other way around.
2016-10-09 20:37:28 -07:00
Vadim Khotilovich
693ddb860e More robust DMatrix creation from a sparse matrix (#1606)
* [CORE] DMatrix from sparse w/ explicit #col #row; safer arg types

* [python-package] c-api change for _init_from_csr _init_from_csc

* fix spaces

* [R-package] adopt the new XGDMatrixCreateFromCSCEx interface

* [CORE] redirect old sparse creators to new ones
2016-09-25 10:01:22 -07:00
chanis
62830be29d [python-package] modify libpath.py and fix typos (#1594)
* Update Makefile

* Update Makefile

* modify __init__.py

* modified libpath.py and fixed typos
2016-09-21 10:12:19 -07:00
chanis
d8876b0b73 [python-package] modify __init__.py (#1587)
* Update Makefile

* Update Makefile

* modify __init__.py
2016-09-19 09:43:36 -07:00
闻波
8cdfec71b3 remove a redundant sentence, and a word 'and' (#1526)
* fix a typo

* fix a typo and some code format

* Update training.py

* delete redundant sentence
2016-08-31 11:51:40 -07:00
kiselev1189
53ce511be3 Fix how maximize_metric value is determined in early_stop (#1451)
* Update callback.py

* Update callback.py
2016-08-27 13:09:24 -07:00
Preston Parry
0627213544 Fixes typo "candicate" (#1508) 2016-08-26 14:00:27 -07:00
Preston Parry
cf4951b0b0 Fixes another typo "candicate" (#1509) 2016-08-26 14:00:23 -07:00
Dan Harbin
78ae772f2c Make python package wheelable (#1500)
Currently xgboost can only be installed by running:

    python setup.py install

Now it can be packaged (in binary form) as a wheel and installed like:

    pip install xgboost-0.6-py2-none-any.whl

distutils and wheel install `data_files` differently than setuptools.
setuptools will install the `data_files` in the package directory whereas the
others install it in `sys.prefix`. By adding `sys.prefix` to the list of
directories to check for the shared library, xgboost can now be distributed as
a wheel.
2016-08-26 14:00:11 -07:00
Hongliang Liu
c5a2b79558 PyPI (pip installation) setup for 0.6 code (#1445)
* force gcc-5 or clang-omp for Mac OS, prepare for pip pack

* add sklearn dep, make -j4

* finalize PyPI submission

* revert to Xcode clang for passing build #1468

* force to clang, try to solve cmake travis error

* remove sklearn dependency
2016-08-10 07:45:56 -05:00
Tianqi Chen
4a8d63b6c8 Tag version 0.6 (#1422) 2016-07-29 11:23:06 -07:00
Yuan (Terry) Tang
c60a356273 Remove pypi downloads badge (#1365) 2016-07-16 13:36:05 -04:00
Titouan Lorieul
75d9be55de [py] fix label encoding of eval sets in sklearn API (#1244) 2016-07-11 05:29:46 -05:00
Vladimir
aaf0a73486 fixed error when eval False (#1271) 2016-06-12 09:36:36 -07:00
Antonio Augusto Santos
19129b289c Preserve the actal objective used on the booster
Save the actual objective used on xgboost.train.

Not saving it was giving problem in predict_proba, as issue  #1215
2016-05-31 19:01:10 -03:00
Alberto Torres
af2e9ebd82 Update sklearn.py 2016-05-25 15:00:11 +02:00
tqchen
149589c583 [PYTHON] Refactor trainnig API to use callback 2016-05-19 21:31:23 -07:00
Vadim Khotilovich
ffed95eec0 py: replace attr_names() with attributes() 2016-05-15 22:04:38 -05:00
Vadim Khotilovich
a13a3a4d76 attr_names for python interface; attribute deletion via set_attr 2016-05-15 02:05:10 -05:00
Shayne Kang
bf24d6ae98 fix VisibleDeprecationWarning 2016-05-08 01:44:04 +09:00
Borun Dev Chowdhury
fc02f8a2dc cosmetic change
cosmetic change of putting space after comma compared to previous edit.
2016-05-07 12:33:37 +02:00
borundev
95bcff90af XGBModel.fit had a call to DMatrix without missing=self.missing. fixed that 2016-05-07 12:32:03 +02:00
Titouan Lorieul
3ab8f0b13d [py] added apply function in sklearn API to return the predicted leaves 2016-05-04 12:27:30 +02:00
Alistair Johnson
6750c8b743 Added other feature importances in python package (#1135)
* added new function to calculate other feature importances

* added capability to plot other feature importance measures

* changed plotting default to fscore

* added info on importance_type to boilerplate comment

* updated text of error statement

* added self module name to fix call

* added unit test for feature importances

* style fixes
2016-05-02 12:25:24 -05:00
sinhrks
9da2f3e613 DOC/TST: Fix Python sklearn dep 2016-05-01 17:27:43 +09:00
Faron
ad3f49e881 [py] eta decay bugfix 2016-04-30 15:51:57 +02:00
sinhrks
6bab164d80 Bug mixing DMatrix's with and without feature names 2016-04-30 14:42:57 +09:00
Faron
cf607e2448 [py] split value histograms 2016-04-28 20:26:21 +02:00
sinhrks
c55cc809e5 BUG: XGBClassifier.feature_importances_ raises ValueError if input is pandas DataFrame 2016-04-27 21:50:03 +09:00
Tianqi Chen
4149854633 Merge pull request #1068 from Laurae2/master
Updated obsolete installation instructions
2016-04-26 19:50:06 -07:00
sinhrks
8fc2456c87 Enable flake8 2016-04-24 17:32:31 +09:00
tqchen
49f3892942 allow common python output in single node 2016-04-11 15:48:16 -07:00
zyxue
79b35da308 improved docstring for folds in cv function 2016-04-09 10:21:56 -07:00
Laurae2
77136baf2c Updated obsolete installation instructions
Fixed local compilation, and installation for R package and Python
package. Modified the according documents.
2016-03-30 17:43:54 +02:00
Julian Quick
bbb9ce1641 Verbose message: which fields have impropper data types
A more verbose error message letting the user know which fields have impropper data types
2016-03-22 14:13:29 -06:00
Julian Quick
2cd109fb98 a more verbose field mismatch error message
This error message can be hard to understand when there are several fields, as shown in the example below. This improves the error message, letting the user know which fields were unexpected or missing.

    import xgboost as xgb
    import pandas as pd
    train = pd.DataFrame({'a':[1], 'b':[2], 'c':[3], 'd':[4], 'f':[2], 'g':2, 'etc etc etc':[11]})
    dtrain = xgb.DMatrix(train.drop('d', axis=1), train.d)
    test = pd.DataFrame({'a':[1], 'b':[2], 'c':[1], 'd':[4], 'e':[2], 'f':[2], 'g':2, 'etc etc etc':[11]})
    dtest = xgb.DMatrix(test)
    modl = xgb.train({}, dtrain)
    modl.predict(dtest)
    
    
    # ValueError: feature_names mismatch: [u'a', u'b', u'c', u'etc etc etc', u'f', u'g'] [u'a', u'b', u'c', u'd', u'e', u'etc etc etc', u'f', u'g']
2016-03-17 18:13:30 -06:00
DAndrey
311f7c8f47 change type of xgbclassifier.classes_ from list to numpy array 2016-03-17 16:54:33 +03:00
tqchen
bec7332eea Fix rabit 2016-03-06 20:54:27 -08:00
tqchen
ced6d45e01 Update rabit 2016-03-02 20:53:34 -08:00
tqchen
ecb3a271be [PYTHON-DIST] Distributed xgboost python training API. 2016-02-29 16:54:13 -08:00
Yuan (Terry) Tang
fdd520d774 Merge branch 'master' into sklearn 2016-02-29 10:50:55 -06:00
Paulo Alves
b7985466a4 Merge remote-tracking branch 'upstream/master' 2016-02-29 08:53:48 -03:00