xgboost

Author	SHA1	Message	Date
jokari69	fb0fc0c580	option to shuffle data in mknfolds (#1459 ) * option to shuffle data in mknfolds * removed possibility to run as stand alone test * split function def in 2 lines for lint * option to shuffle data in mknfolds * removed possibility to run as stand alone test * split function def in 2 lines for lint	2016-12-23 07:53:30 +08:00
Ian	167864da75	python package tree plotting support fmap (#1856 ) * to_graphviz and plot_tree support fmap * [python-package] add model_plot docstring	2016-12-13 07:36:17 -06:00
ccphillippi	dd477ac903	Move feature_importances_ to base XGBModel for XGBRegressor access (#1591 )	2016-12-01 10:17:37 -08:00
AbdealiJK	6f16f0ef58	Use bst_float consistently throughout (#1824 ) * Fix various typos * Add override to functions that are overridden gcc gives warnings about functions that are being overridden by not being marked as oveirridden. This fixes it. * Use bst_float consistently Use bst_float for all the variables that involve weight, leaf value, gradient, hessian, gain, loss_chg, predictions, base_margin, feature values. In some cases, when due to additions and so on the value can take a larger value, double is used. This ensures that type conversions are minimal and reduces loss of precision.	2016-11-30 10:02:10 -08:00
Jivan Roquet	0c19d4b029	[python-package] Provide a learning_rates parameter to xgb.cv() (#1770 ) * Allow using learning_rates parameter when doing CV - Create a new `callback_cv` method working when called from `xgb.cv()` - Rename existing `callback` into `callback_train` and make it the default callback - Get the logic out of the callbacks and place it into a common helper * Add a learning_rates parameter to cv() * lint * remove caller explicit reference * callback is aware of its calling context * remove caller argument * remove learning_rates param * restore learning_rates for training, but deprecated * lint * lint line too long * quick example for predefined callbacks	2016-11-24 09:49:07 -08:00
Yuan (Terry) Tang	ca0069b708	Fix typo - eval_metric in param should be dictionary (#1791 )	2016-11-20 18:52:41 -06:00
Nan Zhu	5217e53156	stylistic fix (#1789 ) * stylistic fix * try multiple repos * fix * fix	2016-11-19 22:03:10 -05:00
baderbuddy	c52b2faba4	Added license information (#1783 ) Added license information to the setup.py	2016-11-17 13:36:47 -08:00
Zhongxiao Ma	55bfc29942	keep builtin evaluations while using customized evaluation function (#1624 ) * keep builtin evaluations while using customized evaluation function * fix concat bytes to str	2016-11-10 12:40:48 -08:00
AbdealiJK	b94fcab4dc	Add dump_format=json option (#1726 ) * Add format to the params accepted by DumpModel Currently, only the test format is supported when trying to dump a model. The plan is to add more such formats like JSON which are easy to read and/or parse by machines. And to make the interface for this even more generic to allow other formats to be added. Hence, we make some modifications to make these function generic and accept a new parameter "format" which signifies the format of the dump to be created. * Fix typos and errors in docs * plugin: Mention all the register macros available Document the register macros currently available to the plugin writers so they know what exactly can be extended using hooks. * sparce_page_source: Use same arg name in .h and .cc * gbm: Add JSON dump The dump_format argument can be used to specify what type of dump file should be created. Add functionality to dump gblinear and gbtree into a JSON file. The JSON file has an array, each item is a JSON object for the tree. For gblinear: - The item is the bias and weights vectors For gbtree: - The item is the root node. The root node has a attribute "children" which holds the children nodes. This happens recursively. * core.py: Add arg dump_format for get_dump()	2016-11-04 09:55:25 -07:00
Eric Liu	9b2e41340b	make DMatrix._init_from_npy2d only copy data when necessary (#1637 ) * make DMatrix._init_from_npy2d only copy data when necessary When creating DMatrix from a 2d ndarray, it can unnecessarily copy the input data. This can be problematic when the data is already very large--running out of memory. The copy is temporary (going out of scope at the end of this function) but it still adds to peak memory usage. ``numpy.array`` copies its input no matter what by default. By adding ``copy=False``, it will only do so when necessary. Since XGDMatrixCreateFromMat is readonly on the input buffer, this copy is not needed. Also added comments explaining when a copy can happen (if data ordering/layout is wrong or if type is not 32-bit float). * remove whitespace	2016-10-20 09:30:52 -07:00
ziguang1216	94a9e3222e	[python-package] Fix the issue #1439 (#1666 ) Fix 1439 Fix python_wrapper when eval set name contain '-' will cause early_stop maximize variable con't set to True propely Change-Id: Ib0595afd4ae7b445a84c00a3a8faeccc506c6d13	2016-10-18 10:22:51 -07:00
Yuan (Terry) Tang	63829d656c	Fix mknfold using new StratifiedKFold API (#1660 )	2016-10-12 14:43:37 -07:00
Jonathan Rahn	c8ae52f17a	add scikit-learn v0.18 compatibility (#1636 ) * add scikit-learn v0.18 compatibility import KFold & StratifiedKFold from sklearn.model_selection instead of sklearn.cross_validation * change DeprecationWarning to ImportError DeprecationWarning isn't an exception, so it should work the other way around.	2016-10-09 20:37:28 -07:00
Vadim Khotilovich	693ddb860e	More robust DMatrix creation from a sparse matrix (#1606 ) * [CORE] DMatrix from sparse w/ explicit #col #row; safer arg types * [python-package] c-api change for _init_from_csr _init_from_csc * fix spaces * [R-package] adopt the new XGDMatrixCreateFromCSCEx interface * [CORE] redirect old sparse creators to new ones	2016-09-25 10:01:22 -07:00
chanis	62830be29d	[python-package] modify libpath.py and fix typos (#1594 ) * Update Makefile * Update Makefile * modify __init__.py * modified libpath.py and fixed typos	2016-09-21 10:12:19 -07:00
chanis	d8876b0b73	[python-package] modify __init__.py (#1587 ) * Update Makefile * Update Makefile * modify __init__.py	2016-09-19 09:43:36 -07:00
闻波	8cdfec71b3	remove a redundant sentence, and a word 'and' (#1526 ) * fix a typo * fix a typo and some code format * Update training.py * delete redundant sentence	2016-08-31 11:51:40 -07:00
kiselev1189	53ce511be3	Fix how maximize_metric value is determined in early_stop (#1451 ) * Update callback.py * Update callback.py	2016-08-27 13:09:24 -07:00
Preston Parry	0627213544	Fixes typo "candicate" (#1508 )	2016-08-26 14:00:27 -07:00
Preston Parry	cf4951b0b0	Fixes another typo "candicate" (#1509 )	2016-08-26 14:00:23 -07:00
Dan Harbin	78ae772f2c	Make python package wheelable (#1500 ) Currently xgboost can only be installed by running: python setup.py install Now it can be packaged (in binary form) as a wheel and installed like: pip install xgboost-0.6-py2-none-any.whl distutils and wheel install `data_files` differently than setuptools. setuptools will install the `data_files` in the package directory whereas the others install it in `sys.prefix`. By adding `sys.prefix` to the list of directories to check for the shared library, xgboost can now be distributed as a wheel.	2016-08-26 14:00:11 -07:00
Hongliang Liu	c5a2b79558	PyPI (pip installation) setup for 0.6 code (#1445 ) * force gcc-5 or clang-omp for Mac OS, prepare for pip pack * add sklearn dep, make -j4 * finalize PyPI submission * revert to Xcode clang for passing build #1468 * force to clang, try to solve cmake travis error * remove sklearn dependency	2016-08-10 07:45:56 -05:00
Tianqi Chen	4a8d63b6c8	Tag version 0.6 (#1422 )	2016-07-29 11:23:06 -07:00
Yuan (Terry) Tang	c60a356273	Remove pypi downloads badge (#1365 )	2016-07-16 13:36:05 -04:00
Titouan Lorieul	75d9be55de	[py] fix label encoding of eval sets in sklearn API (#1244 )	2016-07-11 05:29:46 -05:00
Vladimir	aaf0a73486	fixed error when eval False (#1271 )	2016-06-12 09:36:36 -07:00
Antonio Augusto Santos	19129b289c	Preserve the actal objective used on the booster Save the actual objective used on xgboost.train. Not saving it was giving problem in predict_proba, as issue #1215	2016-05-31 19:01:10 -03:00
Alberto Torres	af2e9ebd82	Update sklearn.py	2016-05-25 15:00:11 +02:00
tqchen	149589c583	[PYTHON] Refactor trainnig API to use callback	2016-05-19 21:31:23 -07:00
Vadim Khotilovich	ffed95eec0	py: replace attr_names() with attributes()	2016-05-15 22:04:38 -05:00
Vadim Khotilovich	a13a3a4d76	attr_names for python interface; attribute deletion via set_attr	2016-05-15 02:05:10 -05:00
Shayne Kang	bf24d6ae98	fix VisibleDeprecationWarning	2016-05-08 01:44:04 +09:00
Borun Dev Chowdhury	fc02f8a2dc	cosmetic change cosmetic change of putting space after comma compared to previous edit.	2016-05-07 12:33:37 +02:00
borundev	95bcff90af	XGBModel.fit had a call to DMatrix without missing=self.missing. fixed that	2016-05-07 12:32:03 +02:00
Titouan Lorieul	3ab8f0b13d	[py] added apply function in sklearn API to return the predicted leaves	2016-05-04 12:27:30 +02:00
Alistair Johnson	6750c8b743	Added other feature importances in python package (#1135 ) * added new function to calculate other feature importances * added capability to plot other feature importance measures * changed plotting default to fscore * added info on importance_type to boilerplate comment * updated text of error statement * added self module name to fix call * added unit test for feature importances * style fixes	2016-05-02 12:25:24 -05:00
sinhrks	9da2f3e613	DOC/TST: Fix Python sklearn dep	2016-05-01 17:27:43 +09:00
Faron	ad3f49e881	[py] eta decay bugfix	2016-04-30 15:51:57 +02:00
sinhrks	6bab164d80	Bug mixing DMatrix's with and without feature names	2016-04-30 14:42:57 +09:00
Faron	cf607e2448	[py] split value histograms	2016-04-28 20:26:21 +02:00
sinhrks	c55cc809e5	BUG: XGBClassifier.feature_importances_ raises ValueError if input is pandas DataFrame	2016-04-27 21:50:03 +09:00
Tianqi Chen	4149854633	Merge pull request #1068 from Laurae2/master Updated obsolete installation instructions	2016-04-26 19:50:06 -07:00
sinhrks	8fc2456c87	Enable flake8	2016-04-24 17:32:31 +09:00
tqchen	49f3892942	allow common python output in single node	2016-04-11 15:48:16 -07:00
zyxue	79b35da308	improved docstring for folds in cv function	2016-04-09 10:21:56 -07:00
Laurae2	77136baf2c	Updated obsolete installation instructions Fixed local compilation, and installation for R package and Python package. Modified the according documents.	2016-03-30 17:43:54 +02:00
Julian Quick	bbb9ce1641	Verbose message: which fields have impropper data types A more verbose error message letting the user know which fields have impropper data types	2016-03-22 14:13:29 -06:00
Julian Quick	2cd109fb98	a more verbose field mismatch error message This error message can be hard to understand when there are several fields, as shown in the example below. This improves the error message, letting the user know which fields were unexpected or missing. import xgboost as xgb import pandas as pd train = pd.DataFrame({'a':[1], 'b':[2], 'c':[3], 'd':[4], 'f':[2], 'g':2, 'etc etc etc':[11]}) dtrain = xgb.DMatrix(train.drop('d', axis=1), train.d) test = pd.DataFrame({'a':[1], 'b':[2], 'c':[1], 'd':[4], 'e':[2], 'f':[2], 'g':2, 'etc etc etc':[11]}) dtest = xgb.DMatrix(test) modl = xgb.train({}, dtrain) modl.predict(dtest) # ValueError: feature_names mismatch: [u'a', u'b', u'c', u'etc etc etc', u'f', u'g'] [u'a', u'b', u'c', u'd', u'e', u'etc etc etc', u'f', u'g']	2016-03-17 18:13:30 -06:00
DAndrey	311f7c8f47	change type of xgbclassifier.classes_ from list to numpy array	2016-03-17 16:54:33 +03:00

1 2 3 4

181 Commits