xgboost

Author	SHA1	Message	Date
ngoyal2707	902ecbade8	added python doc string for nthreads to dmatrix (#3363 )	2018-06-08 14:16:30 +12:00
Kristian Gampong	a510e68dda	Add validate_features option for Booster predict (#3323 ) * Add validate_features option for Booster predict * Fix trailing whitespace in docstring	2018-05-29 11:40:49 -07:00
Rory Mitchell	a185ddfe03	Implement GPU accelerated coordinate descent algorithm (#3178 ) * Implement GPU accelerated coordinate descent algorithm. * Exclude external memory tests for GPU	2018-04-20 14:56:35 +12:00
Philip Hyunsu Cho	32ea70c1c9	Documenting CSV loading into DMatrix (#3137 ) * Support CSV file in DMatrix We'd just need to expose the CSV parser in dmlc-core to the Python wrapper * Revert extra code; document existing CSV support CSV support is already there but undocumented * Add notice about categorical features	2018-02-28 18:41:10 -08:00
Scott Lundberg	d878c36c84	Add SHAP interaction effects, fix minor bug, and add cox loss (#3043 ) * Add interaction effects and cox loss * Minimize whitespace changes * Cox loss now no longer needs a pre-sorted dataset. * Address code review comments * Remove mem check, rename to pred_interactions, include bias * Make lint happy * More lint fixes * Fix cox loss indexing * Fix main effects and tests * Fix lint * Use half interaction values on the off-diagonals * Fix lint again	2018-02-07 20:38:01 -06:00
Yuchao Dai	eedca8c8ec	fix the typo in core.py (#2978 )	2017-12-25 21:08:27 -08:00
Sam O	602b34ab91	Fix performance of c_array in python core.py (#2786 )	2017-11-29 11:12:49 -08:00
Rory Mitchell	16c63f30d0	Fix MultiIndex detection (breaks for latest pandas==0.21.0). (#2872 )	2017-11-11 11:12:23 +13:00
Scott Lundberg	78c4188cec	SHAP values for feature contributions (#2438 ) * SHAP values for feature contributions * Fix commenting error * New polynomial time SHAP value estimation algorithm * Update API to support SHAP values * Fix merge conflicts with updates in master * Correct submodule hashes * Fix variable sized stack allocation * Make lint happy * Add docs * Fix typo * Adjust tolerances * Remove unneeded def * Fixed cpp test setup * Updated R API and cleaned up * Fixed test typo	2017-10-12 12:35:51 -07:00
Philip Cho	31ad40b963	Make __del__ method idempotent (#2627 ) Addresses Issue #2533.	2017-09-27 03:03:55 -04:00
Icyblade Dai	0e85b30fdd	Fix issue 2670 (#2671 ) * fix issue 2670 * add python<3.6 compatibility * fix Index * fix Index/MultiIndex * fix lint * fix W0622 really nonsense * fix lambda * Trigger Travis * add test for MultiIndex * remove tailing whitespace	2017-09-19 15:49:41 -04:00
PSEUDOTENSOR / Jonathan McKinney	6b375f6ad8	Multi-threaded XGDMatrixCreateFromMat for faster DMatrix creation (#2530 ) * Multi-threaded XGDMatrixCreateFromMat for faster DMatrix creation from numpy arrays for python interface.	2017-07-21 14:43:17 +12:00
Jakub Zakrzewski	ed6384ecbf	[Python] Use appropriate integer types when calling native code. (#2361 ) Don't use implicit conversions to c_int, which incidentally happen to work on (some) 64-bit platforms, but: * may lead to truncation of the input value to a 32-bit signed int, * cause segfaults on some 32-bit architectures (tested on Ubuntu ARM, but is also the likely cause of issue #1707). Also, when passing references use explicit 64-bit integers, where needed, instead of c_ulong, which is not guaranteed to be this large.	2017-06-02 10:16:54 -07:00
Juang, Yi-Lin	6776292951	Minor cleanup (#2342 ) * Clean up demo of multiclass classification * Remove extra space	2017-05-26 09:40:41 -04:00
Maurus Cuelenaere	6bd1869026	Add prediction of feature contributions (#2003 ) * Add prediction of feature contributions This implements the idea described at http://blog.datadive.net/interpreting-random-forests/ which tries to give insight in how a prediction is composed of its feature contributions and a bias. * Support multi-class models * Calculate learning_rate per-tree instead of using the one from the first tree * Do not rely on node.base_weight * learning_rate having the same value as the node mean value (aka leaf value, if it were a leaf); instead calculate them (lazily) on-the-fly * Add simple test for contributions feature * Check against param.num_nodes instead of checking for non-zero length * Loop over all roots instead of only the first	2017-05-14 00:58:10 -05:00
yexu15	179b384e39	A fix regarding the compatibility with python 2.6 (#1981 ) * A fix regarding the compatibility with python 2.6 the syntax of {n: self.attr(n) for n in attr_names} is illegal in python 2.6 * Update core.py add a space after comma	2017-01-29 20:18:28 -08:00
AbdealiJK	6f16f0ef58	Use bst_float consistently throughout (#1824 ) * Fix various typos * Add override to functions that are overridden gcc gives warnings about functions that are being overridden by not being marked as oveirridden. This fixes it. * Use bst_float consistently Use bst_float for all the variables that involve weight, leaf value, gradient, hessian, gain, loss_chg, predictions, base_margin, feature values. In some cases, when due to additions and so on the value can take a larger value, double is used. This ensures that type conversions are minimal and reduces loss of precision.	2016-11-30 10:02:10 -08:00
Zhongxiao Ma	55bfc29942	keep builtin evaluations while using customized evaluation function (#1624 ) * keep builtin evaluations while using customized evaluation function * fix concat bytes to str	2016-11-10 12:40:48 -08:00
AbdealiJK	b94fcab4dc	Add dump_format=json option (#1726 ) * Add format to the params accepted by DumpModel Currently, only the test format is supported when trying to dump a model. The plan is to add more such formats like JSON which are easy to read and/or parse by machines. And to make the interface for this even more generic to allow other formats to be added. Hence, we make some modifications to make these function generic and accept a new parameter "format" which signifies the format of the dump to be created. * Fix typos and errors in docs * plugin: Mention all the register macros available Document the register macros currently available to the plugin writers so they know what exactly can be extended using hooks. * sparce_page_source: Use same arg name in .h and .cc * gbm: Add JSON dump The dump_format argument can be used to specify what type of dump file should be created. Add functionality to dump gblinear and gbtree into a JSON file. The JSON file has an array, each item is a JSON object for the tree. For gblinear: - The item is the bias and weights vectors For gbtree: - The item is the root node. The root node has a attribute "children" which holds the children nodes. This happens recursively. * core.py: Add arg dump_format for get_dump()	2016-11-04 09:55:25 -07:00
Eric Liu	9b2e41340b	make DMatrix._init_from_npy2d only copy data when necessary (#1637 ) * make DMatrix._init_from_npy2d only copy data when necessary When creating DMatrix from a 2d ndarray, it can unnecessarily copy the input data. This can be problematic when the data is already very large--running out of memory. The copy is temporary (going out of scope at the end of this function) but it still adds to peak memory usage. ``numpy.array`` copies its input no matter what by default. By adding ``copy=False``, it will only do so when necessary. Since XGDMatrixCreateFromMat is readonly on the input buffer, this copy is not needed. Also added comments explaining when a copy can happen (if data ordering/layout is wrong or if type is not 32-bit float). * remove whitespace	2016-10-20 09:30:52 -07:00
Vadim Khotilovich	693ddb860e	More robust DMatrix creation from a sparse matrix (#1606 ) * [CORE] DMatrix from sparse w/ explicit #col #row; safer arg types * [python-package] c-api change for _init_from_csr _init_from_csc * fix spaces * [R-package] adopt the new XGDMatrixCreateFromCSCEx interface * [CORE] redirect old sparse creators to new ones	2016-09-25 10:01:22 -07:00
tqchen	149589c583	[PYTHON] Refactor trainnig API to use callback	2016-05-19 21:31:23 -07:00
Vadim Khotilovich	ffed95eec0	py: replace attr_names() with attributes()	2016-05-15 22:04:38 -05:00
Vadim Khotilovich	a13a3a4d76	attr_names for python interface; attribute deletion via set_attr	2016-05-15 02:05:10 -05:00
Shayne Kang	bf24d6ae98	fix VisibleDeprecationWarning	2016-05-08 01:44:04 +09:00
Alistair Johnson	6750c8b743	Added other feature importances in python package (#1135 ) * added new function to calculate other feature importances * added capability to plot other feature importance measures * changed plotting default to fscore * added info on importance_type to boilerplate comment * updated text of error statement * added self module name to fix call * added unit test for feature importances * style fixes	2016-05-02 12:25:24 -05:00
sinhrks	6bab164d80	Bug mixing DMatrix's with and without feature names	2016-04-30 14:42:57 +09:00
Faron	cf607e2448	[py] split value histograms	2016-04-28 20:26:21 +02:00
sinhrks	8fc2456c87	Enable flake8	2016-04-24 17:32:31 +09:00
Julian Quick	bbb9ce1641	Verbose message: which fields have impropper data types A more verbose error message letting the user know which fields have impropper data types	2016-03-22 14:13:29 -06:00
Julian Quick	2cd109fb98	a more verbose field mismatch error message This error message can be hard to understand when there are several fields, as shown in the example below. This improves the error message, letting the user know which fields were unexpected or missing. import xgboost as xgb import pandas as pd train = pd.DataFrame({'a':[1], 'b':[2], 'c':[3], 'd':[4], 'f':[2], 'g':2, 'etc etc etc':[11]}) dtrain = xgb.DMatrix(train.drop('d', axis=1), train.d) test = pd.DataFrame({'a':[1], 'b':[2], 'c':[1], 'd':[4], 'e':[2], 'f':[2], 'g':2, 'etc etc etc':[11]}) dtest = xgb.DMatrix(test) modl = xgb.train({}, dtrain) modl.predict(dtest) # ValueError: feature_names mismatch: [u'a', u'b', u'c', u'etc etc etc', u'f', u'g'] [u'a', u'b', u'c', u'd', u'e', u'etc etc etc', u'f', u'g']	2016-03-17 18:13:30 -06:00
tqchen	ecb3a271be	[PYTHON-DIST] Distributed xgboost python training API.	2016-02-29 16:54:13 -08:00
tqchen	a71ba04109	[DIST] Add Distributed XGBoost on AWS Tutorial	2016-02-25 21:51:37 -08:00
Maxim Grechkin	f5e96eba72	Make missing handling consistent with sklearn's portion of the python package	2016-01-28 14:16:11 -08:00
Kai Luo	d9e50fd7f3	__copy__ calls __deepcopy__ with an argument	2016-01-20 19:57:20 +08:00
Kai Luo	5cd765e935	fix signature of __deepcopy__ method	2016-01-20 17:18:11 +08:00
terrytangyuan	0eb6240fd0	Fixed all lint errors	2015-12-11 18:46:15 -06:00
terrytangyuan	a7e79e089b	fix lint errors in core	2015-12-11 18:37:13 -06:00
sinhrks	25c4fbd0cb	Cleanup pandas support	2015-11-13 06:55:04 +09:00
antonymayi	8c7b18daed	python 2.6 compatibility tweak replacing set literal {} with set() for python 2.6 compatibility (plus reformatting the line)	2015-11-10 14:50:54 +01:00
Yuan (Terry) Tang	1dd96b6cdc	Merge pull request #597 from JohanManders/python-pandas-dtypes Python pandas dtypes	2015-11-09 18:08:41 -06:00
FrozenFingerz	b59018aa05	python: multiple eval_metrics changes - allows feval to return a list of tuples (name, error/score value) - changed behavior for multiple eval_metrics in conjunction with early_stopping: Instead of raising an error, the last passed evel_metric (or last entry in return value of feval) is used for early stopping - allows list of eval_metrics in dict-typed params - unittest for new features / behavior documentation updated - example for assigning a list to 'eval_metric' - note about early stopping on last passed eval metric - info msg for used eval metric added	2015-11-08 11:23:54 +01:00
Johan Manders	5f0f8749d9	Cleaned up some code	2015-11-04 18:05:47 +01:00
Johan Manders	f9e1b2b7b7	Added back feature names	2015-11-03 21:26:11 +01:00
Johan Manders	96f221e0d0	Merge pull request #5 from dmlc/master Update to latest version	2015-11-03 20:37:20 +01:00
Preston Parry	b3bb54da73	fixes typo in error message	2015-10-27 23:34:50 -07:00
Johan Manders	7c79c9ac3a	Bool gets mapped to i instead of int	2015-10-19 17:36:57 +02:00
Johan Manders	9bbc3901ee	More Pandas dtypes and more flexible variable naming - Pandas DataFrame supports more dtypes than 'int64', 'float64' and 'bool', therefor added a bunch of extra dtypes for the data variable. - From now on the label variable can be a Pandas DataFrame with the same dtypes as the data variable. - If label is a Pandas DataFrame will be converted to float. - If no feature_types is set, the data dtypes will be converted to 'int' or 'float'. - The feature_names may contain every character except [, ] or <	2015-10-17 15:13:42 +02:00
sinhrks	dbcb4c8729	Support non-str column names	2015-10-04 13:30:01 +09:00
sinhrks	b943becc61	python DMatrix now accepts pandas DataFrame	2015-10-01 22:52:32 +09:00

1 2 3

114 Commits