xgboost

Author	SHA1	Message	Date
Jiaming Yuan	2443275891	Cleanup Python code. (#6223 ) * Remove pathlike as XGBoost 1.2 requires Python 3.6. * Move conditional import of dask/distributed into dask module.	2020-10-12 15:44:41 +08:00
Christian Lorentzen	cf4f019ed6	[Breaking] Change default evaluation metric for classification to logloss / mlogloss (#6183 ) * Change DefaultEvalMetric of classification from error to logloss * Change default binary metric in plugin/example/custom_obj.cc * Set old error metric in python tests * Set old error metric in R tests * Fix missed eval metrics and typos in R tests * Fix setting eval_metric twice in R tests * Add warning for empty eval_metric for classification * Fix Dask tests Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2020-10-02 12:06:47 -07:00
Jiaming Yuan	b9ebbffc57	Fix plotting test. (#6040 ) Previously the test loads a model generated by `test_basic.py`, now we generate the model explicitly. * Cleanup saved files for basic tests.	2020-08-22 13:18:48 +08:00
Jiaming Yuan	029a8b533f	Simplify the data backends. (#5893 )	2020-07-16 15:17:31 +08:00
Jiaming Yuan	a461a9a90a	Define lazy isinstance for Python compat. (#5364 ) * Avoid importing datatable. * Fix #5363.	2020-02-26 14:23:33 +08:00
Jiaming Yuan	c35cdecddd	Move prediction cache to Learner. (#5220 ) * Move prediction cache into Learner. * Clean-ups - Remove duplicated cache in Learner and GBM. - Remove ad-hoc fix of invalid cache. - Remove `PredictFromCache` in predictors. - Remove prediction cache for linear altogether, as it's only moving the prediction into training process but doesn't provide any actual overall speed gain. - The cache is now unique to Learner, which means the ownership is no longer shared by any other components. * Changes - Add version to prediction cache. - Use weak ptr to check expired DMatrix. - Pass shared pointer instead of raw pointer.	2020-02-14 13:04:23 +08:00
Jiaming Yuan	911a902835	Merge model compatibility fixes from 1.0rc branch. (#5305 ) * Port test model compatibility. * Port logit model fix. https://github.com/dmlc/xgboost/pull/5248 https://github.com/dmlc/xgboost/pull/5281	2020-02-13 20:41:58 +08:00
Jiaming Yuan	64af1ecf86	[Breaking] Remove num roots. (#5059 )	2019-12-05 21:58:43 +08:00
Rory Mitchell	e3c34c79be	External data adapters (#5044 ) * Use external data adapters as lightweight intermediate layer between external data and DMatrix	2019-12-04 10:56:17 +13:00
Jiaming Yuan	5374f52531	Complete cudf support. (#4850 ) * Handles missing value. * Accept all floating point and integer types. * Move to cudf 9.0 API. * Remove requirement on `null_count`. * Arbitrary column types support.	2019-09-16 23:52:00 -04:00
Evan Kepner	53d4272c2a	add os.PathLike support for file paths to DMatrix and Booster Python classes (#4757 )	2019-08-15 04:46:25 -04:00
Jiaming Yuan	29a1356669	Deprecate `reg:linear' in favor of` reg:squarederror'. (#4267 ) * Deprecate `reg:linear' in favor of `reg:squarederror'. * Replace the use of `reg:linear'. * Replace the use of `silent`.	2019-03-17 17:55:04 +08:00
Philip Hyunsu Cho	2aaae2e7bb	Fix #4163 : always copy sliced data (#4165 ) * Revert "Accept numpy array view. (#4147)" This reverts commit a985a99cf0dacb26a5d734835473d492d3c2a0df. * Fix #4163: always copy sliced data * Remove print() from the test; check shape equality * Check if 'base' attribute exists * Fix lint * Address reviewer comment * Fix lint	2019-02-20 14:46:34 -08:00
Jiaming Yuan	a985a99cf0	Accept numpy array view. (#4147 ) * Accept array view (slice) in metainfo.	2019-02-18 22:21:34 +08:00
Jiaming Yuan	e0a279114e	Unify logging facilities. (#3982 ) * Unify logging facilities. * Enhance `ConsoleLogger` to handle different verbosity. * Override macros from `dmlc`. * Don't use specialized gamma when building with GPU. * Remove verbosity cache in monitor. * Test monitor. * Deprecate `silent`. * Fix doc and messages. * Fix python test. * Fix silent tests.	2018-12-14 19:29:58 +08:00
Jiaming Yuan	2ea0f887c1	Refactor Python tests. (#3897 ) * Deprecate nose tests. * Format python tests.	2018-11-15 13:56:33 +13:00
Philip Hyunsu Cho	10cd7c8447	Fix #3714 : preserve feature names when slicing DMatrix (#3766 ) * Fix #3714: preserve feature names when slicing DMatrix * Add test	2018-10-08 01:04:33 -07:00
Philip Hyunsu Cho	66e74d2223	Fix get_uint_info() (#3442 ) * Add regression test	2018-07-05 20:06:59 -07:00
Oliver Laslett	18813a26ab	allow arbitrary cross validation fold indices (#3353 ) * allow arbitrary cross validation fold indices - use training indices passed to `folds` parameter in `training.cv` - update doc string * add tests for arbitrary fold indices	2018-06-30 19:23:49 +00:00
Scott Lundberg	d878c36c84	Add SHAP interaction effects, fix minor bug, and add cox loss (#3043 ) * Add interaction effects and cox loss * Minimize whitespace changes * Cox loss now no longer needs a pre-sorted dataset. * Address code review comments * Remove mem check, rename to pred_interactions, include bias * Make lint happy * More lint fixes * Fix cox loss indexing * Fix main effects and tests * Fix lint * Use half interaction values on the off-diagonals * Fix lint again	2018-02-07 20:38:01 -06:00
Scott Lundberg	78c4188cec	SHAP values for feature contributions (#2438 ) * SHAP values for feature contributions * Fix commenting error * New polynomial time SHAP value estimation algorithm * Update API to support SHAP values * Fix merge conflicts with updates in master * Correct submodule hashes * Fix variable sized stack allocation * Make lint happy * Add docs * Fix typo * Adjust tolerances * Remove unneeded def * Fixed cpp test setup * Updated R API and cleaned up * Fixed test typo	2017-10-12 12:35:51 -07:00
PSEUDOTENSOR / Jonathan McKinney	6b375f6ad8	Multi-threaded XGDMatrixCreateFromMat for faster DMatrix creation (#2530 ) * Multi-threaded XGDMatrixCreateFromMat for faster DMatrix creation from numpy arrays for python interface.	2017-07-21 14:43:17 +12:00
Maurus Cuelenaere	6bd1869026	Add prediction of feature contributions (#2003 ) * Add prediction of feature contributions This implements the idea described at http://blog.datadive.net/interpreting-random-forests/ which tries to give insight in how a prediction is composed of its feature contributions and a bias. * Support multi-class models * Calculate learning_rate per-tree instead of using the one from the first tree * Do not rely on node.base_weight * learning_rate having the same value as the node mean value (aka leaf value, if it were a leaf); instead calculate them (lazily) on-the-fly * Add simple test for contributions feature * Check against param.num_nodes instead of checking for non-zero length * Loop over all roots instead of only the first	2017-05-14 00:58:10 -05:00
jokari69	fb0fc0c580	option to shuffle data in mknfolds (#1459 ) * option to shuffle data in mknfolds * removed possibility to run as stand alone test * split function def in 2 lines for lint * option to shuffle data in mknfolds * removed possibility to run as stand alone test * split function def in 2 lines for lint	2016-12-23 07:53:30 +08:00
AbdealiJK	b94fcab4dc	Add dump_format=json option (#1726 ) * Add format to the params accepted by DumpModel Currently, only the test format is supported when trying to dump a model. The plan is to add more such formats like JSON which are easy to read and/or parse by machines. And to make the interface for this even more generic to allow other formats to be added. Hence, we make some modifications to make these function generic and accept a new parameter "format" which signifies the format of the dump to be created. * Fix typos and errors in docs * plugin: Mention all the register macros available Document the register macros currently available to the plugin writers so they know what exactly can be extended using hooks. * sparce_page_source: Use same arg name in .h and .cc * gbm: Add JSON dump The dump_format argument can be used to specify what type of dump file should be created. Add functionality to dump gblinear and gbtree into a JSON file. The JSON file has an array, each item is a JSON object for the tree. For gblinear: - The item is the bias and weights vectors For gbtree: - The item is the root node. The root node has a attribute "children" which holds the children nodes. This happens recursively. * core.py: Add arg dump_format for get_dump()	2016-11-04 09:55:25 -07:00
tqchen	149589c583	[PYTHON] Refactor trainnig API to use callback	2016-05-19 21:31:23 -07:00
Alistair Johnson	6750c8b743	Added other feature importances in python package (#1135 ) * added new function to calculate other feature importances * added capability to plot other feature importance measures * changed plotting default to fscore * added info on importance_type to boilerplate comment * updated text of error statement * added self module name to fix call * added unit test for feature importances * style fixes	2016-05-02 12:25:24 -05:00
sinhrks	6bab164d80	Bug mixing DMatrix's with and without feature names	2016-04-30 14:42:57 +09:00
sinhrks	8fc2456c87	Enable flake8	2016-04-24 17:32:31 +09:00
tqchen	ec2fb5bc48	Fix multi-class loading	2016-03-10 19:22:26 -08:00
terrytangyuan	803a6fe474	Separate dependencies and lightweight test env for Python	2016-02-28 20:11:10 -06:00
tqchen	90bc7f8f6b	[TEST] Fix travis test when reading hdfs	2016-02-27 18:15:32 -08:00
Tianqi Chen	758a77de9c	Fix testcase after update and allow hdfs load	2016-02-26 17:04:51 -08:00
ivallesp	ed5c98f0ee	re-using the verbose-eval parameter in the cv and aggcv methods and tests adapted	2016-02-19 17:14:57 +01:00
FrozenFingerz	177259a0a7	unittest for cv bugfixes added	2015-12-29 14:13:40 +01:00
sinhrks	25c4fbd0cb	Cleanup pandas support	2015-11-13 06:55:04 +09:00
Johan Manders	b0f38e9352	Changed 4 tests Changed symbol test to give error on < sign, not on = sign Changed 3 other functions, so that float is used instead of q	2015-11-03 21:32:47 +01:00
sinhrks	1f19b78287	Python: adjusts plot_importance ylim	2015-10-25 03:16:53 +09:00
Tianqi Chen	d4d36eed45	Merge pull request #528 from terrytangyuan/test More Unit Tests for Python Package	2015-10-22 08:39:32 -07:00
terrytangyuan	ec2cdafec5	Added fixed random seed for tests (+1 squashed commit) Squashed commits: [76e3664] Added fixed random seed for tests	2015-10-21 23:38:41 -05:00
sinhrks	6f046327ac	Allow plot function to handle XGBModel	2015-10-22 01:00:54 +09:00
sinhrks	dbcb4c8729	Support non-str column names	2015-10-04 13:30:01 +09:00
Tianqi Chen	2859c190cd	Merge pull request #522 from sinhrks/pandas python DMatrix now accepts pandas DataFrame	2015-10-02 10:19:14 -07:00
sinhrks	b958c55ac6	CV returns ndarray or DataFrame	2015-10-02 22:38:03 +09:00
sinhrks	b943becc61	python DMatrix now accepts pandas DataFrame	2015-10-01 22:52:32 +09:00
sinhrks	f6f3473d17	Change to properties	2015-09-28 22:36:39 +09:00
sinhrks	db692a30e5	Add feature_types	2015-09-28 22:25:35 +09:00
sinhrks	f7d434aec2	Fix numpy array check logic	2015-09-17 22:51:44 +09:00
sinhrks	bb6b7ded55	Cleanup str roundtrip using ctypes	2015-09-17 04:10:19 +09:00
sinhrks	db0c9e1c2d	BUG: incorrect model_file results in segfault	2015-09-16 22:02:30 +09:00

1 2

55 Commits