xgboost

Author	SHA1	Message	Date
Jiaming Yuan	3136185bc5	JSON configuration IO. (#5111 ) * Add saving/loading JSON configuration. * Implement Python pickle interface with new IO routines. * Basic tests for training continuation.	2019-12-15 17:31:53 +08:00
mitama	374648c21a	Add better error message for invalid feature names (#5024 )	2019-11-10 01:58:14 +08:00
Jiaming Yuan	7663de956c	Run training with empty DMatrix. (#4990 ) This makes GPU Hist robust in distributed environment as some workers might not be associated with any data in either training or evaluation. * Disable rabit mock test for now: See #5012 . * Disable dask-cudf test at prediction for now: See #5003 * Launch dask job for all workers despite they might not have any data. * Check 0 rows in elementwise evaluation metrics. Using AUC and AUC-PR still throws an error. See #4663 for a robust fix. * Add tests for edge cases. * Add `LaunchKernel` wrapper handling zero sized grid. * Move some parts of allreducer into a cu file. * Don't validate feature names when the booster is empty. * Sync number of columns in DMatrix. As num_feature is required to be the same across all workers in data split mode. * Filtering in dask interface now by default syncs all booster that's not empty, instead of using rank 0. * Fix Jenkins' GPU tests. * Install dask-cuda from source in Jenkins' test. Now all tests are actually running. * Restore GPU Hist tree synchronization test. * Check UUID of running devices. The check is only performed on CUDA version >= 10.x, as 9.x doesn't have UUID field. * Fix CMake policy and project variables. Use xgboost_SOURCE_DIR uniformly, add policy for CMake >= 3.13. * Fix copying data to CPU * Fix race condition in cpu predictor. * Fix duplicated DMatrix construction. * Don't download extra nccl in CI script.	2019-11-06 16:13:13 +08:00
Philip Hyunsu Cho	741fbf47c4	[CI] Update lint configuration to support latest pylint convention (#4971 ) * Update lint configuration * Use gcc 8 consistently in build instruction	2019-10-21 16:40:57 -07:00
Jacob Kim	a78d4e7aa8	Follow PEP 257 -- Docstring Conventions (#4959 )	2019-10-17 23:45:25 -04:00
Jiaming Yuan	7e72a12871	Don't `set_params` at the end of `set_state`. (#4947 ) * Don't set_params at the end of set_state. * Also fix another issue found in dask prediction. * Add note about prediction. Don't support other prediction modes at the moment.	2019-10-15 10:08:26 -04:00
Jiaming Yuan	d30e63a0a5	Support feature names/types for cudf. (#4902 ) * Implement most of the pandas procedure for cudf except for type conversion. * Requires an array of interfaces in metainfo.	2019-09-29 15:07:51 -04:00
Vibhu Jawa	2fa8b359e0	Add support for cudf.Series (#4891 )	2019-09-25 23:52:28 -04:00
Jiaming Yuan	b8433c455a	Rewrite Dask interface. (#4819 )	2019-09-25 01:30:14 -04:00
Jiaming Yuan	c7416002e9	Fix DMatrix doc. (#4884 )	2019-09-23 01:55:04 -04:00
Jiaming Yuan	d669ea1eaa	Deprecate set group (#4864 ) * Convert jvm package and R package. * Restore for compatibility.	2019-09-17 21:26:54 -04:00
Jiaming Yuan	5374f52531	Complete cudf support. (#4850 ) * Handles missing value. * Accept all floating point and integer types. * Move to cudf 9.0 API. * Remove requirement on `null_count`. * Arbitrary column types support.	2019-09-16 23:52:00 -04:00
Jiaming Yuan	9700776597	Cudf support. (#4745 ) * Initial support for cudf integration. * Add two C APIs for consuming data and metainfo. * Add CopyFrom for SimpleCSRSource as a generic function to consume the data. * Add FromDeviceColumnar for consuming device data. * Add new MetaInfo::SetInfo for consuming label, weight etc.	2019-08-19 16:51:40 +12:00
Evan Kepner	53d4272c2a	add os.PathLike support for file paths to DMatrix and Booster Python classes (#4757 )	2019-08-15 04:46:25 -04:00
Marcos	562d9ae963	Eliminate FutureWarning: Series.base is deprecated (#4337 ) * Remove all references to data.base Should eliminate the deprecation warning in issue #4300 * Fix lint	2019-07-04 21:06:23 -07:00
Oleksandr Pryimak	986fee6022	pytest tests/python fails if no pandas installed (#4620 ) * _maybe_pandas_xxx should return their arguments unchanged if no pandas installed * Tests should not assume pandas is installed * Mark tests which require pandas as such	2019-07-01 02:54:08 +08:00
Jiaming Yuan	8bdf15120a	Implement tree model dump with code generator. (#4602 ) * Implement tree model dump with a code generator. * Split up generators. * Implement graphviz generator. * Use pattern matching. * [Breaking] Return a Source in `to_graphviz` instead of Digraph in Python package. Co-Authored-By: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2019-06-26 15:20:44 +08:00
Bryan Woods	278562db13	Add support for cross-validation using query ID (#4474 ) * adding support for matrix slicing with query ID for cross-validation * hail mary test of unrar installation for windows tests * trying to modify tests to run in Github CI * Remove dependency on wget and unrar * Save error log from R test * Relax assertion in test_training * Use int instead of bool in C function interface * Revise R interface * Add XGDMatrixSliceDMatrixEx and keep old XGDMatrixSliceDMatrix for API compatibility	2019-05-23 10:45:02 -07:00
Sean Owen	5a567ec249	Ensure pandas DataFrame column names are treated as strings in type error message (#4481 )	2019-05-21 16:19:35 +08:00
Philip Hyunsu Cho	bbe0dbd7ec	Migrate pylint check to Python 3 (#4381 ) * Migrate lint to Python 3 * Fix lint errors * Use Miniconda3 to use Python 3.7 * Use latest pylint and astroid	2019-04-21 01:01:54 -07:00
Jiaming Yuan	82dca3c108	Don't store DMatrix handle until it's initialized. (#4317 ) * Use a temporary variable to store the handle. * Decode c++ error message. * Simple note about saved binary.	2019-04-01 18:29:28 +08:00
Patrick Ford	74009afcac	Added trees_to_df() method for Booster class (#4153 ) * add test_parse_tree.py to tests/python * Fix formatting * Fix pylint error * Ignore 'no member' error for Pandas dataframe	2019-02-26 13:28:24 -08:00
Philip Hyunsu Cho	2aaae2e7bb	Fix #4163 : always copy sliced data (#4165 ) * Revert "Accept numpy array view. (#4147)" This reverts commit a985a99cf0dacb26a5d734835473d492d3c2a0df. * Fix #4163: always copy sliced data * Remove print() from the test; check shape equality * Check if 'base' attribute exists * Fix lint * Address reviewer comment * Fix lint	2019-02-20 14:46:34 -08:00
Jiaming Yuan	a985a99cf0	Accept numpy array view. (#4147 ) * Accept array view (slice) in metainfo.	2019-02-18 22:21:34 +08:00
Pasha Stetsenko	ff2d4c99fa	Update datatable usage (#4123 )	2019-02-17 03:44:09 +08:00
Philip Hyunsu Cho	99a290489c	Update Python docstring for ranking functions (#4121 ) * Update Python docstring for ranking functions * Fix formatting	2019-02-10 12:22:02 -08:00
Jiaming Yuan	1088dff42c	Prevent training without setting up caches. (#4066 ) * Prevent training without setting up caches. * Add warning for internal functions. * Check number of features. * Address reviewer's comment.	2019-02-03 01:03:29 -08:00
Jiaming Yuan	4fac9874e0	Check booster for dart in feature importance. (#4073 ) * Check booster for dart in feature importance.	2019-01-22 16:03:54 +08:00
Sam Wilkinson	fd722d60cd	Deprecation warning for lists passed into DMatrix (#3970 ) * Ensure lists cannot be passed into DMatrix The documentation does not include lists as an allowed type for the data inputted into DMatrix. Despite this, a list can be passed in without an error. This change would prevent a list form being passed in directly.	2018-12-14 19:26:11 +08:00
Philip Hyunsu Cho	c5130e487a	Fix #3894 : Allow loading pickles without self.booster attributes (redux) (#3944 )	2018-11-28 09:31:46 -08:00
Jiaming Yuan	93f63324e6	Address deprecation of Python ABC. (#3909 )	2018-11-16 19:43:32 +13:00
Jelle Zijlstra	d9642cf757	handle $PATH not being set in python library (#3845 ) Fixes #3844	2018-11-06 15:27:02 -08:00
Philip Hyunsu Cho	ea99b53d8e	Document behavior of get_fscore() for zero-importance features (#3763 )	2018-10-08 01:52:25 -07:00
Philip Hyunsu Cho	10cd7c8447	Fix #3714 : preserve feature names when slicing DMatrix (#3766 ) * Fix #3714: preserve feature names when slicing DMatrix * Add test	2018-10-08 01:04:33 -07:00
Philip Hyunsu Cho	c23783a0d1	Add notes to doc (#3765 )	2018-10-07 14:09:09 -07:00
Takahiro Kojima	2405c59352	remove extra of (#3713 )	2018-09-21 11:55:39 -07:00
Philip Hyunsu Cho	bd41bd6605	Better error message for failed library loading (#3690 ) * Better error message for failed lib loading * Address review comment + fix lint	2018-09-12 22:37:26 -07:00
Philip Hyunsu Cho	5a8bbb39a1	Revert #3677 and #3674 (#3678 ) * Revert "Add scikit-learn as dependency for doc build (#3677)" This reverts commit 308f664ade0547242608e21f6198c895415f03da. * Revert "Add scikit-learn tests (#3674)" This reverts commit d176a0fbc8165e3afe3e42ff464ab7b253211555.	2018-09-06 20:43:17 -07:00
Philip Hyunsu Cho	d176a0fbc8	Add scikit-learn tests (#3674 ) * Add scikit-learn tests Goal is to pass scikit-learn's check_estimator() for XGBClassifier, XGBRegressor, and XGBRanker. It is actually not possible to do so entirely, since check_estimator() assumes that NaN is disallowed, but XGBoost allows for NaN as missing values. However, it is always good ideas to add some checks inspired by check_estimator(). * Fix lint * Fix lint	2018-09-06 09:55:28 -07:00
Philip Hyunsu Cho	4ed8a88240	Update Python API doc (#3619 ) * Add XGBRanker to Python API doc * Show inherited members of XGBRegressor in API doc, since XGBRegressor uses default methods from XGBModel * Add table of contents to Python API doc * Skip JVM doc download if not available * Show inherited members for XGBRegressor and XGBRanker * Expose XGBRanker to Python XGBoost module directory * Add docstring to XGBRegressor.predict() and XGBRanker.predict() * Fix rendering errors in Python docstrings * Fix lint	2018-08-22 18:59:30 -07:00
Grace Lam	993e62b9e7	Add JSON model dump functionality (#3603 ) * Add JSON model dump functionality * Fix lint	2018-08-17 16:18:43 -07:00
trivialfis	7c82dc92b2	Fix accessing DMatrix.handle before set. (#3599 ) Close #3597.	2018-08-16 15:26:06 -07:00
Philip Hyunsu Cho	3c72654e3b	Revert "Fix #3485 , #3540 : Don't use dropout for predicting test sets" (#3563 ) * Revert "Fix #3485, #3540: Don't use dropout for predicting test sets (#3556)" This reverts commit 44811f233071c5805d70c287abd22b155b732727. * Document behavior of predict() for DART booster * Add notice to parameter.rst	2018-08-08 09:48:55 -07:00
wenduowang	3b62e75f2e	Fix bug of using list(x) function when x is string (#3432 ) * Fix bug of using list(x) function when x is string list('abcdcba') = ['a', 'b', 'c', 'd', 'c', 'b', 'a'] * Allow feature_names/feature_types to be of any type If feature_names/feature_types is iterable, e.g. tuple, list, then convert the value to list, except for string; otherwise construct a list with a single value * Delete excess whitespace * Fix whitespace to pass lint	2018-07-30 07:36:34 -07:00
jqmp	e9a97e0d88	Add total_gain and total_cover importance measures (#3498 ) Add `'total_gain'` and `'total_cover'` as possible `importance_type` arguments to `Booster.get_score` in the Python package. `get_score` already accepts a `'gain'` argument, which returns each feature's average gain over all of its splits. `'total_gain'` does the same, but returns a total rather than an average. This seems more intuitively meaningful, and also matches the behavior of the R package's `xgb.importance` function. I also added an analogous `'total_cover'` command for consistency. This should resolve #3484.	2018-07-23 00:30:55 -07:00
KOLANICH	a393d44c5d	Improved library loading a bit (#3481 ) * Improved library loading a bit * Fixed indentation. * Fixes according to the discussion * Moved the comment to a separate line. * specified exception type	2018-07-20 16:03:44 -07:00
Philip Hyunsu Cho	66e74d2223	Fix get_uint_info() (#3442 ) * Add regression test	2018-07-05 20:06:59 -07:00
Philip Hyunsu Cho	48d6e68690	Add callback interface to re-direct console output (#3438 ) * Add callback interface to re-direct console output * Exempt TrackerLogger from custom logging * Fix lint	2018-07-05 11:32:30 -07:00
cinqS	8bec8d5e9a	Better doc for save_model() / load_model() (#3143 ) Be clear that they do not save Python-specific attributes	2018-06-29 04:24:33 +00:00
PSEUDOTENSOR / Jonathan McKinney	9ac163d0bb	Allow import via python datatable. (#3272 ) * Allow import via python datatable. * Write unit tests * Refactor dt API functions * Refactor python code * Lint fixes * Address review comments	2018-06-20 13:16:18 -07:00

1 2 3

114 Commits