xgboost

Author	SHA1	Message	Date
Ram Rachum	02884b08aa	Fix exception causes all over the codebase (#5787 )	2020-06-15 21:06:07 +08:00
Jiaming Yuan	35e2205256	[dask] Return GPU Series when input is from cuDF. (#5710 ) * Refactor predict function.	2020-05-28 17:51:20 +08:00
Jiaming Yuan	f145241593	Let XGBoostError inherit ValueError. (#5696 )	2020-05-26 08:34:56 +08:00
Jiaming Yuan	5af8161a1a	Implement Python data handler. (#5689 ) * Define data handlers for DMatrix. * Throw ValueError in scikit learn interface.	2020-05-22 11:53:55 +08:00
Philip Hyunsu Cho	4fd95272c8	Instruct Mac users to install libomp (#5606 )	2020-04-25 15:50:30 -07:00
Jiaming Yuan	9c1103e06c	[Breaking] Set output margin to True for custom objective. (#5564 ) * Set output margin to True for custom objective in Python and R. * Add a demo for writing multi-class custom objective function. * Run tests on selected demos.	2020-04-20 20:44:12 +08:00
Jiaming Yuan	cfee9fae91	Don't use uint for threads. (#5542 )	2020-04-17 09:45:42 +08:00
Jiaming Yuan	8b04736b81	[dask] dask cudf inplace prediction. (#5512 ) * Add inplace prediction for dask-cudf. * Remove Dockerfile.release, since it's not used anywhere * Use Conda exclusively in CUDF and GPU containers * Improve cupy memory copying. * Add skip marks to tests. * Add mgpu-cudf category on the CI to run all distributed tests. Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2020-04-15 18:15:51 +08:00
Jiaming Yuan	c90119eb67	Update Python doc. [skip ci] (#5517 ) * Update doc for copying booster. [skip ci] The issue is resolved in #5312 . * Add version for new APIs. [skip ci]	2020-04-14 16:25:20 +08:00
Jiaming Yuan	c218d8ffbf	Enable parameter validation for skl. (#5477 )	2020-04-03 10:23:58 +08:00
Rory Mitchell	15f40e51e9	Add support for dlpack, expose python docs for DeviceQuantileDMatrix (#5465 )	2020-04-01 23:34:32 +13:00
Jiaming Yuan	6601a641d7	Thread safe, inplace prediction. (#5389 ) Normal prediction with DMatrix is now thread safe with locks. Added inplace prediction is lock free thread safe. When data is on device (cupy, cudf), the returned data is also on device. * Implementation for numpy, csr, cudf and cupy. * Implementation for dask. * Remove sync in simple dmatrix.	2020-03-30 15:35:28 +08:00
Rory Mitchell	13b10a6370	Device dmatrix (#5420 )	2020-03-28 14:42:21 +13:00
Jiaming Yuan	abca9908ba	Support pandas SparseArray. (#5431 )	2020-03-20 21:40:22 +08:00
Jiaming Yuan	8d06878bf9	Deterministic GPU histogram. (#5361 ) * Use pre-rounding based method to obtain reproducible floating point summation. * GPU Hist for regression and classification are bit-by-bit reproducible. * Add doc. * Switch to thrust reduce for `node_sum_gradient`.	2020-03-04 15:13:28 +08:00
Jiaming Yuan	a461a9a90a	Define lazy isinstance for Python compat. (#5364 ) * Avoid importing datatable. * Fix #5363.	2020-02-26 14:23:33 +08:00
Jiaming Yuan	0fd455e162	Restore loading model from buffer. (#5360 )	2020-02-26 11:30:13 +08:00
OrdoAbChao	b4f952bd22	[Breaking] Remove Scikit-Learn default parameters (#5130 ) * Simplify Scikit-Learn parameter management. * Copy base class for removing duplicated parameter signatures. * Set all parameters to None. * Handle None in set_param. * Extract the doc. Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>	2020-01-23 20:25:20 +08:00
Jiaming Yuan	1891cc766d	Fix metainfo from DataFrame. (#5216 ) * Fix metainfo from DataFrame. * Unify helper functions for data and meta.	2020-01-22 16:29:44 +08:00
Rory Mitchell	5d4c24a1fc	Fix cupy without cudf import (#5219 )	2020-01-22 18:02:39 +13:00
Rory Mitchell	9c56480c61	Support dmatrix construction from cupy array (#5206 )	2020-01-22 13:15:27 +13:00
Kodi Arfer	f100b8d878	[Breaking] Don't drop trees during DART prediction by default (#5115 ) * Simplify DropTrees calling logic * Add `training` parameter for prediction method. * [Breaking]: Add `training` to C API. * Change for R and Python custom objective. * Correct comment. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>	2020-01-13 21:48:30 +08:00
Jiaming Yuan	ebc86a3afa	Disable parameter validation for Scikit-Learn interface. (#5167 ) * Disable parameter validation for now. Scikit-Learn passes all parameters down to XGBoost, whether they are used or not. * Add option `validate_parameters`.	2020-01-07 11:17:31 +08:00
K.O	018df6004e	Fix feature_name crated from int64index dataframe. (#5081 )	2019-12-30 12:26:22 +08:00
Jiaming Yuan	6848d0426f	Clean up Python 2 compatibility code. (#5161 )	2019-12-27 18:34:53 +08:00
Jiaming Yuan	0202e04a8e	Add base margin to sklearn interface. (#5151 )	2019-12-24 09:43:41 +08:00
Jiaming Yuan	3136185bc5	JSON configuration IO. (#5111 ) * Add saving/loading JSON configuration. * Implement Python pickle interface with new IO routines. * Basic tests for training continuation.	2019-12-15 17:31:53 +08:00
mitama	374648c21a	Add better error message for invalid feature names (#5024 )	2019-11-10 01:58:14 +08:00
Jiaming Yuan	7663de956c	Run training with empty DMatrix. (#4990 ) This makes GPU Hist robust in distributed environment as some workers might not be associated with any data in either training or evaluation. * Disable rabit mock test for now: See #5012 . * Disable dask-cudf test at prediction for now: See #5003 * Launch dask job for all workers despite they might not have any data. * Check 0 rows in elementwise evaluation metrics. Using AUC and AUC-PR still throws an error. See #4663 for a robust fix. * Add tests for edge cases. * Add `LaunchKernel` wrapper handling zero sized grid. * Move some parts of allreducer into a cu file. * Don't validate feature names when the booster is empty. * Sync number of columns in DMatrix. As num_feature is required to be the same across all workers in data split mode. * Filtering in dask interface now by default syncs all booster that's not empty, instead of using rank 0. * Fix Jenkins' GPU tests. * Install dask-cuda from source in Jenkins' test. Now all tests are actually running. * Restore GPU Hist tree synchronization test. * Check UUID of running devices. The check is only performed on CUDA version >= 10.x, as 9.x doesn't have UUID field. * Fix CMake policy and project variables. Use xgboost_SOURCE_DIR uniformly, add policy for CMake >= 3.13. * Fix copying data to CPU * Fix race condition in cpu predictor. * Fix duplicated DMatrix construction. * Don't download extra nccl in CI script.	2019-11-06 16:13:13 +08:00
Philip Hyunsu Cho	741fbf47c4	[CI] Update lint configuration to support latest pylint convention (#4971 ) * Update lint configuration * Use gcc 8 consistently in build instruction	2019-10-21 16:40:57 -07:00
Jacob Kim	a78d4e7aa8	Follow PEP 257 -- Docstring Conventions (#4959 )	2019-10-17 23:45:25 -04:00
Jiaming Yuan	7e72a12871	Don't `set_params` at the end of `set_state`. (#4947 ) * Don't set_params at the end of set_state. * Also fix another issue found in dask prediction. * Add note about prediction. Don't support other prediction modes at the moment.	2019-10-15 10:08:26 -04:00
Jiaming Yuan	d30e63a0a5	Support feature names/types for cudf. (#4902 ) * Implement most of the pandas procedure for cudf except for type conversion. * Requires an array of interfaces in metainfo.	2019-09-29 15:07:51 -04:00
Vibhu Jawa	2fa8b359e0	Add support for cudf.Series (#4891 )	2019-09-25 23:52:28 -04:00
Jiaming Yuan	b8433c455a	Rewrite Dask interface. (#4819 )	2019-09-25 01:30:14 -04:00
Jiaming Yuan	c7416002e9	Fix DMatrix doc. (#4884 )	2019-09-23 01:55:04 -04:00
Jiaming Yuan	d669ea1eaa	Deprecate set group (#4864 ) * Convert jvm package and R package. * Restore for compatibility.	2019-09-17 21:26:54 -04:00
Jiaming Yuan	5374f52531	Complete cudf support. (#4850 ) * Handles missing value. * Accept all floating point and integer types. * Move to cudf 9.0 API. * Remove requirement on `null_count`. * Arbitrary column types support.	2019-09-16 23:52:00 -04:00
Jiaming Yuan	9700776597	Cudf support. (#4745 ) * Initial support for cudf integration. * Add two C APIs for consuming data and metainfo. * Add CopyFrom for SimpleCSRSource as a generic function to consume the data. * Add FromDeviceColumnar for consuming device data. * Add new MetaInfo::SetInfo for consuming label, weight etc.	2019-08-19 16:51:40 +12:00
Evan Kepner	53d4272c2a	add os.PathLike support for file paths to DMatrix and Booster Python classes (#4757 )	2019-08-15 04:46:25 -04:00
Marcos	562d9ae963	Eliminate FutureWarning: Series.base is deprecated (#4337 ) * Remove all references to data.base Should eliminate the deprecation warning in issue #4300 * Fix lint	2019-07-04 21:06:23 -07:00
Oleksandr Pryimak	986fee6022	pytest tests/python fails if no pandas installed (#4620 ) * _maybe_pandas_xxx should return their arguments unchanged if no pandas installed * Tests should not assume pandas is installed * Mark tests which require pandas as such	2019-07-01 02:54:08 +08:00
Jiaming Yuan	8bdf15120a	Implement tree model dump with code generator. (#4602 ) * Implement tree model dump with a code generator. * Split up generators. * Implement graphviz generator. * Use pattern matching. * [Breaking] Return a Source in `to_graphviz` instead of Digraph in Python package. Co-Authored-By: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2019-06-26 15:20:44 +08:00
Bryan Woods	278562db13	Add support for cross-validation using query ID (#4474 ) * adding support for matrix slicing with query ID for cross-validation * hail mary test of unrar installation for windows tests * trying to modify tests to run in Github CI * Remove dependency on wget and unrar * Save error log from R test * Relax assertion in test_training * Use int instead of bool in C function interface * Revise R interface * Add XGDMatrixSliceDMatrixEx and keep old XGDMatrixSliceDMatrix for API compatibility	2019-05-23 10:45:02 -07:00
Sean Owen	5a567ec249	Ensure pandas DataFrame column names are treated as strings in type error message (#4481 )	2019-05-21 16:19:35 +08:00
Philip Hyunsu Cho	bbe0dbd7ec	Migrate pylint check to Python 3 (#4381 ) * Migrate lint to Python 3 * Fix lint errors * Use Miniconda3 to use Python 3.7 * Use latest pylint and astroid	2019-04-21 01:01:54 -07:00
Jiaming Yuan	82dca3c108	Don't store DMatrix handle until it's initialized. (#4317 ) * Use a temporary variable to store the handle. * Decode c++ error message. * Simple note about saved binary.	2019-04-01 18:29:28 +08:00
Patrick Ford	74009afcac	Added trees_to_df() method for Booster class (#4153 ) * add test_parse_tree.py to tests/python * Fix formatting * Fix pylint error * Ignore 'no member' error for Pandas dataframe	2019-02-26 13:28:24 -08:00
Philip Hyunsu Cho	2aaae2e7bb	Fix #4163 : always copy sliced data (#4165 ) * Revert "Accept numpy array view. (#4147)" This reverts commit a985a99cf0dacb26a5d734835473d492d3c2a0df. * Fix #4163: always copy sliced data * Remove print() from the test; check shape equality * Check if 'base' attribute exists * Fix lint * Address reviewer comment * Fix lint	2019-02-20 14:46:34 -08:00
Jiaming Yuan	a985a99cf0	Accept numpy array view. (#4147 ) * Accept array view (slice) in metainfo.	2019-02-18 22:21:34 +08:00

... 2 3 4 5 6

290 Commits