xgboost

Author	SHA1	Message	Date
Jiaming Yuan	db80671d6b	Fix monotone constraint with tuple input. (#7891 )	2022-05-13 04:00:03 +08:00
Jiaming Yuan	8ba4722d04	Remove pyarrow workaround. (#7884 )	2022-05-11 20:54:48 +08:00
Rong Ou	14ef38b834	Initial support for federated learning (#7831 ) Federated learning plugin for xgboost: * A gRPC server to aggregate MPI-style requests (allgather, allreduce, broadcast) from federated workers. * A Rabit engine for the federated environment. * Integration test to simulate federated learning. Additional followups are needed to address GPU support, better security, and privacy, etc.	2022-05-05 21:49:22 +08:00
Jiaming Yuan	ad06172c6b	Refactor pandas dataframe handling. (#7843 )	2022-04-26 18:53:43 +08:00
Jiaming Yuan	f0f76259c9	Remove `STRING_TYPES`. (#7827 )	2022-04-22 19:07:51 +08:00
Jiaming Yuan	c70fa502a5	Expose `feature_types` to sklearn interface. (#7821 )	2022-04-21 20:23:35 +08:00
Jiaming Yuan	52d4eda786	Deprecate `use_label_encoder` in XGBClassifier. (#7822 ) * Deprecate `use_label_encoder` in XGBClassifier. * We have removed the encoder, now prepare to remove the indicator.	2022-04-21 13:14:02 +08:00
Jiaming Yuan	bcce17e688	Remove text loading in basic walk through demo. (#7753 )	2022-04-01 00:59:42 +08:00
Jiaming Yuan	02dd7b6913	Remove use of distutils. (#7770 ) distutils is deprecated and replaced by other stdlib constructs.	2022-03-31 19:03:10 +08:00
Jiaming Yuan	522636cb52	Bump version. (#7769 )	2022-03-31 06:33:22 +08:00
Jiaming Yuan	9150fdbd4d	Support pandas nullable types. (#7760 )	2022-03-30 08:51:52 +08:00
Jiaming Yuan	a50b84244e	Cleanup configuration for constraints. (#7758 )	2022-03-29 04:22:46 +08:00
Jiaming Yuan	3c9b04460a	Move `num_parallel_tree` to model parameter. (#7751 ) The size of forest should be a property of model itself instead of a training hyper-parameter.	2022-03-29 02:32:42 +08:00
Jiaming Yuan	b3ba0e8708	Check cupy lazily. (#7752 )	2022-03-26 06:09:58 +08:00
Chengyang	c92ab2ce49	Add type hints to core.py (#7707 ) Co-authored-by: Chengyang Gu <bridgream@gmail.com> Co-authored-by: jiamingy <jm.yuan@outlook.com>	2022-03-23 21:12:14 +08:00
Xiaochang Wu	613ec36c5a	Support building SimpleDMatrix from Arrow data format (#7512 ) * Integrate with Arrow C data API. * Support Arrow dataset. * Support Arrow table. Co-authored-by: Xiaochang Wu <xiaochang.wu@intel.com> Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com> Co-authored-by: Zhang Zhang <zhang.zhang@intel.com>	2022-03-15 13:25:19 +08:00
Jiaming Yuan	a62a3d991d	[dask] prediction with categorical data. (#7708 )	2022-03-10 00:21:48 +08:00
Pradipta Ghosh	68b6d6bbe2	Fix for Feature shape mismatch error (#7715 )	2022-03-03 21:36:29 +08:00
Cheng Li	a92e0f6240	multi groups in the constraints (#7711 )	2022-03-01 18:10:15 +08:00
Jiaming Yuan	83a66b4994	Support categorical data for hist. (#7695 ) * Extract partitioner from hist. * Implement categorical data support by passing the gradient index directly into the partitioner. * Organize/update document. * Remove code for negative hessian.	2022-02-25 03:47:14 +08:00
Jiaming Yuan	c859764d29	[doc] Clarify that states in callbacks are mutated. (#7685 ) * Fix copy for cv. This prevents inserting default callbacks into the input list. * Clarify the behavior of callbacks in training/cv. * Fix typos in doc.	2022-02-22 11:45:00 +08:00
Jiaming Yuan	e56d1779e1	Require Python 3.7. (#7682 ) * Update setup.py.	2022-02-21 05:46:48 +08:00
Jiaming Yuan	f08c5dcb06	Cleanup some pylint errors. (#7667 ) * Cleanup some pylint errors. * Cleanup pylint errors in rabit modules. * Make data iter an abstract class and cleanup private access. * Cleanup no-self-use for booster.	2022-02-19 18:53:12 +08:00
Jiaming Yuan	b76c5d54bf	Define export symbols in callback module. (#7665 )	2022-02-19 18:52:41 +08:00
Jiaming Yuan	0d0abe1845	Support optimal partitioning for GPU hist. (#7652 ) * Implement `MaxCategory` in quantile. * Implement partition-based split for GPU evaluation. Currently, it's based on the existing evaluation function. * Extract an evaluator from GPU Hist to store the needed states. * Added some CUDA stream/event utilities. * Update document with references. * Fixed a bug in approx evaluator where the number of data points is less than the number of categories.	2022-02-15 03:03:12 +08:00
Jiaming Yuan	5cd1f71b51	[dask] Improve configuration for port. (#7645 ) - Try port 0 to let the OS return the available port. - Add port configuration.	2022-02-14 21:34:34 +08:00
Jiaming Yuan	b52c4e13b0	[dask] Fix empty partition with pandas input. (#7644 ) Empty partition is different from empty dataset. For the former case, each worker has non-empty dask collections, but each collection might contain empty partition.	2022-02-14 19:35:51 +08:00
Jiaming Yuan	fe4ce920b2	[dask] Cleanup dask module. (#7634 ) * Add a new utility for mapping function onto workers. * Unify the type for feature names. * Clean up the iterator. * Fix prediction with DaskDMatrix worker specification. * Fix base margin with DeviceQuantileDMatrix. * Support vs 2022 in setup.py.	2022-02-08 20:41:46 +08:00
Jiaming Yuan	926af9951e	Add missing train parameter for sklearn interface. (#7629 ) Some other parameters are still missing and rely on **kwargs, for instance parameters from dart.	2022-02-08 13:20:19 +08:00
Jiaming Yuan	3e693e4f97	[dask] Fix nthread config with dask sklearn wrapper. (#7633 )	2022-02-08 06:38:32 +08:00
Philip Hyunsu Cho	f6e6d0b2c0	[CI] Build Python wheels for MacOS (x86_64 and arm64) (#7621 ) * Build Python wheels for OSX (x86_64 and arm64) * Use Conda's libomp when running Python tests * fix * Add comment to explain CIBW_TARGET_OSX_ARM64 * Update release script * Add comments in build_python_wheels.sh * Document wheel pipeline	2022-02-02 17:35:48 -08:00
Philip Hyunsu Cho	b4340abf56	Add special handling for multi:softmax in sklearn predict (#7607 ) * Add special handling for multi:softmax in sklearn predict * Add test coverage	2022-01-29 15:54:49 -08:00
Jiaming Yuan	24789429fd	Support latest pandas Index type. (#7595 )	2022-01-26 18:20:10 +08:00
Jiaming Yuan	f84291c1e1	Fix `max_cat_to_onehot` doc annotation [skip ci] (#7592 )	2022-01-23 16:33:23 +08:00
Jiaming Yuan	ef4dae4c0e	[dask] Add scheduler address to dask config. (#7581 ) - Add user configuration. - Bring back to the logic of using scheduler address from dask. This was removed when we were trying to support GKE, now we bring it back and let xgboost try it if direct guess or host IP from user config failed.	2022-01-22 01:56:32 +08:00
Jiaming Yuan	b4ec1682c6	Update document for multi output and categorical. (#7574 ) * Group together categorical related parameters. * Update documents about multioutput and categorical.	2022-01-19 04:35:17 +08:00
Jiaming Yuan	dac9eb13bd	Implement new `save_raw` in Python. (#7572 ) * Expose the new C API function to Python. * Remove old document and helper script. * Small optimization to the `save_raw` and Json ctors.	2022-01-19 02:27:51 +08:00
Jiaming Yuan	13b0fa4b97	Implement `get_group`. (#7564 )	2022-01-16 02:07:42 +08:00
Jiaming Yuan	52277cc3da	Rename build info function to be consistent with rest of the API. (#7553 )	2022-01-14 00:39:28 +08:00
Jiaming Yuan	001503186c	Rewrite approx (#7214 ) This PR rewrites the approx tree method to use codebase from hist for better performance and code sharing. The rewrite has many benefits: - Support for both `max_leaves` and `max_depth`. - Support for `grow_policy`. - Support for mono constraint. - Support for feature weights. - Support for easier bin configuration (`max_bin`). - Support for categorical data. - Faster performance for most of the datasets. (many times faster) - Support for prediction cache. - Significantly better performance for external memory. - Unites the code base between approx and hist.	2022-01-10 21:15:05 +08:00
Jiaming Yuan	54582f641a	[doc] Use cross references in sphinx doc. (#7522 ) * Use cross references instead of URL. * Fix auto doc for callback.	2022-01-05 03:21:25 +08:00
Jiaming Yuan	eb1efb54b5	Define `feature_names_in_`. (#7526 ) * Define `feature_names_in_`. * Raise attribute error if it's not defined.	2022-01-05 01:35:34 +08:00
Jiaming Yuan	8f0a42a266	Initial support for multi-label classification. (#7521 ) * Add support in sklearn classifier.	2022-01-04 23:58:21 +08:00
Jiaming Yuan	58a6723eb1	Initial support for multioutput regression. (#7514 ) * Add num target model parameter, which is configured from input labels. * Change elementwise metric and indexing for weights. * Add demo. * Add tests.	2021-12-18 09:28:38 +08:00
Jiaming Yuan	6f8a4633b7	Fix Python typehint with upgraded mypy. (#7513 )	2021-12-16 23:08:08 +08:00
Jiaming Yuan	70b12d898a	[dask] Fix ddqdm with empty partition. (#7510 ) * Fix empty partition. * war.	2021-12-16 20:37:29 +08:00
Jiaming Yuan	05497a9141	[dask] Fix asyncio. (#7508 )	2021-12-13 01:48:25 +08:00
Jiaming Yuan	021f8bf28b	Fix pylint. (#7498 )	2021-12-07 13:23:30 +08:00
Jiaming Yuan	b124a27f57	Support scipy sparse in dask. (#7457 )	2021-11-23 16:45:36 +08:00
Harvey	0552ca8021	Fix typo (#7469 )	2021-11-23 08:58:45 +08:00

1 2 3 4 5 ...

676 Commits