xgboost

Author	SHA1	Message	Date
Jiaming Yuan	c859764d29	[doc] Clarify that states in callbacks are mutated. (#7685 ) * Fix copy for cv. This prevents inserting default callbacks into the input list. * Clarify the behavior of callbacks in training/cv. * Fix typos in doc.	2022-02-22 11:45:00 +08:00
Jiaming Yuan	e56d1779e1	Require Python 3.7. (#7682 ) * Update setup.py.	2022-02-21 05:46:48 +08:00
Jiaming Yuan	f08c5dcb06	Cleanup some pylint errors. (#7667 ) * Cleanup some pylint errors. * Cleanup pylint errors in rabit modules. * Make data iter an abstract class and cleanup private access. * Cleanup no-self-use for booster.	2022-02-19 18:53:12 +08:00
Jiaming Yuan	b76c5d54bf	Define export symbols in callback module. (#7665 )	2022-02-19 18:52:41 +08:00
Jiaming Yuan	0d0abe1845	Support optimal partitioning for GPU hist. (#7652 ) * Implement `MaxCategory` in quantile. * Implement partition-based split for GPU evaluation. Currently, it's based on the existing evaluation function. * Extract an evaluator from GPU Hist to store the needed states. * Added some CUDA stream/event utilities. * Update document with references. * Fixed a bug in approx evaluator where the number of data points is less than the number of categories.	2022-02-15 03:03:12 +08:00
Jiaming Yuan	5cd1f71b51	[dask] Improve configuration for port. (#7645 ) - Try port 0 to let the OS return the available port. - Add port configuration.	2022-02-14 21:34:34 +08:00
Jiaming Yuan	b52c4e13b0	[dask] Fix empty partition with pandas input. (#7644 ) Empty partition is different from empty dataset. For the former case, each worker has non-empty dask collections, but each collection might contain empty partition.	2022-02-14 19:35:51 +08:00
Jiaming Yuan	fe4ce920b2	[dask] Cleanup dask module. (#7634 ) * Add a new utility for mapping function onto workers. * Unify the type for feature names. * Clean up the iterator. * Fix prediction with DaskDMatrix worker specification. * Fix base margin with DeviceQuantileDMatrix. * Support vs 2022 in setup.py.	2022-02-08 20:41:46 +08:00
Jiaming Yuan	926af9951e	Add missing train parameter for sklearn interface. (#7629 ) Some other parameters are still missing and rely on **kwargs, for instance parameters from dart.	2022-02-08 13:20:19 +08:00
Jiaming Yuan	3e693e4f97	[dask] Fix nthread config with dask sklearn wrapper. (#7633 )	2022-02-08 06:38:32 +08:00
Philip Hyunsu Cho	f6e6d0b2c0	[CI] Build Python wheels for MacOS (x86_64 and arm64) (#7621 ) * Build Python wheels for OSX (x86_64 and arm64) * Use Conda's libomp when running Python tests * fix * Add comment to explain CIBW_TARGET_OSX_ARM64 * Update release script * Add comments in build_python_wheels.sh * Document wheel pipeline	2022-02-02 17:35:48 -08:00
Philip Hyunsu Cho	b4340abf56	Add special handling for multi:softmax in sklearn predict (#7607 ) * Add special handling for multi:softmax in sklearn predict * Add test coverage	2022-01-29 15:54:49 -08:00
Jiaming Yuan	24789429fd	Support latest pandas Index type. (#7595 )	2022-01-26 18:20:10 +08:00
Jiaming Yuan	f84291c1e1	Fix `max_cat_to_onehot` doc annotation [skip ci] (#7592 )	2022-01-23 16:33:23 +08:00
Jiaming Yuan	ef4dae4c0e	[dask] Add scheduler address to dask config. (#7581 ) - Add user configuration. - Bring back to the logic of using scheduler address from dask. This was removed when we were trying to support GKE, now we bring it back and let xgboost try it if direct guess or host IP from user config failed.	2022-01-22 01:56:32 +08:00
Jiaming Yuan	b4ec1682c6	Update document for multi output and categorical. (#7574 ) * Group together categorical related parameters. * Update documents about multioutput and categorical.	2022-01-19 04:35:17 +08:00
Jiaming Yuan	dac9eb13bd	Implement new `save_raw` in Python. (#7572 ) * Expose the new C API function to Python. * Remove old document and helper script. * Small optimization to the `save_raw` and Json ctors.	2022-01-19 02:27:51 +08:00
Jiaming Yuan	13b0fa4b97	Implement `get_group`. (#7564 )	2022-01-16 02:07:42 +08:00
Jiaming Yuan	52277cc3da	Rename build info function to be consistent with rest of the API. (#7553 )	2022-01-14 00:39:28 +08:00
Jiaming Yuan	001503186c	Rewrite approx (#7214 ) This PR rewrites the approx tree method to use codebase from hist for better performance and code sharing. The rewrite has many benefits: - Support for both `max_leaves` and `max_depth`. - Support for `grow_policy`. - Support for mono constraint. - Support for feature weights. - Support for easier bin configuration (`max_bin`). - Support for categorical data. - Faster performance for most of the datasets. (many times faster) - Support for prediction cache. - Significantly better performance for external memory. - Unites the code base between approx and hist.	2022-01-10 21:15:05 +08:00
Jiaming Yuan	54582f641a	[doc] Use cross references in sphinx doc. (#7522 ) * Use cross references instead of URL. * Fix auto doc for callback.	2022-01-05 03:21:25 +08:00
Jiaming Yuan	eb1efb54b5	Define `feature_names_in_`. (#7526 ) * Define `feature_names_in_`. * Raise attribute error if it's not defined.	2022-01-05 01:35:34 +08:00
Jiaming Yuan	8f0a42a266	Initial support for multi-label classification. (#7521 ) * Add support in sklearn classifier.	2022-01-04 23:58:21 +08:00
Jiaming Yuan	58a6723eb1	Initial support for multioutput regression. (#7514 ) * Add num target model parameter, which is configured from input labels. * Change elementwise metric and indexing for weights. * Add demo. * Add tests.	2021-12-18 09:28:38 +08:00
Jiaming Yuan	6f8a4633b7	Fix Python typehint with upgraded mypy. (#7513 )	2021-12-16 23:08:08 +08:00
Jiaming Yuan	70b12d898a	[dask] Fix ddqdm with empty partition. (#7510 ) * Fix empty partition. * war.	2021-12-16 20:37:29 +08:00
Jiaming Yuan	05497a9141	[dask] Fix asyncio. (#7508 )	2021-12-13 01:48:25 +08:00
Jiaming Yuan	021f8bf28b	Fix pylint. (#7498 )	2021-12-07 13:23:30 +08:00
Jiaming Yuan	b124a27f57	Support scipy sparse in dask. (#7457 )	2021-11-23 16:45:36 +08:00
Harvey	0552ca8021	Fix typo (#7469 )	2021-11-23 08:58:45 +08:00
Jiaming Yuan	d33854af1b	[Breaking] Accept multi-dim meta info. (#7405 ) This PR changes base_margin into a 3-dim array, with one of them being reserved for multi-target classification. Also, a breaking change is made for binary serialization due to extra dimension along with a fix for saving the feature weights. Lastly, it unifies the prediction initialization between CPU and GPU. After this PR, the meta info setter in Python will be based on array interface.	2021-11-18 23:02:54 +08:00
Jiaming Yuan	e27f543deb	Set use_logger in tracker to false. (#7438 )	2021-11-16 05:12:42 +08:00
Kian Meng Ang	d27a11ff87	Fix typos in python package (#7432 )	2021-11-14 17:20:19 +08:00
Jiaming Yuan	46726ec176	Expose build info (#7399 )	2021-11-12 18:22:46 +08:00
Jiaming Yuan	97d7582457	Delay breaking changes to 1.6. (#7420 ) The patch is too big to be backported.	2021-11-12 16:46:03 +08:00
Jiaming Yuan	c74df31bf9	Cleanup the `train` function. (#7377 ) * Move attribute setter to callback. * Remove the internal train function. * Remove unnecessary initialization.	2021-11-02 18:00:26 +08:00
Jiaming Yuan	154b15060e	Move callbacks from `fit` to `__init__`. (#7375 )	2021-11-02 17:51:42 +08:00
Jiaming Yuan	a13321148a	Support multi-class with base margin. (#7381 ) This is already partially supported but never properly tested. So the only possible way to use it is calling `numpy.ndarray.flatten` with `base_margin` before passing it into XGBoost. This PR adds proper support for most of the data types along with tests.	2021-11-02 13:38:00 +08:00
Jiaming Yuan	c6769488b3	Typehint for subset of core API. (#7348 )	2021-10-28 20:47:04 +08:00
Jiaming Yuan	45aef75cca	Move skl `eval_metric` and `early_stopping rounds` to model params. (#6751 ) A new parameter `custom_metric` is added to `train` and `cv` to distinguish the behaviour from the old `feval`. And `feval` is deprecated. The new `custom_metric` receives transformed prediction when the built-in objective is used. This enables XGBoost to use cost functions from other libraries like scikit-learn directly without going through the definition of the link function. `eval_metric` and `early_stopping_rounds` in sklearn interface are moved from `fit` to `__init__` and is now saved as part of the scikit-learn model. The old ones in `fit` function are now deprecated. The new `eval_metric` in `__init__` has the same new behaviour as `custom_metric`. Added more detailed documents for the behaviour of custom objective and metric.	2021-10-28 17:20:20 +08:00
Jiaming Yuan	6b074add66	Update setup.py. (#7360 ) * Add new classifiers. * Typehint.	2021-10-28 14:58:31 +08:00
Jiaming Yuan	3c4aa9b2ea	[breaking] Remove label encoder deprecated in 1.3. (#7357 )	2021-10-28 13:24:29 +08:00
Jiaming Yuan	ac9bfaa4f2	Handle missing values in dataframe with category dtype. (#7331 ) * Replace -1 in pandas initializer. * Unify `IsValid` functor. * Mimic pandas data handling in cuDF glue code. * Check invalid categories. * Fix DDM sketching.	2021-10-28 03:33:54 +08:00
Jiaming Yuan	f999897615	[dask] Use nthread in DMatrix construction. (#7337 ) This is consistent with the thread overriding behavior.	2021-10-20 15:16:40 +08:00
Jiaming Yuan	376b448015	[doc] Fix broken links. (#7341 ) * Fix most of the link checks from sphinx. * Remove duplicate explicit target name.	2021-10-20 14:45:30 +08:00
Jiaming Yuan	f53da412aa	Add typehint to tracker. (#7338 )	2021-10-20 12:49:36 +08:00
Jiaming Yuan	c42e3fbcf3	[doc] Fix early stopping document. (#7334 )	2021-10-18 11:21:16 -07:00
Jiaming Yuan	f56e2e9a66	Support categorical data with pandas Dataframe in inplace prediction (#7322 )	2021-10-17 14:32:06 +08:00
Jiaming Yuan	5b17bb0031	Fix prediction with cat data in sklearn interface. (#7306 ) * Specify DMatrix parameter for pre-processing dataframe. * Add document about the behaviour of prediction.	2021-10-12 14:31:12 +08:00
Jiaming Yuan	69d3b1b8b4	Remove old callback deprecated in 1.3. (#7280 )	2021-10-08 17:24:59 +08:00

... 3 4 5 6 7 ...

856 Commits