xgboost

Author	SHA1	Message	Date
Jiaming Yuan	a13321148a	Support multi-class with base margin. (#7381 ) This is already partially supported but never properly tested. So the only possible way to use it is calling `numpy.ndarray.flatten` with `base_margin` before passing it into XGBoost. This PR adds proper support for most of the data types along with tests.	2021-11-02 13:38:00 +08:00
Jiaming Yuan	c6769488b3	Typehint for subset of core API. (#7348 )	2021-10-28 20:47:04 +08:00
Jiaming Yuan	45aef75cca	Move skl `eval_metric` and `early_stopping rounds` to model params. (#6751 ) A new parameter `custom_metric` is added to `train` and `cv` to distinguish the behaviour from the old `feval`. And `feval` is deprecated. The new `custom_metric` receives transformed prediction when the built-in objective is used. This enables XGBoost to use cost functions from other libraries like scikit-learn directly without going through the definition of the link function. `eval_metric` and `early_stopping_rounds` in sklearn interface are moved from `fit` to `__init__` and is now saved as part of the scikit-learn model. The old ones in `fit` function are now deprecated. The new `eval_metric` in `__init__` has the same new behaviour as `custom_metric`. Added more detailed documents for the behaviour of custom objective and metric.	2021-10-28 17:20:20 +08:00
Jiaming Yuan	ac9bfaa4f2	Handle missing values in dataframe with category dtype. (#7331 ) * Replace -1 in pandas initializer. * Unify `IsValid` functor. * Mimic pandas data handling in cuDF glue code. * Check invalid categories. * Fix DDM sketching.	2021-10-28 03:33:54 +08:00
Jiaming Yuan	376b448015	[doc] Fix broken links. (#7341 ) * Fix most of the link checks from sphinx. * Remove duplicate explicit target name.	2021-10-20 14:45:30 +08:00
Jiaming Yuan	f56e2e9a66	Support categorical data with pandas Dataframe in inplace prediction (#7322 )	2021-10-17 14:32:06 +08:00
Jiaming Yuan	69d3b1b8b4	Remove old callback deprecated in 1.3. (#7280 )	2021-10-08 17:24:59 +08:00
Jiaming Yuan	e48e05e6e2	Add typehint to rabit module. (#7240 )	2021-09-17 18:31:02 +08:00
Jiaming Yuan	b18f5f61b0	Fix pylint (#7241 )	2021-09-17 11:50:36 +08:00
Jiaming Yuan	0ed979b096	Support more input types for categorical data. (#7220 ) * Support more input types for categorical data. * Shorten the type name from "categorical" to "c". * Tests for np/cp array and scipy csr/csc/coo. * Specify the type for feature info.	2021-09-16 20:39:30 +08:00
Jiaming Yuan	ee8d1f5ed8	Fix histogram truncation. (#7181 ) * Fix truncation. * Lint. * lint.	2021-08-24 18:34:32 -07:00
Jiaming Yuan	8a84be37b8	Pass scikit learn estimator checks for regressor. (#7130 ) * Check data shape. * Check labels.	2021-08-03 18:58:20 +08:00
Jiaming Yuan	778135f657	Fix parameter loading with training continuation. (#7121 ) * Add a demo for training continuation.	2021-07-23 10:51:47 +08:00
Jiaming Yuan	e6088366df	Export Python Interface for external memory. (#7070 ) * Add Python iterator interface. * Add tests. * Add demo. * Add documents. * Handle empty dataset.	2021-07-22 15:15:53 +08:00
Jiaming Yuan	5d7cdf2e36	[Breaking] Rename Quantile DMatrix C API. (#7082 ) The role of ProxyDMatrix is going beyond what it was designed. Now it's used by both QuantileDeviceDMatrix and inplace prediction. After the refactoring of sparse DMatrix it will also be used for external memory. Renaming the C API to extract it from QuantileDeviceDMatrix.	2021-07-08 11:34:14 +08:00
Jiaming Yuan	a5d222fcdb	Handle categorical split in model histogram and dataframe. (#7065 ) * Error on get_split_value_histogram when feature is categorical * Add a category column to output dataframe	2021-07-02 13:10:36 +08:00
Jiaming Yuan	663136aa08	Implement feature score for linear model. (#7048 ) * Add feature score support for linear model. * Port R interface to the new implementation. * Add linear model support in Python. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2021-06-25 14:34:02 +08:00
Jiaming Yuan	da1ad798ca	Convert numpy float to Python float in feat score. (#7047 )	2021-06-21 20:58:43 +08:00
Jiaming Yuan	86715e4cd4	Support categorical data for dask functional interface and DQM. (#7043 ) * Support categorical data for dask functional interface and DQM. * Implement categorical data support for GPU GK-merge. * Add support for dask functional interface. * Add support for DQM. * Get newer cupy.	2021-06-18 13:06:52 +08:00
Jiaming Yuan	7dd29ffd47	Implement feature score in GBTree. (#7041 ) * Categorical data support. * Eliminate text parsing during feature score computation.	2021-06-18 11:53:16 +08:00
Jiaming Yuan	d9799b09d0	Categorical data support for cuDF. (#7042 ) * Add support in DMatrix. * Add support in DQM, except for iterator.	2021-06-17 13:54:33 +08:00
Jiaming Yuan	b56614e9b8	[R] Use new predict function. (#6819 ) * Call new C prediction API. * Add `strict_shape`. * Add `iterationrange`. * Update document.	2021-06-11 13:03:29 +08:00
Jiaming Yuan	ee4f51a631	Support for all primitive types from array. (#7003 ) * Change C API name. * Test for all primitive types from array. * Add native support for CPU 128 float. * Convert boolean and float16 in Python. * Fix dask version for now.	2021-06-01 08:34:48 +08:00
vslaykovsky	2a9979e256	Fixed incorrect feature mismatch error message (#6949 ) data.shape[0] denotes the number of samples, data.shape[1] is the number of features	2021-05-11 13:52:11 +08:00
Jiaming Yuan	37ad60fe25	Enforce input data is not `object`. (#6927 ) * Check for object data type. * Allow strided arrays with greater underlying buffer size.	2021-05-02 00:09:01 +08:00
Jiaming Yuan	ef473b1f09	Disable pylint error. (#6911 )	2021-04-29 01:01:37 +08:00
Jiaming Yuan	a2ecbdaa31	Add an API guard to prevent global variables being changed. (#6891 )	2021-04-23 10:27:57 +08:00
Jiaming Yuan	a5d7094a45	Update documents. (#6856 ) * Add early stopping section to prediction doc. * Remove best_ntree_limit. * Better doxygen output.	2021-04-16 12:41:03 +08:00
Jiaming Yuan	dee5ef2dfd	Typehint for Sklearn. (#6799 )	2021-04-14 06:55:21 +08:00
giladmaya	aa0d8f20c1	Support configuring constraints by feature names (#6783 ) Co-authored-by: fis <jm.yuan@outlook.com>	2021-04-04 06:53:33 +08:00
Jiaming Yuan	7e06c81894	Fix approximated predict contribution. (#6811 )	2021-04-03 02:15:03 +08:00
Jiaming Yuan	0cced530ea	[doc] Clarify prediction function. (#6813 )	2021-04-03 02:12:04 +08:00
Jiaming Yuan	a5c852660b	Update document for sklearn model IO. (#6809 ) * Update the use of JSON. * Remove unnecessary type cast.	2021-04-01 15:52:36 +08:00
Jiaming Yuan	9da2287ab8	[breaking] Save booster feature info in JSON, remove feature name generation. (#6605 ) * Save feature info in booster in JSON model. * [breaking] Remove automatic feature name generation in `DMatrix`. This PR is to enable reliable feature validation in Python package.	2021-02-25 18:54:16 +08:00
Jiaming Yuan	c375173dca	Support pylint 2.7.0 (#6726 )	2021-02-25 12:49:58 +08:00
Jiaming Yuan	bdedaab8d1	Fix pylint. (#6714 )	2021-02-19 11:53:27 +08:00
Roffild	4c5d2608e0	[python-package] Fix class Booster: feature_types = None (#6705 )	2021-02-13 17:50:23 +08:00
Jiaming Yuan	4656b09d5d	[breaking] Add prediction fucntion for DMatrix and use inplace predict for dask. (#6668 ) * Add a new API function for predicting on `DMatrix`. This function aligns with rest of the `XGBoosterPredictFrom` functions on semantic of function arguments. Purge `ntree_limit` from libxgboost, use iteration instead. * [dask] Use `inplace_predict` by default for dask sklearn models. * [dask] Run prediction shape inference on worker instead of client. The breaking change is in the Python sklearn `apply` function, I made it to be consistent with other prediction functions where `best_iteration` is used by default.	2021-02-08 18:26:32 +08:00
Jiaming Yuan	411592a347	Enhance inplace prediction. (#6653 ) * Accept array interface for csr and array. * Accept an optional proxy dmatrix for metainfo. This constructs an explicit `_ProxyDMatrix` type in Python. * Remove unused doc. * Add strict output.	2021-02-02 11:41:46 +08:00
Jiaming Yuan	4bf23c2391	Specify shape in prediction contrib and interaction. (#6614 )	2021-01-26 02:08:22 +08:00
Jiaming Yuan	8942c98054	Define metainfo and other parameters for all DMatrix interfaces. (#6601 ) This PR ensures all DMatrix types have a common interface. * Fix logic in avoiding duplicated DMatrix in sklearn. * Check for consistency between DMatrix types. * Add doc for bounds.	2021-01-25 16:06:06 +08:00
Jiaming Yuan	d356b7a071	Restore unknown data support. (#6595 )	2021-01-14 04:51:16 +08:00
Jiaming Yuan	0027220aa0	[breaking] Remove duplicated predict functions, Fix attributes IO. (#6593 ) * Fix attributes not being restored. * Rename all `data` to `X`. [breaking]	2021-01-13 16:56:49 +08:00
Jiaming Yuan	80065d571e	[dask] Add DaskXGBRanker (#6576 ) * Initial support for distributed LTR using dask. * Support `qid` in libxgboost. * Refactor `predict` and `n_features_in_`, `best_[score/iteration/ntree_limit]` to avoid duplicated code. * Define `DaskXGBRanker`. The dask ranker doesn't support group structure, instead it uses query id and convert to group ptr internally.	2021-01-08 18:35:09 +08:00
Jiaming Yuan	de8fd852a5	[dask] Add type hints. (#6519 ) * Add validate_features. * Show type hints in doc. Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2020-12-29 19:41:02 +08:00
Philip Hyunsu Cho	fbb980d9d3	Expand `~` into the home directory on Linux and MacOS (#6531 )	2020-12-19 23:35:13 -08:00
Jiaming Yuan	ca3da55de4	Support early stopping with training continuation, correct num boosted rounds. (#6506 ) * Implement early stopping with training continuation. * Add new C API for obtaining boosted rounds. * Fix off by 1 in `save_best`. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2020-12-17 19:59:19 +08:00
Jiaming Yuan	0e97d97d50	Fix merge conflict. (#6512 )	2020-12-16 18:02:25 +08:00
Jiaming Yuan	347f593169	Accept numpy array for DMatrix slice index. (#6368 )	2020-12-16 14:42:52 +08:00
Jiaming Yuan	ef4a0e0aac	Fix DMatrix feature names/types IO. (#6507 ) * Fix feature names/types IO Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2020-12-16 14:24:27 +08:00

1 2 3 4 5 ...

310 Commits