Jiaming Yuan
c2b3a13e70
[breaking][skl] Remove parameter serialization. ( #8963 )
...
- Remove parameter serialization in the scikit-learn interface.
The scikit-lear interface `save_model` will save only the model and discard all
hyper-parameters. This is to align with the native XGBoost interface, which distinguishes
the hyper-parameter and model parameters.
With the scikit-learn interface, model parameters are attributes of the estimator. For
instance, `n_features_in_`, `n_classes_` are always accessible with
`estimator.n_features_in_` and `estimator.n_classes_`, but not with the
`estimator.get_params`.
- Define a `load_model` method for classifier to load its own attributes.
- Set n_estimators to None by default.
2023-03-27 21:34:10 +08:00
Jiaming Yuan
21a52c7f98
[doc] Add introduction and notes for the sklearn interface. ( #8948 )
2023-03-23 13:30:42 +08:00
Jiaming Yuan
bf88dadb61
[doc] Fix callback example. ( #8944 )
2023-03-23 03:27:04 +08:00
Jiaming Yuan
151882dd26
Initial support for multi-target tree. ( #8616 )
...
* Implement multi-target for hist.
- Add new hist tree builder.
- Move data fetchers for tests.
- Dispatch function calls in gbm base on the tree type.
2023-03-22 23:49:56 +08:00
Jiaming Yuan
5891f752c8
Rework the MAP metric. ( #8931 )
...
- The new implementation is more strict as only binary labels are accepted. The previous implementation converts values greater than 1 to 1.
- Deterministic GPU. (no atomic add).
- Fix top-k handling.
- Precise definition of MAP. (There are other variants on how to handle top-k).
- Refactor GPU ranking tests.
2023-03-22 17:45:20 +08:00
Jiaming Yuan
f186c87cf9
Check inf in data for all types of DMatrix. ( #8911 )
2023-03-15 11:24:35 +08:00
Jiaming Yuan
7eba285a1e
Support sklearn cross validation for ranker. ( #8859 )
...
* Support sklearn cross validation for ranker.
- Add a convention for X to include a special `qid` column.
sklearn utilities consider only `X`, `y` and `sample_weight` for supervised learning
algorithms, but we need an additional qid array for ranking.
It's important to be able to support the cross validation function in sklearn since all
other tuning functions like grid search are based on cross validation.
2023-03-07 00:22:08 +08:00
Jiaming Yuan
6a892ce281
Specify src path for isort. ( #8867 )
2023-03-06 17:30:27 +08:00
mzzhang95
6cef9a08e9
[pyspark] Update eval_metric validation to support list of strings ( #8826 )
2023-03-02 08:24:12 +08:00
Jiaming Yuan
cce4af4acf
Initial support for quantile loss. ( #8750 )
...
- Add support for Python.
- Add objective.
2023-02-16 02:30:18 +08:00
WeichenXu
f27a7258c6
Fix feature types param ( #8772 )
...
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
2023-02-14 02:16:42 +08:00
Jiaming Yuan
457f704e3d
Add quantile metric. ( #8761 )
2023-02-13 19:07:40 +08:00
Jiaming Yuan
225b3158f6
Support custom metric in sklearn ranker. ( #8786 )
2023-02-12 13:14:07 +08:00
Jiaming Yuan
8a16944664
Fix ranking with quantile dmatrix and group weight. ( #8762 )
2023-02-10 20:32:35 +08:00
Jiaming Yuan
c4802bfcd0
Cleanup booster param types. ( #8756 )
2023-02-07 15:52:19 +08:00
Jiaming Yuan
0f37a01dd9
Require black formatter for the python package. ( #8748 )
2023-02-07 01:53:33 +08:00
Jiaming Yuan
a2e433a089
Fix empty DMatrix with categorical features. ( #8739 )
2023-02-07 00:40:11 +08:00
Jiaming Yuan
c1786849e3
Use array interface for CSC matrix. ( #8672 )
...
* Use array interface for CSC matrix.
Use array interface for CSC matrix and align the interface with CSR and dense.
- Fix nthread issue in the R package DMatrix.
- Unify the behavior of handling `missing` with other inputs.
- Unify the behavior of handling `missing` around R, Python, Java, and Scala DMatrix.
- Expose `num_non_missing` to the JVM interface.
- Deprecate old CSR and CSC constructors.
2023-02-05 01:59:46 +08:00
BenEfrati
213b5602d9
Add sample_weight to eval_metric ( #8706 )
2023-02-05 00:06:38 +08:00
Jiaming Yuan
0e61ba57d6
Fix GPU L1 error. ( #8749 )
2023-02-04 03:02:00 +08:00
Jiaming Yuan
1325ba9251
Support primitive types of pyarrow-backed pandas dataframe. ( #8653 )
...
Categorical data (dictionary) is not supported at the moment.
2023-01-30 17:53:29 +08:00
Jiaming Yuan
9fb12b20a4
Cleanup the callback module. ( #8702 )
...
- Cleanup pylint markers.
- run formatter.
- Update examples of using callback.
2023-01-22 00:13:49 +08:00
James Lamb
6933240837
[python-package] remove unused functions in xgboost.data ( #8695 )
2023-01-19 08:02:54 +08:00
Jiaming Yuan
31b9cbab3d
Make sure input numpy array is aligned. ( #8690 )
...
- use `np.require` to specify that the alignment is required.
- scipy csr as well.
- validate input pointer in `ArrayInterface`.
2023-01-18 08:12:13 +08:00
Jiaming Yuan
175986b739
[doc] Add missing document for pyspark ranker. [skip ci] ( #8692 )
2023-01-18 07:52:18 +08:00
Jiaming Yuan
247946a875
Cache transformed in QuantileDMatrix for efficiency. ( #8666 )
2023-01-17 06:02:40 +08:00
Jiaming Yuan
d6018eb4b9
Remove all use of DeviceQuantileDMatrix. ( #8665 )
2023-01-17 00:04:10 +08:00
Bobby Wang
72ec0c5484
[pyspark] support pred_contribs ( #8633 )
2023-01-11 16:51:12 +08:00
Jiaming Yuan
cfa994d57f
Multi-target support for L1 error. ( #8652 )
...
- Add matrix support to the median function.
- Iterate through each target for quantile computation.
2023-01-11 05:51:14 +08:00
Jiaming Yuan
badeff1d74
Init estimation for regression. ( #8272 )
2023-01-11 02:04:56 +08:00
Jiaming Yuan
1b58d81315
[doc] Document Python inputs. ( #8643 )
2023-01-10 15:39:32 +08:00
Jiaming Yuan
e68a152d9e
Do not return internal value for get_params. ( #8634 )
2023-01-05 17:48:26 +08:00
Bobby Wang
d3ad0524e7
[pyspark] Re-work _fit function ( #8630 )
2023-01-04 18:21:57 +08:00
Rong Ou
3ceeb8c61c
Add data split mode to DMatrix MetaInfo ( #8568 )
2022-12-25 20:37:37 +08:00
Rong Ou
77b069c25d
Support bitwise allreduce operations in the communicator ( #8623 )
2022-12-25 06:40:05 +08:00
Jiaming Yuan
c430ae52f3
Fix mypy errors with the latest numpy. ( #8617 )
2022-12-21 01:42:05 -08:00
Jiaming Yuan
f6effa1734
Support Series and Python primitives in inplace_predict and QDM ( #8547 )
2022-12-17 00:15:15 +08:00
Jiaming Yuan
001e663d42
Set enable_categorical to True in predict. ( #8592 )
2022-12-15 05:27:06 +08:00
James Lamb
06ea6c7e79
[python] remove unnecessary conversions between data structures ( #8546 )
...
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2022-12-14 18:32:02 +08:00
Jiaming Yuan
40343c8ee1
Test dask demos. ( #8557 )
...
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2022-12-13 18:37:31 +08:00
Jiaming Yuan
deb3edf562
Support list and tuple for QDM. ( #8542 )
2022-12-10 01:14:44 +08:00
Bobby Wang
40a1a2ffa8
[pyspark] check use_qdm across all the workers ( #8496 )
2022-12-08 18:09:17 +08:00
Gianfrancesco Angelini
5540019373
feat(py, plot_importance): + values_format as arg ( #8540 )
2022-12-08 00:47:28 +08:00
Matthew Rocklin
b7ffdcdbb9
Properly await async method client.wait_for_workers ( #8558 )
...
* Properly await async method client.wait_for_workers
* ignore mypy error.
Co-authored-by: jiamingy <jm.yuan@outlook.com>
2022-12-07 21:49:30 +08:00
Bobby Wang
8e41ad24f5
[pyspark] sort qid for SparkRanker ( #8497 )
...
* [pyspark] sort qid for SparkRandker
* resolve comments
2022-12-01 16:40:35 -08:00
Jiaming Yuan
d666ba775e
Support all pandas nullable integer types. ( #8480 )
...
- Enumerate all pandas integer types.
- Tests for `None`, `nan`, and `pd.NA`
2022-11-28 22:38:16 +08:00
Jiaming Yuan
f2209c1fe4
Don't shuffle columns in categorical tests. ( #8446 )
2022-11-28 20:28:06 +08:00
WeichenXu
67ea1c3435
[pyspark] Make QDM optional based on cuDF check ( #8471 )
2022-11-27 14:58:54 +08:00
Jiaming Yuan
8f97c92541
Support half type for pandas. ( #8481 )
2022-11-24 12:47:40 +08:00
Jiaming Yuan
e07245f110
Take datatable as row major input. ( #8472 )
...
* Take datatable as row major input.
Try to avoid a transform with dense table.
2022-11-24 09:20:13 +08:00