116 Commits

Author SHA1 Message Date
Jiaming Yuan
317d7be6ee
Always use partition based categorical splits. (#7857) 2022-05-03 22:30:32 +08:00
Jiaming Yuan
52d4eda786
Deprecate use_label_encoder in XGBClassifier. (#7822)
* Deprecate `use_label_encoder` in XGBClassifier.

* We have removed the encoder, now prepare to remove the indicator.
2022-04-21 13:14:02 +08:00
Bobby Wang
89d6419fd5
[jvm-packages] add doc for xgboost4j-spark-gpu (#7779)
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2022-04-07 11:35:01 +08:00
giuliohome
c467e90ac1
[doc] Update doc for Kubernetes Operator (#7777) 2022-03-31 23:10:49 +08:00
Jiaming Yuan
a50b84244e
Cleanup configuration for constraints. (#7758) 2022-03-29 04:22:46 +08:00
Jiaming Yuan
4d81c741e9
External memory support for hist (#7531)
* Generate column matrix from gHistIndex.
* Avoid synchronization with the sparse page once the cache is written.
* Cleanups: Remove member variables/functions, change the update routine to look like approx and gpu_hist.
* Remove pruner.
2022-03-22 00:13:20 +08:00
Jiaming Yuan
18a4af63aa
Update documents and tests. (#7659)
* Revise documents after recent refactoring and cat support.
* Add tests for behavior of max_depth and max_leaves.
2022-02-26 03:57:47 +08:00
Jiaming Yuan
83a66b4994
Support categorical data for hist. (#7695)
* Extract partitioner from hist.
* Implement categorical data support by passing the gradient index directly into the partitioner.
* Organize/update document.
* Remove code for negative hessian.
2022-02-25 03:47:14 +08:00
Jiaming Yuan
93eebe8664
[doc] Fix broken link. [skip ci] (#7655) 2022-02-15 14:07:34 +08:00
Jiaming Yuan
0d0abe1845
Support optimal partitioning for GPU hist. (#7652)
* Implement `MaxCategory` in quantile.
* Implement partition-based split for GPU evaluation.  Currently, it's based on the existing evaluation function.
* Extract an evaluator from GPU Hist to store the needed states.
* Added some CUDA stream/event utilities.
* Update document with references.
* Fixed a bug in approx evaluator where the number of data points is less than the number of categories.
2022-02-15 03:03:12 +08:00
Jiaming Yuan
5cd1f71b51
[dask] Improve configuration for port. (#7645)
- Try port 0 to let the OS return the available port.
- Add port configuration.
2022-02-14 21:34:34 +08:00
Jiaming Yuan
ef4dae4c0e
[dask] Add scheduler address to dask config. (#7581)
- Add user configuration.
- Bring back to the logic of using scheduler address from dask.  This was removed when we were trying to support GKE, now we bring it back and let xgboost try it if direct guess or host IP from user config failed.
2022-01-22 01:56:32 +08:00
Jiaming Yuan
b4ec1682c6
Update document for multi output and categorical. (#7574)
* Group together categorical related parameters.
* Update documents about multioutput and categorical.
2022-01-19 04:35:17 +08:00
Jiaming Yuan
dac9eb13bd
Implement new save_raw in Python. (#7572)
* Expose the new C API function to Python.
* Remove old document and helper script.
* Small optimization to the `save_raw` and Json ctors.
2022-01-19 02:27:51 +08:00
Jiaming Yuan
deab0e32ba
Validate out of range categorical value. (#7576)
* Use float in CPU categorical set to preserve the input value.
* Check out of range values.
2022-01-18 20:16:19 +08:00
Jiaming Yuan
e5e47c3c99
Clarify the behavior of invalid categorical value handling. (#7529) 2022-01-13 16:11:52 +08:00
Jiaming Yuan
ec56d5869b
[doc] Include dask examples into doc. (#7530) 2022-01-05 03:27:22 +08:00
Jiaming Yuan
54582f641a
[doc] Use cross references in sphinx doc. (#7522)
* Use cross references instead of URL.
* Fix auto doc for callback.
2022-01-05 03:21:25 +08:00
Jiaming Yuan
8f0a42a266
Initial support for multi-label classification. (#7521)
* Add support in sklearn classifier.
2022-01-04 23:58:21 +08:00
Randall Britten
a4a0ebb85d
[doc] Lowercase omega for per tree complexity (#7532)
As suggested on issue #7480
2021-12-29 23:05:54 +08:00
Jiaming Yuan
a512b4b394
[doc] Promote dask from experimental. [skip ci] (#7509) 2021-12-16 14:17:06 +08:00
Jiaming Yuan
c024c42dce
Modernize XGBoost Python document. (#7468)
* Use sphinx gallery to integrate examples.
* Remove mock objects.
* Add dask doc inventory.
2021-11-23 23:24:52 +08:00
Jiaming Yuan
97d7582457
Delay breaking changes to 1.6. (#7420)
The patch is too big to be backported.
2021-11-12 16:46:03 +08:00
Jiaming Yuan
45aef75cca
Move skl eval_metric and early_stopping rounds to model params. (#6751)
A new parameter `custom_metric` is added to `train` and `cv` to distinguish the behaviour from the old `feval`.  And `feval` is deprecated.  The new `custom_metric` receives transformed prediction when the built-in objective is used.  This enables XGBoost to use cost functions from other libraries like scikit-learn directly without going through the definition of the link function.

`eval_metric` and `early_stopping_rounds` in sklearn interface are moved from `fit` to `__init__` and is now saved as part of the scikit-learn model.  The old ones in `fit` function are now deprecated. The new `eval_metric` in `__init__` has the same new behaviour as `custom_metric`.

Added more detailed documents for the behaviour of custom objective and metric.
2021-10-28 17:20:20 +08:00
Jiaming Yuan
15685996fc
[doc] Small improvements for categorical data document. (#7330) 2021-10-20 18:04:32 +08:00
Jiaming Yuan
376b448015
[doc] Fix broken links. (#7341)
* Fix most of the link checks from sphinx.
* Remove duplicate explicit target name.
2021-10-20 14:45:30 +08:00
Jiaming Yuan
406c70ba0e
[doc] Fix typo. [skip ci] (#7311) 2021-10-12 19:10:18 +08:00
Jiaming Yuan
0bd8f21e4e
Add document for categorical data. (#7307) 2021-10-12 16:10:59 +08:00
Christian Lorentzen
a0dcf6f5c1
[DOC] Improve tutorial on feature interactions (#7219) 2021-09-12 21:40:02 +08:00
Jiaming Yuan
ba47eda61b
[doc] Use figure directive. (#7143) 2021-08-03 15:56:25 +08:00
Jiaming Yuan
e6088366df
Export Python Interface for external memory. (#7070)
* Add Python iterator interface.
* Add tests.
* Add demo.
* Add documents.
* Handle empty dataset.
2021-07-22 15:15:53 +08:00
ZabelTech
1d91f71119
fix typo in XGDMatrixSetFloatInfo example (#7097) 2021-07-10 21:40:25 +08:00
Jeff H
d22b293f2f
Update reference to treelite website (#7084)
treelite.io is no longer a valid site and re-directs users to a parked domain. Re-directing to the documentation is safer at this point.
2021-07-06 22:15:07 -07:00
Jiaming Yuan
cf06a266a8
[dask][doc] Wrap the example in main guard. (#6979) 2021-05-25 08:24:47 +08:00
Jiaming Yuan
5cb51a191e
[dask][doc] Add small example for sklearn interface. (#6970) 2021-05-19 13:50:45 +08:00
Andrew Ziem
3e7e426b36
Fix spelling in documents (#6948)
* Update roxygen2 doc.

Co-authored-by: fis <jm.yuan@outlook.com>
2021-05-11 20:44:36 +08:00
Kai Fricke
c8cc3eacc9
[docs] Add tutorial for XGBoost-Ray (#6884)
* Add XGBoost-Ray tutorial

* Add link to modin
2021-04-22 02:07:13 +08:00
Jiaming Yuan
a5d7094a45
Update documents. (#6856)
* Add early stopping section to prediction doc.
* Remove best_ntree_limit.
* Better doxygen output.
2021-04-16 12:41:03 +08:00
Jiaming Yuan
9d62b14591
Fix document. [skip ci] (#6669) 2021-02-02 20:43:31 +08:00
Jiaming Yuan
87ab1ad607
[dask] Accept Future of model for prediction. (#6650)
This PR changes predict and inplace_predict to accept a Future of model, to avoid sending models to workers repeatably.

* Document is updated to reflect functionality additions in recent changes.
2021-02-02 08:45:52 +08:00
Jiaming Yuan
d8ec7aad5a
[dask] Add a 1 line sample to infer output shape. (#6645)
* [dask] Use a 1 line sample to infer output shape.

This is for inferring shape with direct prediction (without DaskDMatrix).
There are a few things that requires known output shape before carrying out
actual prediction, including dask meta data, output dataframe columns.

* Infer output shape based on local prediction.
* Remove set param in predict function as it's not thread safe nor necessary as
we now let dask to decide the parallelism.
* Simplify prediction on `DaskDMatrix`.
2021-01-30 18:55:50 +08:00
Jiaming Yuan
4bf23c2391
Specify shape in prediction contrib and interaction. (#6614) 2021-01-26 02:08:22 +08:00
Jiaming Yuan
c5876277a8
Drop saving binary format for memory snapshot. (#6513) 2020-12-17 00:14:57 +08:00
James Lamb
1e2c3ade9e
[doc] [dask] Add example on early stopping with Dask (#6501)
Co-authored-by: fis <jm.yuan@outlook.com>
2020-12-15 22:23:23 +08:00
James Lamb
afc4567268
[doc] [dask] fix partitioning in Dask example (#6389) 2020-12-14 18:37:49 +08:00
Jiaming Yuan
a30461cf87
[dask] Support all parameters in regressor and classifier. (#6471)
* Add eval_metric.
* Add callback.
* Add feature weights.
* Add custom objective.
2020-12-14 07:35:56 +08:00
hzy001
c2ba4fb957
Fix broken links. (#6455)
Co-authored-by: Hao Ziyu <haoziyu@qiyi.com>
Co-authored-by: fis <jm.yuan@outlook.com>
2020-12-02 17:39:12 +08:00
Jiaming Yuan
00218d065a
[dask] Update document. [skip ci] (#6413) 2020-11-20 19:16:19 +08:00
James Lamb
12d27f43ff
[doc] make Dask distributed example copy-pastable (#6345) 2020-11-11 20:22:17 -08:00
Jean Lescut-Muller
9564886d9f
Update custom_metric_obj.rst (#6367) 2020-11-10 22:29:22 +08:00