Commit Graph

171 Commits

Author SHA1 Message Date
WeichenXu
f23cc92130 [pyspark] User guide doc and tutorials (#8082)
Co-authored-by: Bobby Wang <wbo4958@gmail.com>
2022-07-19 22:25:14 +08:00
Jiaming Yuan
b90c6d25e8 Implement max_cat_threshold for CPU. (#7957) 2022-06-04 11:02:46 +08:00
Bobby Wang
5a7dc41351 [doc] update doc for dumping model to be json or ubj for jvm packages (#7955) 2022-05-31 14:43:13 +08:00
Philip Hyunsu Cho
6f424d8d6c [Doc] Warn against loading JSON from external source (#7918) 2022-05-18 17:02:36 -07:00
Jiaming Yuan
1b6538b4e5 [breaking] Drop single precision histogram (#7892)
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2022-05-13 19:54:55 +08:00
Jiaming Yuan
317d7be6ee Always use partition based categorical splits. (#7857) 2022-05-03 22:30:32 +08:00
Jiaming Yuan
52d4eda786 Deprecate use_label_encoder in XGBClassifier. (#7822)
* Deprecate `use_label_encoder` in XGBClassifier.

* We have removed the encoder, now prepare to remove the indicator.
2022-04-21 13:14:02 +08:00
Bobby Wang
89d6419fd5 [jvm-packages] add doc for xgboost4j-spark-gpu (#7779)
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2022-04-07 11:35:01 +08:00
giuliohome
c467e90ac1 [doc] Update doc for Kubernetes Operator (#7777) 2022-03-31 23:10:49 +08:00
Jiaming Yuan
a50b84244e Cleanup configuration for constraints. (#7758) 2022-03-29 04:22:46 +08:00
Jiaming Yuan
4d81c741e9 External memory support for hist (#7531)
* Generate column matrix from gHistIndex.
* Avoid synchronization with the sparse page once the cache is written.
* Cleanups: Remove member variables/functions, change the update routine to look like approx and gpu_hist.
* Remove pruner.
2022-03-22 00:13:20 +08:00
Jiaming Yuan
18a4af63aa Update documents and tests. (#7659)
* Revise documents after recent refactoring and cat support.
* Add tests for behavior of max_depth and max_leaves.
2022-02-26 03:57:47 +08:00
Jiaming Yuan
83a66b4994 Support categorical data for hist. (#7695)
* Extract partitioner from hist.
* Implement categorical data support by passing the gradient index directly into the partitioner.
* Organize/update document.
* Remove code for negative hessian.
2022-02-25 03:47:14 +08:00
Jiaming Yuan
93eebe8664 [doc] Fix broken link. [skip ci] (#7655) 2022-02-15 14:07:34 +08:00
Jiaming Yuan
0d0abe1845 Support optimal partitioning for GPU hist. (#7652)
* Implement `MaxCategory` in quantile.
* Implement partition-based split for GPU evaluation.  Currently, it's based on the existing evaluation function.
* Extract an evaluator from GPU Hist to store the needed states.
* Added some CUDA stream/event utilities.
* Update document with references.
* Fixed a bug in approx evaluator where the number of data points is less than the number of categories.
2022-02-15 03:03:12 +08:00
Jiaming Yuan
5cd1f71b51 [dask] Improve configuration for port. (#7645)
- Try port 0 to let the OS return the available port.
- Add port configuration.
2022-02-14 21:34:34 +08:00
Jiaming Yuan
ef4dae4c0e [dask] Add scheduler address to dask config. (#7581)
- Add user configuration.
- Bring back to the logic of using scheduler address from dask.  This was removed when we were trying to support GKE, now we bring it back and let xgboost try it if direct guess or host IP from user config failed.
2022-01-22 01:56:32 +08:00
Jiaming Yuan
b4ec1682c6 Update document for multi output and categorical. (#7574)
* Group together categorical related parameters.
* Update documents about multioutput and categorical.
2022-01-19 04:35:17 +08:00
Jiaming Yuan
dac9eb13bd Implement new save_raw in Python. (#7572)
* Expose the new C API function to Python.
* Remove old document and helper script.
* Small optimization to the `save_raw` and Json ctors.
2022-01-19 02:27:51 +08:00
Jiaming Yuan
deab0e32ba Validate out of range categorical value. (#7576)
* Use float in CPU categorical set to preserve the input value.
* Check out of range values.
2022-01-18 20:16:19 +08:00
Jiaming Yuan
e5e47c3c99 Clarify the behavior of invalid categorical value handling. (#7529) 2022-01-13 16:11:52 +08:00
Jiaming Yuan
ec56d5869b [doc] Include dask examples into doc. (#7530) 2022-01-05 03:27:22 +08:00
Jiaming Yuan
54582f641a [doc] Use cross references in sphinx doc. (#7522)
* Use cross references instead of URL.
* Fix auto doc for callback.
2022-01-05 03:21:25 +08:00
Jiaming Yuan
8f0a42a266 Initial support for multi-label classification. (#7521)
* Add support in sklearn classifier.
2022-01-04 23:58:21 +08:00
Randall Britten
a4a0ebb85d [doc] Lowercase omega for per tree complexity (#7532)
As suggested on issue #7480
2021-12-29 23:05:54 +08:00
Jiaming Yuan
a512b4b394 [doc] Promote dask from experimental. [skip ci] (#7509) 2021-12-16 14:17:06 +08:00
Jiaming Yuan
c024c42dce Modernize XGBoost Python document. (#7468)
* Use sphinx gallery to integrate examples.
* Remove mock objects.
* Add dask doc inventory.
2021-11-23 23:24:52 +08:00
Jiaming Yuan
97d7582457 Delay breaking changes to 1.6. (#7420)
The patch is too big to be backported.
2021-11-12 16:46:03 +08:00
Jiaming Yuan
45aef75cca Move skl eval_metric and early_stopping rounds to model params. (#6751)
A new parameter `custom_metric` is added to `train` and `cv` to distinguish the behaviour from the old `feval`.  And `feval` is deprecated.  The new `custom_metric` receives transformed prediction when the built-in objective is used.  This enables XGBoost to use cost functions from other libraries like scikit-learn directly without going through the definition of the link function.

`eval_metric` and `early_stopping_rounds` in sklearn interface are moved from `fit` to `__init__` and is now saved as part of the scikit-learn model.  The old ones in `fit` function are now deprecated. The new `eval_metric` in `__init__` has the same new behaviour as `custom_metric`.

Added more detailed documents for the behaviour of custom objective and metric.
2021-10-28 17:20:20 +08:00
Jiaming Yuan
15685996fc [doc] Small improvements for categorical data document. (#7330) 2021-10-20 18:04:32 +08:00
Jiaming Yuan
376b448015 [doc] Fix broken links. (#7341)
* Fix most of the link checks from sphinx.
* Remove duplicate explicit target name.
2021-10-20 14:45:30 +08:00
Jiaming Yuan
406c70ba0e [doc] Fix typo. [skip ci] (#7311) 2021-10-12 19:10:18 +08:00
Jiaming Yuan
0bd8f21e4e Add document for categorical data. (#7307) 2021-10-12 16:10:59 +08:00
Christian Lorentzen
a0dcf6f5c1 [DOC] Improve tutorial on feature interactions (#7219) 2021-09-12 21:40:02 +08:00
Jiaming Yuan
ba47eda61b [doc] Use figure directive. (#7143) 2021-08-03 15:56:25 +08:00
Jiaming Yuan
e6088366df Export Python Interface for external memory. (#7070)
* Add Python iterator interface.
* Add tests.
* Add demo.
* Add documents.
* Handle empty dataset.
2021-07-22 15:15:53 +08:00
ZabelTech
1d91f71119 fix typo in XGDMatrixSetFloatInfo example (#7097) 2021-07-10 21:40:25 +08:00
Jeff H
d22b293f2f Update reference to treelite website (#7084)
treelite.io is no longer a valid site and re-directs users to a parked domain. Re-directing to the documentation is safer at this point.
2021-07-06 22:15:07 -07:00
Jiaming Yuan
cf06a266a8 [dask][doc] Wrap the example in main guard. (#6979) 2021-05-25 08:24:47 +08:00
Jiaming Yuan
5cb51a191e [dask][doc] Add small example for sklearn interface. (#6970) 2021-05-19 13:50:45 +08:00
Andrew Ziem
3e7e426b36 Fix spelling in documents (#6948)
* Update roxygen2 doc.

Co-authored-by: fis <jm.yuan@outlook.com>
2021-05-11 20:44:36 +08:00
Kai Fricke
c8cc3eacc9 [docs] Add tutorial for XGBoost-Ray (#6884)
* Add XGBoost-Ray tutorial

* Add link to modin
2021-04-22 02:07:13 +08:00
Jiaming Yuan
a5d7094a45 Update documents. (#6856)
* Add early stopping section to prediction doc.
* Remove best_ntree_limit.
* Better doxygen output.
2021-04-16 12:41:03 +08:00
Jiaming Yuan
9d62b14591 Fix document. [skip ci] (#6669) 2021-02-02 20:43:31 +08:00
Jiaming Yuan
87ab1ad607 [dask] Accept Future of model for prediction. (#6650)
This PR changes predict and inplace_predict to accept a Future of model, to avoid sending models to workers repeatably.

* Document is updated to reflect functionality additions in recent changes.
2021-02-02 08:45:52 +08:00
Jiaming Yuan
d8ec7aad5a [dask] Add a 1 line sample to infer output shape. (#6645)
* [dask] Use a 1 line sample to infer output shape.

This is for inferring shape with direct prediction (without DaskDMatrix).
There are a few things that requires known output shape before carrying out
actual prediction, including dask meta data, output dataframe columns.

* Infer output shape based on local prediction.
* Remove set param in predict function as it's not thread safe nor necessary as
we now let dask to decide the parallelism.
* Simplify prediction on `DaskDMatrix`.
2021-01-30 18:55:50 +08:00
Jiaming Yuan
4bf23c2391 Specify shape in prediction contrib and interaction. (#6614) 2021-01-26 02:08:22 +08:00
Jiaming Yuan
c5876277a8 Drop saving binary format for memory snapshot. (#6513) 2020-12-17 00:14:57 +08:00
James Lamb
1e2c3ade9e [doc] [dask] Add example on early stopping with Dask (#6501)
Co-authored-by: fis <jm.yuan@outlook.com>
2020-12-15 22:23:23 +08:00
James Lamb
afc4567268 [doc] [dask] fix partitioning in Dask example (#6389) 2020-12-14 18:37:49 +08:00