Federated learning plugin for xgboost:
* A gRPC server to aggregate MPI-style requests (allgather, allreduce, broadcast) from federated workers.
* A Rabit engine for the federated environment.
* Integration test to simulate federated learning.
Additional followups are needed to address GPU support, better security, and privacy, etc.
* Use the name `Context`.
* Pass a context object into `SetInfo`.
* Add context to proxy matrix.
* Add context to iterative DMatrix.
This is to remove the use of the default number of threads during `SetInfo` as a follow-up on
removing the global omp variable while preparing for CUDA stream semantic. Currently, XGBoost
uses the legacy CUDA stream, we will gradually remove them in the future in favor of non-blocking streams.
* [R] Fix global feature importance.
* Add implementation for tree index. The parameter is not documented in C API since we
should work on porting the model slicing to R instead of supporting more use of tree
index.
* Fix the difference between "gain" and "total_gain".
* debug.
* Fix prediction.
- Reduce dependency on dmlc parsers and provide an interface for users to load data by themselves.
- Remove use of threaded iterator and IO queue.
- Remove `page_size`.
- Make sure the number of pages in memory is bounded.
- Make sure the cache can not be violated.
- Provide an interface for internal algorithms to process data asynchronously.
The role of ProxyDMatrix is going beyond what it was designed. Now it's used by both
QuantileDeviceDMatrix and inplace prediction. After the refactoring of sparse DMatrix it
will also be used for external memory. Renaming the C API to extract it from
QuantileDeviceDMatrix.
* Add feature score support for linear model.
* Port R interface to the new implementation.
* Add linear model support in Python.
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
* Change C API name.
* Test for all primitive types from array.
* Add native support for CPU 128 float.
* Convert boolean and float16 in Python.
* Fix dask version for now.
The guard protects the global variable from being changed by XGBoost. But this leads to a
bug that the `n_threads` parameter is no longer used after the first iteration. This is
due to the fact that `omp_set_num_threads` is only called once in `Learner::Configure` at
the beginning of the training process.
The guard is still useful for `gpu_id`, since this is called all the times in our codebase
doesn't matter which iteration we are currently running.
* Save feature info in booster in JSON model.
* [breaking] Remove automatic feature name generation in `DMatrix`.
This PR is to enable reliable feature validation in Python package.
* Add a new API function for predicting on `DMatrix`. This function aligns
with rest of the `XGBoosterPredictFrom*` functions on semantic of function
arguments.
* Purge `ntree_limit` from libxgboost, use iteration instead.
* [dask] Use `inplace_predict` by default for dask sklearn models.
* [dask] Run prediction shape inference on worker instead of client.
The breaking change is in the Python sklearn `apply` function, I made it to be
consistent with other prediction functions where `best_iteration` is used by
default.
* Accept array interface for csr and array.
* Accept an optional proxy dmatrix for metainfo.
This constructs an explicit `_ProxyDMatrix` type in Python.
* Remove unused doc.
* Add strict output.
* Initial support for distributed LTR using dask.
* Support `qid` in libxgboost.
* Refactor `predict` and `n_features_in_`, `best_[score/iteration/ntree_limit]`
to avoid duplicated code.
* Define `DaskXGBRanker`.
The dask ranker doesn't support group structure, instead it uses query id and
convert to group ptr internally.
* Implement early stopping with training continuation.
* Add new C API for obtaining boosted rounds.
* Fix off by 1 in `save_best`.
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
* Add management functions for global configuration: XGBSetGlobalConfig(), XGBGetGlobalConfig().
* Add Python interface: set_config(), get_config(), and config_context().
* Add unit tests for Python
* Add R interface: xgb.set.config(), xgb.get.config()
* Add unit tests for R
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone.
* Implement the save_best option in early stopping.
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
* Add thread local return entry for DMatrix.
* Save feature name and feature type in binary file.
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>