Support adaptive tree, a feature supported by both sklearn and lightgbm. The tree leaf is recomputed based on residue of labels and predictions after construction.
For l1 error, the optimal value is the median (50 percentile).
This is marked as experimental support for the following reasons:
- The value is not well defined for distributed training, where we might have empty leaves for local workers. Right now I just use the original leaf value for computing the average with other workers, which might cause significant errors.
- Some follow-ups are required, for exact, pruner, and optimization for quantile function. Also, we need to calculate the initial estimation.
* Use the name `Context`.
* Pass a context object into `SetInfo`.
* Add context to proxy matrix.
* Add context to iterative DMatrix.
This is to remove the use of the default number of threads during `SetInfo` as a follow-up on
removing the global omp variable while preparing for CUDA stream semantic. Currently, XGBoost
uses the legacy CUDA stream, we will gradually remove them in the future in favor of non-blocking streams.
* Generate column matrix from gHistIndex.
* Avoid synchronization with the sparse page once the cache is written.
* Cleanups: Remove member variables/functions, change the update routine to look like approx and gpu_hist.
* Remove pruner.
* Implement `MaxCategory` in quantile.
* Implement partition-based split for GPU evaluation. Currently, it's based on the existing evaluation function.
* Extract an evaluator from GPU Hist to store the needed states.
* Added some CUDA stream/event utilities.
* Update document with references.
* Fixed a bug in approx evaluator where the number of data points is less than the number of categories.
This PR prepares the GHistIndexMatrix to host the column matrix which is used by the hist tree method by accepting sparse_threshold parameter.
Some cleanups are made to ensure the correct batch param is being passed into DMatrix along with some additional tests for correctness of SimpleDMatrix.
This is the one last PR for removing omp global variable.
* Add context object to the `DMatrix`. This bridges `DMatrix` with https://github.com/dmlc/xgboost/issues/7308 .
* Require context to be available at the construction time of booster.
* Add `n_threads` support for R csc DMatrix constructor.
* Remove `omp_get_max_threads` in R glue code.
* Remove threading utilities that rely on omp global variable.
* Implement ubjson.
This is a partial implementation of UBJSON with support for typed arrays. Some missing
features are `f64`, typed object, and the no-op.
This PR changes base_margin into a 3-dim array, with one of them being reserved for multi-target classification. Also, a breaking change is made for binary serialization due to extra dimension along with a fix for saving the feature weights. Lastly, it unifies the prediction initialization between CPU and GPU. After this PR, the meta info setter in Python will be based on array interface.
On GPU we use rouding factor to truncate the gradient for deterministic results. This PR changes the gradient representation to fixed point number with exponent aligned with rounding factor.
[breaking] Drop non-deterministic histogram.
Use fixed point for shared memory.
This PR is to improve the performance of GPU Hist.
Co-authored-by: Andy Adinets <aadinets@nvidia.com>
- Reduce dependency on dmlc parsers and provide an interface for users to load data by themselves.
- Remove use of threaded iterator and IO queue.
- Remove `page_size`.
- Make sure the number of pages in memory is bounded.
- Make sure the cache can not be violated.
- Provide an interface for internal algorithms to process data asynchronously.
* Re-implement ROC-AUC.
* Binary
* MultiClass
* LTR
* Add documents.
This PR resolves a few issues:
- Define a value when the dataset is invalid, which can happen if there's an
empty dataset, or when the dataset contains only positive or negative values.
- Define ROC-AUC for multi-class classification.
- Define weighted average value for distributed setting.
- A correct implementation for learning to rank task. Previous
implementation is just binary classification with averaging across groups,
which doesn't measure ordered learning to rank.
* Stop printing out message.
* Remove R specialization.
The printed message is not really useful anyway, without a reproducible example
there's no way to fix it. But if there's a reproducible example, we can always
obtain these information by a debugger. Removing the `printf` function avoids
creating the context in kernel.
* Make external memory data partitioning deterministic.
* Change the meaning of `page_size` from bytes to number of rows.
* Design a data pool.
* Note for external memory.
* Enable unity build on Windows CI.
* Force garbage collect on test.