- Reduce dependency on dmlc parsers and provide an interface for users to load data by themselves.
- Remove use of threaded iterator and IO queue.
- Remove `page_size`.
- Make sure the number of pages in memory is bounded.
- Make sure the cache can not be violated.
- Provide an interface for internal algorithms to process data asynchronously.
* Disable JSON serialization for now.
* Multi-class classification is checkpointing for each iteration.
This brings significant overhead.
Revert: 90355b4f007ae
* Set R tests to use binary.
* fix type error
* Validate number of features.
* resolve comments
* add feature size for LabelPoint and DataBatch
* pass the feature size to native
* move feature size validating tests into a separate suite
* resolve comments
Co-authored-by: fis <jm.yuan@outlook.com>
Move this function into gbtree, and uses only updater for doing so. As now the predictor knows exactly how many trees to predict, there's no need for it to update the prediction cache.
* Fix syncing DMatrix columns.
* notes for tree method.
* Enable feature validation for all interfaces except for jvm.
* Better tests for boosting from predictions.
* Disable validation on JVM.
* Disable parameter validation for now.
Scikit-Learn passes all parameters down to XGBoost, whether they are used or
not.
* Add option `validate_parameters`.
* Pass pointer to model parameters.
This PR de-duplicates most of the model parameters except the one in
`tree_model.h`. One difficulty is `base_score` is a model property but can be
changed at runtime by objective function. Hence when performing model IO, we
need to save the one provided by users, instead of the one transformed by
objective. Here we created an immutable version of `LearnerModelParam` that
represents the value of model parameter after configuration.
* Use `UpdateAllowUnknown' for non-model related parameter.
Model parameter can not pack an additional boolean value due to binary IO
format. This commit deals only with non-model related parameter configuration.
* Add tidy command line arg for use-dmlc-gtest.
* Apply Configurable to objective functions.
* Apply Model to Learner and Regtree, gbm.
* Add Load/SaveConfig to objs.
* Refactor obj tests to use smart pointer.
* Dummy methods for Save/Load Model.
* Refactor configuration [Part II].
* General changes:
** Remove `Init` methods to avoid ambiguity.
** Remove `Configure(std::map<>)` to avoid redundant copying and prepare for
parameter validation. (`std::vector` is returned from `InitAllowUnknown`).
** Add name to tree updaters for easier debugging.
* Learner changes:
** Make `LearnerImpl` the only source of configuration.
All configurations are stored and carried out by `LearnerImpl::Configure()`.
** Remove booster in C API.
Originally kept for "compatibility reason", but did not state why. So here
we just remove it.
** Add a `metric_names_` field in `LearnerImpl`.
** Remove `LazyInit`. Configuration will always be lazy.
** Run `Configure` before every iteration.
* Predictor changes:
** Allocate both cpu and gpu predictor.
** Remove cpu_predictor from gpu_predictor.
`GBTree` is now used to dispatch the predictor.
** Remove some GPU Predictor tests.
* IO
No IO changes. The binary model format stability is tested by comparing
hashing value of save models between two commits
This is part 1 of refactoring configuration.
* Move tree heuristic configurations.
* Split up declarations and definitions for GBTree.
* Implement UseGPU in gbm.
* Only define `gpu_id` and `n_gpus` in `LearnerTrainParam`
* Pass LearnerTrainParam through XGBoost vid factory method.
* Disable all GPU usage when GPU related parameters are not specified (fixes XGBoost choosing GPU over aggressively).
* Test learner train param io.
* Fix gpu pickling.