This PR changes base_margin into a 3-dim array, with one of them being reserved for multi-target classification. Also, a breaking change is made for binary serialization due to extra dimension along with a fix for saving the feature weights. Lastly, it unifies the prediction initialization between CPU and GPU. After this PR, the meta info setter in Python will be based on array interface.
* [CI] Drop CUDA 10.1; Require 11.0
* Change NCCL version
* Use CUDA 10.1 for clang-tidy, for now
* Remove JDK 11 and 12
* Fix NCCL version
* Don't require 11.0 just yet, until clang-tidy is fixed
* Skip MultiClassesSerializationTest.GpuHist
* Extend array interface to handle ndarray.
The `ArrayInterface` class is extended to support multi-dim array inputs. Previously this
class handles only 2-dim (vector is also matrix). This PR specifies the expected
dimension at compile-time and the array interface can perform various checks automatically
for input data. Also, adapters like CSR are more rigorous about their input. Lastly, row
vector and column vector are handled without intervention from the caller.
* [R] Fix global feature importance.
* Add implementation for tree index. The parameter is not documented in C API since we
should work on porting the model slicing to R instead of supporting more use of tree
index.
* Fix the difference between "gain" and "total_gain".
* debug.
* Fix prediction.
Change from system Python to environment python3. For Ubuntu 20.04, only `python3` is
available and there's no `python`. So at least `python3` is consistent with Python
virtual env, Ubuntu and anaconda.
This is already partially supported but never properly tested. So the only possible way to use it is calling `numpy.ndarray.flatten` with `base_margin` before passing it into XGBoost. This PR adds proper support
for most of the data types along with tests.
Generated using `clang-format -style=google -dump-config > .clang-format`, with column
width changed from 80 to 100 to be consistent with existing cpplint check.
Spark 3.2 depends on 3.7.0-M11 which has changed some implicited functions'
signatures. And it will result the xgboost4j built against spark 3.0/3.1
failed when saving the model.
A new parameter `custom_metric` is added to `train` and `cv` to distinguish the behaviour from the old `feval`. And `feval` is deprecated. The new `custom_metric` receives transformed prediction when the built-in objective is used. This enables XGBoost to use cost functions from other libraries like scikit-learn directly without going through the definition of the link function.
`eval_metric` and `early_stopping_rounds` in sklearn interface are moved from `fit` to `__init__` and is now saved as part of the scikit-learn model. The old ones in `fit` function are now deprecated. The new `eval_metric` in `__init__` has the same new behaviour as `custom_metric`.
Added more detailed documents for the behaviour of custom objective and metric.