* [dask] Use a 1 line sample to infer output shape.
This is for inferring shape with direct prediction (without DaskDMatrix).
There are a few things that requires known output shape before carrying out
actual prediction, including dask meta data, output dataframe columns.
* Infer output shape based on local prediction.
* Remove set param in predict function as it's not thread safe nor necessary as
we now let dask to decide the parallelism.
* Simplify prediction on `DaskDMatrix`.
This PR ensures all DMatrix types have a common interface.
* Fix logic in avoiding duplicated DMatrix in sklearn.
* Check for consistency between DMatrix types.
* Add doc for bounds.
* [java] extending the library loader to use both OS and CPU architecture.
* Simplifying create_jni.py's architecture detection.
* Tidying up the architecture detection in create_jni.py
The old (before fix) best_ntree_limit ignores the num_class parameters, which is incorrect. In before we workarounded it in c++ layer to avoid possible breaking changes on other language bindings. But the Python interpretation stayed incorrect. The PR fixed that in Python to consider num_class, but didn't remove the old workaround, so tree calculation in predictor is incorrect, see PredictBatch in CPUPredictor.
* Initial support for distributed LTR using dask.
* Support `qid` in libxgboost.
* Refactor `predict` and `n_features_in_`, `best_[score/iteration/ntree_limit]`
to avoid duplicated code.
* Define `DaskXGBRanker`.
The dask ranker doesn't support group structure, instead it uses query id and
convert to group ptr internally.
* Update dmlc-core submodule and conform to new API
* Remove unsupported parameter from method signature
* Update dmlc-core submodule and conform to new API
* Update dmlc-core
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
* For sklearn:
- Handles user defined objective function.
- Handles `softmax`.
* For dask:
- Use the implementation from sklearn, the previous implementation doesn't perform any extra handling.
* Calling XGBModel.fit() should clear the Booster by default
* Document the behavior of fit()
* Allow sklearn object to be passed in directly via xgb_model argument
* Fix lint
For the `gamma-nloglik` eval metric, small positive values in the labels are causing `NaN`'s in the outputs, as reported here: https://github.com/dmlc/xgboost/issues/5349. This will add clipping on them, similar to what is done in other metrics like `poisson-nloglik` and `logloss`.