From b9a4f3336af7c78f9f7b66f7c91a419108b7ec83 Mon Sep 17 00:00:00 2001
From: Jiaming Yuan <jm.yuan@outlook.com>
Date: Tue, 13 Apr 2021 08:38:27 +0800
Subject: [PATCH] 1.4 release notes. (#6843)

---
 NEWS.md | 227 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 227 insertions(+)

diff --git a/NEWS.md b/NEWS.md
index 4486a83bf..4d7452c9b 100644
--- a/NEWS.md
+++ b/NEWS.md
@@ -3,6 +3,233 @@ XGBoost Change Log
 
 This file records the changes in xgboost library in reverse chronological order.
 
+## v1.4.0 (2021.04.12)
+
+### Introduction of pre-built binary package for R, with GPU support
+Starting with release 1.4.0, users now have the option of installing `{xgboost}` without
+having to build it from the source. This is particularly advantageous for users who want
+to take advantage of the GPU algorithm (`gpu_hist`), as previously they'd have to build
+`{xgboost}` from the source using CMake and NVCC. Now installing `{xgboost}` with GPU
+support is as easy as: `R CMD INSTALL ./xgboost_r_gpu_linux.tar.gz`. (#6827)
+
+See the instructions at https://xgboost.readthedocs.io/en/latest/build.html
+
+### Improvements on prediction functions
+XGBoost has many prediction types including shap value computation and inplace prediction.
+In 1.4 we overhauled the underlying prediction functions for C API and Python API with an
+unified interface. (#6777, #6693, #6653, #6662, #6648, #6668, #6804)
+* Starting with 1.4, sklearn interface prediction will use inplace predict by default when
+  input data is supported.
+* Users can use inplace predict with `dart` booster and enable GPU acceleration just
+  like `gbtree`.
+* Also all prediction functions with tree models are now thread-safe.  Inplace predict is
+  improved with `base_margin` support.
+* A new set of C predict functions are exposed in the public interface.
+* A user-visible change is a newly added parameter called `strict_shape`.  See
+  https://xgboost.readthedocs.io/en/latest/prediction.html for more details.
+
+
+### Improvement on Dask interface
+* Starting with 1.4, the Dask interface is considered to be feature-complete, which means
+  all of the models found in the single node Python interface are now supported in Dask,
+  including but not limited to ranking and random forest.  Also, the prediction function
+  is significantly faster and supports shap value computation.
+  - Most of the parameters found in single node sklearn interface are supported by
+    Dask interface. (#6471, #6591)
+  - Implements learning to rank.  On the Dask interface, we use the newly added support of
+    query ID to enable group structure. (#6576)
+  - The Dask interface has Python type hints support. (#6519)
+  - All models can be safely pickled. (#6651)
+  - Random forest estimators are now supported. (#6602)
+  - Shap value computation is now supported. (#6575, #6645, #6614)
+  - Evaluation result is printed on the scheduler process. (#6609)
+  - `DaskDMatrix` (and device quantile dmatrix) now accepts all meta-information. (#6601)
+
+* Prediction optimization.  We enhanced and speeded up the prediction function for the
+  Dask interface.  See the latest Dask tutorial page in our document for an overview of
+  how you can optimize it even further. (#6650, #6645, #6648, #6668)
+
+* Bug fixes
+  - If you are using the latest Dask and distributed where `distributed.MultiLock` is
+    present, XGBoost supports training multiple models on the same cluster in
+    parallel. (#6743)
+  - A bug fix for when using `dask.client` to launch async task, XGBoost might use a
+    different client object internally. (#6722)
+
+* Other improvements on documents, blogs, tutorials, and demos. (#6389, #6366, #6687,
+  #6699, #6532, #6501)
+
+### Python package
+With changes from Dask and general improvement on prediction, we have made some
+enhancements on the general Python interface and IO for booster information.  Starting
+from 1.4, booster feature names and types can be saved into the JSON model.  Also some
+model attributes like `best_iteration`, `best_score` are restored upon model load.  On
+sklearn interface, some attributes are now implemented as Python object property with
+better documents.
+
+* Breaking change: All `data` parameters in prediction functions are renamed to `X`
+  for better compliance to sklearn estimator interface guidelines.
+* Breaking change: XGBoost used to generate some pseudo feature names with `DMatrix`
+  when inputs like `np.ndarray` don't have column names.  The procedure is removed to
+  avoid conflict with other inputs. (#6605)
+* Early stopping with training continuation is now supported. (#6506)
+* Optional import for Dask and cuDF are now lazy. (#6522)
+* As mentioned in the prediction improvement summary, the sklearn interface uses inplace
+  prediction whenever possible. (#6718)
+* Booster information like feature names and feature types are now saved into the JSON
+  model file. (#6605)
+* All `DMatrix` interfaces including `DeviceQuantileDMatrix` and counterparts in Dask
+  interface (as mentioned in the Dask changes summary) now accept all the meta-information
+  like `group` and `qid` in their constructor for better consistency. (#6601)
+* Booster attributes are restored upon model load so users don't have to call `attr`
+  manually. (#6593)
+* On sklearn interface, all models accept `base_margin` for evaluation datasets. (#6591)
+* Improvements over the setup script including smaller sdist size and faster installation
+  if the C++ library is already built (#6611, #6694, #6565).
+
+* Bug fixes for Python package:
+  - Don't validate feature when number of rows is 0. (#6472)
+  - Move metric configuration into booster. (#6504)
+  - Calling XGBModel.fit() should clear the Booster by default (#6562)
+  - Support `_estimator_type`. (#6582)
+  - [dask, sklearn] Fix predict proba. (#6566, #6817)
+  - Restore unknown data support. (#6595)
+  - Fix learning rate scheduler with cv. (#6720)
+  - Fixes small typo in sklearn documentation (#6717)
+  - [python-package] Fix class Booster: feature_types = None (#6705)
+  - Fix divide by 0 in feature importance when no split is found. (#6676)
+
+
+### JVM package
+* [jvm-packages] fix early stopping doesn't work even without custom_eval setting (#6738)
+* fix potential TaskFailedListener's callback won't be called (#6612)
+* [jvm] Add ability to load booster direct from byte array (#6655)
+* [jvm-packages] JVM library loader extensions (#6630)
+
+### R package
+* R documentation: Make construction of DMatrix consistent.
+* Fix R documentation for xgb.train. (#6764)
+
+### ROC-AUC
+We re-implemented the ROC-AUC metric in XGBoost.  The new implementation supports
+multi-class classification and has better support for learning to rank tasks that are not
+binary.  Also, it has a better-defined average on distributed environments with additional
+handling for invalid datasets. (#6749, #6747, #6797)
+
+### Global configuration.
+Starting from 1.4, XGBoost's Python, R and C interfaces support a new global configuration
+model where users can specify some global parameters.  Currently, supported parameters are
+`verbosity` and `use_rmm`.  The latter is experimental, see rmm plugin demo and
+related README file for details. (#6414, #6656)
+
+### Other New features.
+* Better handling for input data types that support `__array_interface__`.  For some
+  data types including GPU inputs and `scipy.sparse.csr_matrix`, XGBoost employs
+  `__array_interface__` for processing the underlying data.  Starting from 1.4, XGBoost
+  can accept arbitrary array strides (which means column-major is supported) without
+  making data copies, potentially reducing a significant amount of memory consumption.
+  Also version 3 of `__cuda_array_interface__` is now supported.  (#6776, #6765, #6459,
+  #6675)
+* Improved parameter validation, now feeding XGBoost with parameters that contain
+  whitespace will trigger an error. (#6769)
+* For Python and R packages, file paths containing the home indicator `~` are supported.
+* As mentioned in the Python changes summary, the JSON model can now save feature
+  information of the trained booster.  The JSON schema is updated accordingly. (#6605)
+* Development of categorical data support is continued.  Newly added weighted data support
+  and `dart` booster support. (#6508, #6693)
+* As mentioned in Dask change summary, ranking now supports the `qid` parameter for
+  query groups. (#6576)
+* `DMatrix.slice` can now consume a numpy array. (#6368)
+
+### Other breaking changes
+* Aside from the feature name generation, there are 2 breaking changes:
+  - Drop saving binary format for memory snapshot. (#6513, #6640)
+  - Change default evaluation metric for binary:logitraw objective to logloss (#6647)
+
+### CPU Optimization
+* Aside from the general changes on predict function, some optimizations are applied on
+  CPU implementation. (#6683, #6550, #6696, #6700)
+* Also performance for sampling initialization in `hist` is improved. (#6410)
+
+### Notable fixes in the core library
+These fixes do not reside in particular language bindings:
+* Fixes for gamma regression.  This includes checking for invalid input values, fixes for
+  gamma deviance metric, and better floating point guard for gamma negative log-likelihood
+  metric. (#6778, #6537, #6761)
+* Random forest with `gpu_hist` might generate low accuracy in previous versions. (#6755)
+* Fix a bug in GPU sketching when data size exceeds limit of 32-bit integer. (#6826)
+* Memory consumption fix for row-major adapters (#6779)
+* Don't estimate sketch batch size when rmm is used. (#6807) (#6830)
+* Fix in-place predict with missing value. (#6787)
+* Re-introduce double buffer in UpdatePosition, to fix perf regression in gpu_hist (#6757)
+* Pass correct split_type to GPU predictor (#6491)
+* Fix DMatrix feature names/types IO. (#6507)
+* Use view for `SparsePage` exclusively to avoid some data access races. (#6590)
+* Check for invalid data. (#6742)
+* Fix relocatable include in CMakeList (#6734) (#6737)
+* Fix DMatrix slice with feature types. (#6689)
+
+### Other deprecation notices:
+
+* This release will be the last release to support CUDA 10.0. (#6642)
+
+* Starting in the next release, the Python package will require Pip 19.3+ due to the use
+  of manylinux2014 tag. Also, CentOS 6, RHEL 6 and other old distributions will not be
+  supported.
+
+### Known issue:
+
+MacOS build of the JVM packages doesn't support multi-threading out of the box. To enable
+multi-threading with JVM packages, MacOS users will need to build the JVM packages from
+the source. See https://xgboost.readthedocs.io/en/latest/jvm/index.html#installation-from-source
+
+
+### Doc
+* Dedicated page for `tree_method` parameter is added. (#6564, #6633)
+* [doc] Add FLAML as a fast tuning tool for XGBoost  (#6770)
+* Add document for tests directory. [skip ci] (#6760)
+* Fix doc string of config.py to use correct `versionadded` (#6458)
+* Update demo for prediction. (#6789)
+* [Doc] Document that AUCPR is for binary classification/ranking (#5899)
+* Update the C API comments (#6457)
+* Fix document. [skip ci] (#6669)
+
+### Maintenance: Testing, continuous integration
+* Use CPU input for test_boost_from_prediction. (#6818)
+* [CI] Upload xgboost4j.dll to S3 (#6781)
+* Update dmlc-core submodule (#6745)
+* [CI] Use manylinux2010_x86_64 container to vendor libgomp (#6485)
+* Add conda-forge badge (#6502)
+* Fix merge conflict. (#6512)
+* [CI] Split up main.yml, add mypy. (#6515)
+* [Breaking] Upgrade cuDF and RMM to 0.18 nightlies; require RMM 0.18+ for RMM plugin (#6510)
+* "featue_map" typo changed to  "feature_map" (#6540)
+* Add script for generating release tarball. (#6544)
+* Add credentials to .gitignore (#6559)
+* Remove warnings in tests. (#6554)
+* Update dmlc-core submodule and conform to new API (#6431)
+* Suppress hypothesis health check for dask client. (#6589)
+* Fix pylint. (#6714)
+* [CI] Clear R package cache (#6746)
+* Exclude dmlc test on github action. (#6625)
+* Tests for regression metrics with weights. (#6729)
+* Add helper script and doc for releasing pip package. (#6613)
+* Support pylint 2.7.0 (#6726)
+* Remove R cache in github action. (#6695)
+* [CI] Do not mix up stashed executable built for ARM and x86_64 platforms (#6646)
+* [CI] Add ARM64 test to Jenkins pipeline (#6643)
+* Disable s390x and arm64 tests on travis for now. (#6641)
+* Move sdist test to action. (#6635)
+* [dask] Rework base margin test. (#6627)
+
+
+### Maintenance: Refactor code for legibility and maintainability
+* Improve OpenMP exception handling (#6680)
+* Improve string view to reduce string allocation. (#6644)
+* Simplify Span checks. (#6685)
+* Use generic dispatching routine for array interface. (#6672)
+
+
 ## v1.3.0 (2020.12.08)
 
 ### XGBoost4J-Spark: Exceptions should cancel jobs gracefully instead of killing SparkContext (#6019).