15 KiB
15 KiB
XGBoost Change Log
This file records the changes in xgboost library in reverse chronological order.
v0.72.1 (2018.07.08)
This version is only applicable for the Python package. The content is identical to that of v0.72.
v0.72 (2018.06.01)
- Starting with this release, we plan to make a new release every two months. See #3252 for more details.
- Fix a pathological behavior (near-zero second-order gradients) in multiclass objective (#3304)
- Tree dumps now use high precision in storing floating-point values (#3298)
- Submodules
rabitanddmlc-corehave been brought up to date, bringing bug fixes (#3330, #3221). - GPU support
- Continuous integration tests for GPU code (#3294, #3309)
- GPU accelerated coordinate descent algorithm (#3178)
- Abstract 1D vector class now works with multiple GPUs (#3287)
- Generate PTX code for most recent architecture (#3316)
- Fix a memory bug on NVIDIA K80 cards (#3293)
- Address performance instability for single-GPU, multi-core machines (#3324)
- Python package
- FreeBSD support (#3247)
- Validation of feature names in
Booster.predict()is now optional (#3323)
- Updated Sklearn API
- Validation sets now support instance weights (#2354)
XGBClassifier.predict_proba()should not supportoutput_marginoption. (#3343) See BREAKING CHANGES below.
- R package:
- Better handling of NULL in
print.xgb.Booster()(#3338) - Comply with CRAN policy by removing compiler warning suppression (#3329)
- Updated CRAN submission
- Better handling of NULL in
- JVM packages
- JVM packages will now use the same versioning scheme as other packages (#3253)
- Update Spark to 2.3 (#3254)
- Add scripts to cross-build and deploy artifacts (#3276, #3307)
- Fix a compilation error for Scala 2.10 (#3332)
- BREAKING CHANGES
XGBClassifier.predict_proba()no longer accepts paramteroutput_margin. The paramater makes no sense forpredict_proba()because the method is to predict class probabilities, not raw margin scores.
v0.71 (2018.04.11)
- This is a minor release, mainly motivated by issues concerning
pip install, e.g. #2426, #3189, #3118, and #3194. With this release, users of Linux and MacOS will be able to runpip installfor the most part. - Refactored linear booster class (
gblinear), so as to support multiple coordinate descent updaters (#3103, #3134). See BREAKING CHANGES below. - Fix slow training for multiclass classification with high number of classes (#3109)
- Fix a corner case in approximate quantile sketch (#3167). Applicable for 'hist' and 'gpu_hist' algorithms
- Fix memory leak in DMatrix (#3182)
- New functionality
- Better linear booster class (#3103, #3134)
- Pairwise SHAP interaction effects (#3043)
- Cox loss (#3043)
- AUC-PR metric for ranking task (#3172)
- Monotonic constraints for 'hist' algorithm (#3085)
- GPU support
- Create an abtract 1D vector class that moves data seamlessly between the main and GPU memory (#2935, #3116, #3068). This eliminates unnecessary PCIe data transfer during training time.
- Fix minor bugs (#3051, #3217)
- Fix compatibility error for CUDA 9.1 (#3218)
- Python package:
- Correctly handle parameter
verbose_eval=0(#3115)
- Correctly handle parameter
- R package:
- Eliminate segmentation fault on 32-bit Windows platform (#2994)
- JVM packages
- Fix a memory bug involving double-freeing Booster objects (#3005, #3011)
- Handle empty partition in predict (#3014)
- Update docs and unify terminology (#3024)
- Delete cache files after job finishes (#3022)
- Compatibility fixes for latest Spark versions (#3062, #3093)
- BREAKING CHANGES: Updated linear modelling algorithms. In particular L1/L2 regularisation penalties are now normalised to number of training examples. This makes the implementation consistent with sklearn/glmnet. L2 regularisation has also been removed from the intercept. To produce linear models with the old regularisation behaviour, the alpha/lambda regularisation parameters can be manually scaled by dividing them by the number of training examples.
v0.7 (2017.12.30)
- This version represents a major change from the last release (v0.6), which was released one year and half ago.
- Updated Sklearn API
- Add compatibility layer for scikit-learn v0.18:
sklearn.cross_validationnow deprecated - Updated to allow use of all XGBoost parameters via
**kwargs. - Updated
nthreadton_jobsandseedtorandom_state(as per Sklearn convention);nthreadandseedare now marked as deprecated - Updated to allow choice of Booster (
gbtree,gblinear, ordart) XGBRegressornow supports instance weights (specifysample_weightparameter)- Pass
n_jobsparameter to theDMatrixconstructor - Add
xgb_modelparameter tofitmethod, to allow continuation of training
- Add compatibility layer for scikit-learn v0.18:
- Refactored gbm to allow more friendly cache strategy
- Specialized some prediction routine
- Robust
DMatrixconstruction from a sparse matrix - Faster consturction of
DMatrixfrom 2D NumPy matrices: elide copies, use of multiple threads - Automatically remove nan from input data when it is sparse.
- This can solve some of user reported problem of istart != hist.size
- Fix the single-instance prediction function to obtain correct predictions
- Minor fixes
- Thread local variable is upgraded so it is automatically freed at thread exit.
- Fix saving and loading
count::poissonmodels - Fix CalcDCG to use base-2 logarithm
- Messages are now written to stderr instead of stdout
- Keep built-in evaluations while using customized evaluation functions
- Use
bst_floatconsistently to minimize type conversion - Copy the base margin when slicing
DMatrix - Evaluation metrics are now saved to the model file
- Use
int32_texplicitly when serializing version - In distributed training, synchronize the number of features after loading a data matrix.
- Migrate to C++11
- The current master version now requires C++11 enabled compiled(g++4.8 or higher)
- Predictor interface was factored out (in a manner similar to the updater interface).
- Makefile support for Solaris and ARM
- Test code coverage using Codecov
- Add CPP tests
- Add
DockerfileandJenkinsfileto support continuous integration for GPU code - New functionality
- Ability to adjust tree model's statistics to a new dataset without changing tree structures.
- Ability to extract feature contributions from individual predictions, as described in here and here.
- Faster, histogram-based tree algorithm (
tree_method='hist') . - GPU/CUDA accelerated tree algorithms (
tree_method='gpu_hist'or'gpu_exact'), including the GPU-based predictor. - Monotonic constraints: when other features are fixed, force the prediction to be monotonic increasing with respect to a certain specified feature.
- Faster gradient caculation using AVX SIMD
- Ability to export models in JSON format
- Support for Tweedie regression
- Additional dropout options for DART: binomial+1, epsilon
- Ability to update an existing model in-place: this is useful for many applications, such as determining feature importance
- Python package:
- New parameters:
learning_ratesincv()shuffleinmknfold()max_featuresandshow_valuesinplot_importance()sample_weightinXGBRegressor.fit()
- Support binary wheel builds
- Fix
MultiIndexdetection to support Pandas 0.21.0 and higher - Support metrics and evaluation sets whose names contain
- - Support feature maps when plotting trees
- Compatibility fix for Python 2.6
- Call
print_evaluationcallback at last iteration - Use appropriate integer types when calling native code, to prevent truncation and memory error
- Fix shared library loading on Mac OS X
- New parameters:
- R package:
- New parameters:
silentinxgb.DMatrix()use_int_idinxgb.model.dt.tree()predcontribinpredict()monotone_constraintsinxgb.train()
- Default value of the
save_periodparameter inxgboost()changed to NULL (consistent withxgb.train()). - It's possible to custom-build the R package with GPU acceleration support.
- Enable JVM build for Mac OS X and Windows
- Integration with AppVeyor CI
- Improved safety for garbage collection
- Store numeric attributes with higher precision
- Easier installation for devel version
- Improved
xgb.plot.tree() - Various minor fixes to improve user experience and robustness
- Register native code to pass CRAN check
- Updated CRAN submission
- New parameters:
- JVM packages
- Add Spark pipeline persistence API
- Fix data persistence: loss evaluation on test data had wrongly used caches for training data.
- Clean external cache after training
- Implement early stopping
- Enable training of multiple models by distinguishing stage IDs
- Better Spark integration: support RDD / dataframe / dataset, integrate with Spark ML package
- XGBoost4j now supports ranking task
- Support training with missing data
- Refactor JVM package to separate regression and classification models to be consistent with other machine learning libraries
- Support XGBoost4j compilation on Windows
- Parameter tuning tool
- Publish source code for XGBoost4j to maven local repo
- Scala implementation of the Rabit tracker (drop-in replacement for the Java implementation)
- Better exception handling for the Rabit tracker
- Persist
num_class, number of classes (for classification task) XGBoostModelnow holdsBoosterParams- libxgboost4j is now part of CMake build
- Release
DMatrixwhen no longer needed, to conserve memory - Expose
baseMargin, to allow initialization of boosting with predictions from an external model - Support instance weights
- Use
SparkParallelismTrackerto prevent jobs from hanging forever - Expose train-time evaluation metrics via
XGBoostModel.summary - Option to specify
host-ipexplicitly in the Rabit tracker
- Documentation
- Better math notation for gradient boosting
- Updated build instructions for Mac OS X
- Template for GitHub issues
- Add
CITATIONfile for citing XGBoost in scientific writing - Fix dropdown menu in xgboost.readthedocs.io
- Document
updater_seqparameter - Style fixes for Python documentation
- Links to additional examples and tutorials
- Clarify installation requirements
- Changes that break backward compatibility
v0.6 (2016.07.29)
- Version 0.5 is skipped due to major improvements in the core
- Major refactor of core library.
- Goal: more flexible and modular code as a portable library.
- Switch to use of c++11 standard code.
- Random number generator defaults to
std::mt19937. - Share the data loading pipeline and logging module from dmlc-core.
- Enable registry pattern to allow optionally plugin of objective, metric, tree constructor, data loader.
- Future plugin modules can be put into xgboost/plugin and register back to the library.
- Remove most of the raw pointers to smart ptrs, for RAII safety.
- Add official option to approximate algorithm
tree_methodto parameter.- Change default behavior to switch to prefer faster algorithm.
- User will get a message when approximate algorithm is chosen.
- Change library name to libxgboost.so
- Backward compatiblity
- The binary buffer file is not backward compatible with previous version.
- The model file is backward compatible on 64 bit platforms.
- The model file is compatible between 64/32 bit platforms(not yet tested).
- External memory version and other advanced features will be exposed to R library as well on linux.
- Previously some of the features are blocked due to C++11 and threading limits.
- The windows version is still blocked due to Rtools do not support
std::thread.
- rabit and dmlc-core are maintained through git submodule
- Anyone can open PR to update these dependencies now.
- Improvements
- Rabit and xgboost libs are not thread-safe and use thread local PRNGs
- This could fix some of the previous problem which runs xgboost on multiple threads.
- JVM Package
- Enable xgboost4j for java and scala
- XGBoost distributed now runs on Flink and Spark.
- Support model attributes listing for meta data.
- Support callback API
- Support new booster DART(dropout in tree boosting)
- Add CMake build system
v0.47 (2016.01.14)
- Changes in R library
- fixed possible problem of poisson regression.
- switched from 0 to NA for missing values.
- exposed access to additional model parameters.
- Changes in Python library
- throws exception instead of crash terminal when a parameter error happens.
- has importance plot and tree plot functions.
- accepts different learning rates for each boosting round.
- allows model training continuation from previously saved model.
- allows early stopping in CV.
- allows feval to return a list of tuples.
- allows eval_metric to handle additional format.
- improved compatibility in sklearn module.
- additional parameters added for sklearn wrapper.
- added pip installation functionality.
- supports more Pandas DataFrame dtypes.
- added best_ntree_limit attribute, in addition to best_score and best_iteration.
- Java api is ready for use
- Added more test cases and continuous integration to make each build more robust.
v0.4 (2015.05.11)
- Distributed version of xgboost that runs on YARN, scales to billions of examples
- Direct save/load data and model from/to S3 and HDFS
- Feature importance visualization in R module, by Michael Benesty
- Predict leaf index
- Poisson regression for counts data
- Early stopping option in training
- Native save load support in R and python
- xgboost models now can be saved using save/load in R
- xgboost python model is now pickable
- sklearn wrapper is supported in python module
- Experimental External memory version
v0.3 (2014.09.07)
- Faster tree construction module
- Allows subsample columns during tree construction via
bst:col_samplebytree=ratio
- Allows subsample columns during tree construction via
- Support for boosting from initial predictions
- Experimental version of LambdaRank
- Linear booster is now parallelized, using parallel coordinated descent.
- Add Code Guide for customizing objective function and evaluation
- Add R module
v0.2x (2014.05.20)
- Python module
- Weighted samples instances
- Initial version of pairwise rank
v0.1 (2014.03.26)
- Initial release