Go to file

Anthony D'Amato f58e41bad8 Fix deterministic partitioning with dataset containing Double.NaN (#5996 )

The functions featureValueOfSparseVector or featureValueOfDenseVector could return a Float.NaN if the input vectore was containing any missing values. This would make fail the partition key computation and most of the vectors would end up in the same partition. We fix this by avoid returning a NaN and simply use the row HashCode in this case.
We added a test to ensure that the repartition is indeed now uniform on input dataset containing values by checking that the partitions size variance is below a certain threshold.

Signed-off-by: Anthony D'Amato <anthony.damato@hotmail.fr>

2020-08-18 18:55:37 -07:00

.github

[CI] Improve R linter script (#5944 )

2020-07-27 00:55:35 -07:00

amalgamation

Feature weights (#5962 )

2020-08-18 19:55:41 +08:00

cmake

Fix mingw build with R. (#5918 )

2020-07-22 02:56:49 +08:00

cub @ c3cceac115

Refactor gpu_hist split evaluation (#5610 )

2020-04-30 08:58:12 +12:00

demo

Feature weights (#5962 )

2020-08-18 19:55:41 +08:00

dev

Add release note for 1.0.0 in NEWS.md (#5329 )

2020-03-03 21:35:43 -08:00

dmlc-core @ 5df8305fe6

Ensure that configured dmlc/build_config.h is picked up by Rabit and XGBoost (#5514 )

2020-04-11 23:48:28 -07:00

doc

Feature weights (#5962 )

2020-08-18 19:55:41 +08:00

include/xgboost

Swap byte-order in binary serializer to support big-endian arch (#5813 )

2020-08-18 14:47:17 -07:00

jvm-packages

Fix deterministic partitioning with dataset containing Double.NaN (#5996 )

2020-08-18 18:55:37 -07:00

plugin

RMM integration plugin (#5873 )

2020-08-12 01:26:02 -07:00

python-package

Swap byte-order in binary serializer to support big-endian arch (#5813 )

2020-08-18 14:47:17 -07:00

R-package

Add SHAP summary plot using ggplot2 (#5882 )

2020-08-18 18:04:09 -07:00

rabit

Merge rabit

2020-08-18 03:52:33 +08:00

src

Swap byte-order in binary serializer to support big-endian arch (#5813 )

2020-08-18 14:47:17 -07:00

tests

Swap byte-order in binary serializer to support big-endian arch (#5813 )

2020-08-18 14:47:17 -07:00

.clang-tidy

Upgrade clang-tidy on CI. (#5469 )

2020-04-05 04:42:29 +08:00

.editorconfig

Added configuration for python into .editorconfig (#3494 )

2018-07-23 00:24:10 -07:00

.gitignore

[R] Remove dependency on gendef for Visual Studio builds (fixes #5608 ) (#5764 )

2020-06-15 00:20:44 +00:00

.gitmodules

Remove rabit.

2020-08-18 03:48:36 +08:00

.travis.yml

Swap byte-order in binary serializer to support big-endian arch (#5813 )

2020-08-18 14:47:17 -07:00

appveyor.yml

Remove R and JVM from appveyor. (#5922 )

2020-07-23 03:26:48 +08:00

CITATION

simplify software citation (#2912 )

2017-12-01 02:58:13 -08:00

CMakeLists.txt

RMM integration plugin (#5873 )

2020-08-12 01:26:02 -07:00

CONTRIBUTORS.md

Rename Ant Financial to Ant Group (#5827 )

2020-06-25 15:25:36 -04:00

Jenkinsfile

[CI] Cancel builds on subsequent pushes (#6011 )

2020-08-13 11:17:39 -07:00

Jenkinsfile-win64

[CI] Cancel builds on subsequent pushes (#6011 )

2020-08-13 11:17:39 -07:00

LICENSE

fixed year to 2019 in conf.py, helpers.h and LICENSE (#4661 )

2019-07-15 12:29:12 -04:00

Makefile

C++14 for xgboost (#5664 )

2020-05-21 12:26:40 +12:00

NEWS.md

Add release note for 1.1.0 in NEWS.md (#5763 )

2020-06-08 14:16:10 -07:00

README.md

Update README.md (#5346 )

2020-02-23 02:52:37 +08:00

README.md

eXtreme Gradient Boosting

Community | Documentation | Resources | Contributors | Release Notes

XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework. XGBoost provides a parallel tree boosting (also known as GBDT, GBM) that solve many data science problems in a fast and accurate way. The same code runs on major distributed environment (Kubernetes, Hadoop, SGE, MPI, Dask) and can solve problems beyond billions of examples.

License

Contribute to XGBoost

XGBoost has been developed and used by a group of active community members. Your help is very valuable to make the package better for everyone. Checkout the Community Page.

Reference

Tianqi Chen and Carlos Guestrin. XGBoost: A Scalable Tree Boosting System. In 22nd SIGKDD Conference on Knowledge Discovery and Data Mining, 2016
XGBoost originates from research project at University of Washington.