Philip Hyunsu Cho 109473dae2
Fix #3545: XGDMatrixCreateFromCSCEx silently discards empty trailing rows (#3553)
* Fix #3545: XGDMatrixCreateFromCSCEx silently discards empty trailing rows

Description: The bug is triggered when

1. The data matrix has empty rows at the bottom. More precisely, the rows
   `n-k+1`, `n-k+2`, ..., `n` of the matrix have missing values in all
   dimensions (`n` number of instances, `k` number of trailing rows)
2. The data matrix is given as Compressed Sparse Column (CSC) format.

Diagnosis: When the CSC matrix is converted to Compressed Sparse Row (CSR)
format (this is common format used for DMatrix), the trailing empty rows
are silently ignored. More specifically, the row pointer (`offset`) of the
newly created CSR matrix does not take account of these rows.

Fix: Modify the row pointer.

* Add regression test
2018-08-05 10:15:42 -07:00
2018-03-21 19:24:29 -04:00
2018-06-07 10:25:58 +12:00
2018-06-18 12:53:52 -07:00
2018-05-09 14:31:59 -07:00
2018-07-10 00:42:15 -07:00
2017-12-01 02:58:13 -08:00
2018-07-08 15:42:09 -07:00
2018-07-04 13:09:32 -07:00

eXtreme Gradient Boosting

Build Status Build Status Documentation Status GitHub license CRAN Status Badge PyPI version

Community | Documentation | Resources | Contributors | Release Notes

XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework. XGBoost provides a parallel tree boosting (also known as GBDT, GBM) that solve many data science problems in a fast and accurate way. The same code runs on major distributed environment (Hadoop, SGE, MPI) and can solve problems beyond billions of examples.

License

© Contributors, 2016. Licensed under an Apache-2 license.

Contribute to XGBoost

XGBoost has been developed and used by a group of active community members. Your help is very valuable to make the package better for everyone. Checkout the Community Page

Reference

  • Tianqi Chen and Carlos Guestrin. XGBoost: A Scalable Tree Boosting System. In 22nd SIGKDD Conference on Knowledge Discovery and Data Mining, 2016
  • XGBoost originates from research project at University of Washington.
Description
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
Readme 33 MiB
Languages
C++ 45.5%
Python 20.3%
Cuda 15.2%
R 6.8%
Scala 6.4%
Other 5.6%