Fix CPU hist init for sparse dataset. (#4625)

* Fix CPU hist init for sparse dataset.

* Implement sparse histogram cut.
* Allow empty features.

* Fix windows build, don't use sparse in distributed environment.

* Comments.

* Smaller threshold.

* Fix windows omp.

* Fix msvc lambda capture.

* Fix MSVC macro.

* Fix MSVC initialization list.

* Fix MSVC initialization list x2.

* Preserve categorical feature behavior.

* Rename matrix to sparse cuts.
* Reuse UseGroup.
* Check for categorical data when adding cut.

Co-Authored-By: Philip Hyunsu Cho <chohyu01@cs.washington.edu>

* Sanity check.

* Fix comments.

* Fix comment.
This commit is contained in:
Jiaming Yuan
2019-07-04 19:27:03 -04:00
committed by Philip Hyunsu Cho
parent b7a1f22d24
commit d9a47794a5
33 changed files with 681 additions and 299 deletions

View File

@@ -52,14 +52,14 @@ class SparsePageSource : public DataSource {
* \param page_size Page size for external memory.
*/
static void CreateRowPage(dmlc::Parser<uint32_t>* src,
const std::string& cache_info,
const size_t page_size = DMatrix::kPageSize);
const std::string& cache_info,
const size_t page_size = DMatrix::kPageSize);
/*!
* \brief Create source cache by copy content from DMatrix.
* \param cache_info The cache_info of cache file location.
*/
static void CreateRowPage(DMatrix* src,
const std::string& cache_info);
const std::string& cache_info);
/*!
* \brief Create source cache by copy content from DMatrix. Creates transposed column page, may be sorted or not.
@@ -67,7 +67,7 @@ class SparsePageSource : public DataSource {
* \param sorted Whether columns should be pre-sorted
*/
static void CreateColumnPage(DMatrix* src,
const std::string& cache_info, bool sorted);
const std::string& cache_info, bool sorted);
/*!
* \brief Check if the cache file already exists.
* \param cache_info The cache prefix of files.