Fix bugs in multithreaded ApplySplitSparseData() (#2161)

* Bugfix 1: Fix segfault in multithreaded ApplySplitSparseData()

When there are more threads than rows in rowset, some threads end up
with empty ranges, causing them to crash. (iend - 1 needs to be
accessible as part of algorithm)

Fix: run only those threads with nonempty ranges.

* Add regression test for Bugfix 1

* Moving python_omp_test to existing python test group

Turns out you don't need to set "OMP_NUM_THREADS" to enable
multithreading. Just add nthread parameter.

* Bugfix 2: Fix corner case of ApplySplitSparseData() for categorical feature

When split value is less than all cut points, split_cond is set
incorrectly.

Fix: set split_cond = -1 to indicate this scenario

* Bugfix 3: Initialize data layout indicator before using it

data_layout_ is accessed before being set; this variable determines
whether feature 0 is included in feat_set.

Fix: re-order code in InitData() to initialize data_layout_ first

* Adding regression test for Bugfix 2

Unfortunately, no regression test for Bugfix 3, as there is no
way to deterministically assign value to an uninitialized variable.
This commit is contained in:
Philip Cho
2017-04-02 11:37:40 -07:00
committed by Tianqi Chen
parent ed5e75de2f
commit 2715baef64
3 changed files with 113 additions and 67 deletions

View File

@@ -64,6 +64,7 @@ namespace xgboost {
* used for feature index and row index.
*/
typedef uint32_t bst_uint;
typedef int32_t bst_int;
/*! \brief long integers */
typedef uint64_t bst_ulong; // NOLINT(*)
/*! \brief float type, used for storing statistics */