Fix bugs in multithreaded ApplySplitSparseData() (#2161)

* Bugfix 1: Fix segfault in multithreaded ApplySplitSparseData()

When there are more threads than rows in rowset, some threads end up
with empty ranges, causing them to crash. (iend - 1 needs to be
accessible as part of algorithm)

Fix: run only those threads with nonempty ranges.

* Add regression test for Bugfix 1

* Moving python_omp_test to existing python test group

Turns out you don't need to set "OMP_NUM_THREADS" to enable
multithreading. Just add nthread parameter.

* Bugfix 2: Fix corner case of ApplySplitSparseData() for categorical feature

When split value is less than all cut points, split_cond is set
incorrectly.

Fix: set split_cond = -1 to indicate this scenario

* Bugfix 3: Initialize data layout indicator before using it

data_layout_ is accessed before being set; this variable determines
whether feature 0 is included in feat_set.

Fix: re-order code in InitData() to initialize data_layout_ first

* Adding regression test for Bugfix 2

Unfortunately, no regression test for Bugfix 3, as there is no
way to deterministically assign value to an uninitialized variable.

This commit is contained in:

Philip Cho

2017-04-02 11:37:40 -07:00

committed by

Tianqi Chen

parent ed5e75de2f

commit 2715baef64

3 changed files with 113 additions and 67 deletions

									
										1

include/xgboost/base.h
									
												View File
												
				@@ -64,6 +64,7 @@ namespace xgboost {

				 *  used for feature index and row index.

				 */

				typedef uint32_t bst_uint;

				typedef int32_t bst_int;

				/*! \brief long integers */

				typedef uint64_t bst_ulong;  // NOLINT(*)

				/*! \brief float type, used for storing statistics */

Fix bugs in multithreaded ApplySplitSparseData() (#2161)

1 include/xgboost/base.h Unescape Escape View File

1

include/xgboost/base.h

View File