Fix bugs in multithreaded ApplySplitSparseData() (#2161)
* Bugfix 1: Fix segfault in multithreaded ApplySplitSparseData() When there are more threads than rows in rowset, some threads end up with empty ranges, causing them to crash. (iend - 1 needs to be accessible as part of algorithm) Fix: run only those threads with nonempty ranges. * Add regression test for Bugfix 1 * Moving python_omp_test to existing python test group Turns out you don't need to set "OMP_NUM_THREADS" to enable multithreading. Just add nthread parameter. * Bugfix 2: Fix corner case of ApplySplitSparseData() for categorical feature When split value is less than all cut points, split_cond is set incorrectly. Fix: set split_cond = -1 to indicate this scenario * Bugfix 3: Initialize data layout indicator before using it data_layout_ is accessed before being set; this variable determines whether feature 0 is included in feat_set. Fix: re-order code in InitData() to initialize data_layout_ first * Adding regression test for Bugfix 2 Unfortunately, no regression test for Bugfix 3, as there is no way to deterministically assign value to an uninitialized variable.
This commit is contained in:
@@ -64,6 +64,7 @@ namespace xgboost {
|
||||
* used for feature index and row index.
|
||||
*/
|
||||
typedef uint32_t bst_uint;
|
||||
typedef int32_t bst_int;
|
||||
/*! \brief long integers */
|
||||
typedef uint64_t bst_ulong; // NOLINT(*)
|
||||
/*! \brief float type, used for storing statistics */
|
||||
|
||||
Reference in New Issue
Block a user