Reported in issue #2165. Dynamic scheduling of OpenMP loops involve
implicit synchronization. To implement synchronization, libgomp uses futex
(fast userspace mutex), whereas MinGW uses kernel-space mutex, which is more
costly. With chunk size of 1, synchronization overhead may become prohibitive
on Windows machines.
Solution: use 'guided' schedule to minimize the number of syncs