* Initial performance optimizations for xgboost
* remove includes
* revert float->double
* fix for CI
* fix for CI
* fix for CI
* fix for CI
* fix for CI
* fix for CI
* fix for CI
* fix for CI
* fix for CI
* fix for CI
* Check existence of _mm_prefetch and __builtin_prefetch
* Fix lint