[R package] GPU support (#2732)

* [R] MSVC compatibility * [GPU] allow seed in BernoulliRng up to size_t and scale to uint32_t * R package build with cmake and CUDA * R package CUDA build fixes and cleanups * always export the R package native initialization routine on windows * update the install instructions doc * fix lint * use static_cast directly to set BernoulliRng seed * [R] demo for GPU accelerated algorithm * tidy up the R package cmake stuff * R pack cmake: installs main dependency packages if needed * [R] version bump in DESCRIPTION * update NEWS * added short missing/sparse values explanations to FAQ
2017-09-28 18:15:28 -05:00
parent 5c9f01d0a9
commit 74db9757b3
14 changed files with 394 additions and 30 deletions
--- a/doc/build.md
+++ b/doc/build.md
@@ -134,9 +134,9 @@ Other versions of Visual Studio may work but are untested.

 ### Building with GPU support

-XGBoost can be built with GPU support for both Linux and Windows using cmake. GPU support works with the Python package as well as the CLI version. The R package is not yet supported.
+XGBoost can be built with GPU support for both Linux and Windows using cmake. GPU support works with the Python package as well as the CLI version. See [Installing R package with GPU support](#installing-r-package-with-gpu-support) for special instructions for R.

-An up-to-date version of the cuda toolkit is required.
+An up-to-date version of the CUDA toolkit is required.

 From the command line on Linux starting from the xgboost directory:

@@ -146,7 +146,9 @@ $ cd build
 $ cmake .. -DUSE_CUDA=ON
 $ make -j
 ```
-On Windows using cmake, see what options for Generators you have for cmake, and choose one with [arch] replaced by Win64:
+**Windows requirements** for GPU build: only Visual C++ 2015 or 2013 with CUDA v8.0 were fully tested. Either install Visual C++ 2015 Build Tools separately, or as a part of Visual Studio 2015. If you already have Visual Studio 2017, the Visual C++ 2015 Toolchain componenet has to be installed using the VS 2017 Installer. Likely, you would need to use the VS2015 x64 Native Tools command prompt to run the cmake commands given below. In some situations, however, things run just fine from MSYS2 bash command line. 
+
+On Windows, using cmake, see what options for Generators you have for cmake, and choose one with [arch] replaced by Win64:
 ```bash
 cmake -help
 ```
@@ -156,9 +158,15 @@ $ mkdir build
 $ cd build
 $ cmake .. -G"Visual Studio 14 2015 Win64" -DUSE_CUDA=ON
 ```
-Cmake will create an xgboost.sln solution file in the build directory. Build this solution in release mode as a x64 build.
+To speed up compilation, compute version specific to your GPU could be passed to cmake as, e.g., `-DGPU_COMPUTE_VER=50`.
+The above cmake configuration run will create an xgboost.sln solution file in the build directory. Build this solution in release mode as a x64 build, either from Visual studio or from command line:
+```
+cmake --build . --target xgboost --config Release
+```
+If build seems to use only a single process, you might try to append an option like ` -- /m:6` to the above command.

 ### Windows Binaries
+
 Unofficial windows binaries and instructions on how to use them are hosted on [Guido Tapia's blog](http://www.picnet.com.au/blogs/guido/post/2016/09/22/xgboost-windows-x64-binaries-for-download/)

 ### Customized Building
@@ -273,8 +281,42 @@ setwd('wherever/you/cloned/it/xgboost/R-package/')
 install.packages('.', repos = NULL, type="source")
 ```

+The package could also be built and installed with cmake (and Visual C++ 2015 on Windows) using instructions from the next section, but without GPU support (omit the `-DUSE_CUDA=ON` cmake parameter).
+
 If all fails, try [building the shared library](#build-the-shared-library) to see whether a problem is specific to R package or not.

+### Installing R package with GPU support
+
+The procedure and requirements are similar as in [Building with GPU support](#building-with-gpu-support), so make sure to read it first.
+
+On Linux, starting from the xgboost directory:
+
+```bash
+mkdir build
+cd build
+cmake .. -DUSE_CUDA=ON -DR_LIB=ON
+make install -j
+```
+When default target is used, an R package shared library would be built in the `build` area.
+The `install` target, in addition, assembles the package files with this shared library under `build/R-package`, and runs `R CMD INSTALL`.
+
+On Windows, cmake with Visual C++ Build Tools (or Visual Studio) has to be used to build an R package with GPU support. Rtools must also be installed (perhaps, some other MinGW distributions with `gendef.exe` and `dlltool.exe` would work, but that was not tested).
+```bash
+mkdir build
+cd build
+cmake .. -G"Visual Studio 14 2015 Win64" -DUSE_CUDA=ON -DR_LIB=ON
+cmake --build . --target install --config Release
+```
+When `--target xgboost` is used, an R package dll would be built under `build/Release`.
+The `--target install`, in addition, assembles the package files with this dll under `build/R-package`, and runs `R CMD INSTALL`.
+
+If cmake can't find your R during the configuration step, you might provide the location of its executable to cmake like this: `-DLIBR_EXECUTABLE="C:/Program Files/R/R-3.4.1/bin/x64/R.exe"`.
+
+If on Windows you get a "permission denied" error when trying to write to ...Program Files/R/... during the package installation, create a `.Rprofile` file in your personal home directory (if you don't already have one in there), and add a line to it which specifies the location of your R packages user library, like the following:
+```r
+.libPaths( unique(c("C:/Users/USERNAME/Documents/R/win-library/3.4", .libPaths())))
+```
+You might find the exact location by running `.libPaths()` in R GUI or RStudio.

 ## Trouble Shooting

--- a/doc/faq.md
+++ b/doc/faq.md
@@ -57,9 +57,17 @@ Yes, xgboost implements LambdaMART. Checkout the objective section in [parameter
 How to deal with Missing Value
 ------------------------------
 xgboost supports missing value by default.
+In tree algorithms, branch directions for missing values are learned during training.
+Note that the gblinear booster treats missing values as zeros.


 Slightly different result between runs
 --------------------------------------
 This could happen, due to non-determinism in floating point summation order and multi-threading.
 Though the general accuracy will usually remain the same.
+
+
+Why do I see different results with sparse and dense data?
+--------------------------------------------------------
+"Sparse" elements are treated as if they were "missing" by the tree booster, and as zeros by the linear booster.
+For tree models, it is important to use consistent data formats during training and scoring.