Release version 0.72 (#3337 )

Fix print.xgb.Booster for R (#3338 )
* Fix print.xgb.Booster valid_handle should be TRUE when x$handle is NOT null * Update xgb.Booster.R Modify is.null.handle to return TRUE for NULL handle
2018-06-01 16:00:31 -07:00 · 2018-05-29 11:44:55 -07:00 · 2018-05-29 11:40:49 -07:00 · 2018-05-28 10:30:21 -07:00 · 2018-05-28 10:26:29 -07:00 · 2018-05-28 10:21:10 -07:00
149 changed files with 4870 additions and 3832 deletions
--- a/.clang-tidy
+++ b/.clang-tidy
@@ -0,0 +1,22 @@
+Checks: 'modernize-*,-modernize-make-*,-modernize-raw-string-literal,google-*,-google-default-arguments,-clang-diagnostic-#pragma-messages,readability-identifier-naming'
+CheckOptions:
+  - { key: readability-identifier-naming.ClassCase,           value: CamelCase  }
+  - { key: readability-identifier-naming.StructCase,          value: CamelCase  }
+  - { key: readability-identifier-naming.TypeAliasCase,          value: CamelCase  }
+  - { key: readability-identifier-naming.TypedefCase,          value: CamelCase  }
+  - { key: readability-identifier-naming.TypeTemplateParameterCase,          value: CamelCase  }
+  - { key: readability-identifier-naming.LocalVariableCase,          value: lower_case  }
+  - { key: readability-identifier-naming.MemberCase,          value: lower_case  }
+  - { key: readability-identifier-naming.PrivateMemberSuffix,           value: '_'  }
+  - { key: readability-identifier-naming.ProtectedMemberSuffix,           value: '_'  }
+  - { key: readability-identifier-naming.EnumCase,          value: CamelCase  }
+  - { key: readability-identifier-naming.EnumConstant,          value: CamelCase  }
+  - { key: readability-identifier-naming.EnumConstantPrefix,          value: k  }
+  - { key: readability-identifier-naming.GlobalConstantCase,          value: CamelCase  }
+  - { key: readability-identifier-naming.GlobalConstantPrefix,          value: k  }
+  - { key: readability-identifier-naming.StaticConstantCase,          value: CamelCase  }
+  - { key: readability-identifier-naming.StaticConstantPrefix,          value: k  }
+  - { key: readability-identifier-naming.ConstexprVariableCase,          value: CamelCase  }
+  - { key: readability-identifier-naming.ConstexprVariablePrefix,          value: k  }
+  - { key: readability-identifier-naming.FunctionCase,          value: CamelCase  }
+  - { key: readability-identifier-naming.NamespaceCase,       value: lower_case }
--- a/.gitignore
+++ b/.gitignore
@@ -15,7 +15,6 @@
 *.Rcheck
 *.rds
 *.tar.gz
-#*txt*
 *conf
 *buffer
 *model
@@ -47,13 +46,12 @@ Debug
 *.cpage.col
 *.cpage
 *.Rproj
-./xgboost
 ./xgboost.mpi
 ./xgboost.mock
 #.Rbuildignore
 R-package.Rproj
 *.cache*
-#java
+# java
 java/xgboost4j/target
 java/xgboost4j/tmp
 java/xgboost4j-demo/target
@@ -68,10 +66,9 @@ nb-configuration*
 .settings/
 build
 config.mk
-xgboost
+/xgboost
 *.data
 build_plugin
-dmlc-core
 .idea
 recommonmark/
 tags
--- a/.travis.yml
+++ b/.travis.yml
@@ -44,10 +44,12 @@ matrix:
 addons:
  apt:
    sources:
+      - llvm-toolchain-trusty-5.0
      - ubuntu-toolchain-r-test
      - george-edison55-precise-backports
    packages:
-      - cmake
+      - clang
+      - clang-tidy-5.0
      - cmake-data
      - doxygen
      - wget
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -14,8 +14,8 @@ option(USE_NCCL "Build using NCCL for multi-GPU. Also requires USE_CUDA")
 option(JVM_BINDINGS "Build JVM bindings" OFF)
 option(GOOGLE_TEST "Build google tests" OFF)
 option(R_LIB "Build shared library for R package" OFF)
-set(GPU_COMPUTE_VER 35;50;52;60;61 CACHE STRING
-  "Space separated list of compute versions to be built against")
+set(GPU_COMPUTE_VER "" CACHE STRING
+  "Space separated list of compute versions to be built against, e.g. '35 61'")

 # Deprecation warning
 if(PLUGIN_UPDATER_GPU)
@@ -106,7 +106,7 @@ endif()

 # dmlc-core
 add_subdirectory(dmlc-core)
-set(LINK_LIBRARIES dmlccore rabit)
+set(LINK_LIBRARIES dmlc rabit)


 if(USE_CUDA)
@@ -122,16 +122,13 @@ if(USE_CUDA)
    add_definitions(-DXGBOOST_USE_NCCL)
  endif()

-  if((CUDA_VERSION_MAJOR EQUAL 9) OR (CUDA_VERSION_MAJOR GREATER 9))
-    message("CUDA 9.0 detected, adding Volta compute capability (7.0).")
-    set(GPU_COMPUTE_VER "${GPU_COMPUTE_VER};70")
-  endif()
-  
  set(GENCODE_FLAGS "")
  format_gencode_flags("${GPU_COMPUTE_VER}" GENCODE_FLAGS)
+  message("cuda architecture flags: ${GENCODE_FLAGS}")
+
  set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS};--expt-extended-lambda;--expt-relaxed-constexpr;${GENCODE_FLAGS};-lineinfo;")
  if(NOT MSVC)
-    set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS};-Xcompiler -fPIC; -std=c++11")
+    set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS};-Xcompiler -fPIC; -Xcompiler -Werror; -std=c++11")
  endif()

  if(USE_NCCL)
--- a/CONTRIBUTORS.md
+++ b/CONTRIBUTORS.md
@@ -7,8 +7,8 @@ Committers
 Committers are people who have made substantial contribution to the project and granted write access to the project.
 * [Tianqi Chen](https://github.com/tqchen), University of Washington
  - Tianqi is a PhD working on large-scale machine learning, he is the creator of the project.
-* [Tong He](https://github.com/hetong007), Simon Fraser University
-  - Tong is a master student working on data mining, he is the maintainer of xgboost R package.
+* [Tong He](https://github.com/hetong007), Amazon AI
+  - Tong is an applied scientist in Amazon AI, he is the maintainer of xgboost R package.
 * [Vadim Khotilovich](https://github.com/khotilov)
  - Vadim contributes many improvements in R and core packages.
 * [Bing Xu](https://github.com/antinucleon)
@@ -54,7 +54,8 @@ List of Contributors
 * [Masaaki Horikoshi](https://github.com/sinhrks)
  - Masaaki is the initial creator of xgboost python plotting module.
 * [Hongliang Liu](https://github.com/phunterlau)
-  - Hongliang is the maintainer of xgboost python PyPI package for pip installation.
+* [Hyunsu Cho](http://hyunsu-cho.io/)
+  - Hyunsu is the maintainer of the XGBoost Python package. He is in charge of submitting the Python package to Python Package Index (PyPI). He is also the initial author of the CPU 'hist' updater.
 * [daiyl0320](https://github.com/daiyl0320)
  - daiyl0320 contributed patch to xgboost distributed version more robust, and scales stably on TB scale datasets.
 * [Huayi Zhang](https://github.com/irachex)
--- a/60
+++ b/60
@@ -7,9 +7,9 @@
 dockerRun = 'tests/ci_build/ci_build.sh'

 def buildMatrix = [
-    [ "enabled": true,  "os" : "linux", "withGpu": true,  "withOmp": true, "pythonVersion": "2.7" ],
-    [ "enabled": true,  "os" : "linux", "withGpu": false, "withOmp": true, "pythonVersion": "2.7" ],
-    [ "enabled": false, "os" : "osx",   "withGpu": false, "withOmp": false, "pythonVersion": "2.7" ],
+    [ "enabled": true,  "os" : "linux", "withGpu": true, "withNccl": true,  "withOmp": true, "pythonVersion": "2.7", "cudaVersion": "9.1" ],
+    [ "enabled": true,  "os" : "linux", "withGpu": true, "withNccl": true,  "withOmp": true, "pythonVersion": "2.7", "cudaVersion": "8.0" ],
+    [ "enabled": false,  "os" : "linux", "withGpu": false, "withNccl": false, "withOmp": true, "pythonVersion": "2.7", "cudaVersion": ""  ],
 ]

 pipeline {
@@ -69,8 +69,7 @@ def buildFactory(buildName, conf) {
    def os = conf["os"]
    def nodeReq = conf["withGpu"] ? "${os} && gpu" : "${os}"
    def dockerTarget = conf["withGpu"] ? "gpu" : "cpu"
-    [ ("cmake_${buildName}") : { buildPlatformCmake("cmake_${buildName}", conf, nodeReq, dockerTarget) },
-      ("make_${buildName}") : { buildPlatformMake("make_${buildName}", conf, nodeReq, dockerTarget) }
+    [ ("${buildName}") : { buildPlatformCmake("${buildName}", conf, nodeReq, dockerTarget) }
    ]
 }

@@ -81,6 +80,10 @@ def buildPlatformCmake(buildName, conf, nodeReq, dockerTarget) {
    def opts = cmakeOptions(conf)
    // Destination dir for artifacts
    def distDir = "dist/${buildName}"
+    def dockerArgs = ""
+    if(conf["withGpu"]){
+        dockerArgs = "--build-arg CUDA_VERSION=" + conf["cudaVersion"]
+    }
    // Build node - this is returned result
    node(nodeReq) {
        unstash name: 'srcs'
@@ -92,58 +95,33 @@ def buildPlatformCmake(buildName, conf, nodeReq, dockerTarget) {
        """.stripMargin('|')
        // Invoke command inside docker
        sh """
-        ${dockerRun} ${dockerTarget} tests/ci_build/build_via_cmake.sh ${opts}
-        ${dockerRun} ${dockerTarget} tests/ci_build/test_${dockerTarget}.sh
-        ${dockerRun} ${dockerTarget} bash -c "cd python-package; python setup.py bdist_wheel"
+        ${dockerRun} ${dockerTarget} ${dockerArgs} tests/ci_build/build_via_cmake.sh ${opts}
+        ${dockerRun} ${dockerTarget} ${dockerArgs} tests/ci_build/test_${dockerTarget}.sh
+        ${dockerRun} ${dockerTarget} ${dockerArgs} bash -c "cd python-package; python setup.py bdist_wheel"
        rm -rf "${distDir}"; mkdir -p "${distDir}/py"
        cp xgboost "${distDir}"
        cp -r lib "${distDir}"
        cp -r python-package/dist "${distDir}/py"
+        # Test the wheel for compatibility on a barebones CPU container
+        ${dockerRun} release ${dockerArgs} bash -c " \
+            auditwheel show xgboost-*-py2-none-any.whl
+            pip install --user python-package/dist/xgboost-*-py2-none-any.whl && \
+            python -m nose tests/python"
        """
        archiveArtifacts artifacts: "${distDir}/**/*.*", allowEmptyArchive: true
    }
 }

-/**
- * Build platform via make
- */
-def buildPlatformMake(buildName, conf, nodeReq, dockerTarget) {
-    def opts = makeOptions(conf)
-    // Destination dir for artifacts
-    def distDir = "dist/${buildName}"
-    // Build node
-    node(nodeReq) {
-        unstash name: 'srcs'
-        echo """
-        |===== XGBoost Make build =====
-        |  dockerTarget: ${dockerTarget}
-        |  makeOpts    : ${opts}
-        |=========================
-        """.stripMargin('|')
-        // Invoke command inside docker
-        sh """
-        ${dockerRun} ${dockerTarget} tests/ci_build/build_via_make.sh ${opts}
-        """
-    }
-}
-
-def makeOptions(conf) {
-    return ([
-        conf["withGpu"] ? 'PLUGIN_UPDATER_GPU=ON' : 'PLUGIN_UPDATER_GPU=OFF',
-        conf["withOmp"] ? 'USE_OPENMP=1' : 'USE_OPENMP=0']
-        ).join(" ")
-}
-
-
 def cmakeOptions(conf) {
    return ([
-        conf["withGpu"] ? '-DPLUGIN_UPDATER_GPU:BOOL=ON' : '',
+        conf["withGpu"] ? '-DUSE_CUDA=ON' : '-DUSE_CUDA=OFF',
+        conf["withNccl"] ? '-DUSE_NCCL=ON' : '-DUSE_NCCL=OFF',
        conf["withOmp"] ? '-DOPEN_MP:BOOL=ON' : '']
        ).join(" ")
 }

 def getBuildName(conf) {
-    def gpuLabel = conf['withGpu'] ? "_gpu" : "_cpu"
+    def gpuLabel = conf['withGpu'] ? "_cuda" + conf['cudaVersion'] : "_cpu"
    def ompLabel = conf['withOmp'] ? "_omp" : ""
    def pyLabel = "_py${conf['pythonVersion']}"
    return "${conf['os']}${gpuLabel}${ompLabel}${pyLabel}"
--- a/4
+++ b/4
@@ -261,13 +261,15 @@ Rpack: clean_all
 	cat R-package/src/Makevars.in|sed '2s/.*/PKGROOT=./' | sed '3s/.*/ENABLE_STD_THREAD=0/' > xgboost/src/Makevars.in
 	cp xgboost/src/Makevars.in xgboost/src/Makevars.win
 	sed -i -e 's/@OPENMP_CXXFLAGS@/$$\(SHLIB_OPENMP_CFLAGS\)/g' xgboost/src/Makevars.win
+	bash R-package/remove_warning_suppression_pragma.sh
+	rm xgboost/remove_warning_suppression_pragma.sh

 Rbuild: Rpack
 	R CMD build --no-build-vignettes xgboost
 	rm -rf xgboost

 Rcheck: Rbuild
-	R CMD check  xgboost*.tar.gz
+	R CMD check xgboost*.tar.gz

 -include build/*.d
 -include build/*/*.d
--- a/NEWS.md
+++ b/NEWS.md
@@ -3,6 +3,36 @@ XGBoost Change Log

 This file records the changes in xgboost library in reverse chronological order.

+## v0.72 (2018.06.01)
+* Starting with this release, we plan to make a new release every two months. See #3252 for more details.
+* Fix a pathological behavior (near-zero second-order gradients) in multiclass objective (#3304)
+* Tree dumps now use high precision in storing floating-point values (#3298)
+* Submodules `rabit` and `dmlc-core` have been brought up to date, bringing bug fixes (#3330, #3221).
+* GPU support
+  - Continuous integration tests for GPU code (#3294, #3309)
+  - GPU accelerated coordinate descent algorithm (#3178)
+  - Abstract 1D vector class now works with multiple GPUs (#3287)
+  - Generate PTX code for most recent architecture (#3316)
+  - Fix a memory bug on NVIDIA K80 cards (#3293)
+  - Address performance instability for single-GPU, multi-core machines (#3324)
+* Python package
+  - FreeBSD support (#3247)
+  - Validation of feature names in `Booster.predict()` is now optional (#3323)
+* Updated Sklearn API
+  - Validation sets now support instance weights (#2354)
+  - `XGBClassifier.predict_proba()` should not support `output_margin` option. (#3343) See BREAKING CHANGES below.
+* R package:
+  - Better handling of NULL in `print.xgb.Booster()` (#3338)
+  - Comply with CRAN policy by removing compiler warning suppression (#3329)
+  - Updated CRAN submission
+* JVM packages
+  - JVM packages will now use the same versioning scheme as other packages (#3253)
+  - Update Spark to 2.3 (#3254)
+  - Add scripts to cross-build and deploy artifacts (#3276, #3307)
+  - Fix a compilation error for Scala 2.10 (#3332)
+* BREAKING CHANGES
+  - `XGBClassifier.predict_proba()` no longer accepts paramter `output_margin`. The paramater makes no sense for `predict_proba()` because the method is to predict class probabilities, not raw margin scores.
+
 ## v0.71 (2018.04.11)
 * This is a minor release, mainly motivated by issues concerning `pip install`, e.g. #2426, #3189, #3118, and #3194.
  With this release, users of Linux and MacOS will be able to run `pip install` for the most part.
--- a/R-package/DESCRIPTION
+++ b/R-package/DESCRIPTION
@@ -2,7 +2,7 @@ Package: xgboost
 Type: Package
 Title: Extreme Gradient Boosting
 Version: 0.71.1
-Date: 2018-04-11
+Date: 2018-05-11
 Authors@R: c(
  person("Tianqi", "Chen", role = c("aut"),
         email = "tianqi.tchen@gmail.com"),
@@ -14,7 +14,20 @@ Authors@R: c(
         email = "khotilovich@gmail.com"),
  person("Yuan", "Tang", role = c("aut"),
         email = "terrytangyuan@gmail.com",
-         comment = c(ORCID = "0000-0001-5243-233X"))
+         comment = c(ORCID = "0000-0001-5243-233X")),
+  person("Hyunsu", "Cho", role = c("aut"),
+         email = "chohyu01@cs.washington.edu"),
+  person("Kailong", "Chen", role = c("aut")),
+  person("Rory", "Mitchell", role = c("aut")),
+  person("Ignacio", "Cano", role = c("aut")),
+  person("Tianyi", "Zhou", role = c("aut")),
+  person("Mu", "Li", role = c("aut")),
+  person("Junyuan", "Xie", role = c("aut")),
+  person("Min", "Lin", role = c("aut")),
+  person("Yifeng", "Geng", role = c("aut")),
+  person("Yutian", "Li", role = c("aut")),
+  person("XGBoost contributors", role = c("cph"),
+         comment = "base XGBoost implementation")
  )
 Description: Extreme Gradient Boosting, which is an efficient implementation
    of the gradient boosting framework from Chen & Guestrin (2016) <doi:10.1145/2939672.2939785>.
@@ -28,6 +41,7 @@ Description: Extreme Gradient Boosting, which is an efficient implementation
 License: Apache License (== 2.0) | file LICENSE
 URL: https://github.com/dmlc/xgboost
 BugReports: https://github.com/dmlc/xgboost/issues
+NeedsCompilation: yes
 VignetteBuilder: knitr
 Suggests:
    knitr,
--- a/R-package/R/callbacks.R
+++ b/R-package/R/callbacks.R
@@ -691,11 +691,6 @@ cb.gblinear.history <- function(sparse=FALSE) {
 #' For an \code{xgb.cv} result, a list of such matrices is returned with the elements
 #' corresponding to CV folds.
 #'
-#' @examples
-#' \dontrun{
-#' See \code{\link{cv.gblinear.history}}
-#' }
-#'
 #' @export
 xgb.gblinear.history <- function(model, class_index = NULL) {

--- a/R-package/R/xgb.Booster.R
+++ b/R-package/R/xgb.Booster.R
@@ -37,11 +37,14 @@ xgb.handleToBooster <- function(handle, raw = NULL) {
 # Check whether xgb.Booster.handle is null
 # internal utility function
 is.null.handle <- function(handle) {
+  if (is.null(handle)) return(TRUE)
+
  if (!identical(class(handle), "xgb.Booster.handle"))
    stop("argument type must be xgb.Booster.handle")

-  if (is.null(handle) || .Call(XGCheckNullPtr_R, handle))
+  if (.Call(XGCheckNullPtr_R, handle))
    return(TRUE)
+
  return(FALSE)
 }

@@ -537,7 +540,7 @@ xgb.ntree <- function(bst) {
 print.xgb.Booster <- function(x, verbose = FALSE, ...) {
  cat('##### xgb.Booster\n')

-  valid_handle <- is.null.handle(x$handle)
+  valid_handle <- !is.null.handle(x$handle)
  if (!valid_handle)
    cat("Handle is invalid! Suggest using xgb.Booster.complete\n")

--- a/R-package/R/xgb.cv.R
+++ b/R-package/R/xgb.cv.R
@@ -83,7 +83,7 @@
 #'   \item \code{params} parameters that were passed to the xgboost library. Note that it does not 
 #'         capture parameters changed by the \code{\link{cb.reset.parameters}} callback.
 #'   \item \code{callbacks} callback functions that were either automatically assigned or 
-#'         explicitely passed.
+#'         explicitly passed.
 #'   \item \code{evaluation_log} evaluation history storead as a \code{data.table} with the
 #'         first column corresponding to iteration number and the rest corresponding to the 
 #'         CV-based evaluation means and standard deviations for the training and test CV-sets.
--- a/R-package/R/xgb.dump.R
+++ b/R-package/R/xgb.dump.R
@@ -30,7 +30,8 @@
 #' bst <- xgboost(data = train$data, label = train$label, max_depth = 2, 
 #'                eta = 1, nthread = 2, nrounds = 2, objective = "binary:logistic")
 #' # save the model in file 'xgb.model.dump'
-#' xgb.dump(bst, 'xgb.model.dump', with_stats = TRUE)
+#' dump_path = file.path(tempdir(), 'model.dump')
+#' xgb.dump(bst, dump_path, with_stats = TRUE)
 #' 
 #' # print the model without saving it to a file
 #' print(xgb.dump(bst, with_stats = TRUE))
--- a/R-package/README.md
+++ b/R-package/README.md
@@ -30,4 +30,4 @@ Examples
 Development
 -----------

-* See the [R Package section](https://xgboost.readthedocs.io/en/latest/how_to/contribute.html#r-package) of the contributiors guide.
+* See the [R Package section](https://xgboost.readthedocs.io/en/latest/how_to/contribute.html#r-package) of the contributors guide.
--- a/R-package/demo/basic_walkthrough.R
+++ b/R-package/demo/basic_walkthrough.R
@@ -99,7 +99,8 @@ err <- as.numeric(sum(as.integer(pred > 0.5) != label))/length(label)
 print(paste("test-error=", err))

 # You can dump the tree you learned using xgb.dump into a text file
-xgb.dump(bst, "dump.raw.txt", with_stats = T)
+dump_path = file.path(tempdir(), 'dump.raw.txt')
+xgb.dump(bst, dump_path, with_stats = T)

 # Finally, you can check which features are the most important.
 print("Most important features (look at column Gain):")
--- a/R-package/man/xgb.cv.Rd
+++ b/R-package/man/xgb.cv.Rd
@@ -99,7 +99,7 @@ An object of class \code{xgb.cv.synchronous} with the following elements:
  \item \code{params} parameters that were passed to the xgboost library. Note that it does not 
        capture parameters changed by the \code{\link{cb.reset.parameters}} callback.
  \item \code{callbacks} callback functions that were either automatically assigned or 
-        explicitely passed.
+        explicitly passed.
  \item \code{evaluation_log} evaluation history storead as a \code{data.table} with the
        first column corresponding to iteration number and the rest corresponding to the 
        CV-based evaluation means and standard deviations for the training and test CV-sets.
--- a/R-package/man/xgb.dump.Rd
+++ b/R-package/man/xgb.dump.Rd
@@ -44,7 +44,8 @@ test <- agaricus.test
 bst <- xgboost(data = train$data, label = train$label, max_depth = 2, 
               eta = 1, nthread = 2, nrounds = 2, objective = "binary:logistic")
 # save the model in file 'xgb.model.dump'
-xgb.dump(bst, 'xgb.model.dump', with_stats = TRUE)
+dump.path = file.path(tempdir(), 'model.dump')
+xgb.dump(bst, dump.path, with_stats = TRUE)

 # print the model without saving it to a file
 print(xgb.dump(bst, with_stats = TRUE))
--- a/R-package/man/xgb.gblinear.history.Rd
+++ b/R-package/man/xgb.gblinear.history.Rd
@@ -27,9 +27,3 @@ A helper function to extract the matrix of linear coefficients' history
 from a gblinear model created while using the \code{cb.gblinear.history()}
 callback.
 }
-\examples{
-\dontrun{
-See \\code{\\link{cv.gblinear.history}}
-}
-
-}
--- a/R-package/remove_warning_suppression_pragma.sh
+++ b/R-package/remove_warning_suppression_pragma.sh
@@ -0,0 +1,14 @@
+#!/bin/bash
+# remove all #pragma's that suppress compiler warnings
+set -e
+set -x
+for file in xgboost/src/dmlc-core/include/dmlc/*.h
+do
+  sed -i.bak -e 's/^.*#pragma GCC diagnostic.*$//' -e 's/^.*#pragma clang diagnostic.*$//' -e 's/^.*#pragma warning.*$//' "${file}"
+done
+for file in xgboost/src/dmlc-core/include/dmlc/*.h.bak
+do
+  rm "${file}"
+done
+set +x
+set +e
--- a/R-package/tests/testthat/test_helpers.R
+++ b/R-package/tests/testthat/test_helpers.R
@@ -42,9 +42,10 @@ mbst.GLM <- xgboost(data = as.matrix(iris[, -5]), label = mlabel, verbose = 0,

 test_that("xgb.dump works", {
  expect_length(xgb.dump(bst.Tree), 200)
-  expect_true(xgb.dump(bst.Tree, 'xgb.model.dump', with_stats = T))
-  expect_true(file.exists('xgb.model.dump'))
-  expect_gt(file.size('xgb.model.dump'), 8000)
+  dump_file = file.path(tempdir(), 'xgb.model.dump')
+  expect_true(xgb.dump(bst.Tree, dump_file, with_stats = T))
+  expect_true(file.exists(dump_file))
+  expect_gt(file.size(dump_file), 8000)

  # JSON format
  dmp <- xgb.dump(bst.Tree, dump_format = "json")
--- a/amalgamation/dmlc-minimum0.cc
+++ b/amalgamation/dmlc-minimum0.cc
@@ -7,6 +7,8 @@
 #include "../dmlc-core/src/io/recordio_split.cc"
 #include "../dmlc-core/src/io/input_split_base.cc"
 #include "../dmlc-core/src/io/local_filesys.cc"
+#include "../dmlc-core/src/io/filesys.cc"
+#include "../dmlc-core/src/io/indexed_recordio_split.cc"
 #include "../dmlc-core/src/data.cc"
 #include "../dmlc-core/src/io.cc"
 #include "../dmlc-core/src/recordio.cc"
--- a/appveyor.yml
+++ b/appveyor.yml
@@ -53,7 +53,7 @@ install:
          Import-Module "$Env:TEMP\appveyor-tool.ps1"
          Bootstrap
          $DEPS = "c('data.table','magrittr','stringi','ggplot2','DiagrammeR','Ckmeans.1d.dp','vcd','testthat','igraph','knitr','rmarkdown')"
-          cmd.exe /c "R.exe -q -e ""install.packages($DEPS, repos='$CRAN', type='win.binary')"" 2>&1"
+          cmd.exe /c "R.exe -q -e ""install.packages($DEPS, repos='$CRAN', type='both')"" 2>&1"
        }

 build_script:
--- a/build.sh
+++ b/build.sh
@@ -15,25 +15,21 @@ else

    if [[ ! -e ./rabit/Makefile ]]; then
        echo ""
-        echo "Please clone the rabit repository into this directory."
-        echo "Here are the commands:"
-        echo "rm -rf rabit"
-        echo "git clone https://github.com/dmlc/rabit.git rabit"
+        echo "Please init the rabit submodule:"
+        echo "git submodule update --init --recursive -- rabit"
        not_ready=1
    fi
    
    if [[ ! -e ./dmlc-core/Makefile ]]; then
        echo ""
-        echo "Please clone the dmlc-core repository into this directory."
-        echo "Here are the commands:"
-        echo "rm -rf dmlc-core"
-        echo "git clone https://github.com/dmlc/dmlc-core.git dmlc-core"
+        echo "Please init the dmlc-core submodule:"
+        echo "git submodule update --init --recursive -- dmlc-core"
        not_ready=1
    fi

    if [[ "${not_ready}" == "1" ]]; then
        echo ""
-        echo "Please fix the errors above and retry the build or reclone the repository with:"
+        echo "Please fix the errors above and retry the build, or reclone the repository with:"
        echo "git clone --recursive https://github.com/dmlc/xgboost.git"
        echo ""
        exit 1
--- a/cmake/Utils.cmake
+++ b/cmake/Utils.cmake
@@ -54,10 +54,25 @@ function(set_default_configuration_release)
 	endif()
 endfunction(set_default_configuration_release)

+# Generate nvcc compiler flags given a list of architectures
+# Also generates PTX for the most recent architecture for forwards compatibility
 function(format_gencode_flags flags out)
+  # Set up architecture flags
+  if(NOT flags) 
+    if((CUDA_VERSION_MAJOR EQUAL 9) OR (CUDA_VERSION_MAJOR GREATER 9))
+      set(flags "35;50;52;60;61;70")
+    else()
+      set(flags "35;50;52;60;61")
+    endif()
+  endif()
+  # Generate SASS
  foreach(ver ${flags})
    set(${out} "${${out}}-gencode arch=compute_${ver},code=sm_${ver};")
  endforeach()
+  # Generate PTX for last architecture
+  list(GET flags -1 ver)
+  set(${out} "${${out}}-gencode arch=compute_${ver},code=compute_${ver};")
+  
  set(${out} "${${out}}" PARENT_SCOPE)
 endfunction(format_gencode_flags flags)

--- a/2
+++ b/2
--- a/include/xgboost/base.h
+++ b/include/xgboost/base.h
@@ -81,20 +81,19 @@ namespace xgboost {
 * \brief unsigned integer type used in boost,
 *  used for feature index and row index.
 */
-typedef uint32_t bst_uint;
-typedef int32_t bst_int;
+using bst_uint = uint32_t;  // NOLINT
+using bst_int = int32_t;    // NOLINT
 /*! \brief long integers */
 typedef uint64_t bst_ulong;  // NOLINT(*)
 /*! \brief float type, used for storing statistics */
-typedef float bst_float;
-
+using bst_float = float;  // NOLINT

 namespace detail {
 /*! \brief Implementation of gradient statistics pair. Template specialisation
 * may be used to overload different gradients types e.g. low precision, high
 * precision, integer, floating point. */
 template <typename T>
-class bst_gpair_internal {
+class GradientPairInternal {
  /*! \brief gradient statistics */
  T grad_;
  /*! \brief second order gradient statistics */
@@ -104,23 +103,23 @@ class bst_gpair_internal {
  XGBOOST_DEVICE void SetHess(float h) { hess_ = h; }

 public:
-  typedef T value_t;
+  using ValueT = T;

-  XGBOOST_DEVICE bst_gpair_internal() : grad_(0), hess_(0) {}
+  XGBOOST_DEVICE GradientPairInternal() : grad_(0), hess_(0) {}

-  XGBOOST_DEVICE bst_gpair_internal(float grad, float hess) {
+  XGBOOST_DEVICE GradientPairInternal(float grad, float hess) {
    SetGrad(grad);
    SetHess(hess);
  }

  // Copy constructor if of same value type
-  XGBOOST_DEVICE bst_gpair_internal(const bst_gpair_internal<T> &g)
-      : grad_(g.grad_), hess_(g.hess_) {}
+  XGBOOST_DEVICE GradientPairInternal(const GradientPairInternal<T> &g)
+      : grad_(g.grad_), hess_(g.hess_) {}  // NOLINT

  // Copy constructor if different value type - use getters and setters to
  // perform conversion
  template <typename T2>
-  XGBOOST_DEVICE bst_gpair_internal(const bst_gpair_internal<T2> &g) {
+  XGBOOST_DEVICE explicit GradientPairInternal(const GradientPairInternal<T2> &g) {
    SetGrad(g.GetGrad());
    SetHess(g.GetHess());
  }
@@ -128,85 +127,85 @@ class bst_gpair_internal {
  XGBOOST_DEVICE float GetGrad() const { return grad_; }
  XGBOOST_DEVICE float GetHess() const { return hess_; }

-  XGBOOST_DEVICE bst_gpair_internal<T> &operator+=(
-      const bst_gpair_internal<T> &rhs) {
+  XGBOOST_DEVICE GradientPairInternal<T> &operator+=(
+      const GradientPairInternal<T> &rhs) {
    grad_ += rhs.grad_;
    hess_ += rhs.hess_;
    return *this;
  }

-  XGBOOST_DEVICE bst_gpair_internal<T> operator+(
-      const bst_gpair_internal<T> &rhs) const {
-    bst_gpair_internal<T> g;
+  XGBOOST_DEVICE GradientPairInternal<T> operator+(
+      const GradientPairInternal<T> &rhs) const {
+    GradientPairInternal<T> g;
    g.grad_ = grad_ + rhs.grad_;
    g.hess_ = hess_ + rhs.hess_;
    return g;
  }

-  XGBOOST_DEVICE bst_gpair_internal<T> &operator-=(
-      const bst_gpair_internal<T> &rhs) {
+  XGBOOST_DEVICE GradientPairInternal<T> &operator-=(
+      const GradientPairInternal<T> &rhs) {
    grad_ -= rhs.grad_;
    hess_ -= rhs.hess_;
    return *this;
  }

-  XGBOOST_DEVICE bst_gpair_internal<T> operator-(
-      const bst_gpair_internal<T> &rhs) const {
-    bst_gpair_internal<T> g;
+  XGBOOST_DEVICE GradientPairInternal<T> operator-(
+      const GradientPairInternal<T> &rhs) const {
+    GradientPairInternal<T> g;
    g.grad_ = grad_ - rhs.grad_;
    g.hess_ = hess_ - rhs.hess_;
    return g;
  }

-  XGBOOST_DEVICE bst_gpair_internal(int value) {
-    *this = bst_gpair_internal<T>(static_cast<float>(value),
+  XGBOOST_DEVICE explicit GradientPairInternal(int value) {
+    *this = GradientPairInternal<T>(static_cast<float>(value),
                                  static_cast<float>(value));
  }

  friend std::ostream &operator<<(std::ostream &os,
-                                  const bst_gpair_internal<T> &g) {
+                                  const GradientPairInternal<T> &g) {
    os << g.GetGrad() << "/" << g.GetHess();
    return os;
  }
 };

 template<>
-inline XGBOOST_DEVICE float bst_gpair_internal<int64_t>::GetGrad() const {
+inline XGBOOST_DEVICE float GradientPairInternal<int64_t>::GetGrad() const {
  return grad_ * 1e-4f;
 }
 template<>
-inline XGBOOST_DEVICE float bst_gpair_internal<int64_t>::GetHess() const {
+inline XGBOOST_DEVICE float GradientPairInternal<int64_t>::GetHess() const {
  return hess_ * 1e-4f;
 }
 template<>
-inline XGBOOST_DEVICE void bst_gpair_internal<int64_t>::SetGrad(float g) {
+inline XGBOOST_DEVICE void GradientPairInternal<int64_t>::SetGrad(float g) {
  grad_ = static_cast<int64_t>(std::round(g * 1e4));
 }
 template<>
-inline XGBOOST_DEVICE void bst_gpair_internal<int64_t>::SetHess(float h) {
+inline XGBOOST_DEVICE void GradientPairInternal<int64_t>::SetHess(float h) {
  hess_ = static_cast<int64_t>(std::round(h * 1e4));
 }

 }  // namespace detail

 /*! \brief gradient statistics pair usually needed in gradient boosting */
-typedef detail::bst_gpair_internal<float> bst_gpair;
+using GradientPair = detail::GradientPairInternal<float>;

 /*! \brief High precision gradient statistics pair */
-typedef detail::bst_gpair_internal<double> bst_gpair_precise;
+using GradientPairPrecise = detail::GradientPairInternal<double>;

 /*! \brief High precision gradient statistics pair with integer backed
 * storage. Operators are associative where floating point versions are not
 * associative. */
-typedef detail::bst_gpair_internal<int64_t> bst_gpair_integer;
+using GradientPairInteger = detail::GradientPairInternal<int64_t>;

 /*! \brief small eps gap for minimum split decision. */
-const bst_float rt_eps = 1e-6f;
+const bst_float kRtEps = 1e-6f;

 /*! \brief define unsigned long for openmp loop */
-typedef dmlc::omp_ulong omp_ulong;
+using omp_ulong = dmlc::omp_ulong;  // NOLINT
 /*! \brief define unsigned int for openmp loop */
-typedef dmlc::omp_uint bst_omp_uint;
+using bst_omp_uint = dmlc::omp_uint;  // NOLINT

 /*!
 * \brief define compatible keywords in g++
--- a/include/xgboost/c_api.h
+++ b/include/xgboost/c_api.h
@@ -30,16 +30,16 @@ typedef uint64_t bst_ulong;  // NOLINT(*)


 /*! \brief handle to DMatrix */
-typedef void *DMatrixHandle;
+typedef void *DMatrixHandle;  // NOLINT(*)
 /*! \brief handle to Booster */
-typedef void *BoosterHandle;
+typedef void *BoosterHandle;  // NOLINT(*)
 /*! \brief handle to a data iterator */
-typedef void *DataIterHandle;
+typedef void *DataIterHandle;  // NOLINT(*)
 /*! \brief handle to a internal data holder. */
-typedef void *DataHolderHandle;
+typedef void *DataHolderHandle;  // NOLINT(*)

 /*! \brief Mini batch used in XGBoost Data Iteration */
-typedef struct {
+typedef struct {  // NOLINT(*)
  /*! \brief number of rows in the minibatch */
  size_t size;
  /*! \brief row pointer to the rows in the data */
@@ -66,7 +66,7 @@ typedef struct {
 * \param handle The handle to the callback.
 * \param batch The data content to be set.
 */
-XGB_EXTERN_C typedef int XGBCallbackSetData(
+XGB_EXTERN_C typedef int XGBCallbackSetData(  // NOLINT(*)
    DataHolderHandle handle, XGBoostBatchCSR batch);

 /*!
@@ -80,9 +80,8 @@ XGB_EXTERN_C typedef int XGBCallbackSetData(
 * \param set_function_handle The handle to be passed to set function.
 * \return 0 if we are reaching the end and batch is not returned.
 */
-XGB_EXTERN_C typedef int XGBCallbackDataIterNext(
-    DataIterHandle data_handle,
-    XGBCallbackSetData* set_function,
+XGB_EXTERN_C typedef int XGBCallbackDataIterNext(  // NOLINT(*)
+    DataIterHandle data_handle, XGBCallbackSetData *set_function,
    DataHolderHandle set_function_handle);

 /*!
@@ -216,11 +215,9 @@ XGB_DLL int XGDMatrixCreateFromMat(const float *data,
 * \param nthread number of threads (up to maximum cores available, if <=0 use all cores)
 * \return 0 when success, -1 when failure happens
 */
-XGB_DLL int XGDMatrixCreateFromMat_omp(const float *data,
-                                       bst_ulong nrow,
-                                       bst_ulong ncol,
-                                       float missing,
-                                       DMatrixHandle *out,
+XGB_DLL int XGDMatrixCreateFromMat_omp(const float *data,  // NOLINT
+                                       bst_ulong nrow, bst_ulong ncol,
+                                       float missing, DMatrixHandle *out,
                                       int nthread);
 /*!
 * \brief create a new dmatrix from sliced content of existing matrix
--- a/include/xgboost/data.h
+++ b/include/xgboost/data.h
@@ -30,44 +30,45 @@ enum DataType {
 /*!
 * \brief Meta information about dataset, always sit in memory.
 */
-struct MetaInfo {
+class MetaInfo {
+ public:
  /*! \brief number of rows in the data */
-  uint64_t num_row;
+  uint64_t num_row_{0};
  /*! \brief number of columns in the data */
-  uint64_t num_col;
+  uint64_t num_col_{0};
  /*! \brief number of nonzero entries in the data */
-  uint64_t num_nonzero;
+  uint64_t num_nonzero_{0};
  /*! \brief label of each instance */
-  std::vector<bst_float> labels;
+  std::vector<bst_float> labels_;
  /*!
   * \brief specified root index of each instance,
   *  can be used for multi task setting
   */
-  std::vector<bst_uint> root_index;
+  std::vector<bst_uint> root_index_;
  /*!
   * \brief the index of begin and end of a group
   *  needed when the learning task is ranking.
   */
-  std::vector<bst_uint> group_ptr;
+  std::vector<bst_uint> group_ptr_;
  /*! \brief weights of each instance, optional */
-  std::vector<bst_float> weights;
+  std::vector<bst_float> weights_;
  /*!
   * \brief initialized margins,
   * if specified, xgboost will start from this init margin
   * can be used to specify initial prediction to boost from.
   */
-  std::vector<bst_float> base_margin;
+  std::vector<bst_float> base_margin_;
  /*! \brief version flag, used to check version of this info */
  static const int kVersion = 1;
  /*! \brief default constructor */
-  MetaInfo() : num_row(0), num_col(0), num_nonzero(0) {}
+  MetaInfo()  = default;
  /*!
   * \brief Get weight of each instances.
   * \param i Instance index.
   * \return The weight.
   */
  inline bst_float GetWeight(size_t i) const {
-    return weights.size() != 0 ?  weights[i] : 1.0f;
+    return weights_.size() != 0 ?  weights_[i] : 1.0f;
  }
  /*!
   * \brief Get the root index of i-th instance.
@@ -75,20 +76,20 @@ struct MetaInfo {
   * \return The pre-defined root index of i-th instance.
   */
  inline unsigned GetRoot(size_t i) const {
-    return root_index.size() != 0 ? root_index[i] : 0U;
+    return root_index_.size() != 0 ? root_index_[i] : 0U;
  }
  /*! \brief get sorted indexes (argsort) of labels by absolute value (used by cox loss) */
  inline const std::vector<size_t>& LabelAbsSort() const {
-    if (label_order_cache.size() == labels.size()) {
-      return label_order_cache;
+    if (label_order_cache_.size() == labels_.size()) {
+      return label_order_cache_;
    }
-    label_order_cache.resize(labels.size());
-    std::iota(label_order_cache.begin(), label_order_cache.end(), 0);
-    const auto l = labels;
-    XGBOOST_PARALLEL_SORT(label_order_cache.begin(), label_order_cache.end(),
+    label_order_cache_.resize(labels_.size());
+    std::iota(label_order_cache_.begin(), label_order_cache_.end(), 0);
+    const auto l = labels_;
+    XGBOOST_PARALLEL_SORT(label_order_cache_.begin(), label_order_cache_.end(),
              [&l](size_t i1, size_t i2) {return std::abs(l[i1]) < std::abs(l[i2]);});

-    return label_order_cache;
+    return label_order_cache_;
  }
  /*! \brief clear all the information */
  void Clear();
@@ -113,7 +114,7 @@ struct MetaInfo {

 private:
  /*! \brief argsort of labels */
-  mutable std::vector<size_t> label_order_cache;
+  mutable std::vector<size_t> label_order_cache_;
 };

 /*! \brief read-only sparse instance batch in CSR format */
@@ -125,7 +126,7 @@ struct SparseBatch {
    /*! \brief feature value */
    bst_float fvalue;
    /*! \brief default constructor */
-    Entry() {}
+    Entry() = default;
    /*!
     * \brief constructor with index and value
     * \param index The feature or row index.
@@ -141,11 +142,11 @@ struct SparseBatch {
  /*! \brief an instance of sparse vector in the batch */
  struct Inst {
    /*! \brief pointer to the elements*/
-    const Entry *data;
+    const Entry *data{nullptr};
    /*! \brief length of the instance */
-    bst_uint length;
+    bst_uint length{0};
    /*! \brief constructor */
-    Inst() : data(0), length(0) {}
+    Inst()  = default;
    Inst(const Entry *data, bst_uint length) : data(data), length(length) {}
    /*! \brief get i-th pair in the sparse vector*/
    inline const Entry& operator[](size_t i) const {
@@ -167,7 +168,7 @@ struct RowBatch : public SparseBatch {
  const Entry *data_ptr;
  /*! \brief get i-th row from the batch */
  inline Inst operator[](size_t i) const {
-    return Inst(data_ptr + ind_ptr[i], static_cast<bst_uint>(ind_ptr[i + 1] - ind_ptr[i]));
+    return {data_ptr + ind_ptr[i], static_cast<bst_uint>(ind_ptr[i + 1] - ind_ptr[i])};
  }
 };

@@ -206,16 +207,16 @@ class DataSource : public dmlc::DataIter<RowBatch> {
 * \brief A vector-like structure to represent set of rows.
 * But saves the memory when all rows are in the set (common case in xgb)
 */
-struct RowSet {
+class RowSet {
 public:
  /*! \return i-th row index */
  inline bst_uint operator[](size_t i) const;
  /*! \return the size of the set. */
-  inline size_t size() const;
+  inline size_t Size() const;
  /*! \brief push the index back to the set */
-  inline void push_back(bst_uint i);
+  inline void PushBack(bst_uint i);
  /*! \brief clear the set */
-  inline void clear();
+  inline void Clear();
  /*!
   * \brief save rowset to file.
   * \param fo The file to be saved.
@@ -228,11 +229,11 @@ struct RowSet {
   */
  inline bool Load(dmlc::Stream* fi);
  /*! \brief constructor */
-  RowSet() : size_(0) {}
+  RowSet()  = default;

 private:
  /*! \brief The internal data structure of size */
-  uint64_t size_;
+  uint64_t size_{0};
  /*! \brief The internal data structure of row set if not all*/
  std::vector<bst_uint> rows_;
 };
@@ -250,11 +251,11 @@ struct RowSet {
 class DMatrix {
 public:
  /*! \brief default constructor */
-  DMatrix() : cache_learner_ptr_(nullptr) {}
+  DMatrix()  = default;
  /*! \brief meta information of the dataset */
-  virtual MetaInfo& info() = 0;
+  virtual MetaInfo& Info() = 0;
  /*! \brief meta information of the dataset */
-  virtual const MetaInfo& info() const = 0;
+  virtual const MetaInfo& Info() const = 0;
  /*!
   * \brief get the row iterator, reset to beginning position
   * \note Only either RowIterator or  column Iterator can be active.
@@ -291,9 +292,9 @@ class DMatrix {
  /*! \brief get column density */
  virtual float GetColDensity(size_t cidx) const = 0;
  /*! \return reference of buffered rowset, in column access */
-  virtual const RowSet& buffered_rowset() const = 0;
+  virtual const RowSet& BufferedRowset() const = 0;
  /*! \brief virtual destructor */
-  virtual ~DMatrix() {}
+  virtual ~DMatrix() = default;
  /*!
   * \brief Save DMatrix to local file.
   *  The saved file only works for non-sharded dataset(single machine training).
@@ -343,7 +344,7 @@ class DMatrix {
  // allow learner class to access this field.
  friend class LearnerImpl;
  /*! \brief public field to back ref cached matrix. */
-  LearnerImpl* cache_learner_ptr_;
+  LearnerImpl* cache_learner_ptr_{nullptr};
 };

 // implementation of inline functions
@@ -351,15 +352,15 @@ inline bst_uint RowSet::operator[](size_t i) const {
  return rows_.size() == 0 ? static_cast<bst_uint>(i) : rows_[i];
 }

-inline size_t RowSet::size() const {
+inline size_t RowSet::Size() const {
  return size_;
 }

-inline void RowSet::clear() {
+inline void RowSet::Clear() {
  rows_.clear(); size_ = 0;
 }

-inline void RowSet::push_back(bst_uint i) {
+inline void RowSet::PushBack(bst_uint i) {
  if (rows_.size() == 0) {
    if (i == size_) {
      ++size_; return;
--- a/include/xgboost/feature_map.h
+++ b/include/xgboost/feature_map.h
@@ -45,7 +45,7 @@ class FeatureMap {
   */
  inline void PushBack(int fid, const char *fname, const char *ftype) {
    CHECK_EQ(fid, static_cast<int>(names_.size()));
-    names_.push_back(std::string(fname));
+    names_.emplace_back(fname);
    types_.push_back(GetType(ftype));
  }
  /*! \brief clear the feature map */
@@ -54,11 +54,11 @@ class FeatureMap {
    types_.clear();
  }
  /*! \return number of known features */
-  inline size_t size() const {
+  inline size_t Size() const {
    return names_.size();
  }
  /*! \return name of specific feature */
-  inline const char* name(size_t idx) const {
+  inline const char* Name(size_t idx) const {
    CHECK_LT(idx,  names_.size()) << "FeatureMap feature index exceed bound";
    return names_[idx].c_str();
  }
@@ -75,7 +75,7 @@ class FeatureMap {
   * \return The translated type.
   */
  inline static Type GetType(const char* tname) {
-    using namespace std;
+    using std::strcmp;
    if (!strcmp("i", tname)) return kIndicator;
    if (!strcmp("q", tname)) return kQuantitive;
    if (!strcmp("int", tname)) return kInteger;
--- a/include/xgboost/gbm.h
+++ b/include/xgboost/gbm.h
@@ -27,7 +27,7 @@ namespace xgboost {
 class GradientBooster {
 public:
  /*! \brief virtual destructor */
-  virtual ~GradientBooster() {}
+  virtual ~GradientBooster() = default;
  /*!
   * \brief set configuration from pair iterators.
   * \param begin The beginning iterator.
@@ -69,7 +69,7 @@ class GradientBooster {
   * the booster may change content of gpair
   */
  virtual void DoBoost(DMatrix* p_fmat,
-                       HostDeviceVector<bst_gpair>* in_gpair,
+                       HostDeviceVector<GradientPair>* in_gpair,
                       ObjFunction* obj = nullptr) = 0;

  /*!
--- a/include/xgboost/learner.h
+++ b/include/xgboost/learner.h
@@ -37,7 +37,7 @@ namespace xgboost {
 class Learner : public rabit::Serializable {
 public:
  /*! \brief virtual destructor */
-  virtual ~Learner() {}
+  ~Learner() override = default;
  /*!
   * \brief set configuration from pair iterators.
   * \param begin The beginning iterator.
@@ -62,12 +62,12 @@ class Learner : public rabit::Serializable {
   * \brief load model from stream
   * \param fi input stream.
   */
-  virtual void Load(dmlc::Stream* fi) = 0;
+  void Load(dmlc::Stream* fi) override = 0;
  /*!
   * \brief save model to stream.
   * \param fo output stream
   */
-  virtual void Save(dmlc::Stream* fo) const = 0;
+  void Save(dmlc::Stream* fo) const override = 0;
  /*!
   * \brief update the model for one iteration
   *  With the specified objective function.
@@ -84,7 +84,7 @@ class Learner : public rabit::Serializable {
   */
  virtual void BoostOneIter(int iter,
                            DMatrix* train,
-                            HostDeviceVector<bst_gpair>* in_gpair) = 0;
+                            HostDeviceVector<GradientPair>* in_gpair) = 0;
  /*!
   * \brief evaluate the model for specific iteration using the configured metrics.
   * \param iter iteration number
@@ -194,7 +194,7 @@ inline void Learner::Predict(const SparseBatch::Inst& inst,
                             bool output_margin,
                             HostDeviceVector<bst_float>* out_preds,
                             unsigned ntree_limit) const {
-  gbm_->PredictInstance(inst, &out_preds->data_h(), ntree_limit);
+  gbm_->PredictInstance(inst, &out_preds->HostVector(), ntree_limit);
  if (!output_margin) {
    obj_->PredTransform(out_preds);
  }
--- a/include/xgboost/linear_updater.h
+++ b/include/xgboost/linear_updater.h
@@ -11,6 +11,7 @@
 #include <utility>
 #include <vector>
 #include "../../src/gbm/gblinear_model.h"
+#include "../../src/common/host_device_vector.h"

 namespace xgboost {
 /*!
@@ -19,7 +20,7 @@ namespace xgboost {
 class LinearUpdater {
 public:
  /*! \brief virtual destructor */
-  virtual ~LinearUpdater() {}
+  virtual ~LinearUpdater() = default;
  /*!
   * \brief Initialize the updater with given arguments.
   * \param args arguments to the objective function.
@@ -36,7 +37,7 @@ class LinearUpdater {
   * \param sum_instance_weight The sum instance weights, used to normalise l1/l2 penalty.
   */

-  virtual void Update(std::vector<bst_gpair>* in_gpair, DMatrix* data,
+  virtual void Update(HostDeviceVector<GradientPair>* in_gpair, DMatrix* data,
                      gbm::GBLinearModel* model,
                      double sum_instance_weight) = 0;

--- a/include/xgboost/logging.h
+++ b/include/xgboost/logging.h
@@ -21,7 +21,7 @@ class BaseLogger {
    log_stream_ << "[" << dmlc::DateLogger().HumanDate() << "] ";
 #endif
  }
-  std::ostream& stream() { return log_stream_; }
+  std::ostream& stream() { return log_stream_; }  // NOLINT

 protected:
  std::ostringstream log_stream_;
--- a/include/xgboost/metric.h
+++ b/include/xgboost/metric.h
@@ -35,7 +35,7 @@ class Metric {
  /*! \return name of metric */
  virtual const char* Name() const = 0;
  /*! \brief virtual destructor */
-  virtual ~Metric() {}
+  virtual ~Metric() = default;
  /*!
   * \brief create a metric according to name.
   * \param name name of the metric.
--- a/include/xgboost/objective.h
+++ b/include/xgboost/objective.h
@@ -23,7 +23,7 @@ namespace xgboost {
 class ObjFunction {
 public:
  /*! \brief virtual destructor */
-  virtual ~ObjFunction() {}
+  virtual ~ObjFunction() = default;
  /*!
   * \brief set configuration from pair iterators.
   * \param begin The beginning iterator.
@@ -47,7 +47,7 @@ class ObjFunction {
  virtual void GetGradient(HostDeviceVector<bst_float>* preds,
                           const MetaInfo& info,
                           int iteration,
-                           HostDeviceVector<bst_gpair>* out_gpair) = 0;
+                           HostDeviceVector<GradientPair>* out_gpair) = 0;

  /*! \return the default evaluation metric for the objective */
  virtual const char* DefaultEvalMetric() const = 0;
--- a/include/xgboost/predictor.h
+++ b/include/xgboost/predictor.h
@@ -36,7 +36,7 @@ namespace xgboost {

 class Predictor {
 public:
-  virtual ~Predictor() {}
+  virtual ~Predictor() = default;

  /**
   * \fn  virtual void Predictor::Init(const std::vector<std::pair<std::string,
--- a/include/xgboost/tree_model.h
+++ b/include/xgboost/tree_model.h
@@ -71,70 +71,70 @@ template<typename TSplitCond, typename TNodeStat>
 class TreeModel {
 public:
  /*! \brief data type to indicate split condition */
-  typedef TNodeStat  NodeStat;
+  using NodeStat = TNodeStat;
  /*! \brief auxiliary statistics of node to help tree building */
-  typedef TSplitCond SplitCond;
+  using SplitCond = TSplitCond;
  /*! \brief tree node */
  class Node {
   public:
-    Node() : sindex_(0) {
+    Node()  {
      // assert compact alignment
      static_assert(sizeof(Node) == 4 * sizeof(int) + sizeof(Info),
                    "Node: 64 bit align");
    }
    /*! \brief index of left child */
-    inline int cleft() const {
+    inline int LeftChild() const {
      return this->cleft_;
    }
    /*! \brief index of right child */
-    inline int cright() const {
+    inline int RightChild() const {
      return this->cright_;
    }
    /*! \brief index of default child when feature is missing */
-    inline int cdefault() const {
-      return this->default_left() ? this->cleft() : this->cright();
+    inline int DefaultChild() const {
+      return this->DefaultLeft() ? this->LeftChild() : this->RightChild();
    }
    /*! \brief feature index of split condition */
-    inline unsigned split_index() const {
+    inline unsigned SplitIndex() const {
      return sindex_ & ((1U << 31) - 1U);
    }
    /*! \brief when feature is unknown, whether goes to left child */
-    inline bool default_left() const {
+    inline bool DefaultLeft() const {
      return (sindex_ >> 31) != 0;
    }
    /*! \brief whether current node is leaf node */
-    inline bool is_leaf() const {
+    inline bool IsLeaf() const {
      return cleft_ == -1;
    }
    /*! \return get leaf value of leaf node */
-    inline bst_float leaf_value() const {
+    inline bst_float LeafValue() const {
      return (this->info_).leaf_value;
    }
    /*! \return get split condition of the node */
-    inline TSplitCond split_cond() const {
+    inline TSplitCond SplitCond() const {
      return (this->info_).split_cond;
    }
    /*! \brief get parent of the node */
-    inline int parent() const {
+    inline int Parent() const {
      return parent_ & ((1U << 31) - 1);
    }
    /*! \brief whether current node is left child */
-    inline bool is_left_child() const {
+    inline bool IsLeftChild() const {
      return (parent_ & (1U << 31)) != 0;
    }
    /*! \brief whether this node is deleted */
-    inline bool is_deleted() const {
+    inline bool IsDeleted() const {
      return sindex_ == std::numeric_limits<unsigned>::max();
    }
    /*! \brief whether current node is root */
-    inline bool is_root() const {
+    inline bool IsRoot() const {
      return parent_ == -1;
    }
    /*!
     * \brief set the right child
     * \param nid node id to right child
     */
-    inline void set_right_child(int nid) {
+    inline void SetRightChild(int nid) {
      this->cright_ = nid;
    }
    /*!
@@ -143,7 +143,7 @@ class TreeModel {
     * \param split_cond  split condition
     * \param default_left the default direction when feature is unknown
     */
-    inline void set_split(unsigned split_index, TSplitCond split_cond,
+    inline void SetSplit(unsigned split_index, TSplitCond split_cond,
                          bool default_left = false) {
      if (default_left) split_index |= (1U << 31);
      this->sindex_ = split_index;
@@ -155,13 +155,13 @@ class TreeModel {
     * \param right right index, could be used to store
     *        additional information
     */
-    inline void set_leaf(bst_float value, int right = -1) {
+    inline void SetLeaf(bst_float value, int right = -1) {
      (this->info_).leaf_value = value;
      this->cleft_ = -1;
      this->cright_ = right;
    }
    /*! \brief mark that this node is deleted */
-    inline void mark_delete() {
+    inline void MarkDelete() {
      this->sindex_ = std::numeric_limits<unsigned>::max();
    }

@@ -181,11 +181,11 @@ class TreeModel {
    // pointer to left, right
    int cleft_, cright_;
    // split feature index, left split or right split depends on the highest bit
-    unsigned sindex_;
+    unsigned sindex_{0};
    // extra info
    Info info_;
    // set parent
-    inline void set_parent(int pidx, bool is_left_child = true) {
+    inline void SetParent(int pidx, bool is_left_child = true) {
      if (is_left_child) pidx |= (1U << 31);
      this->parent_ = pidx;
    }
@@ -193,35 +193,35 @@ class TreeModel {

 protected:
  // vector of nodes
-  std::vector<Node> nodes;
+  std::vector<Node> nodes_;
  // free node space, used during training process
-  std::vector<int>  deleted_nodes;
+  std::vector<int>  deleted_nodes_;
  // stats of nodes
-  std::vector<TNodeStat> stats;
+  std::vector<TNodeStat> stats_;
  // leaf vector, that is used to store additional information
-  std::vector<bst_float> leaf_vector;
+  std::vector<bst_float> leaf_vector_;
  // allocate a new node,
  // !!!!!! NOTE: may cause BUG here, nodes.resize
  inline int AllocNode() {
    if (param.num_deleted != 0) {
-      int nd = deleted_nodes.back();
-      deleted_nodes.pop_back();
+      int nd = deleted_nodes_.back();
+      deleted_nodes_.pop_back();
      --param.num_deleted;
      return nd;
    }
    int nd = param.num_nodes++;
    CHECK_LT(param.num_nodes, std::numeric_limits<int>::max())
        << "number of nodes in the tree exceed 2^31";
-    nodes.resize(param.num_nodes);
-    stats.resize(param.num_nodes);
-    leaf_vector.resize(param.num_nodes * param.size_leaf_vector);
+    nodes_.resize(param.num_nodes);
+    stats_.resize(param.num_nodes);
+    leaf_vector_.resize(param.num_nodes * param.size_leaf_vector);
    return nd;
  }
  // delete a tree node, keep the parent field to allow trace back
  inline void DeleteNode(int nid) {
    CHECK_GE(nid, param.num_roots);
-    deleted_nodes.push_back(nid);
-    nodes[nid].mark_delete();
+    deleted_nodes_.push_back(nid);
+    nodes_[nid].MarkDelete();
    ++param.num_deleted;
  }

@@ -232,11 +232,11 @@ class TreeModel {
   * \param value new leaf value
   */
  inline void ChangeToLeaf(int rid, bst_float value) {
-    CHECK(nodes[nodes[rid].cleft() ].is_leaf());
-    CHECK(nodes[nodes[rid].cright()].is_leaf());
-    this->DeleteNode(nodes[rid].cleft());
-    this->DeleteNode(nodes[rid].cright());
-    nodes[rid].set_leaf(value);
+    CHECK(nodes_[nodes_[rid].LeftChild() ].IsLeaf());
+    CHECK(nodes_[nodes_[rid].RightChild()].IsLeaf());
+    this->DeleteNode(nodes_[rid].LeftChild());
+    this->DeleteNode(nodes_[rid].RightChild());
+    nodes_[rid].SetLeaf(value);
  }
  /*!
   * \brief collapse a non leaf node to a leaf node, delete its children
@@ -244,12 +244,12 @@ class TreeModel {
   * \param value new leaf value
   */
  inline void CollapseToLeaf(int rid, bst_float value) {
-    if (nodes[rid].is_leaf()) return;
-    if (!nodes[nodes[rid].cleft() ].is_leaf()) {
-      CollapseToLeaf(nodes[rid].cleft(), 0.0f);
+    if (nodes_[rid].IsLeaf()) return;
+    if (!nodes_[nodes_[rid].LeftChild() ].IsLeaf()) {
+      CollapseToLeaf(nodes_[rid].LeftChild(), 0.0f);
    }
-    if (!nodes[nodes[rid].cright() ].is_leaf()) {
-      CollapseToLeaf(nodes[rid].cright(), 0.0f);
+    if (!nodes_[nodes_[rid].RightChild() ].IsLeaf()) {
+      CollapseToLeaf(nodes_[rid].RightChild(), 0.0f);
    }
    this->ChangeToLeaf(rid, value);
  }
@@ -262,47 +262,47 @@ class TreeModel {
    param.num_nodes = 1;
    param.num_roots = 1;
    param.num_deleted = 0;
-    nodes.resize(1);
+    nodes_.resize(1);
  }
  /*! \brief get node given nid */
  inline Node& operator[](int nid) {
-    return nodes[nid];
+    return nodes_[nid];
  }
  /*! \brief get node given nid */
  inline const Node& operator[](int nid) const {
-    return nodes[nid];
+    return nodes_[nid];
  }

  /*! \brief get const reference to nodes */
-  inline const std::vector<Node>& GetNodes() const { return nodes; }
+  inline const std::vector<Node>& GetNodes() const { return nodes_; }

  /*! \brief get node statistics given nid */
-  inline NodeStat& stat(int nid) {
-    return stats[nid];
+  inline NodeStat& Stat(int nid) {
+    return stats_[nid];
  }
  /*! \brief get node statistics given nid */
-  inline const NodeStat& stat(int nid) const {
-    return stats[nid];
+  inline const NodeStat& Stat(int nid) const {
+    return stats_[nid];
  }
  /*! \brief get leaf vector given nid */
-  inline bst_float* leafvec(int nid) {
-    if (leaf_vector.size() == 0) return nullptr;
-    return& leaf_vector[nid * param.size_leaf_vector];
+  inline bst_float* Leafvec(int nid) {
+    if (leaf_vector_.size() == 0) return nullptr;
+    return& leaf_vector_[nid * param.size_leaf_vector];
  }
  /*! \brief get leaf vector given nid */
-  inline const bst_float* leafvec(int nid) const {
-    if (leaf_vector.size() == 0) return nullptr;
-    return& leaf_vector[nid * param.size_leaf_vector];
+  inline const bst_float* Leafvec(int nid) const {
+    if (leaf_vector_.size() == 0) return nullptr;
+    return& leaf_vector_[nid * param.size_leaf_vector];
  }
  /*! \brief initialize the model */
  inline void InitModel() {
    param.num_nodes = param.num_roots;
-    nodes.resize(param.num_nodes);
-    stats.resize(param.num_nodes);
-    leaf_vector.resize(param.num_nodes * param.size_leaf_vector, 0.0f);
+    nodes_.resize(param.num_nodes);
+    stats_.resize(param.num_nodes);
+    leaf_vector_.resize(param.num_nodes * param.size_leaf_vector, 0.0f);
    for (int i = 0; i < param.num_nodes; i ++) {
-      nodes[i].set_leaf(0.0f);
-      nodes[i].set_parent(-1);
+      nodes_[i].SetLeaf(0.0f);
+      nodes_[i].SetParent(-1);
    }
  }
  /*!
@@ -311,35 +311,35 @@ class TreeModel {
   */
  inline void Load(dmlc::Stream* fi) {
    CHECK_EQ(fi->Read(&param, sizeof(TreeParam)), sizeof(TreeParam));
-    nodes.resize(param.num_nodes);
-    stats.resize(param.num_nodes);
+    nodes_.resize(param.num_nodes);
+    stats_.resize(param.num_nodes);
    CHECK_NE(param.num_nodes, 0);
-    CHECK_EQ(fi->Read(dmlc::BeginPtr(nodes), sizeof(Node) * nodes.size()),
-             sizeof(Node) * nodes.size());
-    CHECK_EQ(fi->Read(dmlc::BeginPtr(stats), sizeof(NodeStat) * stats.size()),
-             sizeof(NodeStat) * stats.size());
+    CHECK_EQ(fi->Read(dmlc::BeginPtr(nodes_), sizeof(Node) * nodes_.size()),
+             sizeof(Node) * nodes_.size());
+    CHECK_EQ(fi->Read(dmlc::BeginPtr(stats_), sizeof(NodeStat) * stats_.size()),
+             sizeof(NodeStat) * stats_.size());
    if (param.size_leaf_vector != 0) {
-      CHECK(fi->Read(&leaf_vector));
+      CHECK(fi->Read(&leaf_vector_));
    }
    // chg deleted nodes
-    deleted_nodes.resize(0);
+    deleted_nodes_.resize(0);
    for (int i = param.num_roots; i < param.num_nodes; ++i) {
-      if (nodes[i].is_deleted()) deleted_nodes.push_back(i);
+      if (nodes_[i].IsDeleted()) deleted_nodes_.push_back(i);
    }
-    CHECK_EQ(static_cast<int>(deleted_nodes.size()), param.num_deleted);
+    CHECK_EQ(static_cast<int>(deleted_nodes_.size()), param.num_deleted);
  }
  /*!
   * \brief save model to stream
   * \param fo output stream
   */
  inline void Save(dmlc::Stream* fo) const {
-    CHECK_EQ(param.num_nodes, static_cast<int>(nodes.size()));
-    CHECK_EQ(param.num_nodes, static_cast<int>(stats.size()));
+    CHECK_EQ(param.num_nodes, static_cast<int>(nodes_.size()));
+    CHECK_EQ(param.num_nodes, static_cast<int>(stats_.size()));
    fo->Write(&param, sizeof(TreeParam));
    CHECK_NE(param.num_nodes, 0);
-    fo->Write(dmlc::BeginPtr(nodes), sizeof(Node) * nodes.size());
-    fo->Write(dmlc::BeginPtr(stats), sizeof(NodeStat) * nodes.size());
-    if (param.size_leaf_vector != 0) fo->Write(leaf_vector);
+    fo->Write(dmlc::BeginPtr(nodes_), sizeof(Node) * nodes_.size());
+    fo->Write(dmlc::BeginPtr(stats_), sizeof(NodeStat) * nodes_.size());
+    if (param.size_leaf_vector != 0) fo->Write(leaf_vector_);
  }
  /*!
   * \brief add child nodes to node
@@ -348,10 +348,10 @@ class TreeModel {
  inline void AddChilds(int nid) {
    int pleft  = this->AllocNode();
    int pright = this->AllocNode();
-    nodes[nid].cleft_  = pleft;
-    nodes[nid].cright_ = pright;
-    nodes[nodes[nid].cleft() ].set_parent(nid, true);
-    nodes[nodes[nid].cright()].set_parent(nid, false);
+    nodes_[nid].cleft_  = pleft;
+    nodes_[nid].cright_ = pright;
+    nodes_[nodes_[nid].LeftChild() ].SetParent(nid, true);
+    nodes_[nodes_[nid].RightChild()].SetParent(nid, false);
  }
  /*!
   * \brief only add a right child to a leaf node
@@ -359,8 +359,8 @@ class TreeModel {
   */
  inline void AddRightChild(int nid) {
    int pright = this->AllocNode();
-    nodes[nid].right  = pright;
-    nodes[nodes[nid].right].set_parent(nid, false);
+    nodes_[nid].right  = pright;
+    nodes_[nodes_[nid].right].SetParent(nid, false);
  }
  /*!
   * \brief get current depth
@@ -369,9 +369,9 @@ class TreeModel {
   */
  inline int GetDepth(int nid, bool pass_rchild = false) const {
    int depth = 0;
-    while (!nodes[nid].is_root()) {
-      if (!pass_rchild || nodes[nid].is_left_child()) ++depth;
-      nid = nodes[nid].parent();
+    while (!nodes_[nid].IsRoot()) {
+      if (!pass_rchild || nodes_[nid].IsLeftChild()) ++depth;
+      nid = nodes_[nid].Parent();
    }
    return depth;
  }
@@ -380,9 +380,9 @@ class TreeModel {
   * \param nid node id
   */
  inline int MaxDepth(int nid) const {
-    if (nodes[nid].is_leaf()) return 0;
-    return std::max(MaxDepth(nodes[nid].cleft())+1,
-                     MaxDepth(nodes[nid].cright())+1);
+    if (nodes_[nid].IsLeaf()) return 0;
+    return std::max(MaxDepth(nodes_[nid].LeftChild())+1,
+                     MaxDepth(nodes_[nid].RightChild())+1);
  }
  /*!
   * \brief get maximum depth
@@ -395,7 +395,7 @@ class TreeModel {
    return maxd;
  }
  /*! \brief number of extra nodes besides the root */
-  inline int num_extra_nodes() const {
+  inline int NumExtraNodes() const {
    return param.num_nodes - param.num_roots - param.num_deleted;
  }
 };
@@ -421,7 +421,7 @@ struct PathElement {
  bst_float zero_fraction;
  bst_float one_fraction;
  bst_float pweight;
-  PathElement() {}
+  PathElement() = default;
  PathElement(int i, bst_float z, bst_float o, bst_float w) :
    feature_index(i), zero_fraction(z), one_fraction(o), pweight(w) {}
 };
@@ -457,19 +457,19 @@ class RegTree: public TreeModel<bst_float, RTreeNodeStat> {
     * \brief returns the size of the feature vector
     * \return the size of the feature vector
     */
-    inline size_t size() const;
+    inline size_t Size() const;
    /*!
     * \brief get ith value
     * \param i feature index.
     * \return the i-th feature value
     */
-    inline bst_float fvalue(size_t i) const;
+    inline bst_float Fvalue(size_t i) const;
    /*!
     * \brief check whether i-th entry is missing
     * \param i feature index.
     * \return whether i-th value is missing.
     */
-    inline bool is_missing(size_t i) const;
+    inline bool IsMissing(size_t i) const;

   private:
    /*!
@@ -480,7 +480,7 @@ class RegTree: public TreeModel<bst_float, RTreeNodeStat> {
      bst_float fvalue;
      int flag;
    };
-    std::vector<Entry> data;
+    std::vector<Entry> data_;
  };
  /*!
   * \brief get the leaf index
@@ -562,63 +562,63 @@ class RegTree: public TreeModel<bst_float, RTreeNodeStat> {
 private:
  inline bst_float FillNodeMeanValue(int nid);

-  std::vector<bst_float> node_mean_values;
+  std::vector<bst_float> node_mean_values_;
 };

 // implementations of inline functions
 // do not need to read if only use the model
 inline void RegTree::FVec::Init(size_t size) {
  Entry e; e.flag = -1;
-  data.resize(size);
-  std::fill(data.begin(), data.end(), e);
+  data_.resize(size);
+  std::fill(data_.begin(), data_.end(), e);
 }

 inline void RegTree::FVec::Fill(const RowBatch::Inst& inst) {
  for (bst_uint i = 0; i < inst.length; ++i) {
-    if (inst[i].index >= data.size()) continue;
-    data[inst[i].index].fvalue = inst[i].fvalue;
+    if (inst[i].index >= data_.size()) continue;
+    data_[inst[i].index].fvalue = inst[i].fvalue;
  }
 }

 inline void RegTree::FVec::Drop(const RowBatch::Inst& inst) {
  for (bst_uint i = 0; i < inst.length; ++i) {
-    if (inst[i].index >= data.size()) continue;
-    data[inst[i].index].flag = -1;
+    if (inst[i].index >= data_.size()) continue;
+    data_[inst[i].index].flag = -1;
  }
 }

-inline size_t RegTree::FVec::size() const {
-  return data.size();
+inline size_t RegTree::FVec::Size() const {
+  return data_.size();
 }

-inline bst_float RegTree::FVec::fvalue(size_t i) const {
-  return data[i].fvalue;
+inline bst_float RegTree::FVec::Fvalue(size_t i) const {
+  return data_[i].fvalue;
 }

-inline bool RegTree::FVec::is_missing(size_t i) const {
-  return data[i].flag == -1;
+inline bool RegTree::FVec::IsMissing(size_t i) const {
+  return data_[i].flag == -1;
 }

 inline int RegTree::GetLeafIndex(const RegTree::FVec& feat, unsigned root_id) const {
-  int pid = static_cast<int>(root_id);
-  while (!(*this)[pid].is_leaf()) {
-    unsigned split_index = (*this)[pid].split_index();
-    pid = this->GetNext(pid, feat.fvalue(split_index), feat.is_missing(split_index));
+  auto pid = static_cast<int>(root_id);
+  while (!(*this)[pid].IsLeaf()) {
+    unsigned split_index = (*this)[pid].SplitIndex();
+    pid = this->GetNext(pid, feat.Fvalue(split_index), feat.IsMissing(split_index));
  }
  return pid;
 }

 inline bst_float RegTree::Predict(const RegTree::FVec& feat, unsigned root_id) const {
  int pid = this->GetLeafIndex(feat, root_id);
-  return (*this)[pid].leaf_value();
+  return (*this)[pid].LeafValue();
 }

 inline void RegTree::FillNodeMeanValues() {
  size_t num_nodes = this->param.num_nodes;
-  if (this->node_mean_values.size() == num_nodes) {
+  if (this->node_mean_values_.size() == num_nodes) {
    return;
  }
-  this->node_mean_values.resize(num_nodes);
+  this->node_mean_values_.resize(num_nodes);
  for (int root_id = 0; root_id < param.num_roots; ++root_id) {
    this->FillNodeMeanValue(root_id);
  }
@@ -627,40 +627,39 @@ inline void RegTree::FillNodeMeanValues() {
 inline bst_float RegTree::FillNodeMeanValue(int nid) {
  bst_float result;
  auto& node = (*this)[nid];
-  if (node.is_leaf()) {
-    result = node.leaf_value();
+  if (node.IsLeaf()) {
+    result = node.LeafValue();
  } else {
-    result  = this->FillNodeMeanValue(node.cleft()) * this->stat(node.cleft()).sum_hess;
-    result += this->FillNodeMeanValue(node.cright()) * this->stat(node.cright()).sum_hess;
-    result /= this->stat(nid).sum_hess;
+    result  = this->FillNodeMeanValue(node.LeftChild()) * this->Stat(node.LeftChild()).sum_hess;
+    result += this->FillNodeMeanValue(node.RightChild()) * this->Stat(node.RightChild()).sum_hess;
+    result /= this->Stat(nid).sum_hess;
  }
-  this->node_mean_values[nid] = result;
+  this->node_mean_values_[nid] = result;
  return result;
 }

 inline void RegTree::CalculateContributionsApprox(const RegTree::FVec& feat, unsigned root_id,
                                                  bst_float *out_contribs) const {
-  CHECK_GT(this->node_mean_values.size(), 0U);
+  CHECK_GT(this->node_mean_values_.size(), 0U);
  // this follows the idea of http://blog.datadive.net/interpreting-random-forests/
-  bst_float node_value;
-  unsigned split_index;
-  int pid = static_cast<int>(root_id);
+  unsigned split_index = 0;
+  auto pid = static_cast<int>(root_id);
  // update bias value
-  node_value = this->node_mean_values[pid];
-  out_contribs[feat.size()] += node_value;
-  if ((*this)[pid].is_leaf()) {
+  bst_float node_value = this->node_mean_values_[pid];
+  out_contribs[feat.Size()] += node_value;
+  if ((*this)[pid].IsLeaf()) {
    // nothing to do anymore
    return;
  }
-  while (!(*this)[pid].is_leaf()) {
-    split_index = (*this)[pid].split_index();
-    pid = this->GetNext(pid, feat.fvalue(split_index), feat.is_missing(split_index));
-    bst_float new_value = this->node_mean_values[pid];
+  while (!(*this)[pid].IsLeaf()) {
+    split_index = (*this)[pid].SplitIndex();
+    pid = this->GetNext(pid, feat.Fvalue(split_index), feat.IsMissing(split_index));
+    bst_float new_value = this->node_mean_values_[pid];
    // update feature weight
    out_contribs[split_index] += new_value - node_value;
    node_value = new_value;
  }
-  bst_float leaf_value = (*this)[pid].leaf_value();
+  bst_float leaf_value = (*this)[pid].LeafValue();
  // update leaf feature weight
  out_contribs[split_index] += leaf_value - node_value;
 }
@@ -749,33 +748,33 @@ inline void RegTree::TreeShap(const RegTree::FVec& feat, bst_float *phi,
    ExtendPath(unique_path, unique_depth, parent_zero_fraction,
               parent_one_fraction, parent_feature_index);
  }
-  const unsigned split_index = node.split_index();
+  const unsigned split_index = node.SplitIndex();

  // leaf node
-  if (node.is_leaf()) {
+  if (node.IsLeaf()) {
    for (unsigned i = 1; i <= unique_depth; ++i) {
      const bst_float w = UnwoundPathSum(unique_path, unique_depth, i);
      const PathElement &el = unique_path[i];
      phi[el.feature_index] += w * (el.one_fraction - el.zero_fraction)
-                                 * node.leaf_value() * condition_fraction;
+                                 * node.LeafValue() * condition_fraction;
    }

  // internal node
  } else {
    // find which branch is "hot" (meaning x would follow it)
    unsigned hot_index = 0;
-    if (feat.is_missing(split_index)) {
-      hot_index = node.cdefault();
-    } else if (feat.fvalue(split_index) < node.split_cond()) {
-      hot_index = node.cleft();
+    if (feat.IsMissing(split_index)) {
+      hot_index = node.DefaultChild();
+    } else if (feat.Fvalue(split_index) < node.SplitCond()) {
+      hot_index = node.LeftChild();
    } else {
-      hot_index = node.cright();
+      hot_index = node.RightChild();
    }
-    const unsigned cold_index = (static_cast<int>(hot_index) == node.cleft() ?
-                                 node.cright() : node.cleft());
-    const bst_float w = this->stat(node_index).sum_hess;
-    const bst_float hot_zero_fraction = this->stat(hot_index).sum_hess / w;
-    const bst_float cold_zero_fraction = this->stat(cold_index).sum_hess / w;
+    const unsigned cold_index = (static_cast<int>(hot_index) == node.LeftChild() ?
+                                 node.RightChild() : node.LeftChild());
+    const bst_float w = this->Stat(node_index).sum_hess;
+    const bst_float hot_zero_fraction = this->Stat(hot_index).sum_hess / w;
+    const bst_float cold_zero_fraction = this->Stat(cold_index).sum_hess / w;
    bst_float incoming_zero_fraction = 1;
    bst_float incoming_one_fraction = 1;

@@ -820,13 +819,13 @@ inline void RegTree::CalculateContributions(const RegTree::FVec& feat, unsigned
                                            unsigned condition_feature) const {
  // find the expected value of the tree's predictions
  if (condition == 0) {
-    bst_float node_value = this->node_mean_values[static_cast<int>(root_id)];
-    out_contribs[feat.size()] += node_value;
+    bst_float node_value = this->node_mean_values_[static_cast<int>(root_id)];
+    out_contribs[feat.Size()] += node_value;
  }

  // Preallocate space for the unique path data
  const int maxd = this->MaxDepth(root_id) + 2;
-  PathElement *unique_path_data = new PathElement[(maxd * (maxd + 1)) / 2];
+  auto *unique_path_data = new PathElement[(maxd * (maxd + 1)) / 2];

  TreeShap(feat, out_contribs, root_id, 0, unique_path_data,
           1, 1, -1, condition, condition_feature, 1);
@@ -835,14 +834,14 @@ inline void RegTree::CalculateContributions(const RegTree::FVec& feat, unsigned

 /*! \brief get next position of the tree given current pid */
 inline int RegTree::GetNext(int pid, bst_float fvalue, bool is_unknown) const {
-  bst_float split_value = (*this)[pid].split_cond();
+  bst_float split_value = (*this)[pid].SplitCond();
  if (is_unknown) {
-    return (*this)[pid].cdefault();
+    return (*this)[pid].DefaultChild();
  } else {
    if (fvalue < split_value) {
-      return (*this)[pid].cleft();
+      return (*this)[pid].LeftChild();
    } else {
-      return (*this)[pid].cright();
+      return (*this)[pid].RightChild();
    }
  }
 }
--- a/include/xgboost/tree_updater.h
+++ b/include/xgboost/tree_updater.h
@@ -25,7 +25,7 @@ namespace xgboost {
 class TreeUpdater {
 public:
  /*! \brief virtual destructor */
-  virtual ~TreeUpdater() {}
+  virtual ~TreeUpdater() = default;
  /*!
   * \brief Initialize the updater with given arguments.
   * \param args arguments to the objective function.
@@ -40,7 +40,7 @@ class TreeUpdater {
   *         but maybe different random seeds, usually one tree is passed in at a time,
   *         there can be multiple trees when we train random forest style model
   */
-  virtual void Update(HostDeviceVector<bst_gpair>* gpair,
+  virtual void Update(HostDeviceVector<GradientPair>* gpair,
                      DMatrix* data,
                      const std::vector<RegTree*>& trees) = 0;

--- a/jvm-packages/README.md
+++ b/jvm-packages/README.md
@@ -16,6 +16,49 @@ Apache Flink and Apache Spark.

 You can find more about XGBoost on [Documentation](https://xgboost.readthedocs.org/en/latest/jvm/index.html) and [Resource Page](../demo/README.md).

+## Add Maven Dependency
+
+XGBoost4J, XGBoost4J-Spark, etc. in maven repository is compiled with g++-4.8.5  
+
+### Access SNAPSHOT version
+
+You need to add github as repo:
+
+<b>maven</b>:
+
+```xml
+<repository>
+  <id>GitHub Repo</id>
+  <name>GitHub Repo</name>
+  <url>https://raw.githubusercontent.com/CodingCat/xgboost/maven-repo/</url>
+</repository>
+```
+
+<b>sbt</b>:
+ 
+```sbt 
+resolvers += "GitHub Repo" at "https://raw.githubusercontent.com/CodingCat/xgboost/maven-repo/"
+```
+
+the add dependency as following:
+
+<b>maven</b> 
+
+```
+<dependency>
+    <groupId>ml.dmlc</groupId>
+    <artifactId>xgboost4j</artifactId>
+    <version>latest_version_num</version>
+</dependency>
+``` 
+ 
+<b>sbt</b> 
+```sbt
+ "ml.dmlc" % "xgboost4j" % "latest_version_num"
+``` 
+
+if you want to use `xgboost4j-spark`, you just need to replace xgboost4j with `xgboost4j-spark`
+
 ## Examples

 Full code examples for Scala, Java, Apache Spark, and Apache Flink can
--- a/jvm-packages/dev/build-docker.sh
+++ b/jvm-packages/dev/build-docker.sh
@@ -0,0 +1,5 @@
+#!/bin/bash
+
+set -x
+
+sudo docker run --rm -m 4g -e JAVA_OPTS='-Xmx6g' --attach stdin --attach stdout --attach stderr --volume `pwd`/../:/xgboost codingcat/xgbrelease:latest /xgboost/jvm-packages/dev/build.sh
--- a/jvm-packages/dev/build.sh
+++ b/jvm-packages/dev/build.sh
@@ -0,0 +1,21 @@
+#!/usr/bin/env bash
+
+set -x
+
+export JAVA_HOME=/usr/lib/jvm/java-1.8.0
+export MAVEN_OPTS="-Xmx3000m"
+export CMAKE_CXX_COMPILER=/opt/rh/devtoolset-2/root/usr/bin/gcc
+export CXX=/opt/rh/devtoolset-2/root/usr/bin/g++
+export CC=/opt/rh/devtoolset-2/root/usr/bin/gcc
+
+export PATH=$CXX:$CC:/opt/rh/python27/root/usr/bin/python:$PATH
+
+scl enable devtoolset-2 bash
+scl enable python27 bash
+
+rm /usr/bin/python
+ln -s /opt/rh/python27/root/usr/bin/python /usr/bin/python
+
+# build xgboost
+cd /xgboost/jvm-packages;mvn package
+
--- a/jvm-packages/pom.xml
+++ b/jvm-packages/pom.xml
@@ -6,7 +6,7 @@

    <groupId>ml.dmlc</groupId>
    <artifactId>xgboost-jvm</artifactId>
-    <version>0.8-SNAPSHOT</version>
+    <version>0.72-SNAPSHOT</version>
    <packaging>pom</packaging>
    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
@@ -14,7 +14,7 @@
        <maven.compiler.source>1.7</maven.compiler.source>
        <maven.compiler.target>1.7</maven.compiler.target>
        <flink.version>0.10.2</flink.version>
-        <spark.version>2.2.1</spark.version>
+        <spark.version>2.3.0</spark.version>
        <scala.version>2.11.8</scala.version>
        <scala.binary.version>2.11</scala.binary.version>
    </properties>
@@ -31,6 +31,86 @@
        <module>xgboost4j-spark</module>
        <module>xgboost4j-flink</module>
    </modules>
+    <profiles>
+        <profile>
+            <id>assembly</id>
+            <build>
+                <plugins>
+                    <plugin>
+                        <groupId>org.apache.maven.plugins</groupId>
+                        <artifactId>maven-assembly-plugin</artifactId>
+                        <version>2.6</version>
+                        <configuration>
+                            <descriptorRefs>
+                                <descriptorRef>jar-with-dependencies</descriptorRef>
+                            </descriptorRefs>
+                            <skipAssembly>true</skipAssembly>
+                        </configuration>
+                        <executions>
+                            <execution>
+                                <id>make-assembly</id>
+                                <phase>package</phase>
+                                <goals>
+                                    <goal>single</goal>
+                                </goals>
+                            </execution>
+                        </executions>
+                    </plugin>
+                </plugins>
+            </build>
+        </profile>
+        <profile>
+            <id>release-to-github</id>
+            <distributionManagement>
+                <repository>
+                    <id>github.repo</id>
+                    <name>Temporary Staging Repository</name>
+                    <url>file://${project.build.directory}/mvn-repo</url>
+                </repository>
+            </distributionManagement>
+            <properties>
+                <github.global.server>github</github.global.server>
+            </properties>
+            <build>
+                <plugins>
+                    <plugin>
+                        <groupId>com.github.github</groupId>
+                        <artifactId>site-maven-plugin</artifactId>
+                        <version>0.12</version>
+                        <configuration>
+                            <message>Maven artifacts for ${project.version}</message>
+                            <noJekyll>true</noJekyll>
+                            <outputDirectory>${project.build.directory}/mvn-repo</outputDirectory>
+                            <branch>refs/heads/maven-repo</branch>
+                            <excludes>
+                                <exclude>*-with-dependencies.jar</exclude>
+                            </excludes>
+                            <repositoryName>xgboost</repositoryName>
+                            <repositoryOwner>CodingCat</repositoryOwner>
+                            <merge>true</merge>
+                        </configuration>
+                        <executions>
+                            <!-- run site-maven-plugin's 'site' target as part of the build's normal 'deploy' phase -->
+                            <execution>
+                                <goals>
+                                    <goal>site</goal>
+                                </goals>
+                                <phase>deploy</phase>
+                            </execution>
+                        </executions>
+                    </plugin>
+                    <plugin>
+                        <groupId>org.apache.maven.plugins</groupId>
+                        <artifactId>maven-deploy-plugin</artifactId>
+                        <version>2.8.2</version>
+                        <configuration>
+                            <altDeploymentRepository>internal.repo::default::file://${project.build.directory}/mvn-repo</altDeploymentRepository>
+                        </configuration>
+                    </plugin>
+                </plugins>
+            </build>
+        </profile>
+    </profiles>
    <build>
        <plugins>
            <plugin>
@@ -158,27 +238,6 @@
                    </execution>
                </executions>
            </plugin>
-
-            <plugin>
-                <groupId>org.apache.maven.plugins</groupId>
-                <artifactId>maven-assembly-plugin</artifactId>
-                <version>2.6</version>
-                <configuration>
-                    <descriptorRefs>
-                        <descriptorRef>jar-with-dependencies</descriptorRef>
-                    </descriptorRefs>
-                    <skipAssembly>true</skipAssembly>
-                </configuration>
-                <executions>
-                    <execution>
-                        <id>make-assembly</id>
-                        <phase>package</phase>
-                        <goals>
-                            <goal>single</goal>
-                        </goals>
-                    </execution>
-                </executions>
-            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-surefire-plugin</artifactId>
--- a/jvm-packages/xgboost4j-example/README.md
+++ b/jvm-packages/xgboost4j-example/README.md
@@ -23,7 +23,7 @@ XGBoost4J Code Examples
 * [External Memory](src/main/scala/ml/dmlc/xgboost4j/scala/example/ExternalMemory.scala)

 ## Spark API
-* [Distributed Training with Spark](src/main/scala/ml/dmlc/xgboost4j/scala/example/spark/DistTrainWithSpark.scala)
+* [Distributed Training with Spark](src/main/scala/ml/dmlc/xgboost4j/scala/example/spark/SparkWithDataFrame.scala)

 ## Flink API
 * [Distributed Training with Flink](src/main/scala/ml/dmlc/xgboost4j/scala/example/flink/DistTrainWithFlink.scala)
--- a/jvm-packages/xgboost4j-example/pom.xml
+++ b/jvm-packages/xgboost4j-example/pom.xml
@@ -6,10 +6,10 @@
    <parent>
        <groupId>ml.dmlc</groupId>
        <artifactId>xgboost-jvm</artifactId>
-        <version>0.8-SNAPSHOT</version>
+        <version>0.72-SNAPSHOT</version>
    </parent>
    <artifactId>xgboost4j-example</artifactId>
-    <version>0.8-SNAPSHOT</version>
+    <version>0.72-SNAPSHOT</version>
    <packaging>jar</packaging>
    <build>
        <plugins>
@@ -26,7 +26,7 @@
        <dependency>
            <groupId>ml.dmlc</groupId>
            <artifactId>xgboost4j-spark</artifactId>
-            <version>0.8-SNAPSHOT</version>
+            <version>0.72-SNAPSHOT</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
@@ -37,7 +37,7 @@
        <dependency>
            <groupId>ml.dmlc</groupId>
            <artifactId>xgboost4j-flink</artifactId>
-            <version>0.8-SNAPSHOT</version>
+            <version>0.72-SNAPSHOT</version>
        </dependency>
        <dependency>
            <groupId>org.apache.commons</groupId>
--- a/jvm-packages/xgboost4j-flink/pom.xml
+++ b/jvm-packages/xgboost4j-flink/pom.xml
@@ -6,10 +6,10 @@
    <parent>
        <groupId>ml.dmlc</groupId>
        <artifactId>xgboost-jvm</artifactId>
-        <version>0.8-SNAPSHOT</version>
+        <version>0.72-SNAPSHOT</version>
    </parent>
    <artifactId>xgboost4j-flink</artifactId>
-    <version>0.8-SNAPSHOT</version>
+    <version>0.72-SNAPSHOT</version>
    <build>
        <plugins>
            <plugin>
@@ -26,7 +26,7 @@
        <dependency>
            <groupId>ml.dmlc</groupId>
            <artifactId>xgboost4j</artifactId>
-            <version>0.8-SNAPSHOT</version>
+            <version>0.72-SNAPSHOT</version>
        </dependency>
        <dependency>
            <groupId>org.apache.commons</groupId>
--- a/jvm-packages/xgboost4j-spark/pom.xml
+++ b/jvm-packages/xgboost4j-spark/pom.xml
@@ -6,7 +6,7 @@
    <parent>
        <groupId>ml.dmlc</groupId>
        <artifactId>xgboost-jvm</artifactId>
-        <version>0.8-SNAPSHOT</version>
+        <version>0.72-SNAPSHOT</version>
    </parent>
    <artifactId>xgboost4j-spark</artifactId>
    <build>
@@ -24,7 +24,7 @@
        <dependency>
            <groupId>ml.dmlc</groupId>
            <artifactId>xgboost4j</artifactId>
-            <version>0.8-SNAPSHOT</version>
+            <version>0.72-SNAPSHOT</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
--- a/jvm-packages/xgboost4j-spark/src/test/scala/ml/dmlc/xgboost4j/scala/spark/CheckpointManagerSuite.scala
+++ b/jvm-packages/xgboost4j-spark/src/test/scala/ml/dmlc/xgboost4j/scala/spark/CheckpointManagerSuite.scala
@@ -38,8 +38,8 @@ class CheckpointManagerSuite extends FunSuite  with BeforeAndAfterAll {
    val trainingRDD = sc.parallelize(Classification.train).map(_.asML).cache()
    val paramMap = Map("eta" -> "1", "max_depth" -> "2", "silent" -> "1",
      "objective" -> "binary:logistic")
-    (XGBoost.trainWithRDD(trainingRDD, paramMap, round = 2, sc.defaultParallelism),
-      XGBoost.trainWithRDD(trainingRDD, paramMap, round = 4, sc.defaultParallelism))
+    (XGBoost.trainWithRDD(trainingRDD, paramMap, round = 2, nWorkers = sc.defaultParallelism),
+      XGBoost.trainWithRDD(trainingRDD, paramMap, round = 4, nWorkers = sc.defaultParallelism))
  }

  test("test update/load models") {
--- a/jvm-packages/xgboost4j/pom.xml
+++ b/jvm-packages/xgboost4j/pom.xml
@@ -6,10 +6,10 @@
    <parent>
        <groupId>ml.dmlc</groupId>
        <artifactId>xgboost-jvm</artifactId>
-        <version>0.8-SNAPSHOT</version>
+        <version>0.72-SNAPSHOT</version>
    </parent>
    <artifactId>xgboost4j</artifactId>
-    <version>0.8-SNAPSHOT</version>
+    <version>0.72-SNAPSHOT</version>
    <packaging>jar</packaging>

    <dependencies>
--- a/plugin/dense_parser/dense_libsvm.cc
+++ b/plugin/dense_parser/dense_libsvm.cc
@@ -69,7 +69,7 @@ class DensifyParser : public dmlc::Parser<IndexType> {
  std::vector<xgboost::bst_float> dense_value_;
 };

-template<typename IndexType>
+template<typename IndexType, typename DType = real_t>
 Parser<IndexType> *
 CreateDenseLibSVMParser(const std::string& path,
                        const std::map<std::string, std::string>& args,
@@ -82,5 +82,6 @@ CreateDenseLibSVMParser(const std::string& path,
 }
 }  // namespace data

-DMLC_REGISTER_DATA_PARSER(uint32_t, dense_libsvm, data::CreateDenseLibSVMParser<uint32_t>);
+DMLC_REGISTER_DATA_PARSER(uint32_t, real_t, dense_libsvm,
+  data::CreateDenseLibSVMParser<uint32_t __DMLC_COMMA real_t>);
 }  // namespace dmlc
--- a/plugin/example/custom_obj.cc
+++ b/plugin/example/custom_obj.cc
@@ -36,21 +36,21 @@ class MyLogistic : public ObjFunction {
  void GetGradient(HostDeviceVector<bst_float> *preds,
                   const MetaInfo &info,
                   int iter,
-                   HostDeviceVector<bst_gpair> *out_gpair) override {
-    out_gpair->resize(preds->size());
-    std::vector<bst_float>& preds_h = preds->data_h();
-    std::vector<bst_gpair>& out_gpair_h = out_gpair->data_h();
+                   HostDeviceVector<GradientPair> *out_gpair) override {
+    out_gpair->Resize(preds->Size());
+    std::vector<bst_float>& preds_h = preds->HostVector();
+    std::vector<GradientPair>& out_gpair_h = out_gpair->HostVector();
    for (size_t i = 0; i < preds_h.size(); ++i) {
      bst_float w = info.GetWeight(i);
      // scale the negative examples!
-      if (info.labels[i] == 0.0f) w *= param_.scale_neg_weight;
+      if (info.labels_[i] == 0.0f) w *= param_.scale_neg_weight;
      // logistic transformation
      bst_float p = 1.0f / (1.0f + std::exp(-preds_h[i]));
      // this is the gradient
-      bst_float grad = (p - info.labels[i]) * w;
+      bst_float grad = (p - info.labels_[i]) * w;
      // this is the second order gradient
      bst_float hess = p * (1.0f - p) * w;
-      out_gpair_h.at(i) = bst_gpair(grad, hess);
+      out_gpair_h.at(i) = GradientPair(grad, hess);
    }
  }
  const char* DefaultEvalMetric() const override {
@@ -58,7 +58,7 @@ class MyLogistic : public ObjFunction {
  }
  void PredTransform(HostDeviceVector<bst_float> *io_preds) override {
    // transform margin value to probability.
-    std::vector<bst_float> &preds = io_preds->data_h();
+    std::vector<bst_float> &preds = io_preds->HostVector();
    for (size_t i = 0; i < preds.size(); ++i) {
      preds[i] = 1.0f / (1.0f + std::exp(-preds[i]));
    }
--- a/python-package/xgboost/VERSION
+++ b/python-package/xgboost/VERSION
@@ -1 +1 @@
-0.71
+0.72
--- a/python-package/xgboost/core.py
+++ b/python-package/xgboost/core.py
@@ -626,9 +626,8 @@ class DMatrix(object):
        feature_names : list or None
        """
        if self._feature_names is None:
-            return ['f{0}'.format(i) for i in range(self.num_col())]
-        else:
-            return self._feature_names
+            self._feature_names = ['f{0}'.format(i) for i in range(self.num_col())]
+        return self._feature_names

    @property
    def feature_types(self):
@@ -989,7 +988,8 @@ class Booster(object):
        return self.eval_set([(data, name)], iteration)

    def predict(self, data, output_margin=False, ntree_limit=0, pred_leaf=False,
-                pred_contribs=False, approx_contribs=False, pred_interactions=False):
+                pred_contribs=False, approx_contribs=False, pred_interactions=False,
+                validate_features=True):
        """
        Predict with data.

@@ -1031,6 +1031,10 @@ class Booster(object):
            pred_contribs), and the sum of the entire matrix equals the raw untransformed margin
            value of the prediction. Note the last row and column correspond to the bias term.

+        validate_features : bool
+            When this is True, validate that the Booster's and data's feature_names are identical.
+            Otherwise, it is assumed that the feature_names are the same.
+
        Returns
        -------
        prediction : numpy array
@@ -1047,7 +1051,8 @@ class Booster(object):
        if pred_interactions:
            option_mask |= 0x10

-        self._validate_features(data)
+        if validate_features:
+            self._validate_features(data)

        length = c_bst_ulong()
        preds = ctypes.POINTER(ctypes.c_float)()
--- a/python-package/xgboost/libpath.py
+++ b/python-package/xgboost/libpath.py
@@ -34,7 +34,7 @@ def find_lib_path():
            # hack for pip installation when copy all parent source directory here
            dll_path.append(os.path.join(curr_path, './windows/Release/'))
        dll_path = [os.path.join(p, 'xgboost.dll') for p in dll_path]
-    elif sys.platform.startswith('linux'):
+    elif sys.platform.startswith('linux') or sys.platform.startswith('freebsd'):
        dll_path = [os.path.join(p, 'libxgboost.so') for p in dll_path]
    elif sys.platform == 'darwin':
        dll_path = [os.path.join(p, 'libxgboost.dylib') for p in dll_path]
--- a/python-package/xgboost/sklearn.py
+++ b/python-package/xgboost/sklearn.py
@@ -215,7 +215,8 @@ class XGBModel(XGBModelBase):
        return xgb_params

    def fit(self, X, y, sample_weight=None, eval_set=None, eval_metric=None,
-            early_stopping_rounds=None, verbose=True, xgb_model=None):
+            early_stopping_rounds=None, verbose=True, xgb_model=None,
+            sample_weight_eval_set=None):
        # pylint: disable=missing-docstring,invalid-name,attribute-defined-outside-init
        """
        Fit the gradient boosting model
@@ -231,6 +232,9 @@ class XGBModel(XGBModelBase):
        eval_set : list, optional
            A list of (X, y) tuple pairs to use as a validation set for
            early-stopping
+        sample_weight_eval_set : list, optional
+            A list of the form [L_1, L_2, ..., L_n], where each L_i is a list of
+            instance weights on the i-th validation set.
        eval_metric : str, callable, optional
            If a str, should be a built-in evaluation metric to use. See
            doc/parameter.md. If callable, a custom evaluation metric. The call
@@ -263,9 +267,14 @@ class XGBModel(XGBModelBase):
            trainDmatrix = DMatrix(X, label=y, missing=self.missing, nthread=self.n_jobs)

        evals_result = {}
+
        if eval_set is not None:
-            evals = list(DMatrix(x[0], label=x[1], missing=self.missing,
-                                 nthread=self.n_jobs) for x in eval_set)
+            if sample_weight_eval_set is None:
+                sample_weight_eval_set = [None] * len(eval_set)
+            evals = list(
+                DMatrix(eval_set[i][0], label=eval_set[i][1], missing=self.missing,
+                        weight=sample_weight_eval_set[i], nthread=self.n_jobs)
+                for i in range(len(eval_set)))
            evals = list(zip(evals, ["validation_{}".format(i) for i in
                                     range(len(evals))]))
        else:
@@ -408,7 +417,8 @@ class XGBClassifier(XGBModel, XGBClassifierBase):
                                            random_state, seed, missing, **kwargs)

    def fit(self, X, y, sample_weight=None, eval_set=None, eval_metric=None,
-            early_stopping_rounds=None, verbose=True, xgb_model=None):
+            early_stopping_rounds=None, verbose=True, xgb_model=None,
+            sample_weight_eval_set=None):
        # pylint: disable = attribute-defined-outside-init,arguments-differ
        """
        Fit gradient boosting classifier
@@ -424,6 +434,9 @@ class XGBClassifier(XGBModel, XGBClassifierBase):
        eval_set : list, optional
            A list of (X, y) pairs to use as a validation set for
            early-stopping
+        sample_weight_eval_set : list, optional
+            A list of the form [L_1, L_2, ..., L_n], where each L_i is a list of
+            instance weights on the i-th validation set.
        eval_metric : str, callable, optional
            If a str, should be a built-in evaluation metric to use. See
            doc/parameter.md. If callable, a custom evaluation metric. The call
@@ -478,11 +491,13 @@ class XGBClassifier(XGBModel, XGBClassifierBase):
        training_labels = self._le.transform(y)

        if eval_set is not None:
-            # TODO: use sample_weight if given?
+            if sample_weight_eval_set is None:
+                sample_weight_eval_set = [None] * len(eval_set)
            evals = list(
-                DMatrix(x[0], label=self._le.transform(x[1]),
-                        missing=self.missing, nthread=self.n_jobs)
-                for x in eval_set
+                DMatrix(eval_set[i][0], label=self._le.transform(eval_set[i][1]),
+                        missing=self.missing, weight=sample_weight_eval_set[i],
+                        nthread=self.n_jobs)
+                for i in range(len(eval_set))
            )
            nevals = len(evals)
            eval_names = ["validation_{}".format(i) for i in range(nevals)]
@@ -549,7 +564,7 @@ class XGBClassifier(XGBModel, XGBClassifierBase):
            column_indexes[class_probs > 0.5] = 1
        return self._le.inverse_transform(column_indexes)

-    def predict_proba(self, data, output_margin=False, ntree_limit=0):
+    def predict_proba(self, data, ntree_limit=0):
        """
        Predict the probability of each `data` example being of a given class.
        NOTE: This function is not thread safe.
@@ -560,8 +575,6 @@ class XGBClassifier(XGBModel, XGBClassifierBase):
        ----------
        data : DMatrix
            The dmatrix storing the input.
-        output_margin : bool
-            Whether to output the raw untransformed margin value.
        ntree_limit : int
            Limit number of trees in the prediction; defaults to 0 (use all trees).
        Returns
@@ -571,7 +584,6 @@ class XGBClassifier(XGBModel, XGBClassifierBase):
        """
        test_dmatrix = DMatrix(data, missing=self.missing, nthread=self.n_jobs)
        class_probs = self.get_booster().predict(test_dmatrix,
-                                                 output_margin=output_margin,
                                                 ntree_limit=ntree_limit)
        if self.objective == "multi:softprob":
            return class_probs
--- a/2
+++ b/2
--- a/src/c_api/c_api.cc
+++ b/src/c_api/c_api.cc
@@ -27,7 +27,7 @@ class Booster {
        initialized_(false),
        learner_(Learner::Create(cache_mats)) {}

-  inline Learner* learner() {
+  inline Learner* learner() {  // NOLINT
    return learner_.get();
  }

@@ -40,7 +40,7 @@ class Booster {
        return x.first == name;
      });
    if (it == cfg_.end()) {
-      cfg_.push_back(std::make_pair(name, val));
+      cfg_.emplace_back(name, val);
    } else {
      (*it).second = val;
    }
@@ -193,11 +193,11 @@ struct XGBAPIThreadLocalEntry {
  /*! \brief returning float vector. */
  HostDeviceVector<bst_float> ret_vec_float;
  /*! \brief temp variable of gradient pairs. */
-  HostDeviceVector<bst_gpair> tmp_gpair;
+  HostDeviceVector<GradientPair> tmp_gpair;
 };

 // define the threadlocal store.
-typedef dmlc::ThreadLocalStore<XGBAPIThreadLocalEntry> XGBAPIThreadLocalStore;
+using XGBAPIThreadLocalStore = dmlc::ThreadLocalStore<XGBAPIThreadLocalEntry>;

 int XGDMatrixCreateFromFile(const char *fname,
                            int silent,
@@ -254,14 +254,14 @@ XGB_DLL int XGDMatrixCreateFromCSREx(const size_t* indptr,
    mat.row_ptr_.push_back(mat.row_data_.size());
  }

-  mat.info.num_col = num_column;
+  mat.info.num_col_ = num_column;
  if (num_col > 0) {
-    CHECK_LE(mat.info.num_col, num_col)
-        << "num_col=" << num_col << " vs " << mat.info.num_col;
-    mat.info.num_col = num_col;
+    CHECK_LE(mat.info.num_col_, num_col)
+        << "num_col=" << num_col << " vs " << mat.info.num_col_;
+    mat.info.num_col_ = num_col;
  }
-  mat.info.num_row = nindptr - 1;
-  mat.info.num_nonzero = mat.row_data_.size();
+  mat.info.num_row_ = nindptr - 1;
+  mat.info.num_nonzero_ = mat.row_data_.size();
  *out = new std::shared_ptr<DMatrix>(DMatrix::Create(std::move(source)));
  API_END();
 }
@@ -317,13 +317,13 @@ XGB_DLL int XGDMatrixCreateFromCSCEx(const size_t* col_ptr,
      }
    }
  }
-  mat.info.num_row = mat.row_ptr_.size() - 1;
+  mat.info.num_row_ = mat.row_ptr_.size() - 1;
  if (num_row > 0) {
-    CHECK_LE(mat.info.num_row, num_row);
-    mat.info.num_row = num_row;
+    CHECK_LE(mat.info.num_row_, num_row);
+    mat.info.num_row_ = num_row;
  }
-  mat.info.num_col = ncol;
-  mat.info.num_nonzero = nelem;
+  mat.info.num_col_ = ncol;
+  mat.info.num_nonzero_ = nelem;
  *out  = new std::shared_ptr<DMatrix>(DMatrix::Create(std::move(source)));
  API_END();
 }
@@ -353,8 +353,8 @@ XGB_DLL int XGDMatrixCreateFromMat(const bst_float* data,
  data::SimpleCSRSource& mat = *source;
  mat.row_ptr_.resize(1+nrow);
  bool nan_missing = common::CheckNAN(missing);
-  mat.info.num_row = nrow;
-  mat.info.num_col = ncol;
+  mat.info.num_row_ = nrow;
+  mat.info.num_col_ = ncol;
  const bst_float* data0 = data;

  // count elements for sizing data
@@ -389,12 +389,12 @@ XGB_DLL int XGDMatrixCreateFromMat(const bst_float* data,
    }
  }

-  mat.info.num_nonzero = mat.row_data_.size();
+  mat.info.num_nonzero_ = mat.row_data_.size();
  *out  = new std::shared_ptr<DMatrix>(DMatrix::Create(std::move(source)));
  API_END();
 }

-void prefixsum_inplace(size_t *x, size_t N) {
+void PrefixSum(size_t *x, size_t N) {
  size_t *suma;
 #pragma omp parallel
  {
@@ -425,12 +425,10 @@ void prefixsum_inplace(size_t *x, size_t N) {
  delete[] suma;
 }

-
-XGB_DLL int XGDMatrixCreateFromMat_omp(const bst_float* data,
+XGB_DLL int XGDMatrixCreateFromMat_omp(const bst_float* data,  // NOLINT
                                       xgboost::bst_ulong nrow,
                                       xgboost::bst_ulong ncol,
-                                       bst_float missing,
-                                       DMatrixHandle* out,
+                                       bst_float missing, DMatrixHandle* out,
                                       int nthread) {
  // avoid openmp unless enough data to be worth it to avoid overhead costs
  if (nrow*ncol <= 10000*50) {
@@ -446,8 +444,8 @@ XGB_DLL int XGDMatrixCreateFromMat_omp(const bst_float* data,
  std::unique_ptr<data::SimpleCSRSource> source(new data::SimpleCSRSource());
  data::SimpleCSRSource& mat = *source;
  mat.row_ptr_.resize(1+nrow);
-  mat.info.num_row = nrow;
-  mat.info.num_col = ncol;
+  mat.info.num_row_ = nrow;
+  mat.info.num_col_ = ncol;

  // Check for errors in missing elements
  // Count elements per row (to avoid otherwise need to copy)
@@ -480,7 +478,7 @@ XGB_DLL int XGDMatrixCreateFromMat_omp(const bst_float* data,
  }

  // do cumulative sum (to avoid otherwise need to copy)
-  prefixsum_inplace(&mat.row_ptr_[0], mat.row_ptr_.size());
+  PrefixSum(&mat.row_ptr_[0], mat.row_ptr_.size());
  mat.row_data_.resize(mat.row_data_.size() + mat.row_ptr_.back());

  // Fill data matrix (now that know size, no need for slow push_back())
@@ -500,7 +498,7 @@ XGB_DLL int XGDMatrixCreateFromMat_omp(const bst_float* data,
    }
  }

-  mat.info.num_nonzero = mat.row_data_.size();
+  mat.info.num_nonzero_ = mat.row_data_.size();
  *out  = new std::shared_ptr<DMatrix>(DMatrix::Create(std::move(source)));
  API_END();
 }
@@ -516,12 +514,12 @@ XGB_DLL int XGDMatrixSliceDMatrix(DMatrixHandle handle,
  src.CopyFrom(static_cast<std::shared_ptr<DMatrix>*>(handle)->get());
  data::SimpleCSRSource& ret = *source;

-  CHECK_EQ(src.info.group_ptr.size(), 0U)
+  CHECK_EQ(src.info.group_ptr_.size(), 0U)
      << "slice does not support group structure";

  ret.Clear();
-  ret.info.num_row = len;
-  ret.info.num_col = src.info.num_col;
+  ret.info.num_row_ = len;
+  ret.info.num_col_ = src.info.num_col_;

  dmlc::DataIter<RowBatch>* iter = &src;
  iter->BeforeFirst();
@@ -532,23 +530,22 @@ XGB_DLL int XGDMatrixSliceDMatrix(DMatrixHandle handle,
    const int ridx = idxset[i];
    RowBatch::Inst inst = batch[ridx];
    CHECK_LT(static_cast<xgboost::bst_ulong>(ridx), batch.size);
-    ret.row_data_.resize(ret.row_data_.size() + inst.length);
-    std::memcpy(dmlc::BeginPtr(ret.row_data_) + ret.row_ptr_.back(), inst.data,
-                sizeof(RowBatch::Entry) * inst.length);
+    ret.row_data_.insert(ret.row_data_.end(), inst.data,
+                         inst.data + inst.length);
    ret.row_ptr_.push_back(ret.row_ptr_.back() + inst.length);
-    ret.info.num_nonzero += inst.length;
+    ret.info.num_nonzero_ += inst.length;

-    if (src.info.labels.size() != 0) {
-      ret.info.labels.push_back(src.info.labels[ridx]);
+    if (src.info.labels_.size() != 0) {
+      ret.info.labels_.push_back(src.info.labels_[ridx]);
    }
-    if (src.info.weights.size() != 0) {
-      ret.info.weights.push_back(src.info.weights[ridx]);
+    if (src.info.weights_.size() != 0) {
+      ret.info.weights_.push_back(src.info.weights_[ridx]);
    }
-    if (src.info.base_margin.size() != 0) {
-      ret.info.base_margin.push_back(src.info.base_margin[ridx]);
+    if (src.info.base_margin_.size() != 0) {
+      ret.info.base_margin_.push_back(src.info.base_margin_[ridx]);
    }
-    if (src.info.root_index.size() != 0) {
-      ret.info.root_index.push_back(src.info.root_index[ridx]);
+    if (src.info.root_index_.size() != 0) {
+      ret.info.root_index_.push_back(src.info.root_index_[ridx]);
    }
  }
  *out = new std::shared_ptr<DMatrix>(DMatrix::Create(std::move(source)));
@@ -575,7 +572,7 @@ XGB_DLL int XGDMatrixSetFloatInfo(DMatrixHandle handle,
                          xgboost::bst_ulong len) {
  API_BEGIN();
  static_cast<std::shared_ptr<DMatrix>*>(handle)
-      ->get()->info().SetInfo(field, info, kFloat32, len);
+      ->get()->Info().SetInfo(field, info, kFloat32, len);
  API_END();
 }

@@ -585,7 +582,7 @@ XGB_DLL int XGDMatrixSetUIntInfo(DMatrixHandle handle,
                         xgboost::bst_ulong len) {
  API_BEGIN();
  static_cast<std::shared_ptr<DMatrix>*>(handle)
-      ->get()->info().SetInfo(field, info, kUInt32, len);
+      ->get()->Info().SetInfo(field, info, kUInt32, len);
  API_END();
 }

@@ -593,12 +590,12 @@ XGB_DLL int XGDMatrixSetGroup(DMatrixHandle handle,
                              const unsigned* group,
                              xgboost::bst_ulong len) {
  API_BEGIN();
-  std::shared_ptr<DMatrix> *pmat = static_cast<std::shared_ptr<DMatrix>*>(handle);
-  MetaInfo& info = pmat->get()->info();
-  info.group_ptr.resize(len + 1);
-  info.group_ptr[0] = 0;
+  auto *pmat = static_cast<std::shared_ptr<DMatrix>*>(handle);
+  MetaInfo& info = pmat->get()->Info();
+  info.group_ptr_.resize(len + 1);
+  info.group_ptr_[0] = 0;
  for (uint64_t i = 0; i < len; ++i) {
-    info.group_ptr[i + 1] = info.group_ptr[i] + group[i];
+    info.group_ptr_[i + 1] = info.group_ptr_[i] + group[i];
  }
  API_END();
 }
@@ -608,18 +605,18 @@ XGB_DLL int XGDMatrixGetFloatInfo(const DMatrixHandle handle,
                                  xgboost::bst_ulong* out_len,
                                  const bst_float** out_dptr) {
  API_BEGIN();
-  const MetaInfo& info = static_cast<std::shared_ptr<DMatrix>*>(handle)->get()->info();
+  const MetaInfo& info = static_cast<std::shared_ptr<DMatrix>*>(handle)->get()->Info();
  const std::vector<bst_float>* vec = nullptr;
  if (!std::strcmp(field, "label")) {
-    vec = &info.labels;
+    vec = &info.labels_;
  } else if (!std::strcmp(field, "weight")) {
-    vec = &info.weights;
+    vec = &info.weights_;
  } else if (!std::strcmp(field, "base_margin")) {
-    vec = &info.base_margin;
+    vec = &info.base_margin_;
  } else {
    LOG(FATAL) << "Unknown float field name " << field;
  }
-  *out_len = static_cast<xgboost::bst_ulong>(vec->size());
+  *out_len = static_cast<xgboost::bst_ulong>(vec->size());  // NOLINT
  *out_dptr = dmlc::BeginPtr(*vec);
  API_END();
 }
@@ -629,15 +626,15 @@ XGB_DLL int XGDMatrixGetUIntInfo(const DMatrixHandle handle,
                                 xgboost::bst_ulong *out_len,
                                 const unsigned **out_dptr) {
  API_BEGIN();
-  const MetaInfo& info = static_cast<std::shared_ptr<DMatrix>*>(handle)->get()->info();
+  const MetaInfo& info = static_cast<std::shared_ptr<DMatrix>*>(handle)->get()->Info();
  const std::vector<unsigned>* vec = nullptr;
  if (!std::strcmp(field, "root_index")) {
-    vec = &info.root_index;
+    vec = &info.root_index_;
+    *out_len = static_cast<xgboost::bst_ulong>(vec->size());
+    *out_dptr = dmlc::BeginPtr(*vec);
  } else {
    LOG(FATAL) << "Unknown uint field name " << field;
  }
-  *out_len = static_cast<xgboost::bst_ulong>(vec->size());
-  *out_dptr = dmlc::BeginPtr(*vec);
  API_END();
 }

@@ -645,7 +642,7 @@ XGB_DLL int XGDMatrixNumRow(const DMatrixHandle handle,
                            xgboost::bst_ulong *out) {
  API_BEGIN();
  *out = static_cast<xgboost::bst_ulong>(
-      static_cast<std::shared_ptr<DMatrix>*>(handle)->get()->info().num_row);
+      static_cast<std::shared_ptr<DMatrix>*>(handle)->get()->Info().num_row_);
  API_END();
 }

@@ -653,7 +650,7 @@ XGB_DLL int XGDMatrixNumCol(const DMatrixHandle handle,
                            xgboost::bst_ulong *out) {
  API_BEGIN();
  *out = static_cast<size_t>(
-      static_cast<std::shared_ptr<DMatrix>*>(handle)->get()->info().num_col);
+      static_cast<std::shared_ptr<DMatrix>*>(handle)->get()->Info().num_col_);
  API_END();
 }

@@ -688,8 +685,8 @@ XGB_DLL int XGBoosterUpdateOneIter(BoosterHandle handle,
                                   int iter,
                                   DMatrixHandle dtrain) {
  API_BEGIN();
-  Booster* bst = static_cast<Booster*>(handle);
-  std::shared_ptr<DMatrix> *dtr =
+  auto* bst = static_cast<Booster*>(handle);
+  auto *dtr =
      static_cast<std::shared_ptr<DMatrix>*>(dtrain);

  bst->LazyInit();
@@ -702,15 +699,15 @@ XGB_DLL int XGBoosterBoostOneIter(BoosterHandle handle,
                                  bst_float *grad,
                                  bst_float *hess,
                                  xgboost::bst_ulong len) {
-  HostDeviceVector<bst_gpair>& tmp_gpair = XGBAPIThreadLocalStore::Get()->tmp_gpair;
+  HostDeviceVector<GradientPair>& tmp_gpair = XGBAPIThreadLocalStore::Get()->tmp_gpair;
  API_BEGIN();
-  Booster* bst = static_cast<Booster*>(handle);
-  std::shared_ptr<DMatrix>* dtr =
+  auto* bst = static_cast<Booster*>(handle);
+  auto* dtr =
      static_cast<std::shared_ptr<DMatrix>*>(dtrain);
-  tmp_gpair.resize(len);
-  std::vector<bst_gpair>& tmp_gpair_h = tmp_gpair.data_h();
+  tmp_gpair.Resize(len);
+  std::vector<GradientPair>& tmp_gpair_h = tmp_gpair.HostVector();
  for (xgboost::bst_ulong i = 0; i < len; ++i) {
-    tmp_gpair_h[i] = bst_gpair(grad[i], hess[i]);
+    tmp_gpair_h[i] = GradientPair(grad[i], hess[i]);
  }

  bst->LazyInit();
@@ -726,13 +723,13 @@ XGB_DLL int XGBoosterEvalOneIter(BoosterHandle handle,
                                 const char** out_str) {
  std::string& eval_str = XGBAPIThreadLocalStore::Get()->ret_str;
  API_BEGIN();
-  Booster* bst = static_cast<Booster*>(handle);
+  auto* bst = static_cast<Booster*>(handle);
  std::vector<DMatrix*> data_sets;
  std::vector<std::string> data_names;

  for (xgboost::bst_ulong i = 0; i < len; ++i) {
    data_sets.push_back(static_cast<std::shared_ptr<DMatrix>*>(dmats[i])->get());
-    data_names.push_back(std::string(evnames[i]));
+    data_names.emplace_back(evnames[i]);
  }

  bst->LazyInit();
@@ -750,7 +747,7 @@ XGB_DLL int XGBoosterPredict(BoosterHandle handle,
  HostDeviceVector<bst_float>& preds =
    XGBAPIThreadLocalStore::Get()->ret_vec_float;
  API_BEGIN();
-  Booster *bst = static_cast<Booster*>(handle);
+  auto *bst = static_cast<Booster*>(handle);
  bst->LazyInit();
  bst->learner()->Predict(
      static_cast<std::shared_ptr<DMatrix>*>(dmat)->get(),
@@ -760,8 +757,8 @@ XGB_DLL int XGBoosterPredict(BoosterHandle handle,
      (option_mask & 4) != 0,
      (option_mask & 8) != 0,
      (option_mask & 16) != 0);
-  *out_result = dmlc::BeginPtr(preds.data_h());
-  *len = static_cast<xgboost::bst_ulong>(preds.size());
+  *out_result = dmlc::BeginPtr(preds.HostVector());
+  *len = static_cast<xgboost::bst_ulong>(preds.Size());
  API_END();
 }

@@ -775,7 +772,7 @@ XGB_DLL int XGBoosterLoadModel(BoosterHandle handle, const char* fname) {
 XGB_DLL int XGBoosterSaveModel(BoosterHandle handle, const char* fname) {
  API_BEGIN();
  std::unique_ptr<dmlc::Stream> fo(dmlc::Stream::Create(fname, "w"));
-  Booster *bst = static_cast<Booster*>(handle);
+  auto *bst = static_cast<Booster*>(handle);
  bst->LazyInit();
  bst->learner()->Save(fo.get());
  API_END();
@@ -798,7 +795,7 @@ XGB_DLL int XGBoosterGetModelRaw(BoosterHandle handle,

  API_BEGIN();
  common::MemoryBufferStream fo(&raw_str);
-  Booster *bst = static_cast<Booster*>(handle);
+  auto *bst = static_cast<Booster*>(handle);
  bst->LazyInit();
  bst->learner()->Save(&fo);
  *out_dptr = dmlc::BeginPtr(raw_str);
@@ -815,7 +812,7 @@ inline void XGBoostDumpModelImpl(
    const char*** out_models) {
  std::vector<std::string>& str_vecs = XGBAPIThreadLocalStore::Get()->ret_vec_str;
  std::vector<const char*>& charp_vecs = XGBAPIThreadLocalStore::Get()->ret_vec_charp;
-  Booster *bst = static_cast<Booster*>(handle);
+  auto *bst = static_cast<Booster*>(handle);
  bst->LazyInit();
  str_vecs = bst->learner()->DumpModel(fmap, with_stats != 0, format);
  charp_vecs.resize(str_vecs.size());
@@ -881,7 +878,7 @@ XGB_DLL int XGBoosterGetAttr(BoosterHandle handle,
                     const char* key,
                     const char** out,
                     int* success) {
-  Booster* bst = static_cast<Booster*>(handle);
+  auto* bst = static_cast<Booster*>(handle);
  std::string& ret_str = XGBAPIThreadLocalStore::Get()->ret_str;
  API_BEGIN();
  if (bst->learner()->GetAttr(key, &ret_str)) {
@@ -897,7 +894,7 @@ XGB_DLL int XGBoosterGetAttr(BoosterHandle handle,
 XGB_DLL int XGBoosterSetAttr(BoosterHandle handle,
                     const char* key,
                     const char* value) {
-  Booster* bst = static_cast<Booster*>(handle);
+  auto* bst = static_cast<Booster*>(handle);
  API_BEGIN();
  if (value == nullptr) {
    bst->learner()->DelAttr(key);
@@ -912,7 +909,7 @@ XGB_DLL int XGBoosterGetAttrNames(BoosterHandle handle,
                     const char*** out) {
  std::vector<std::string>& str_vecs = XGBAPIThreadLocalStore::Get()->ret_vec_str;
  std::vector<const char*>& charp_vecs = XGBAPIThreadLocalStore::Get()->ret_vec_charp;
-  Booster *bst = static_cast<Booster*>(handle);
+  auto *bst = static_cast<Booster*>(handle);
  API_BEGIN();
  str_vecs = bst->learner()->GetAttrNames();
  charp_vecs.resize(str_vecs.size());
@@ -927,7 +924,7 @@ XGB_DLL int XGBoosterGetAttrNames(BoosterHandle handle,
 XGB_DLL int XGBoosterLoadRabitCheckpoint(BoosterHandle handle,
                                 int* version) {
  API_BEGIN();
-  Booster* bst = static_cast<Booster*>(handle);
+  auto* bst = static_cast<Booster*>(handle);
  *version = rabit::LoadCheckPoint(bst->learner());
  if (*version != 0) {
    bst->initialized_ = true;
@@ -937,7 +934,7 @@ XGB_DLL int XGBoosterLoadRabitCheckpoint(BoosterHandle handle,

 XGB_DLL int XGBoosterSaveRabitCheckpoint(BoosterHandle handle) {
  API_BEGIN();
-  Booster* bst = static_cast<Booster*>(handle);
+  auto* bst = static_cast<Booster*>(handle);
  if (bst->learner()->AllowLazyCheckPoint()) {
    rabit::LazyCheckPoint(bst->learner());
  } else {
--- a/src/c_api/c_api_error.cc
+++ b/src/c_api/c_api_error.cc
@@ -10,7 +10,7 @@ struct XGBAPIErrorEntry {
  std::string last_error;
 };

-typedef dmlc::ThreadLocalStore<XGBAPIErrorEntry> XGBAPIErrorStore;
+using XGBAPIErrorStore = dmlc::ThreadLocalStore<XGBAPIErrorEntry>;

 const char *XGBGetLastError() {
  return XGBAPIErrorStore::Get()->last_error.c_str();
--- a/src/cli_main.cc
+++ b/src/cli_main.cc
@@ -134,7 +134,7 @@ struct CLIParam : public dmlc::Parameter<CLIParam> {
        char evname[256];
        CHECK_EQ(sscanf(kv.first.c_str(), "eval[%[^]]", evname), 1)
            << "must specify evaluation name for display";
-        eval_data_names.push_back(std::string(evname));
+        eval_data_names.emplace_back(evname);
        eval_data_paths.push_back(kv.second);
      }
    }
@@ -177,7 +177,7 @@ void CLITrain(const CLIParam& param) {
  std::vector<std::string> eval_data_names = param.eval_data_names;
  if (param.eval_train) {
    eval_datasets.push_back(dtrain.get());
-    eval_data_names.push_back(std::string("train"));
+    eval_data_names.emplace_back("train");
  }
  // initialize the learner.
  std::unique_ptr<Learner> learner(Learner::Create(cache_mats));
@@ -332,7 +332,7 @@ void CLIPredict(const CLIParam& param) {
  std::unique_ptr<dmlc::Stream> fo(
      dmlc::Stream::Create(param.name_pred.c_str(), "w"));
  dmlc::ostream os(fo.get());
-  for (bst_float p : preds.data_h()) {
+  for (bst_float p : preds.HostVector()) {
    os << p << '\n';
  }
  // force flush before fo destruct.
@@ -347,17 +347,17 @@ int CLIRunTask(int argc, char *argv[]) {
  rabit::Init(argc, argv);

  std::vector<std::pair<std::string, std::string> > cfg;
-  cfg.push_back(std::make_pair("seed", "0"));
+  cfg.emplace_back("seed", "0");

  common::ConfigIterator itr(argv[1]);
  while (itr.Next()) {
-    cfg.push_back(std::make_pair(std::string(itr.name()), std::string(itr.val())));
+    cfg.emplace_back(std::string(itr.Name()), std::string(itr.Val()));
  }

  for (int i = 2; i < argc; ++i) {
    char name[256], val[256];
    if (sscanf(argv[i], "%[^=]=%s", name, val) == 2) {
-      cfg.push_back(std::make_pair(std::string(name), std::string(val)));
+      cfg.emplace_back(std::string(name), std::string(val));
    }
  }
  CLIParam param;
--- a/src/common/avx_helpers.h
+++ b/src/common/avx_helpers.h
@@ -68,10 +68,10 @@ inline Float8 round(const Float8& x) {

 // Overload std::max/min
 namespace std {
-inline avx::Float8 max(const avx::Float8& a, const avx::Float8& b) {
+inline avx::Float8 max(const avx::Float8& a, const avx::Float8& b) {  // NOLINT
  return avx::Float8(_mm256_max_ps(a.x, b.x));
 }
-inline avx::Float8 min(const avx::Float8& a, const avx::Float8& b) {
+inline avx::Float8 min(const avx::Float8& a, const avx::Float8& b) {  // NOLINT
  return avx::Float8(_mm256_min_ps(a.x, b.x));
 }
 }  // namespace std
@@ -172,7 +172,7 @@ inline Float8 Sigmoid(Float8 x) {
 }

 // Store 8 gradient pairs given vectors containing gradient and Hessian
-inline void StoreGpair(xgboost::bst_gpair* dst, const Float8& grad,
+inline void StoreGpair(xgboost::GradientPair* dst, const Float8& grad,
                       const Float8& hess) {
  float* ptr = reinterpret_cast<float*>(dst);
  __m256 gpair_low = _mm256_unpacklo_ps(grad.x, hess.x);
@@ -190,11 +190,11 @@ namespace avx {
 * \brief Fallback implementation not using AVX.
 */

-struct Float8 {
+struct Float8 {  // NOLINT
  float x[8];
  explicit Float8(const float& val) {
-    for (int i = 0; i < 8; i++) {
-      x[i] = val;
+    for (float & i : x) {
+      i = val;
    }
  }
  explicit Float8(const float* vec) {
@@ -202,7 +202,7 @@ struct Float8 {
      x[i] = vec[i];
    }
  }
-  Float8() {}
+  Float8() = default;
  Float8& operator+=(const Float8& rhs) {
    for (int i = 0; i < 8; i++) {
      x[i] += rhs.x[i];
@@ -228,7 +228,7 @@ struct Float8 {
    return *this;
  }
  void Print() {
-    float* f = reinterpret_cast<float*>(&x);
+    auto* f = reinterpret_cast<float*>(&x);
    printf("%f %f %f %f %f %f %f %f\n", f[0], f[1], f[2], f[3], f[4], f[5],
           f[6], f[7]);
  }
@@ -252,10 +252,10 @@ inline Float8 operator/(Float8 lhs, const Float8& rhs) {
 }

 // Store 8 gradient pairs given vectors containing gradient and Hessian
-inline void StoreGpair(xgboost::bst_gpair* dst, const Float8& grad,
+inline void StoreGpair(xgboost::GradientPair* dst, const Float8& grad,
                       const Float8& hess) {
  for (int i = 0; i < 8; i++) {
-    dst[i] = xgboost::bst_gpair(grad.x[i], hess.x[i]);
+    dst[i] = xgboost::GradientPair(grad.x[i], hess.x[i]);
  }
 }

@@ -269,14 +269,14 @@ inline Float8 Sigmoid(Float8 x) {
 }  // namespace avx

 namespace std {
-inline avx::Float8 max(const avx::Float8& a, const avx::Float8& b) {
+inline avx::Float8 max(const avx::Float8& a, const avx::Float8& b) {  // NOLINT
  avx::Float8 max;
  for (int i = 0; i < 8; i++) {
    max.x[i] = std::max(a.x[i], b.x[i]);
  }
  return max;
 }
-inline avx::Float8 min(const avx::Float8& a, const avx::Float8& b) {
+inline avx::Float8 min(const avx::Float8& a, const avx::Float8& b) {  // NOLINT
  avx::Float8 min;
  for (int i = 0; i < 8; i++) {
    min.x[i] = std::min(a.x[i], b.x[i]);
--- a/src/common/bitmap.h
+++ b/src/common/bitmap.h
@@ -42,7 +42,7 @@ struct BitMap {
  inline void InitFromBool(const std::vector<int>& vec) {
    this->Resize(vec.size());
    // parallel over the full cases
-    bst_omp_uint nsize = static_cast<bst_omp_uint>(vec.size() / 32);
+    auto nsize = static_cast<bst_omp_uint>(vec.size() / 32);
    #pragma omp parallel for schedule(static)
    for (bst_omp_uint i = 0; i < nsize; ++i) {
      uint32_t res = 0;
--- a/src/common/column_matrix.h
+++ b/src/common/column_matrix.h
@@ -8,21 +8,27 @@
 #ifndef XGBOOST_COMMON_COLUMN_MATRIX_H_
 #define XGBOOST_COMMON_COLUMN_MATRIX_H_

-#define XGBOOST_TYPE_SWITCH(dtype, OP)        \
-switch (dtype) {                \
-  case xgboost::common::uint32 : {           \
-    typedef uint32_t DType;         \
-    OP; break;              \
-  }               \
-  case xgboost::common::uint16 : {           \
-    typedef uint16_t DType;         \
-    OP; break;              \
-  }               \
-  case xgboost::common::uint8 : {            \
-    typedef uint8_t DType;          \
-    OP; break;              \
-    default: LOG(FATAL) << "don't recognize type flag" << dtype;  \
-  }               \
+#define XGBOOST_TYPE_SWITCH(dtype, OP)                      \
+  \
+switch(dtype) {                                             \
+    case xgboost::common::uint32: {                         \
+      using DType = uint32_t;                               \
+      OP;                                                   \
+      break;                                                \
+    }                                                       \
+    case xgboost::common::uint16: {                         \
+      using DType = uint16_t;                               \
+      OP;                                                   \
+      break;                                                \
+    }                                                       \
+    case xgboost::common::uint8: {                          \
+      using DType = uint8_t;                                \
+      OP;                                                   \
+      break;                                                \
+      default:                                              \
+        LOG(FATAL) << "don't recognize type flag" << dtype; \
+    }                                                       \
+  \
 }

 #include <type_traits>
@@ -31,11 +37,12 @@ switch (dtype) {                \
 #include "hist_util.h"
 #include "../tree/fast_hist_param.h"

-using xgboost::tree::FastHistParam;

 namespace xgboost {
 namespace common {

+using tree::FastHistParam;
+
 /*! \brief indicator of data type used for storing bin id's in a column. */
 enum DataType {
  uint8 = 1,
@@ -78,7 +85,7 @@ class ColumnMatrix {
       slot of internal buffer. */
    packing_factor_ = sizeof(uint32_t) / static_cast<size_t>(this->dtype);

-    const bst_uint nfeature = static_cast<bst_uint>(gmat.cut->row_ptr.size() - 1);
+    const auto nfeature = static_cast<bst_uint>(gmat.cut->row_ptr.size() - 1);
    const size_t nrow = gmat.row_ptr.size() - 1;

    // identify type of each column
--- a/src/common/common.cc
+++ b/src/common/common.cc
@@ -14,7 +14,7 @@ struct RandomThreadLocalEntry {
  GlobalRandomEngine engine;
 };

-typedef dmlc::ThreadLocalStore<RandomThreadLocalEntry> RandomThreadLocalStore;
+using RandomThreadLocalStore = dmlc::ThreadLocalStore<RandomThreadLocalEntry>;

 GlobalRandomEngine& GlobalRandom() {
  return RandomThreadLocalStore::Get()->engine;
--- a/src/common/compressed_iterator.h
+++ b/src/common/compressed_iterator.h
@@ -11,20 +11,20 @@
 namespace xgboost {
 namespace common {

-typedef unsigned char compressed_byte_t;
+using CompressedByteT = unsigned char;

 namespace detail {
-inline void SetBit(compressed_byte_t *byte, int bit_idx) {
+inline void SetBit(CompressedByteT *byte, int bit_idx) {
  *byte |= 1 << bit_idx;
 }
 template <typename T>
 inline T CheckBit(const T &byte, int bit_idx) {
  return byte & (1 << bit_idx);
 }
-inline void ClearBit(compressed_byte_t *byte, int bit_idx) {
+inline void ClearBit(CompressedByteT *byte, int bit_idx) {
  *byte &= ~(1 << bit_idx);
 }
-static const int padding = 4;  // Assign padding so we can read slightly off
+static const int kPadding = 4;  // Assign padding so we can read slightly off
                               // the beginning of the array

 // The number of bits required to represent a given unsigned range
@@ -76,16 +76,16 @@ class CompressedBufferWriter {
    size_t compressed_size = static_cast<size_t>(std::ceil(
        static_cast<double>(detail::SymbolBits(num_symbols) * num_elements) /
        bits_per_byte));
-    return compressed_size + detail::padding;
+    return compressed_size + detail::kPadding;
  }

  template <typename T>
-  void WriteSymbol(compressed_byte_t *buffer, T symbol, size_t offset) {
+  void WriteSymbol(CompressedByteT *buffer, T symbol, size_t offset) {
    const int bits_per_byte = 8;

    for (size_t i = 0; i < symbol_bits_; i++) {
      size_t byte_idx = ((offset + 1) * symbol_bits_ - (i + 1)) / bits_per_byte;
-      byte_idx += detail::padding;
+      byte_idx += detail::kPadding;
      size_t bit_idx =
          ((bits_per_byte + i) - ((offset + 1) * symbol_bits_)) % bits_per_byte;

@@ -96,20 +96,20 @@ class CompressedBufferWriter {
      }
    }
  }
-  template <typename iter_t>
-  void Write(compressed_byte_t *buffer, iter_t input_begin, iter_t input_end) {
+  template <typename IterT>
+  void Write(CompressedByteT *buffer, IterT input_begin, IterT input_end) {
    uint64_t tmp = 0;
    size_t stored_bits = 0;
    const size_t max_stored_bits = 64 - symbol_bits_;
-    size_t buffer_position = detail::padding;
+    size_t buffer_position = detail::kPadding;
    const size_t num_symbols = input_end - input_begin;
    for (size_t i = 0; i < num_symbols; i++) {
-      typename std::iterator_traits<iter_t>::value_type symbol = input_begin[i];
+      typename std::iterator_traits<IterT>::value_type symbol = input_begin[i];
      if (stored_bits > max_stored_bits) {
        // Eject only full bytes
        size_t tmp_bytes = stored_bits / 8;
        for (size_t j = 0; j < tmp_bytes; j++) {
-          buffer[buffer_position] = static_cast<compressed_byte_t>(
+          buffer[buffer_position] = static_cast<CompressedByteT>(
              tmp >> (stored_bits - (j + 1) * 8));
          buffer_position++;
        }
@@ -129,10 +129,10 @@ class CompressedBufferWriter {
      int shift_bits = static_cast<int>(stored_bits) - (j + 1) * 8;
      if (shift_bits >= 0) {
        buffer[buffer_position] =
-            static_cast<compressed_byte_t>(tmp >> shift_bits);
+            static_cast<CompressedByteT>(tmp >> shift_bits);
      } else {
        buffer[buffer_position] =
-            static_cast<compressed_byte_t>(tmp << std::abs(shift_bits));
+            static_cast<CompressedByteT>(tmp << std::abs(shift_bits));
      }
      buffer_position++;
    }
@@ -153,23 +153,21 @@ template <typename T>

 class CompressedIterator {
 public:
-  typedef CompressedIterator<T> self_type;  ///< My own type
-  typedef ptrdiff_t
-      difference_type;   ///< Type to express the result of subtracting
-                         /// one iterator from another
-  typedef T value_type;  ///< The type of the element the iterator can point to
-  typedef value_type *pointer;   ///< The type of a pointer to an element the
-                                 /// iterator can point to
-  typedef value_type reference;  ///< The type of a reference to an element the
-                                 /// iterator can point to
+  // Type definitions for thrust
+  typedef CompressedIterator<T> self_type;  // NOLINT
+  typedef ptrdiff_t difference_type;        // NOLINT
+  typedef T value_type;                     // NOLINT
+  typedef value_type *pointer;              // NOLINT
+  typedef value_type reference;             // NOLINT
+
 private:
-  compressed_byte_t *buffer_;
+  CompressedByteT *buffer_;
  size_t symbol_bits_;
  size_t offset_;

 public:
  CompressedIterator() : buffer_(nullptr), symbol_bits_(0), offset_(0) {}
-  CompressedIterator(compressed_byte_t *buffer, int num_symbols)
+  CompressedIterator(CompressedByteT *buffer, int num_symbols)
      : buffer_(buffer), offset_(0) {
    symbol_bits_ = detail::SymbolBits(num_symbols);
  }
@@ -178,7 +176,7 @@ class CompressedIterator {
    const int bits_per_byte = 8;
    size_t start_bit_idx = ((offset_ + 1) * symbol_bits_ - 1);
    size_t start_byte_idx = start_bit_idx / bits_per_byte;
-    start_byte_idx += detail::padding;
+    start_byte_idx += detail::kPadding;

    // Read 5 bytes - the maximum we will need
    uint64_t tmp = static_cast<uint64_t>(buffer_[start_byte_idx - 4]) << 32 |
--- a/src/common/config.h
+++ b/src/common/config.h
@@ -24,33 +24,33 @@ class ConfigReaderBase {
   * \brief get current name, called after Next returns true
   * \return current parameter name
   */
-  inline const char *name(void) const {
-    return s_name.c_str();
+  inline const char *Name() const {
+    return s_name_.c_str();
  }
  /*!
   * \brief get current value, called after Next returns true
   * \return current parameter value
   */
-  inline const char *val(void) const {
-    return s_val.c_str();
+  inline const char *Val() const {
+    return s_val_.c_str();
  }
  /*!
   * \brief move iterator to next position
   * \return true if there is value in next position
   */
-  inline bool Next(void) {
+  inline bool Next() {
    while (!this->IsEnd()) {
-      GetNextToken(&s_name);
-      if (s_name == "=") return false;
-      if (GetNextToken(&s_buf) || s_buf != "=")  return false;
-      if (GetNextToken(&s_val) || s_val == "=")  return false;
+      GetNextToken(&s_name_);
+      if (s_name_ == "=") return false;
+      if (GetNextToken(&s_buf_) || s_buf_ != "=")  return false;
+      if (GetNextToken(&s_val_) || s_val_ == "=")  return false;
      return true;
    }
    return false;
  }
  // called before usage
-  inline void Init(void) {
-    ch_buf = this->GetChar();
+  inline void Init() {
+    ch_buf_ = this->GetChar();
  }

 protected:
@@ -58,38 +58,38 @@ class ConfigReaderBase {
   * \brief to be implemented by subclass,
   * get next token, return EOF if end of file
   */
-  virtual char GetChar(void) = 0;
+  virtual char GetChar() = 0;
  /*! \brief to be implemented by child, check if end of stream */
-  virtual bool IsEnd(void) = 0;
+  virtual bool IsEnd() = 0;

 private:
-  char ch_buf;
-  std::string s_name, s_val, s_buf;
+  char ch_buf_;
+  std::string s_name_, s_val_, s_buf_;

-  inline void SkipLine(void) {
+  inline void SkipLine() {
    do {
-      ch_buf = this->GetChar();
-    } while (ch_buf != EOF && ch_buf != '\n' && ch_buf != '\r');
+      ch_buf_ = this->GetChar();
+    } while (ch_buf_ != EOF && ch_buf_ != '\n' && ch_buf_ != '\r');
  }

  inline void ParseStr(std::string *tok) {
-    while ((ch_buf = this->GetChar()) != EOF) {
-      switch (ch_buf) {
+    while ((ch_buf_ = this->GetChar()) != EOF) {
+      switch (ch_buf_) {
        case '\\': *tok += this->GetChar(); break;
        case '\"': return;
        case '\r':
        case '\n': LOG(FATAL)<< "ConfigReader: unterminated string";
-        default: *tok += ch_buf;
+        default: *tok += ch_buf_;
      }
    }
    LOG(FATAL) << "ConfigReader: unterminated string";
  }
  inline void ParseStrML(std::string *tok) {
-    while ((ch_buf = this->GetChar()) != EOF) {
-      switch (ch_buf) {
+    while ((ch_buf_ = this->GetChar()) != EOF) {
+      switch (ch_buf_) {
        case '\\': *tok += this->GetChar(); break;
        case '\'': return;
-        default: *tok += ch_buf;
+        default: *tok += ch_buf_;
      }
    }
    LOG(FATAL) << "unterminated string";
@@ -98,24 +98,24 @@ class ConfigReaderBase {
  inline bool GetNextToken(std::string *tok) {
    tok->clear();
    bool new_line = false;
-    while (ch_buf != EOF) {
-      switch (ch_buf) {
+    while (ch_buf_ != EOF) {
+      switch (ch_buf_) {
        case '#' : SkipLine(); new_line = true; break;
        case '\"':
          if (tok->length() == 0) {
-            ParseStr(tok); ch_buf = this->GetChar(); return new_line;
+            ParseStr(tok); ch_buf_ = this->GetChar(); return new_line;
          } else {
            LOG(FATAL) << "ConfigReader: token followed directly by string";
          }
        case '\'':
          if (tok->length() == 0) {
-            ParseStrML(tok); ch_buf = this->GetChar(); return new_line;
+            ParseStrML(tok); ch_buf_ = this->GetChar(); return new_line;
          } else {
            LOG(FATAL) << "ConfigReader: token followed directly by string";
          }
        case '=':
          if (tok->length() == 0) {
-            ch_buf = this->GetChar();
+            ch_buf_ = this->GetChar();
            *tok = '=';
          }
          return new_line;
@@ -124,12 +124,12 @@ class ConfigReaderBase {
          if (tok->length() == 0) new_line = true;
        case '\t':
        case ' ' :
-          ch_buf = this->GetChar();
+          ch_buf_ = this->GetChar();
          if (tok->length() != 0) return new_line;
          break;
        default:
-          *tok += ch_buf;
-          ch_buf = this->GetChar();
+          *tok += ch_buf_;
+          ch_buf_ = this->GetChar();
          break;
      }
    }
@@ -149,19 +149,19 @@ class ConfigStreamReader: public ConfigReaderBase {
   * \brief constructor
   * \param fin istream input stream
   */
-  explicit ConfigStreamReader(std::istream &fin) : fin(fin) {}
+  explicit ConfigStreamReader(std::istream &fin) : fin_(fin) {}

 protected:
-  virtual char GetChar(void) {
-    return fin.get();
+  char GetChar() override {
+    return fin_.get();
  }
  /*! \brief to be implemented by child, check if end of stream */
-  virtual bool IsEnd(void) {
-    return fin.eof();
+  bool IsEnd() override {
+    return fin_.eof();
  }

 private:
-  std::istream &fin;
+  std::istream &fin_;
 };

 /*!
@@ -173,20 +173,20 @@ class ConfigIterator: public ConfigStreamReader {
   * \brief constructor
   * \param fname name of configure file
   */
-  explicit ConfigIterator(const char *fname) : ConfigStreamReader(fi) {
-    fi.open(fname);
-    if (fi.fail()) {
+  explicit ConfigIterator(const char *fname) : ConfigStreamReader(fi_) {
+    fi_.open(fname);
+    if (fi_.fail()) {
      LOG(FATAL) << "cannot open file " << fname;
    }
    ConfigReaderBase::Init();
  }
  /*! \brief destructor */
-  ~ConfigIterator(void) {
-    fi.close();
+  ~ConfigIterator() {
+    fi_.close();
  }

 private:
-  std::ifstream fi;
+  std::ifstream fi_;
 };
 }  // namespace common
 }  // namespace xgboost
--- a/src/common/device_helpers.cuh
+++ b/src/common/device_helpers.cuh
--- a/src/common/group_data.h
+++ b/src/common/group_data.h
@@ -29,12 +29,12 @@ struct ParallelGroupBuilder {
  // parallel group builder of data
  ParallelGroupBuilder(std::vector<SizeType> *p_rptr,
                       std::vector<ValueType> *p_data)
-      : rptr(*p_rptr), data(*p_data), thread_rptr(tmp_thread_rptr) {
+      : rptr_(*p_rptr), data_(*p_data), thread_rptr_(tmp_thread_rptr_) {
  }
  ParallelGroupBuilder(std::vector<SizeType> *p_rptr,
                       std::vector<ValueType> *p_data,
                       std::vector< std::vector<SizeType> > *p_thread_rptr)
-      : rptr(*p_rptr), data(*p_data), thread_rptr(*p_thread_rptr) {
+      : rptr_(*p_rptr), data_(*p_data), thread_rptr_(*p_thread_rptr) {
  }

 public:
@@ -45,10 +45,10 @@ struct ParallelGroupBuilder {
   * \param nthread number of thread that will be used in construction
   */
  inline void InitBudget(size_t nkeys, int nthread) {
-    thread_rptr.resize(nthread);
-    for (size_t i = 0;  i < thread_rptr.size(); ++i) {
-      thread_rptr[i].resize(nkeys);
-      std::fill(thread_rptr[i].begin(), thread_rptr[i].end(), 0);
+    thread_rptr_.resize(nthread);
+    for (size_t i = 0;  i < thread_rptr_.size(); ++i) {
+      thread_rptr_[i].resize(nkeys);
+      std::fill(thread_rptr_[i].begin(), thread_rptr_[i].end(), 0);
    }
  }
  /*!
@@ -58,34 +58,34 @@ struct ParallelGroupBuilder {
   * \param nelem number of element budget add to this row
   */
  inline void AddBudget(size_t key, int threadid, SizeType nelem = 1) {
-    std::vector<SizeType> &trptr = thread_rptr[threadid];
+    std::vector<SizeType> &trptr = thread_rptr_[threadid];
    if (trptr.size() < key + 1) {
      trptr.resize(key + 1, 0);
    }
    trptr[key] += nelem;
  }
  /*! \brief step 3: initialize the necessary storage */
-  inline void InitStorage(void) {
+  inline void InitStorage() {
    // set rptr to correct size
-    for (size_t tid = 0; tid < thread_rptr.size(); ++tid) {
-      if (rptr.size() <= thread_rptr[tid].size()) {
-        rptr.resize(thread_rptr[tid].size() + 1);
+    for (size_t tid = 0; tid < thread_rptr_.size(); ++tid) {
+      if (rptr_.size() <= thread_rptr_[tid].size()) {
+        rptr_.resize(thread_rptr_[tid].size() + 1);
      }
    }
    // initialize rptr to be beginning of each segment
    size_t start = 0;
-    for (size_t i = 0; i + 1 < rptr.size(); ++i) {
-      for (size_t tid = 0; tid < thread_rptr.size(); ++tid) {
-        std::vector<SizeType> &trptr = thread_rptr[tid];
+    for (size_t i = 0; i + 1 < rptr_.size(); ++i) {
+      for (size_t tid = 0; tid < thread_rptr_.size(); ++tid) {
+        std::vector<SizeType> &trptr = thread_rptr_[tid];
        if (i < trptr.size()) {
          size_t ncnt = trptr[i];
          trptr[i] = start;
          start += ncnt;
        }
      }
-      rptr[i + 1] = start;
+      rptr_[i + 1] = start;
    }
-    data.resize(start);
+    data_.resize(start);
  }
  /*!
   * \brief step 4: add data to the allocated space,
@@ -96,19 +96,19 @@ struct ParallelGroupBuilder {
   * \param threadid the id of thread that calls this function
   */
  inline void Push(size_t key, ValueType value, int threadid) {
-    SizeType &rp = thread_rptr[threadid][key];
-    data[rp++] = value;
+    SizeType &rp = thread_rptr_[threadid][key];
+    data_[rp++] = value;
  }

 private:
  /*! \brief pointer to the beginning and end of each continuous key */
-  std::vector<SizeType> &rptr;
+  std::vector<SizeType> &rptr_;
  /*! \brief index of nonzero entries in each row */
-  std::vector<ValueType> &data;
+  std::vector<ValueType> &data_;
  /*! \brief thread local data structure */
-  std::vector<std::vector<SizeType> > &thread_rptr;
+  std::vector<std::vector<SizeType> > &thread_rptr_;
  /*! \brief local temp thread ptr, use this if not specified by the constructor */
-  std::vector<std::vector<SizeType> > tmp_thread_rptr;
+  std::vector<std::vector<SizeType> > tmp_thread_rptr_;
 };
 }  // namespace common
 }  // namespace xgboost
--- a/src/common/hist_util.cc
+++ b/src/common/hist_util.cc
@@ -17,20 +17,20 @@ namespace xgboost {
 namespace common {

 void HistCutMatrix::Init(DMatrix* p_fmat, uint32_t max_num_bins) {
-  typedef common::WXQuantileSketch<bst_float, bst_float> WXQSketch;
-  const MetaInfo& info = p_fmat->info();
+  using WXQSketch = common::WXQuantileSketch<bst_float, bst_float>;
+  const MetaInfo& info = p_fmat->Info();

  // safe factor for better accuracy
-  const int kFactor = 8;
+  constexpr int kFactor = 8;
  std::vector<WXQSketch> sketchs;

  const int nthread = omp_get_max_threads();

-  unsigned nstep = static_cast<unsigned>((info.num_col + nthread - 1) / nthread);
-  unsigned ncol = static_cast<unsigned>(info.num_col);
-  sketchs.resize(info.num_col);
+  auto nstep = static_cast<unsigned>((info.num_col_ + nthread - 1) / nthread);
+  auto ncol = static_cast<unsigned>(info.num_col_);
+  sketchs.resize(info.num_col_);
  for (auto& s : sketchs) {
-    s.Init(info.num_row, 1.0 / (max_num_bins * kFactor));
+    s.Init(info.num_row_, 1.0 / (max_num_bins * kFactor));
  }

  dmlc::DataIter<RowBatch>* iter = p_fmat->RowIterator();
@@ -40,7 +40,7 @@ void HistCutMatrix::Init(DMatrix* p_fmat, uint32_t max_num_bins) {
    #pragma omp parallel num_threads(nthread)
    {
      CHECK_EQ(nthread, omp_get_num_threads());
-      unsigned tid = static_cast<unsigned>(omp_get_thread_num());
+      auto tid = static_cast<unsigned>(omp_get_thread_num());
      unsigned begin = std::min(nstep * tid, ncol);
      unsigned end = std::min(nstep * (tid + 1), ncol);
      for (size_t i = 0; i < batch.size; ++i) { // NOLINT(*)
@@ -68,7 +68,7 @@ void HistCutMatrix::Init(DMatrix* p_fmat, uint32_t max_num_bins) {
  size_t nbytes = WXQSketch::SummaryContainer::CalcMemCost(max_num_bins * kFactor);
  sreducer.Allreduce(dmlc::BeginPtr(summary_array), nbytes, summary_array.size());

-  this->min_val.resize(info.num_col);
+  this->min_val.resize(info.num_col_);
  row_ptr.push_back(0);
  for (size_t fid = 0; fid < summary_array.size(); ++fid) {
    WXQSketch::SummaryContainer a;
@@ -105,7 +105,7 @@ void HistCutMatrix::Init(DMatrix* p_fmat, uint32_t max_num_bins) {
 }

 void GHistIndexMatrix::Init(DMatrix* p_fmat) {
-  CHECK(cut != nullptr);
+  CHECK(cut != nullptr);  // NOLINT
  dmlc::DataIter<RowBatch>* iter = p_fmat->RowIterator();

  const int nthread = omp_get_max_threads();
@@ -126,7 +126,7 @@ void GHistIndexMatrix::Init(DMatrix* p_fmat) {
    CHECK_GT(cut->cut.size(), 0U);
    CHECK_EQ(cut->row_ptr.back(), cut->cut.size());

-    omp_ulong bsize = static_cast<omp_ulong>(batch.size);
+    auto bsize = static_cast<omp_ulong>(batch.size);
    #pragma omp parallel for num_threads(nthread) schedule(static)
    for (omp_ulong i = 0; i < bsize; ++i) { // NOLINT(*)
      const int tid = omp_get_thread_num();
@@ -217,7 +217,7 @@ FindGroups_(const std::vector<unsigned>& feature_list,
  std::vector<std::vector<bool>> conflict_marks;
  std::vector<size_t> group_nnz;
  std::vector<size_t> group_conflict_cnt;
-  const size_t max_conflict_cnt
+  const auto max_conflict_cnt
    = static_cast<size_t>(param.max_conflict_rate * nrow);

  for (auto fid : feature_list) {
@@ -336,14 +336,14 @@ FastFeatureGrouping(const GHistIndexMatrix& gmat,
 void GHistIndexBlockMatrix::Init(const GHistIndexMatrix& gmat,
                                 const ColumnMatrix& colmat,
                                 const FastHistParam& param) {
-  cut = gmat.cut;
+  cut_ = gmat.cut;

  const size_t nrow = gmat.row_ptr.size() - 1;
  const uint32_t nbins = gmat.cut->row_ptr.back();

  /* step 1: form feature groups */
  auto groups = FastFeatureGrouping(gmat, colmat, param);
-  const uint32_t nblock = static_cast<uint32_t>(groups.size());
+  const auto nblock = static_cast<uint32_t>(groups.size());

  /* step 2: build a new CSR matrix for each feature group */
  std::vector<uint32_t> bin2block(nbins);  // lookup table [bin id] => [block id]
@@ -380,24 +380,24 @@ void GHistIndexBlockMatrix::Init(const GHistIndexMatrix& gmat,
  index_blk_ptr.push_back(0);
  row_ptr_blk_ptr.push_back(0);
  for (uint32_t block_id = 0; block_id < nblock; ++block_id) {
-    index.insert(index.end(), index_temp[block_id].begin(), index_temp[block_id].end());
-    row_ptr.insert(row_ptr.end(), row_ptr_temp[block_id].begin(), row_ptr_temp[block_id].end());
-    index_blk_ptr.push_back(index.size());
-    row_ptr_blk_ptr.push_back(row_ptr.size());
+    index_.insert(index_.end(), index_temp[block_id].begin(), index_temp[block_id].end());
+    row_ptr_.insert(row_ptr_.end(), row_ptr_temp[block_id].begin(), row_ptr_temp[block_id].end());
+    index_blk_ptr.push_back(index_.size());
+    row_ptr_blk_ptr.push_back(row_ptr_.size());
  }

  // save shortcut for each block
  for (uint32_t block_id = 0; block_id < nblock; ++block_id) {
    Block blk;
-    blk.index_begin = &index[index_blk_ptr[block_id]];
-    blk.row_ptr_begin = &row_ptr[row_ptr_blk_ptr[block_id]];
-    blk.index_end = &index[index_blk_ptr[block_id + 1]];
-    blk.row_ptr_end = &row_ptr[row_ptr_blk_ptr[block_id + 1]];
-    blocks.push_back(blk);
+    blk.index_begin = &index_[index_blk_ptr[block_id]];
+    blk.row_ptr_begin = &row_ptr_[row_ptr_blk_ptr[block_id]];
+    blk.index_end = &index_[index_blk_ptr[block_id + 1]];
+    blk.row_ptr_end = &row_ptr_[row_ptr_blk_ptr[block_id + 1]];
+    blocks_.push_back(blk);
  }
 }

-void GHistBuilder::BuildHist(const std::vector<bst_gpair>& gpair,
+void GHistBuilder::BuildHist(const std::vector<GradientPair>& gpair,
                             const RowSetCollection::Elem row_indices,
                             const GHistIndexMatrix& gmat,
                             const std::vector<bst_uint>& feat_set,
@@ -405,30 +405,30 @@ void GHistBuilder::BuildHist(const std::vector<bst_gpair>& gpair,
  data_.resize(nbins_ * nthread_, GHistEntry());
  std::fill(data_.begin(), data_.end(), GHistEntry());

-  const int K = 8;  // loop unrolling factor
-  const bst_omp_uint nthread = static_cast<bst_omp_uint>(this->nthread_);
+  constexpr int kUnroll = 8;  // loop unrolling factor
+  const auto nthread = static_cast<bst_omp_uint>(this->nthread_);
  const size_t nrows = row_indices.end - row_indices.begin;
-  const size_t rest = nrows % K;
+  const size_t rest = nrows % kUnroll;

  #pragma omp parallel for num_threads(nthread) schedule(guided)
-  for (bst_omp_uint i = 0; i < nrows - rest; i += K) {
+  for (bst_omp_uint i = 0; i < nrows - rest; i += kUnroll) {
    const bst_omp_uint tid = omp_get_thread_num();
    const size_t off = tid * nbins_;
-    size_t rid[K];
-    size_t ibegin[K];
-    size_t iend[K];
-    bst_gpair stat[K];
-    for (int k = 0; k < K; ++k) {
+    size_t rid[kUnroll];
+    size_t ibegin[kUnroll];
+    size_t iend[kUnroll];
+    GradientPair stat[kUnroll];
+    for (int k = 0; k < kUnroll; ++k) {
      rid[k] = row_indices.begin[i + k];
    }
-    for (int k = 0; k < K; ++k) {
+    for (int k = 0; k < kUnroll; ++k) {
      ibegin[k] = gmat.row_ptr[rid[k]];
      iend[k] = gmat.row_ptr[rid[k] + 1];
    }
-    for (int k = 0; k < K; ++k) {
+    for (int k = 0; k < kUnroll; ++k) {
      stat[k] = gpair[rid[k]];
    }
-    for (int k = 0; k < K; ++k) {
+    for (int k = 0; k < kUnroll; ++k) {
      for (size_t j = ibegin[k]; j < iend[k]; ++j) {
        const uint32_t bin = gmat.index[j];
        data_[off + bin].Add(stat[k]);
@@ -439,7 +439,7 @@ void GHistBuilder::BuildHist(const std::vector<bst_gpair>& gpair,
    const size_t rid = row_indices.begin[i];
    const size_t ibegin = gmat.row_ptr[rid];
    const size_t iend = gmat.row_ptr[rid + 1];
-    const bst_gpair stat = gpair[rid];
+    const GradientPair stat = gpair[rid];
    for (size_t j = ibegin; j < iend; ++j) {
      const uint32_t bin = gmat.index[j];
      data_[bin].Add(stat);
@@ -456,37 +456,40 @@ void GHistBuilder::BuildHist(const std::vector<bst_gpair>& gpair,
  }
 }

-void GHistBuilder::BuildBlockHist(const std::vector<bst_gpair>& gpair,
+void GHistBuilder::BuildBlockHist(const std::vector<GradientPair>& gpair,
                                  const RowSetCollection::Elem row_indices,
                                  const GHistIndexBlockMatrix& gmatb,
                                  const std::vector<bst_uint>& feat_set,
                                  GHistRow hist) {
-  const int K = 8;  // loop unrolling factor
-  const bst_omp_uint nthread = static_cast<bst_omp_uint>(this->nthread_);
+  constexpr int kUnroll = 8;  // loop unrolling factor
  const size_t nblock = gmatb.GetNumBlock();
  const size_t nrows = row_indices.end - row_indices.begin;
-  const size_t rest = nrows % K;
+  const size_t rest = nrows % kUnroll;
+
+#if defined(_OPENMP)
+  const auto nthread = static_cast<bst_omp_uint>(this->nthread_);
+#endif

  #pragma omp parallel for num_threads(nthread) schedule(guided)
  for (bst_omp_uint bid = 0; bid < nblock; ++bid) {
    auto gmat = gmatb[bid];

-    for (size_t i = 0; i < nrows - rest; i += K) {
-      size_t rid[K];
-      size_t ibegin[K];
-      size_t iend[K];
-      bst_gpair stat[K];
-      for (int k = 0; k < K; ++k) {
+    for (size_t i = 0; i < nrows - rest; i += kUnroll) {
+      size_t rid[kUnroll];
+      size_t ibegin[kUnroll];
+      size_t iend[kUnroll];
+      GradientPair stat[kUnroll];
+      for (int k = 0; k < kUnroll; ++k) {
        rid[k] = row_indices.begin[i + k];
      }
-      for (int k = 0; k < K; ++k) {
+      for (int k = 0; k < kUnroll; ++k) {
        ibegin[k] = gmat.row_ptr[rid[k]];
        iend[k] = gmat.row_ptr[rid[k] + 1];
      }
-      for (int k = 0; k < K; ++k) {
+      for (int k = 0; k < kUnroll; ++k) {
        stat[k] = gpair[rid[k]];
      }
-      for (int k = 0; k < K; ++k) {
+      for (int k = 0; k < kUnroll; ++k) {
        for (size_t j = ibegin[k]; j < iend[k]; ++j) {
          const uint32_t bin = gmat.index[j];
          hist.begin[bin].Add(stat[k]);
@@ -497,7 +500,7 @@ void GHistBuilder::BuildBlockHist(const std::vector<bst_gpair>& gpair,
      const size_t rid = row_indices.begin[i];
      const size_t ibegin = gmat.row_ptr[rid];
      const size_t iend = gmat.row_ptr[rid + 1];
-      const bst_gpair stat = gpair[rid];
+      const GradientPair stat = gpair[rid];
      for (size_t j = ibegin; j < iend; ++j) {
        const uint32_t bin = gmat.index[j];
        hist.begin[bin].Add(stat);
@@ -507,21 +510,26 @@ void GHistBuilder::BuildBlockHist(const std::vector<bst_gpair>& gpair,
 }

 void GHistBuilder::SubtractionTrick(GHistRow self, GHistRow sibling, GHistRow parent) {
-  const bst_omp_uint nthread = static_cast<bst_omp_uint>(this->nthread_);
  const uint32_t nbins = static_cast<bst_omp_uint>(nbins_);
-  const int K = 8;  // loop unrolling factor
-  const uint32_t rest = nbins % K;
+  constexpr int kUnroll = 8;  // loop unrolling factor
+  const uint32_t rest = nbins % kUnroll;
+
+#if defined(_OPENMP)
+  const auto nthread = static_cast<bst_omp_uint>(this->nthread_);
+#endif
+
  #pragma omp parallel for num_threads(nthread) schedule(static)
-  for (bst_omp_uint bin_id = 0; bin_id < static_cast<bst_omp_uint>(nbins - rest); bin_id += K) {
-    GHistEntry pb[K];
-    GHistEntry sb[K];
-    for (int k = 0; k < K; ++k) {
+  for (bst_omp_uint bin_id = 0;
+       bin_id < static_cast<bst_omp_uint>(nbins - rest); bin_id += kUnroll) {
+    GHistEntry pb[kUnroll];
+    GHistEntry sb[kUnroll];
+    for (int k = 0; k < kUnroll; ++k) {
      pb[k] = parent.begin[bin_id + k];
    }
-    for (int k = 0; k < K; ++k) {
+    for (int k = 0; k < kUnroll; ++k) {
      sb[k] = sibling.begin[bin_id + k];
    }
-    for (int k = 0; k < K; ++k) {
+    for (int k = 0; k < kUnroll; ++k) {
      self.begin[bin_id + k].SetSubtract(pb[k], sb[k]);
    }
  }
--- a/src/common/hist_util.h
+++ b/src/common/hist_util.h
@@ -13,26 +13,26 @@
 #include "row_set.h"
 #include "../tree/fast_hist_param.h"

-using xgboost::tree::FastHistParam;
-
 namespace xgboost {
 namespace common {

+using tree::FastHistParam;
+
 /*! \brief sums of gradient statistics corresponding to a histogram bin */
 struct GHistEntry {
  /*! \brief sum of first-order gradient statistics */
-  double sum_grad;
+  double sum_grad{0};
  /*! \brief sum of second-order gradient statistics */
-  double sum_hess;
+  double sum_hess{0};

-  GHistEntry() : sum_grad(0), sum_hess(0) {}
+  GHistEntry()  = default;

  inline void Clear() {
    sum_grad = sum_hess = 0;
  }

-  /*! \brief add a bst_gpair to the sum */
-  inline void Add(const bst_gpair& e) {
+  /*! \brief add a GradientPair to the sum */
+  inline void Add(const GradientPair& e) {
    sum_grad += e.GetGrad();
    sum_hess += e.GetHess();
  }
@@ -58,7 +58,7 @@ struct HistCutUnit {
  /*! \brief number of cutting point, containing the maximum point */
  uint32_t size;
  // default constructor
-  HistCutUnit() {}
+  HistCutUnit() = default;
  // constructor
  HistCutUnit(const bst_float* cut, uint32_t size)
      : cut(cut), size(size) {}
@@ -74,8 +74,8 @@ struct HistCutMatrix {
  std::vector<bst_float> cut;
  /*! \brief Get histogram bound for fid */
  inline HistCutUnit operator[](bst_uint fid) const {
-    return HistCutUnit(dmlc::BeginPtr(cut) + row_ptr[fid],
-                       row_ptr[fid + 1] - row_ptr[fid]);
+    return {dmlc::BeginPtr(cut) + row_ptr[fid],
+                       row_ptr[fid + 1] - row_ptr[fid]};
  }
  // create histogram cut matrix given statistics from data
  // using approximate quantile sketch approach
@@ -92,7 +92,7 @@ struct GHistIndexRow {
  const uint32_t* index;
  /*! \brief The size of the histogram */
  size_t size;
-  GHistIndexRow() {}
+  GHistIndexRow() = default;
  GHistIndexRow(const uint32_t* index, size_t size)
      : index(index), size(size) {}
 };
@@ -115,7 +115,7 @@ struct GHistIndexMatrix {
  void Init(DMatrix* p_fmat);
  // get i-th row
  inline GHistIndexRow operator[](size_t i) const {
-    return GHistIndexRow(&index[0] + row_ptr[i], row_ptr[i + 1] - row_ptr[i]);
+    return {&index[0] + row_ptr[i], row_ptr[i + 1] - row_ptr[i]};
  }
  inline void GetFeatureCounts(size_t* counts) const {
    auto nfeature = cut->row_ptr.size() - 1;
@@ -141,7 +141,7 @@ struct GHistIndexBlock {

  // get i-th row
  inline GHistIndexRow operator[](size_t i) const {
-    return GHistIndexRow(&index[0] + row_ptr[i], row_ptr[i + 1] - row_ptr[i]);
+    return {&index[0] + row_ptr[i], row_ptr[i + 1] - row_ptr[i]};
  }
 };

@@ -154,24 +154,24 @@ class GHistIndexBlockMatrix {
            const FastHistParam& param);

  inline GHistIndexBlock operator[](size_t i) const {
-    return GHistIndexBlock(blocks[i].row_ptr_begin, blocks[i].index_begin);
+    return {blocks_[i].row_ptr_begin, blocks_[i].index_begin};
  }

  inline size_t GetNumBlock() const {
-    return blocks.size();
+    return blocks_.size();
  }

 private:
-  std::vector<size_t> row_ptr;
-  std::vector<uint32_t> index;
-  const HistCutMatrix* cut;
+  std::vector<size_t> row_ptr_;
+  std::vector<uint32_t> index_;
+  const HistCutMatrix* cut_;
  struct Block {
    const size_t* row_ptr_begin;
    const size_t* row_ptr_end;
    const uint32_t* index_begin;
    const uint32_t* index_end;
  };
-  std::vector<Block> blocks;
+  std::vector<Block> blocks_;
 };

 /*!
@@ -186,7 +186,7 @@ struct GHistRow {
  /*! \brief number of entries */
  uint32_t size;

-  GHistRow() {}
+  GHistRow() = default;
  GHistRow(GHistEntry* begin, uint32_t size)
    : begin(begin), size(size) {}
 };
@@ -198,15 +198,15 @@ class HistCollection {
 public:
  // access histogram for i-th node
  inline GHistRow operator[](bst_uint nid) const {
-    const uint32_t kMax = std::numeric_limits<uint32_t>::max();
+    constexpr uint32_t kMax = std::numeric_limits<uint32_t>::max();
    CHECK_NE(row_ptr_[nid], kMax);
-    return GHistRow(const_cast<GHistEntry*>(dmlc::BeginPtr(data_) + row_ptr_[nid]), nbins_);
+    return {const_cast<GHistEntry*>(dmlc::BeginPtr(data_) + row_ptr_[nid]), nbins_};
  }

  // have we computed a histogram for i-th node?
  inline bool RowExists(bst_uint nid) const {
-    const uint32_t kMax = std::numeric_limits<uint32_t>::max();
-    return (nid < row_ptr_.size() && row_ptr_[nid] != kMax);
+    const uint32_t k_max = std::numeric_limits<uint32_t>::max();
+    return (nid < row_ptr_.size() && row_ptr_[nid] != k_max);
  }

  // initialize histogram collection
@@ -218,7 +218,7 @@ class HistCollection {

  // create an empty histogram for i-th node
  inline void AddHistRow(bst_uint nid) {
-    const uint32_t kMax = std::numeric_limits<uint32_t>::max();
+    constexpr uint32_t kMax = std::numeric_limits<uint32_t>::max();
    if (nid >= row_ptr_.size()) {
      row_ptr_.resize(nid + 1, kMax);
    }
@@ -250,13 +250,13 @@ class GHistBuilder {
  }

  // construct a histogram via histogram aggregation
-  void BuildHist(const std::vector<bst_gpair>& gpair,
+  void BuildHist(const std::vector<GradientPair>& gpair,
                 const RowSetCollection::Elem row_indices,
                 const GHistIndexMatrix& gmat,
                 const std::vector<bst_uint>& feat_set,
                 GHistRow hist);
  // same, with feature grouping
-  void BuildBlockHist(const std::vector<bst_gpair>& gpair,
+  void BuildBlockHist(const std::vector<GradientPair>& gpair,
                      const RowSetCollection::Elem row_indices,
                      const GHistIndexBlockMatrix& gmatb,
                      const std::vector<bst_uint>& feat_set,
--- a/src/common/host_device_vector.cc
+++ b/src/common/host_device_vector.cc
@@ -6,6 +6,8 @@
 // dummy implementation of HostDeviceVector in case CUDA is not used

 #include <xgboost/base.h>
+
+#include <utility>
 #include "./host_device_vector.h"

 namespace xgboost {
@@ -13,24 +15,24 @@ namespace xgboost {
 template <typename T>
 struct HostDeviceVectorImpl {
  explicit HostDeviceVectorImpl(size_t size, T v) : data_h_(size, v) {}
-  explicit HostDeviceVectorImpl(std::initializer_list<T> init) : data_h_(init) {}
-  explicit HostDeviceVectorImpl(const std::vector<T>& init) : data_h_(init) {}
+  HostDeviceVectorImpl(std::initializer_list<T> init) : data_h_(init) {}
+  explicit HostDeviceVectorImpl(std::vector<T>  init) : data_h_(std::move(init)) {}
  std::vector<T> data_h_;
 };

 template <typename T>
-HostDeviceVector<T>::HostDeviceVector(size_t size, T v, int device) : impl_(nullptr) {
+HostDeviceVector<T>::HostDeviceVector(size_t size, T v, GPUSet devices) : impl_(nullptr) {
  impl_ = new HostDeviceVectorImpl<T>(size, v);
 }

 template <typename T>
-HostDeviceVector<T>::HostDeviceVector(std::initializer_list<T> init, int device)
+HostDeviceVector<T>::HostDeviceVector(std::initializer_list<T> init, GPUSet devices)
  : impl_(nullptr) {
  impl_ = new HostDeviceVectorImpl<T>(init);
 }

 template <typename T>
-HostDeviceVector<T>::HostDeviceVector(const std::vector<T>& init, int device)
+HostDeviceVector<T>::HostDeviceVector(const std::vector<T>& init, GPUSet devices)
  : impl_(nullptr) {
  impl_ = new HostDeviceVectorImpl<T>(init);
 }
@@ -43,25 +45,58 @@ HostDeviceVector<T>::~HostDeviceVector() {
 }

 template <typename T>
-size_t HostDeviceVector<T>::size() const { return impl_->data_h_.size(); }
+size_t HostDeviceVector<T>::Size() const { return impl_->data_h_.size(); }

 template <typename T>
-int HostDeviceVector<T>::device() const { return -1; }
+GPUSet HostDeviceVector<T>::Devices() const { return GPUSet::Empty(); }

 template <typename T>
-T* HostDeviceVector<T>::ptr_d(int device) { return nullptr; }
+T* HostDeviceVector<T>::DevicePointer(int device) { return nullptr; }

 template <typename T>
-std::vector<T>& HostDeviceVector<T>::data_h() { return impl_->data_h_; }
+std::vector<T>& HostDeviceVector<T>::HostVector() { return impl_->data_h_; }

 template <typename T>
-void HostDeviceVector<T>::resize(size_t new_size, T v, int new_device) {
+void HostDeviceVector<T>::Resize(size_t new_size, T v) {
  impl_->data_h_.resize(new_size, v);
 }

+template <typename T>
+size_t HostDeviceVector<T>::DeviceStart(int device) { return 0; }
+
+template <typename T>
+size_t HostDeviceVector<T>::DeviceSize(int device) { return 0; }
+
+template <typename T>
+void HostDeviceVector<T>::Fill(T v) {
+  std::fill(HostVector().begin(), HostVector().end(), v);
+}
+
+template <typename T>
+void HostDeviceVector<T>::Copy(HostDeviceVector<T>* other) {
+  CHECK_EQ(Size(), other->Size());
+  std::copy(other->HostVector().begin(), other->HostVector().end(), HostVector().begin());
+}
+
+template <typename T>
+void HostDeviceVector<T>::Copy(const std::vector<T>& other) {
+  CHECK_EQ(Size(), other.size());
+  std::copy(other.begin(), other.end(), HostVector().begin());
+}
+
+template <typename T>
+void HostDeviceVector<T>::Copy(std::initializer_list<T> other) {
+  CHECK_EQ(Size(), other.size());
+  std::copy(other.begin(), other.end(), HostVector().begin());
+}
+
+template <typename T>
+void HostDeviceVector<T>::Reshard(GPUSet devices) { }
+
 // explicit instantiations are required, as HostDeviceVector isn't header-only
 template class HostDeviceVector<bst_float>;
-template class HostDeviceVector<bst_gpair>;
+template class HostDeviceVector<GradientPair>;
+template class HostDeviceVector<unsigned int>;

 }  // namespace xgboost

--- a/src/common/host_device_vector.cu
+++ b/src/common/host_device_vector.cu
@@ -2,122 +2,309 @@
 * Copyright 2017 XGBoost contributors
 */

+
+#include <thrust/fill.h>
 #include "./host_device_vector.h"
 #include "./device_helpers.cuh"

 namespace xgboost {

+
 template <typename T>
 struct HostDeviceVectorImpl {
-  HostDeviceVectorImpl(size_t size, T v, int device)
-    : device_(device), on_d_(device >= 0) {
-    if (on_d_) {
+  struct DeviceShard {
+    DeviceShard() : index_(-1), device_(-1), start_(0), on_d_(false), vec_(nullptr) {}
+
+    static size_t ShardStart(size_t size, int ndevices, int index) {
+      size_t portion = dh::DivRoundUp(size, ndevices);
+      size_t begin = index * portion;
+      begin = begin > size ? size : begin;
+      return begin;
+    }
+
+    static size_t ShardSize(size_t size, int ndevices, int index) {
+      size_t portion = dh::DivRoundUp(size, ndevices);
+      size_t begin = index * portion, end = (index + 1) * portion;
+      begin = begin > size ? size : begin;
+      end = end > size ? size : end;
+      return end - begin;
+    }
+
+    void Init(HostDeviceVectorImpl<T>* vec, int device) {
+      if (vec_ == nullptr) { vec_ = vec; }
+      CHECK_EQ(vec, vec_);
+      device_ = device;
+      index_ = vec_->devices_.Index(device);
+      size_t size_h = vec_->Size();
+      int ndevices = vec_->devices_.Size();
+      start_ = ShardStart(size_h, ndevices, index_);
+      size_t size_d = ShardSize(size_h, ndevices, index_);
      dh::safe_cuda(cudaSetDevice(device_));
-      data_d_.resize(size, v);
+      data_.resize(size_d);
+      on_d_ = !vec_->on_h_;
+    }
+
+    void ScatterFrom(const T* begin) {
+      // TODO(canonizer): avoid full copy of host data
+      LazySyncDevice();
+      dh::safe_cuda(cudaSetDevice(device_));
+      dh::safe_cuda(cudaMemcpy(data_.data().get(), begin + start_,
+                               data_.size() * sizeof(T), cudaMemcpyDefault));
+    }
+
+    void GatherTo(thrust::device_ptr<T> begin) {
+      LazySyncDevice();
+      dh::safe_cuda(cudaSetDevice(device_));
+      dh::safe_cuda(cudaMemcpy(begin.get() + start_, data_.data().get(),
+                               data_.size() * sizeof(T), cudaMemcpyDefault));
+    }
+
+    void Fill(T v) {
+      // TODO(canonizer): avoid full copy of host data
+      LazySyncDevice();
+      dh::safe_cuda(cudaSetDevice(device_));
+      thrust::fill(data_.begin(), data_.end(), v);
+    }
+
+    void Copy(DeviceShard* other) {
+      // TODO(canonizer): avoid full copy of host data for this (but not for other)
+      LazySyncDevice();
+      other->LazySyncDevice();
+      dh::safe_cuda(cudaSetDevice(device_));
+      dh::safe_cuda(cudaMemcpy(data_.data().get(), other->data_.data().get(),
+                               data_.size() * sizeof(T), cudaMemcpyDefault));
+    }
+
+    void LazySyncHost() {
+      dh::safe_cuda(cudaSetDevice(device_));
+      thrust::copy(data_.begin(), data_.end(), vec_->data_h_.begin() + start_);
+      on_d_ = false;
+    }
+
+    void LazySyncDevice() {
+      if (on_d_) { return; }
+      // data is on the host
+      size_t size_h = vec_->data_h_.size();
+      int ndevices = vec_->devices_.Size();
+      start_ = ShardStart(size_h, ndevices, index_);
+      size_t size_d = ShardSize(size_h, ndevices, index_);
+      dh::safe_cuda(cudaSetDevice(device_));
+      data_.resize(size_d);
+      thrust::copy(vec_->data_h_.begin() + start_,
+                   vec_->data_h_.begin() + start_ + size_d, data_.begin());
+      on_d_ = true;
+      // this may cause a race condition if LazySyncDevice() is called
+      // from multiple threads in parallel;
+      // however, the race condition is benign, and will not cause problems
+      vec_->on_h_ = false;
+      vec_->size_d_ = vec_->data_h_.size();
+    }
+
+    int index_;
+    int device_;
+    thrust::device_vector<T> data_;
+    size_t start_;
+    // true if there is an up-to-date copy of data on device, false otherwise
+    bool on_d_;
+    HostDeviceVectorImpl<T>* vec_;
+  };
+
+  HostDeviceVectorImpl(size_t size, T v, GPUSet devices)
+    : devices_(devices), on_h_(devices.IsEmpty()), size_d_(0) {
+    if (!devices.IsEmpty()) {
+      size_d_ = size;
+      InitShards();
+      Fill(v);
    } else {
      data_h_.resize(size, v);
    }
  }
+
  // Init can be std::vector<T> or std::initializer_list<T>
  template <class Init>
-  HostDeviceVectorImpl(const Init& init, int device)
-    : device_(device), on_d_(device >= 0) {
-    if (on_d_) {
-      dh::safe_cuda(cudaSetDevice(device_));
-      data_d_.resize(init.size());
-      thrust::copy(init.begin(), init.end(), data_d_.begin());
+  HostDeviceVectorImpl(const Init& init, GPUSet devices)
+    : devices_(devices), on_h_(devices.IsEmpty()), size_d_(0) {
+    if (!devices.IsEmpty()) {
+      size_d_ = init.size();
+      InitShards();
+      Copy(init);
    } else {
      data_h_ = init;
    }
  }
+
+  void InitShards() {
+    int ndevices = devices_.Size();
+    shards_.resize(ndevices);
+    dh::ExecuteIndexShards(&shards_, [&](int i, DeviceShard& shard) {
+        shard.Init(this, devices_[i]);
+      });
+  }
+
  HostDeviceVectorImpl(const HostDeviceVectorImpl<T>&) = delete;
  HostDeviceVectorImpl(HostDeviceVectorImpl<T>&&) = delete;
  void operator=(const HostDeviceVectorImpl<T>&) = delete;
  void operator=(HostDeviceVectorImpl<T>&&) = delete;

-  size_t size() const { return on_d_ ? data_d_.size() : data_h_.size(); }
+  size_t Size() const { return on_h_ ? data_h_.size() : size_d_; }

-  int device() const { return device_; }
+  GPUSet Devices() const { return devices_; }

-  T* ptr_d(int device) {
-    lazy_sync_device(device);
-    return data_d_.data().get();
+  T* DevicePointer(int device) {
+    CHECK(devices_.Contains(device));
+    LazySyncDevice(device);
+    return shards_[devices_.Index(device)].data_.data().get();
  }
-  thrust::device_ptr<T> tbegin(int device) {
-    return thrust::device_ptr<T>(ptr_d(device));
+
+  size_t DeviceSize(int device) {
+    CHECK(devices_.Contains(device));
+    LazySyncDevice(device);
+    return shards_[devices_.Index(device)].data_.size();
  }
-  thrust::device_ptr<T> tend(int device) {
-    auto begin = tbegin(device);
-    return begin + size();
+
+  size_t DeviceStart(int device) {
+    CHECK(devices_.Contains(device));
+    LazySyncDevice(device);
+    return shards_[devices_.Index(device)].start_;
  }
-  std::vector<T>& data_h() {
-    lazy_sync_host();
+
+  thrust::device_ptr<T> tbegin(int device) {  // NOLINT
+    return thrust::device_ptr<T>(DevicePointer(device));
+  }
+
+  thrust::device_ptr<T> tend(int device) {  // NOLINT
+    return tbegin(device) + DeviceSize(device);
+  }
+
+  void ScatterFrom(thrust::device_ptr<T> begin, thrust::device_ptr<T> end) {
+    CHECK_EQ(end - begin, Size());
+    if (on_h_) {
+      thrust::copy(begin, end, data_h_.begin());
+    } else {
+      dh::ExecuteShards(&shards_, [&](DeviceShard& shard) {
+          shard.ScatterFrom(begin.get());
+        });
+    }
+  }
+
+  void GatherTo(thrust::device_ptr<T> begin, thrust::device_ptr<T> end) {
+    CHECK_EQ(end - begin, Size());
+    if (on_h_) {
+      thrust::copy(data_h_.begin(), data_h_.end(), begin);
+    } else {
+      dh::ExecuteShards(&shards_, [&](DeviceShard& shard) { shard.GatherTo(begin); });
+    }
+  }
+
+  void Fill(T v) {
+    if (on_h_) {
+      std::fill(data_h_.begin(), data_h_.end(), v);
+    } else {
+      dh::ExecuteShards(&shards_, [&](DeviceShard& shard) { shard.Fill(v); });
+    }
+  }
+
+  void Copy(HostDeviceVectorImpl<T>* other) {
+    CHECK_EQ(Size(), other->Size());
+    if (on_h_ && other->on_h_) {
+      std::copy(other->data_h_.begin(), other->data_h_.end(), data_h_.begin());
+    } else {
+      CHECK(devices_ == other->devices_);
+      dh::ExecuteIndexShards(&shards_, [&](int i, DeviceShard& shard) {
+          shard.Copy(&other->shards_[i]);
+        });
+    }
+  }
+
+  void Copy(const std::vector<T>& other) {
+    CHECK_EQ(Size(), other.size());
+    if (on_h_) {
+      std::copy(other.begin(), other.end(), data_h_.begin());
+    } else {
+      dh::ExecuteShards(&shards_, [&](DeviceShard& shard) {
+          shard.ScatterFrom(other.data());
+        });
+    }
+  }
+
+  void Copy(std::initializer_list<T> other) {
+    CHECK_EQ(Size(), other.size());
+    if (on_h_) {
+      std::copy(other.begin(), other.end(), data_h_.begin());
+    } else {
+      dh::ExecuteShards(&shards_, [&](DeviceShard& shard) {
+          shard.ScatterFrom(other.begin());
+        });
+    }
+  }
+
+  std::vector<T>& HostVector() {
+    LazySyncHost();
    return data_h_;
  }
-  void resize(size_t new_size, T v, int new_device) {
-    if (new_size == this->size() && new_device == device_)
+
+  void Reshard(GPUSet new_devices) {
+    if (devices_ == new_devices)
      return;
-    if (new_device != -1)
-      device_ = new_device;
-    // if !on_d_, but the data size is 0 and the device is set,
-    // resize the data on device instead
-    if (!on_d_ && (data_h_.size() > 0 || device_ == -1)) {
-      data_h_.resize(new_size, v);
+    CHECK(devices_.IsEmpty());
+    devices_ = new_devices;
+    InitShards();
+  }
+
+  void Resize(size_t new_size, T v) {
+    if (new_size == Size())
+      return;
+    if (Size() == 0 && !devices_.IsEmpty()) {
+      // fast on-device resize
+      on_h_ = false;
+      size_d_ = new_size;
+      InitShards();
+      Fill(v);
    } else {
-      dh::safe_cuda(cudaSetDevice(device_));
-      data_d_.resize(new_size, v);
-      on_d_ = true;
+      // resize on host
+      LazySyncHost();
+      data_h_.resize(new_size, v);
    }
  }

-  void lazy_sync_host() {
-    if (!on_d_)
+  void LazySyncHost() {
+    if (on_h_)
      return;
-    if (data_h_.size() != this->size())
-      data_h_.resize(this->size());
-    dh::safe_cuda(cudaSetDevice(device_));
-    thrust::copy(data_d_.begin(), data_d_.end(), data_h_.begin());
-    on_d_ = false;
+    if (data_h_.size() != size_d_)
+      data_h_.resize(size_d_);
+    dh::ExecuteShards(&shards_, [&](DeviceShard& shard) { shard.LazySyncHost(); });
+    on_h_ = true;
  }

-  void lazy_sync_device(int device) {
-    if (on_d_)
-      return;
-    if (device != device_) {
-      CHECK_EQ(device_, -1);
-      device_ = device;
-    }
-    if (data_d_.size() != this->size()) {
-      dh::safe_cuda(cudaSetDevice(device_));
-      data_d_.resize(this->size());
-    }
-    dh::safe_cuda(cudaSetDevice(device_));
-    thrust::copy(data_h_.begin(), data_h_.end(), data_d_.begin());
-    on_d_ = true;
+  void LazySyncDevice(int device) {
+    CHECK(devices_.Contains(device));
+    shards_[devices_.Index(device)].LazySyncDevice();
  }

  std::vector<T> data_h_;
-  thrust::device_vector<T> data_d_;
-  // true if there is an up-to-date copy of data on device, false otherwise
-  bool on_d_;
-  int device_;
+  bool on_h_;
+  // the total size of the data stored on the devices
+  size_t size_d_;
+  GPUSet devices_;
+  std::vector<DeviceShard> shards_;
 };

 template <typename T>
-HostDeviceVector<T>::HostDeviceVector(size_t size, T v, int device) : impl_(nullptr) {
-  impl_ = new HostDeviceVectorImpl<T>(size, v, device);
+HostDeviceVector<T>::HostDeviceVector(size_t size, T v, GPUSet devices)
+  : impl_(nullptr) {
+  impl_ = new HostDeviceVectorImpl<T>(size, v, devices);
 }

 template <typename T>
-HostDeviceVector<T>::HostDeviceVector(std::initializer_list<T> init, int device)
+HostDeviceVector<T>::HostDeviceVector(std::initializer_list<T> init, GPUSet devices)
  : impl_(nullptr) {
-  impl_ = new HostDeviceVectorImpl<T>(init, device);
+  impl_ = new HostDeviceVectorImpl<T>(init, devices);
 }

 template <typename T>
-HostDeviceVector<T>::HostDeviceVector(const std::vector<T>& init, int device)
+HostDeviceVector<T>::HostDeviceVector(const std::vector<T>& init, GPUSet devices)
  : impl_(nullptr) {
-  impl_ = new HostDeviceVectorImpl<T>(init, device);
+  impl_ = new HostDeviceVectorImpl<T>(init, devices);
 }

 template <typename T>
@@ -128,34 +315,78 @@ HostDeviceVector<T>::~HostDeviceVector() {
 }

 template <typename T>
-size_t HostDeviceVector<T>::size() const { return impl_->size(); }
+size_t HostDeviceVector<T>::Size() const { return impl_->Size(); }

 template <typename T>
-int HostDeviceVector<T>::device() const { return impl_->device(); }
+GPUSet HostDeviceVector<T>::Devices() const { return impl_->Devices(); }

 template <typename T>
-T* HostDeviceVector<T>::ptr_d(int device) { return impl_->ptr_d(device); }
+T* HostDeviceVector<T>::DevicePointer(int device) { return impl_->DevicePointer(device); }

 template <typename T>
-thrust::device_ptr<T> HostDeviceVector<T>::tbegin(int device) {
+size_t HostDeviceVector<T>::DeviceStart(int device) { return impl_->DeviceStart(device); }
+
+template <typename T>
+size_t HostDeviceVector<T>::DeviceSize(int device) { return impl_->DeviceSize(device); }
+
+template <typename T>
+thrust::device_ptr<T> HostDeviceVector<T>::tbegin(int device) {  // NOLINT
  return impl_->tbegin(device);
 }

 template <typename T>
-thrust::device_ptr<T> HostDeviceVector<T>::tend(int device) {
+thrust::device_ptr<T> HostDeviceVector<T>::tend(int device) {  // NOLINT
  return impl_->tend(device);
 }

 template <typename T>
-std::vector<T>& HostDeviceVector<T>::data_h() { return impl_->data_h(); }
+void HostDeviceVector<T>::ScatterFrom
+(thrust::device_ptr<T> begin, thrust::device_ptr<T> end) {
+  impl_->ScatterFrom(begin, end);
+}

 template <typename T>
-void HostDeviceVector<T>::resize(size_t new_size, T v, int new_device) {
-  impl_->resize(new_size, v, new_device);
+void HostDeviceVector<T>::GatherTo
+(thrust::device_ptr<T> begin, thrust::device_ptr<T> end) {
+  impl_->GatherTo(begin, end);
+}
+
+template <typename T>
+void HostDeviceVector<T>::Fill(T v) {
+  impl_->Fill(v);
+}
+
+template <typename T>
+void HostDeviceVector<T>::Copy(HostDeviceVector<T>* other) {
+  impl_->Copy(other->impl_);
+}
+
+template <typename T>
+void HostDeviceVector<T>::Copy(const std::vector<T>& other) {
+  impl_->Copy(other);
+}
+
+template <typename T>
+void HostDeviceVector<T>::Copy(std::initializer_list<T> other) {
+  impl_->Copy(other);
+}
+
+template <typename T>
+std::vector<T>& HostDeviceVector<T>::HostVector() { return impl_->HostVector(); }
+
+template <typename T>
+void HostDeviceVector<T>::Reshard(GPUSet new_devices) {
+  impl_->Reshard(new_devices);
+}
+
+template <typename T>
+void HostDeviceVector<T>::Resize(size_t new_size, T v) {
+  impl_->Resize(new_size, v);
 }

 // explicit instantiations are required, as HostDeviceVector isn't header-only
 template class HostDeviceVector<bst_float>;
-template class HostDeviceVector<bst_gpair>;
+template class HostDeviceVector<GradientPair>;
+template class HostDeviceVector<unsigned int>;

 }  // namespace xgboost
--- a/src/common/host_device_vector.h
+++ b/src/common/host_device_vector.h
@@ -4,6 +4,9 @@
 #ifndef XGBOOST_COMMON_HOST_DEVICE_VECTOR_H_
 #define XGBOOST_COMMON_HOST_DEVICE_VECTOR_H_

+#include <dmlc/logging.h>
+
+#include <algorithm>
 #include <cstdlib>
 #include <initializer_list>
 #include <vector>
@@ -18,6 +21,40 @@ namespace xgboost {

 template <typename T> struct HostDeviceVectorImpl;

+// set of devices across which HostDeviceVector can be distributed;
+// currently implemented as a range, but can be changed later to something else,
+// e.g. a bitset
+class GPUSet {
+ public:
+  explicit GPUSet(int start = 0, int ndevices = 0)
+    : start_(start), ndevices_(ndevices) {}
+  static GPUSet Empty() { return GPUSet(); }
+  static GPUSet Range(int start, int ndevices) { return GPUSet(start, ndevices); }
+  int Size() const { return ndevices_; }
+  int operator[](int index) const {
+    CHECK(index >= 0 && index < ndevices_);
+    return start_ + index;
+  }
+  bool IsEmpty() const { return ndevices_ <= 0; }
+  int Index(int device) const {
+    CHECK(device >= start_ && device < start_ + ndevices_);
+    return device - start_;
+  }
+  bool Contains(int device) const {
+    return start_ <= device && device < start_ + ndevices_;
+  }
+  friend bool operator==(GPUSet a, GPUSet b) {
+    return a.start_ == b.start_ && a.ndevices_ == b.ndevices_;
+  }
+  friend bool operator!=(GPUSet a, GPUSet b) {
+    return a.start_ != b.start_ || a.ndevices_ != b.ndevices_;
+  }
+
+ private:
+  int start_, ndevices_;
+};
+
+
 /**
 * @file host_device_vector.h
 * @brief A device-and-host vector abstraction layer.
@@ -29,24 +66,26 @@ template <typename T> struct HostDeviceVectorImpl;
 *
 * Initialization/Allocation:<br/>
 * One can choose to initialize the vector on CPU or GPU during constructor.
- * (use the 'device' argument) Or, can choose to use the 'resize' method to
- * allocate/resize memory explicitly.
+ * (use the 'devices' argument) Or, can choose to use the 'Resize' method to
+ * allocate/resize memory explicitly, and use the 'Reshard' method 
+ * to specify the devices.
 *
- * Accessing underling data:<br/>
- * Use 'data_h' method to explicitly query for the underlying std::vector.
- * If you need the raw device pointer, use the 'ptr_d' method. For perf
+ * Accessing underlying data:<br/>
+ * Use 'HostVector' method to explicitly query for the underlying std::vector.
+ * If you need the raw device pointer, use the 'DevicePointer' method. For perf
 * implications of these calls, see below.
 *
 * Accessing underling data and their perf implications:<br/>
 * There are 4 scenarios to be considered here:
- * data_h and data on CPU --> no problems, std::vector returned immediately
- * data_h but data on GPU --> this causes a cudaMemcpy to be issued internally.
- *                        subsequent calls to data_h, will NOT incur this penalty.
- *                        (assuming 'ptr_d' is not called in between)
- * ptr_d but data on CPU  --> this causes a cudaMemcpy to be issued internally.
- *                        subsequent calls to ptr_d, will NOT incur this penalty.
- *                        (assuming 'data_h' is not called in between)
- * ptr_d and data on GPU  --> no problems, the device ptr will be returned immediately
+ * HostVector and data on CPU --> no problems, std::vector returned immediately
+ * HostVector but data on GPU --> this causes a cudaMemcpy to be issued internally.
+ *                        subsequent calls to HostVector, will NOT incur this penalty.
+ *                        (assuming 'DevicePointer' is not called in between)
+ * DevicePointer but data on CPU  --> this causes a cudaMemcpy to be issued internally.
+ *                        subsequent calls to DevicePointer, will NOT incur this penalty.
+ *                        (assuming 'HostVector' is not called in between)
+ * DevicePointer and data on GPU  --> no problems, the device ptr 
+ *                        will be returned immediately.
 *
 * What if xgboost is compiled without CUDA?<br/>
 * In that case, there's a special implementation which always falls-back to
@@ -57,35 +96,49 @@ template <typename T> struct HostDeviceVectorImpl;
 * compiling with and without CUDA toolkit. It was easier to have 
 * 'HostDeviceVector' with a special-case implementation in host_device_vector.cc
 *
- * @note: This is not thread-safe!
+ * @note: Size and Devices methods are thread-safe.
+ * DevicePointer, DeviceStart, DeviceSize, tbegin and tend methods are thread-safe 
+ * if different threads call these methods with different values of the device argument.
+ * All other methods are not thread safe. 
 */
 template <typename T>
 class HostDeviceVector {
 public:
-  explicit HostDeviceVector(size_t size = 0, T v = T(), int device = -1);
-  HostDeviceVector(std::initializer_list<T> init, int device = -1);
-  explicit HostDeviceVector(const std::vector<T>& init, int device = -1);
+  explicit HostDeviceVector(size_t size = 0, T v = T(),
+                            GPUSet devices = GPUSet::Empty());
+  HostDeviceVector(std::initializer_list<T> init, GPUSet devices = GPUSet::Empty());
+  explicit HostDeviceVector(const std::vector<T>& init,
+                            GPUSet devices = GPUSet::Empty());
  ~HostDeviceVector();
  HostDeviceVector(const HostDeviceVector<T>&) = delete;
  HostDeviceVector(HostDeviceVector<T>&&) = delete;
  void operator=(const HostDeviceVector<T>&) = delete;
  void operator=(HostDeviceVector<T>&&) = delete;
-  size_t size() const;
-  int device() const;
-  T* ptr_d(int device);
-  T* ptr_h() { return data_h().data(); }
+  size_t Size() const;
+  GPUSet Devices() const;
+  T* DevicePointer(int device);
+
+  T* HostPointer() { return HostVector().data(); }
+  size_t DeviceStart(int device);
+  size_t DeviceSize(int device);

  // only define functions returning device_ptr
  // if HostDeviceVector.h is included from a .cu file
 #ifdef __CUDACC__
-  thrust::device_ptr<T> tbegin(int device);
-  thrust::device_ptr<T> tend(int device);
+  thrust::device_ptr<T> tbegin(int device);  // NOLINT
+  thrust::device_ptr<T> tend(int device);  // NOLINT
+  void ScatterFrom(thrust::device_ptr<T> begin, thrust::device_ptr<T> end);
+  void GatherTo(thrust::device_ptr<T> begin, thrust::device_ptr<T> end);
 #endif

-  std::vector<T>& data_h();
+  void Fill(T v);
+  void Copy(HostDeviceVector<T>* other);
+  void Copy(const std::vector<T>& other);
+  void Copy(std::initializer_list<T> other);

-  // passing in new_device == -1 keeps the device as is
-  void resize(size_t new_size, T v = T(), int new_device = -1);
+  std::vector<T>& HostVector();
+  void Reshard(GPUSet devices);
+  void Resize(size_t new_size, T v = T());

 private:
  HostDeviceVectorImpl<T>* impl_;
--- a/src/common/io.h
+++ b/src/common/io.h
@@ -15,8 +15,8 @@

 namespace xgboost {
 namespace common {
-typedef rabit::utils::MemoryFixSizeBuffer MemoryFixSizeBuffer;
-typedef rabit::utils::MemoryBufferStream MemoryBufferStream;
+using MemoryFixSizeBuffer = rabit::utils::MemoryFixSizeBuffer;
+using MemoryBufferStream = rabit::utils::MemoryBufferStream;

 /*!
 * \brief Input stream that support additional PeekRead
--- a/src/common/math.h
+++ b/src/common/math.h
@@ -39,12 +39,12 @@ inline void Softmax(std::vector<float>* p_rec) {
    wmax = std::max(rec[i], wmax);
  }
  double wsum = 0.0f;
-  for (size_t i = 0; i < rec.size(); ++i) {
-    rec[i] = std::exp(rec[i] - wmax);
-    wsum += rec[i];
+  for (float & elem : rec) {
+    elem = std::exp(elem - wmax);
+    wsum += elem;
  }
-  for (size_t i = 0; i < rec.size(); ++i) {
-    rec[i] /= static_cast<float>(wsum);
+  for (float & elem : rec) {
+    elem /= static_cast<float>(wsum);
  }
 }

--- a/src/common/quantile.h
+++ b/src/common/quantile.h
@@ -35,7 +35,7 @@ struct WQSummary {
    /*! \brief the value of data */
    DType value;
    // constructor
-    Entry() {}
+    Entry() = default;
    // constructor
    Entry(RType rmin, RType rmax, RType wmin, DType value)
        : rmin(rmin), rmax(rmax), wmin(wmin), value(value) {}
@@ -48,11 +48,11 @@ struct WQSummary {
      CHECK(rmax- rmin - wmin > -eps) <<  "relation constraint: min/max";
    }
    /*! \return rmin estimation for v strictly bigger than value */
-    inline RType rmin_next() const {
+    inline RType RMinNext() const {
      return rmin + wmin;
    }
    /*! \return rmax estimation for v strictly smaller than value */
-    inline RType rmax_prev() const {
+    inline RType RMaxPrev() const {
      return rmax - wmin;
    }
  };
@@ -65,7 +65,7 @@ struct WQSummary {
      // weight of instance
      RType weight;
      // default constructor
-      QEntry() {}
+      QEntry() = default;
      // constructor
      QEntry(DType value, RType weight)
          : value(value), weight(weight) {}
@@ -116,7 +116,7 @@ struct WQSummary {
  inline RType MaxError() const {
    RType res = data[0].rmax - data[0].rmin - data[0].wmin;
    for (size_t i = 1; i < size; ++i) {
-      res = std::max(data[i].rmax_prev() - data[i - 1].rmin_next(), res);
+      res = std::max(data[i].RMaxPrev() - data[i - 1].RMinNext(), res);
      res = std::max(data[i].rmax - data[i].rmin - data[i].wmin, res);
    }
    return res;
@@ -140,8 +140,8 @@ struct WQSummary {
      if (istart == 0) {
        return Entry(0.0f, 0.0f, 0.0f, qvalue);
      } else {
-        return Entry(data[istart - 1].rmin_next(),
-                     data[istart].rmax_prev(),
+        return Entry(data[istart - 1].RMinNext(),
+                     data[istart].RMaxPrev(),
                     0.0f, qvalue);
      }
    }
@@ -197,7 +197,7 @@ struct WQSummary {
      while (i < src.size - 1
             && dx2 >= src.data[i + 1].rmax + src.data[i + 1].rmin) ++i;
      CHECK(i != src.size - 1);
-      if (dx2 < src.data[i].rmin_next() + src.data[i + 1].rmax_prev()) {
+      if (dx2 < src.data[i].RMinNext() + src.data[i + 1].RMaxPrev()) {
        if (i != lastidx) {
          data[size++] = src.data[i]; lastidx = i;
        }
@@ -236,20 +236,20 @@ struct WQSummary {
        *dst = Entry(a->rmin + b->rmin,
                     a->rmax + b->rmax,
                     a->wmin + b->wmin, a->value);
-        aprev_rmin = a->rmin_next();
-        bprev_rmin = b->rmin_next();
+        aprev_rmin = a->RMinNext();
+        bprev_rmin = b->RMinNext();
        ++dst; ++a; ++b;
      } else if (a->value < b->value) {
        *dst = Entry(a->rmin + bprev_rmin,
-                     a->rmax + b->rmax_prev(),
+                     a->rmax + b->RMaxPrev(),
                     a->wmin, a->value);
-        aprev_rmin = a->rmin_next();
+        aprev_rmin = a->RMinNext();
        ++dst; ++a;
      } else {
        *dst = Entry(b->rmin + aprev_rmin,
-                     b->rmax + a->rmax_prev(),
+                     b->rmax + a->RMaxPrev(),
                     b->wmin, b->value);
-        bprev_rmin = b->rmin_next();
+        bprev_rmin = b->RMinNext();
        ++dst; ++b;
      }
    }
@@ -307,7 +307,7 @@ struct WQSummary {
        data[i].rmax = prev_rmax;
        *err_maxgap = std::max(*err_maxgap, prev_rmax - data[i].rmax);
      }
-      RType rmin_next = data[i].rmin_next();
+      RType rmin_next = data[i].RMinNext();
      if (data[i].rmax < rmin_next) {
        data[i].rmax = rmin_next;
        *err_wgap = std::max(*err_wgap, data[i].rmax - rmin_next);
@@ -334,13 +334,13 @@ struct WQSummary {
 template<typename DType, typename RType>
 struct WXQSummary : public WQSummary<DType, RType> {
  // redefine entry type
-  typedef typename WQSummary<DType, RType>::Entry Entry;
+  using Entry = typename WQSummary<DType, RType>::Entry;
  // constructor
  WXQSummary(Entry *data, size_t size)
      : WQSummary<DType, RType>(data, size) {}
  // check if the block is large chunk
  inline static bool CheckLarge(const Entry &e, RType chunk) {
-    return  e.rmin_next() > e.rmax_prev() + chunk;
+    return  e.RMinNext() > e.RMaxPrev() + chunk;
  }
  // set prune
  inline void SetPrune(const WQSummary<DType, RType> &src, size_t maxsize) {
@@ -377,13 +377,13 @@ struct WXQSummary : public WQSummary<DType, RType> {
        if (CheckLarge(src.data[i], chunk)) {
          if (bid != i - 1) {
            // accumulate the range of the rest points
-            mrange += src.data[i].rmax_prev() - src.data[bid].rmin_next();
+            mrange += src.data[i].RMaxPrev() - src.data[bid].RMinNext();
          }
          bid = i; ++nbig;
        }
      }
      if (bid != src.size - 2) {
-        mrange += src.data[src.size-1].rmax_prev() - src.data[bid].rmin_next();
+        mrange += src.data[src.size-1].RMaxPrev() - src.data[bid].RMinNext();
      }
    }
    // assert: there cannot be more than n big data points
@@ -405,14 +405,14 @@ struct WXQSummary : public WQSummary<DType, RType> {
      if (end == src.size - 1 || CheckLarge(src.data[end], chunk)) {
        if (bid != end - 1) {
          size_t i = bid;
-          RType maxdx2 = src.data[end].rmax_prev() * 2;
+          RType maxdx2 = src.data[end].RMaxPrev() * 2;
          for (; k < n; ++k) {
            RType dx2 =  2 * ((k * mrange) / n + begin);
            if (dx2 >= maxdx2) break;
            while (i < end &&
                   dx2 >= src.data[i + 1].rmax + src.data[i + 1].rmin) ++i;
            if (i == end) break;
-            if (dx2 < src.data[i].rmin_next() + src.data[i + 1].rmax_prev()) {
+            if (dx2 < src.data[i].RMinNext() + src.data[i + 1].RMaxPrev()) {
              if (i != lastidx) {
                this->data[this->size++] = src.data[i]; lastidx = i;
              }
@@ -429,7 +429,7 @@ struct WXQSummary : public WQSummary<DType, RType> {
        }
        bid = end;
        // shift base by the gap
-        begin += src.data[bid].rmin_next() - src.data[bid].rmax_prev();
+        begin += src.data[bid].RMinNext() - src.data[bid].RMaxPrev();
      }
    }
  }
@@ -448,7 +448,7 @@ struct GKSummary {
    /*! \brief the value of data */
    DType value;
    // constructor
-    Entry() {}
+    Entry() = default;
    // constructor
    Entry(RType rmin, RType rmax, DType value)
        : rmin(rmin), rmax(rmax), value(value) {}
@@ -591,17 +591,17 @@ template<typename DType, typename RType, class TSummary>
 class QuantileSketchTemplate {
 public:
  /*! \brief type of summary type */
-  typedef TSummary Summary;
+  using Summary = TSummary;
  /*! \brief the entry type */
-  typedef typename Summary::Entry Entry;
+  using Entry = typename Summary::Entry;
  /*! \brief same as summary, but use STL to backup the space */
  struct SummaryContainer : public Summary {
    std::vector<Entry> space;
-    SummaryContainer(const SummaryContainer &src) : Summary(NULL, src.size) {
+    SummaryContainer(const SummaryContainer &src) : Summary(nullptr, src.size) {
      this->space = src.space;
      this->data = dmlc::BeginPtr(this->space);
    }
-    SummaryContainer() : Summary(NULL, 0) {
+    SummaryContainer() : Summary(nullptr, 0) {
    }
    /*! \brief reserve space for summary */
    inline void Reserve(size_t size) {
@@ -775,7 +775,7 @@ class QuantileSketchTemplate {
  inline void InitLevel(size_t nlevel) {
    if (level.size() >= nlevel) return;
    data.resize(limit_size * nlevel);
-    level.resize(nlevel, Summary(NULL, 0));
+    level.resize(nlevel, Summary(nullptr, 0));
    for (size_t l = 0; l < level.size(); ++l) {
      level[l].data = dmlc::BeginPtr(data) + l * limit_size;
    }
--- a/src/common/random.h
+++ b/src/common/random.h
@@ -15,7 +15,7 @@ namespace common {
 /*!
 * \brief Define mt19937 as default type Random Engine.
 */
-typedef std::mt19937 RandomEngine;
+using RandomEngine = std::mt19937;

 #if XGBOOST_CUSTOMIZE_GLOBAL_PRNG
 /*!
@@ -56,7 +56,7 @@ typedef CustomGlobalRandomEngine GlobalRandomEngine;
 /*!
 * \brief global random engine
 */
-typedef RandomEngine GlobalRandomEngine;
+using GlobalRandomEngine = RandomEngine;
 #endif

 /*!
--- a/src/common/row_set.h
+++ b/src/common/row_set.h
@@ -21,18 +21,18 @@ class RowSetCollection {
   *  rows (instances) associated with a particular node in a decision
   *  tree. */
  struct Elem {
-    const size_t* begin;
-    const size_t* end;
-    int node_id;
+    const size_t* begin{nullptr};
+    const size_t* end{nullptr};
+    int node_id{-1};
      // id of node associated with this instance set; -1 means uninitialized
-    Elem(void)
-        : begin(nullptr), end(nullptr), node_id(-1) {}
+    Elem()
+         = default;
    Elem(const size_t* begin,
         const size_t* end,
         int node_id)
        : begin(begin), end(end), node_id(node_id) {}

-    inline size_t size() const {
+    inline size_t Size() const {
      return end - begin;
    }
  };
@@ -42,11 +42,11 @@ class RowSetCollection {
    std::vector<size_t> right;
  };

-  inline std::vector<Elem>::const_iterator begin() const {
+  inline std::vector<Elem>::const_iterator begin() const {  // NOLINT
    return elem_of_each_node_.begin();
  }

-  inline std::vector<Elem>::const_iterator end() const {
+  inline std::vector<Elem>::const_iterator end() const {  // NOLINT
    return elem_of_each_node_.end();
  }

@@ -88,7 +88,7 @@ class RowSetCollection {
                       unsigned left_node_id,
                       unsigned right_node_id) {
    const Elem e = elem_of_each_node_[node_id];
-    const bst_omp_uint nthread = static_cast<bst_omp_uint>(row_split_tloc.size());
+    const auto nthread = static_cast<bst_omp_uint>(row_split_tloc.size());
    CHECK(e.begin != nullptr);
    size_t* all_begin = dmlc::BeginPtr(row_indices_);
    size_t* begin = all_begin + (e.begin - all_begin);
--- a/src/common/timer.h
+++ b/src/common/timer.h
@@ -12,10 +12,10 @@
 namespace xgboost {
 namespace common {
 struct Timer {
-  typedef std::chrono::high_resolution_clock ClockT;
-  typedef std::chrono::high_resolution_clock::time_point TimePointT;
-  typedef std::chrono::high_resolution_clock::duration DurationT;
-  typedef std::chrono::duration<double> SecondsT;
+  using ClockT = std::chrono::high_resolution_clock;
+  using TimePointT = std::chrono::high_resolution_clock::time_point;
+  using DurationT = std::chrono::high_resolution_clock::duration;
+  using SecondsT = std::chrono::duration<double>;

  TimePointT start;
  DurationT elapsed;
@@ -70,7 +70,7 @@ struct Monitor {
    if (debug_verbose) {
 #ifdef __CUDACC__
 #include "device_helpers.cuh"
-      dh::synchronize_n_devices(dList.size(), dList);
+      dh::SynchronizeNDevices(dList.size(), dList);
 #endif
    }
    timer_map[name].Start();
@@ -80,7 +80,7 @@ struct Monitor {
    if (debug_verbose) {
 #ifdef __CUDACC__
 #include "device_helpers.cuh"
-      dh::synchronize_n_devices(dList.size(), dList);
+      dh::SynchronizeNDevices(dList.size(), dList);
 #endif
    }
    timer_map[name].Stop();
--- a/src/data/data.cc
+++ b/src/data/data.cc
@@ -24,51 +24,51 @@ DMLC_REGISTRY_ENABLE(::xgboost::data::SparsePageFormatReg);
 namespace xgboost {
 // implementation of inline functions
 void MetaInfo::Clear() {
-  num_row = num_col = num_nonzero = 0;
-  labels.clear();
-  root_index.clear();
-  group_ptr.clear();
-  weights.clear();
-  base_margin.clear();
+  num_row_ = num_col_ = num_nonzero_ = 0;
+  labels_.clear();
+  root_index_.clear();
+  group_ptr_.clear();
+  weights_.clear();
+  base_margin_.clear();
 }

 void MetaInfo::SaveBinary(dmlc::Stream *fo) const {
  int32_t version = kVersion;
  fo->Write(&version, sizeof(version));
-  fo->Write(&num_row, sizeof(num_row));
-  fo->Write(&num_col, sizeof(num_col));
-  fo->Write(&num_nonzero, sizeof(num_nonzero));
-  fo->Write(labels);
-  fo->Write(group_ptr);
-  fo->Write(weights);
-  fo->Write(root_index);
-  fo->Write(base_margin);
+  fo->Write(&num_row_, sizeof(num_row_));
+  fo->Write(&num_col_, sizeof(num_col_));
+  fo->Write(&num_nonzero_, sizeof(num_nonzero_));
+  fo->Write(labels_);
+  fo->Write(group_ptr_);
+  fo->Write(weights_);
+  fo->Write(root_index_);
+  fo->Write(base_margin_);
 }

 void MetaInfo::LoadBinary(dmlc::Stream *fi) {
  int version;
  CHECK(fi->Read(&version, sizeof(version)) == sizeof(version)) << "MetaInfo: invalid version";
  CHECK_EQ(version, kVersion) << "MetaInfo: invalid format";
-  CHECK(fi->Read(&num_row, sizeof(num_row)) == sizeof(num_row)) << "MetaInfo: invalid format";
-  CHECK(fi->Read(&num_col, sizeof(num_col)) == sizeof(num_col)) << "MetaInfo: invalid format";
-  CHECK(fi->Read(&num_nonzero, sizeof(num_nonzero)) == sizeof(num_nonzero))
+  CHECK(fi->Read(&num_row_, sizeof(num_row_)) == sizeof(num_row_)) << "MetaInfo: invalid format";
+  CHECK(fi->Read(&num_col_, sizeof(num_col_)) == sizeof(num_col_)) << "MetaInfo: invalid format";
+  CHECK(fi->Read(&num_nonzero_, sizeof(num_nonzero_)) == sizeof(num_nonzero_))
      << "MetaInfo: invalid format";
-  CHECK(fi->Read(&labels)) <<  "MetaInfo: invalid format";
-  CHECK(fi->Read(&group_ptr)) << "MetaInfo: invalid format";
-  CHECK(fi->Read(&weights)) << "MetaInfo: invalid format";
-  CHECK(fi->Read(&root_index)) << "MetaInfo: invalid format";
-  CHECK(fi->Read(&base_margin)) << "MetaInfo: invalid format";
+  CHECK(fi->Read(&labels_)) <<  "MetaInfo: invalid format";
+  CHECK(fi->Read(&group_ptr_)) << "MetaInfo: invalid format";
+  CHECK(fi->Read(&weights_)) << "MetaInfo: invalid format";
+  CHECK(fi->Read(&root_index_)) << "MetaInfo: invalid format";
+  CHECK(fi->Read(&base_margin_)) << "MetaInfo: invalid format";
 }

 // try to load group information from file, if exists
 inline bool MetaTryLoadGroup(const std::string& fname,
                             std::vector<unsigned>* group) {
  std::unique_ptr<dmlc::Stream> fi(dmlc::Stream::Create(fname.c_str(), "r", true));
-  if (fi.get() == nullptr) return false;
+  if (fi == nullptr) return false;
  dmlc::istream is(fi.get());
  group->clear();
  group->push_back(0);
-  unsigned nline;
+  unsigned nline = 0;
  while (is >> nline) {
    group->push_back(group->back() + nline);
  }
@@ -79,7 +79,7 @@ inline bool MetaTryLoadGroup(const std::string& fname,
 inline bool MetaTryLoadFloatInfo(const std::string& fname,
                                 std::vector<bst_float>* data) {
  std::unique_ptr<dmlc::Stream> fi(dmlc::Stream::Create(fname.c_str(), "r", true));
-  if (fi.get() == nullptr) return false;
+  if (fi == nullptr) return false;
  dmlc::istream is(fi.get());
  data->clear();
  bst_float value;
@@ -93,16 +93,16 @@ inline bool MetaTryLoadFloatInfo(const std::string& fname,
 #define DISPATCH_CONST_PTR(dtype, old_ptr, cast_ptr, proc)              \
  switch (dtype) {                                                      \
    case kFloat32: {                                                    \
-      const float* cast_ptr = reinterpret_cast<const float*>(old_ptr); proc; break; \
+      auto cast_ptr = reinterpret_cast<const float*>(old_ptr); proc; break; \
    }                                                                   \
    case kDouble: {                                                     \
-      const double* cast_ptr = reinterpret_cast<const double*>(old_ptr); proc; break; \
+      auto cast_ptr = reinterpret_cast<const double*>(old_ptr); proc; break; \
    }                                                                   \
    case kUInt32: {                                                     \
-      const uint32_t* cast_ptr = reinterpret_cast<const uint32_t*>(old_ptr); proc; break; \
+      auto cast_ptr = reinterpret_cast<const uint32_t*>(old_ptr); proc; break; \
    }                                                                   \
    case kUInt64: {                                                     \
-      const uint64_t* cast_ptr = reinterpret_cast<const uint64_t*>(old_ptr); proc; break; \
+      auto cast_ptr = reinterpret_cast<const uint64_t*>(old_ptr); proc; break; \
    }                                                                   \
    default: LOG(FATAL) << "Unknown data type" << dtype;                \
  }                                                                     \
@@ -110,28 +110,28 @@ inline bool MetaTryLoadFloatInfo(const std::string& fname,

 void MetaInfo::SetInfo(const char* key, const void* dptr, DataType dtype, size_t num) {
  if (!std::strcmp(key, "root_index")) {
-    root_index.resize(num);
+    root_index_.resize(num);
    DISPATCH_CONST_PTR(dtype, dptr, cast_dptr,
-                       std::copy(cast_dptr, cast_dptr + num, root_index.begin()));
+                       std::copy(cast_dptr, cast_dptr + num, root_index_.begin()));
  } else if (!std::strcmp(key, "label")) {
-    labels.resize(num);
+    labels_.resize(num);
    DISPATCH_CONST_PTR(dtype, dptr, cast_dptr,
-                       std::copy(cast_dptr, cast_dptr + num, labels.begin()));
+                       std::copy(cast_dptr, cast_dptr + num, labels_.begin()));
  } else if (!std::strcmp(key, "weight")) {
-    weights.resize(num);
+    weights_.resize(num);
    DISPATCH_CONST_PTR(dtype, dptr, cast_dptr,
-                       std::copy(cast_dptr, cast_dptr + num, weights.begin()));
+                       std::copy(cast_dptr, cast_dptr + num, weights_.begin()));
  } else if (!std::strcmp(key, "base_margin")) {
-    base_margin.resize(num);
+    base_margin_.resize(num);
    DISPATCH_CONST_PTR(dtype, dptr, cast_dptr,
-                       std::copy(cast_dptr, cast_dptr + num, base_margin.begin()));
+                       std::copy(cast_dptr, cast_dptr + num, base_margin_.begin()));
  } else if (!std::strcmp(key, "group")) {
-    group_ptr.resize(num + 1);
+    group_ptr_.resize(num + 1);
    DISPATCH_CONST_PTR(dtype, dptr, cast_dptr,
-                       std::copy(cast_dptr, cast_dptr + num, group_ptr.begin() + 1));
-    group_ptr[0] = 0;
-    for (size_t i = 1; i < group_ptr.size(); ++i) {
-      group_ptr[i] = group_ptr[i - 1] + group_ptr[i];
+                       std::copy(cast_dptr, cast_dptr + num, group_ptr_.begin() + 1));
+    group_ptr_[0] = 0;
+    for (size_t i = 1; i < group_ptr_.size(); ++i) {
+      group_ptr_[i] = group_ptr_[i - 1] + group_ptr_[i];
    }
  }
 }
@@ -163,7 +163,9 @@ DMatrix* DMatrix::Load(const std::string& uri,
             << "-" <<  rabit::GetWorldSize()
             << cache_shards[i].substr(pos, cache_shards[i].length());
        }
-        if (i + 1 != cache_shards.size()) os << ':';
+        if (i + 1 != cache_shards.size()) {
+          os << ':';
+        }
      }
      cache_file = os.str();
    }
@@ -187,7 +189,7 @@ DMatrix* DMatrix::Load(const std::string& uri,
  if (file_format == "auto" && npart == 1) {
    int magic;
    std::unique_ptr<dmlc::Stream> fi(dmlc::Stream::Create(fname.c_str(), "r", true));
-    if (fi.get() != nullptr) {
+    if (fi != nullptr) {
      common::PeekableInStream is(fi.get());
      if (is.PeekRead(&magic, sizeof(magic)) == sizeof(magic) &&
          magic == data::SimpleCSRSource::kMagic) {
@@ -195,8 +197,8 @@ DMatrix* DMatrix::Load(const std::string& uri,
        source->LoadBinary(&is);
        DMatrix* dmat = DMatrix::Create(std::move(source), cache_file);
        if (!silent) {
-          LOG(CONSOLE) << dmat->info().num_row << 'x' << dmat->info().num_col << " matrix with "
-                       << dmat->info().num_nonzero << " entries loaded from " << uri;
+          LOG(CONSOLE) << dmat->Info().num_row_ << 'x' << dmat->Info().num_col_ << " matrix with "
+                       << dmat->Info().num_nonzero_ << " entries loaded from " << uri;
        }
        return dmat;
      }
@@ -207,26 +209,26 @@ DMatrix* DMatrix::Load(const std::string& uri,
      dmlc::Parser<uint32_t>::Create(fname.c_str(), partid, npart, file_format.c_str()));
  DMatrix* dmat = DMatrix::Create(parser.get(), cache_file);
  if (!silent) {
-    LOG(CONSOLE) << dmat->info().num_row << 'x' << dmat->info().num_col << " matrix with "
-                 << dmat->info().num_nonzero << " entries loaded from " << uri;
+    LOG(CONSOLE) << dmat->Info().num_row_ << 'x' << dmat->Info().num_col_ << " matrix with "
+                 << dmat->Info().num_nonzero_ << " entries loaded from " << uri;
  }
  /* sync up number of features after matrix loaded.
   * partitioned data will fail the train/val validation check 
   * since partitioned data not knowing the real number of features. */
-  rabit::Allreduce<rabit::op::Max>(&dmat->info().num_col, 1);
+  rabit::Allreduce<rabit::op::Max>(&dmat->Info().num_col_, 1);
  // backward compatiblity code.
  if (!load_row_split) {
-    MetaInfo& info = dmat->info();
-    if (MetaTryLoadGroup(fname + ".group", &info.group_ptr) && !silent) {
-      LOG(CONSOLE) << info.group_ptr.size() - 1
+    MetaInfo& info = dmat->Info();
+    if (MetaTryLoadGroup(fname + ".group", &info.group_ptr_) && !silent) {
+      LOG(CONSOLE) << info.group_ptr_.size() - 1
                   << " groups are loaded from " << fname << ".group";
    }
-    if (MetaTryLoadFloatInfo(fname + ".base_margin", &info.base_margin) && !silent) {
-      LOG(CONSOLE) << info.base_margin.size()
+    if (MetaTryLoadFloatInfo(fname + ".base_margin", &info.base_margin_) && !silent) {
+      LOG(CONSOLE) << info.base_margin_.size()
                   << " base_margin are loaded from " << fname << ".base_margin";
    }
-    if (MetaTryLoadFloatInfo(fname + ".weight", &info.weights) && !silent) {
-      LOG(CONSOLE) << info.weights.size()
+    if (MetaTryLoadFloatInfo(fname + ".weight", &info.weights_) && !silent) {
+      LOG(CONSOLE) << info.weights_.size()
                   << " weights are loaded from " << fname << ".weight";
    }
  }
--- a/src/data/simple_csr_source.cc
+++ b/src/data/simple_csr_source.cc
@@ -18,7 +18,7 @@ void SimpleCSRSource::Clear() {

 void SimpleCSRSource::CopyFrom(DMatrix* src) {
  this->Clear();
-  this->info = src->info();
+  this->info = src->Info();
  dmlc::DataIter<RowBatch>* iter = src->RowIterator();
  iter->BeforeFirst();
  while (iter->Next()) {
@@ -36,10 +36,10 @@ void SimpleCSRSource::CopyFrom(dmlc::Parser<uint32_t>* parser) {
  while (parser->Next()) {
    const dmlc::RowBlock<uint32_t>& batch = parser->Value();
    if (batch.label != nullptr) {
-      info.labels.insert(info.labels.end(), batch.label, batch.label + batch.size);
+      info.labels_.insert(info.labels_.end(), batch.label, batch.label + batch.size);
    }
    if (batch.weight != nullptr) {
-      info.weights.insert(info.weights.end(), batch.weight, batch.weight + batch.size);
+      info.weights_.insert(info.weights_.end(), batch.weight, batch.weight + batch.size);
    }
    // Remove the assertion on batch.index, which can be null in the case that the data in this
    // batch is entirely sparse. Although it's true that this indicates a likely issue with the
@@ -48,13 +48,13 @@ void SimpleCSRSource::CopyFrom(dmlc::Parser<uint32_t>* parser) {
    // CHECK(batch.index != nullptr);

    // update information
-    this->info.num_row += batch.size;
+    this->info.num_row_ += batch.size;
    // copy the data over
    for (size_t i = batch.offset[0]; i < batch.offset[batch.size]; ++i) {
      uint32_t index = batch.index[i];
      bst_float fvalue = batch.value == nullptr ? 1.0f : batch.value[i];
-      row_data_.push_back(SparseBatch::Entry(index, fvalue));
-      this->info.num_col = std::max(this->info.num_col,
+      row_data_.emplace_back(index, fvalue);
+      this->info.num_col_ = std::max(this->info.num_col_,
                                    static_cast<uint64_t>(index + 1));
    }
    size_t top = row_ptr_.size();
@@ -62,7 +62,7 @@ void SimpleCSRSource::CopyFrom(dmlc::Parser<uint32_t>* parser) {
      row_ptr_.push_back(row_ptr_[top - 1] + batch.offset[i + 1] - batch.offset[0]);
    }
  }
-  this->info.num_nonzero = static_cast<uint64_t>(row_data_.size());
+  this->info.num_nonzero_ = static_cast<uint64_t>(row_data_.size());
 }

 void SimpleCSRSource::LoadBinary(dmlc::Stream* fi) {
--- a/src/data/simple_csr_source.h
+++ b/src/data/simple_csr_source.h
@@ -35,9 +35,9 @@ class SimpleCSRSource : public DataSource {
  std::vector<RowBatch::Entry> row_data_;
  // functions
  /*! \brief default constructor */
-  SimpleCSRSource() : row_ptr_(1, 0), at_first_(true) {}
+  SimpleCSRSource() : row_ptr_(1, 0) {}
  /*! \brief destructor */
-  virtual ~SimpleCSRSource() {}
+  ~SimpleCSRSource() override = default;
  /*! \brief clear the data structure */
  void Clear();
  /*!
@@ -72,7 +72,7 @@ class SimpleCSRSource : public DataSource {

 private:
  /*! \brief internal variable, used to support iterator interface */
-  bool at_first_;
+  bool at_first_{true};
  /*! \brief */
  RowBatch batch_;
 };
--- a/src/data/simple_dmatrix.cc
+++ b/src/data/simple_dmatrix.cc
@@ -20,7 +20,7 @@ bool SimpleDMatrix::ColBatchIter::Next() {
  data_ptr_ += 1;
  SparsePage* pcol = cpages_[data_ptr_ - 1].get();
  batch_.size = col_index_.size();
-  col_data_.resize(col_index_.size(), SparseBatch::Inst(NULL, 0));
+  col_data_.resize(col_index_.size(), SparseBatch::Inst(nullptr, 0));
  for (size_t i = 0; i < col_data_.size(); ++i) {
    const bst_uint ridx = col_index_[i];
    col_data_[i] = SparseBatch::Inst
@@ -33,7 +33,7 @@ bool SimpleDMatrix::ColBatchIter::Next() {
 }

 dmlc::DataIter<ColBatch>* SimpleDMatrix::ColIterator() {
-  size_t ncol = this->info().num_col;
+  size_t ncol = this->Info().num_col_;
  col_iter_.col_index_.resize(ncol);
  for (size_t i = 0; i < ncol; ++i) {
    col_iter_.col_index_[i] = static_cast<bst_uint>(i);
@@ -43,10 +43,10 @@ dmlc::DataIter<ColBatch>* SimpleDMatrix::ColIterator() {
 }

 dmlc::DataIter<ColBatch>* SimpleDMatrix::ColIterator(const std::vector<bst_uint>&fset) {
-  size_t ncol = this->info().num_col;
+  size_t ncol = this->Info().num_col_;
  col_iter_.col_index_.resize(0);
-  for (size_t i = 0; i < fset.size(); ++i) {
-    if (fset[i] < ncol) col_iter_.col_index_.push_back(fset[i]);
+  for (auto fidx : fset) {
+    if (fidx < ncol) col_iter_.col_index_.push_back(fidx);
  }
  col_iter_.BeforeFirst();
  return &col_iter_;
@@ -56,9 +56,9 @@ void SimpleDMatrix::InitColAccess(const std::vector<bool> &enabled,
                                  float pkeep,
                                  size_t max_row_perbatch, bool sorted) {
  if (this->HaveColAccess(sorted)) return;
-  col_iter_.sorted = sorted;
+  col_iter_.sorted_ = sorted;
  col_iter_.cpages_.clear();
-  if (info().num_row < max_row_perbatch) {
+  if (Info().num_row_ < max_row_perbatch) {
    std::unique_ptr<SparsePage> page(new SparsePage());
    this->MakeOneBatch(enabled, pkeep, page.get(), sorted);
    col_iter_.cpages_.push_back(std::move(page));
@@ -66,10 +66,10 @@ void SimpleDMatrix::InitColAccess(const std::vector<bool> &enabled,
    this->MakeManyBatch(enabled, pkeep, max_row_perbatch, sorted);
  }
  // setup col-size
-  col_size_.resize(info().num_col);
+  col_size_.resize(Info().num_col_);
  std::fill(col_size_.begin(), col_size_.end(), 0);
-  for (size_t i = 0; i < col_iter_.cpages_.size(); ++i) {
-    SparsePage *pcol = col_iter_.cpages_[i].get();
+  for (auto & cpage : col_iter_.cpages_) {
+    SparsePage *pcol = cpage.get();
    for (size_t j = 0; j < pcol->Size(); ++j) {
      col_size_[j] += pcol->offset[j + 1] - pcol->offset[j];
    }
@@ -80,14 +80,14 @@ void SimpleDMatrix::InitColAccess(const std::vector<bool> &enabled,
 void SimpleDMatrix::MakeOneBatch(const std::vector<bool>& enabled, float pkeep,
                                 SparsePage* pcol, bool sorted) {
  // clear rowset
-  buffered_rowset_.clear();
+  buffered_rowset_.Clear();
  // bit map
  const int nthread = omp_get_max_threads();
  std::vector<bool> bmap;
  pcol->Clear();
  common::ParallelGroupBuilder<SparseBatch::Entry>
      builder(&pcol->offset, &pcol->data);
-  builder.InitBudget(info().num_col, nthread);
+  builder.InitBudget(Info().num_col_, nthread);
  // start working
  dmlc::DataIter<RowBatch>* iter = this->RowIterator();
  iter->BeforeFirst();
@@ -99,9 +99,9 @@ void SimpleDMatrix::MakeOneBatch(const std::vector<bool>& enabled, float pkeep,

    long batch_size = static_cast<long>(batch.size); // NOLINT(*)
    for (long i = 0; i < batch_size; ++i) { // NOLINT(*)
-      bst_uint ridx = static_cast<bst_uint>(batch.base_rowid + i);
+      auto ridx = static_cast<bst_uint>(batch.base_rowid + i);
      if (pkeep == 1.0f || coin_flip(rnd)) {
-        buffered_rowset_.push_back(ridx);
+        buffered_rowset_.PushBack(ridx);
      } else {
        bmap[i] = false;
      }
@@ -109,7 +109,7 @@ void SimpleDMatrix::MakeOneBatch(const std::vector<bool>& enabled, float pkeep,
    #pragma omp parallel for schedule(static)
    for (long i = 0; i < batch_size; ++i) { // NOLINT(*)
      int tid = omp_get_thread_num();
-      bst_uint ridx = static_cast<bst_uint>(batch.base_rowid + i);
+      auto ridx = static_cast<bst_uint>(batch.base_rowid + i);
      if (bmap[ridx]) {
        RowBatch::Inst inst = batch[i];
        for (bst_uint j = 0; j < inst.length; ++j) {
@@ -128,13 +128,13 @@ void SimpleDMatrix::MakeOneBatch(const std::vector<bool>& enabled, float pkeep,
    #pragma omp parallel for schedule(static)
    for (long i = 0; i < static_cast<long>(batch.size); ++i) { // NOLINT(*)
      int tid = omp_get_thread_num();
-      bst_uint ridx = static_cast<bst_uint>(batch.base_rowid + i);
+      auto ridx = static_cast<bst_uint>(batch.base_rowid + i);
      if (bmap[ridx]) {
        RowBatch::Inst inst = batch[i];
        for (bst_uint j = 0; j < inst.length; ++j) {
          if (enabled[inst[j].index]) {
            builder.Push(inst[j].index,
-                         SparseBatch::Entry((bst_uint)(batch.base_rowid+i),
+                         SparseBatch::Entry(static_cast<bst_uint>(batch.base_rowid+i),
                                            inst[j].fvalue), tid);
          }
        }
@@ -142,11 +142,11 @@ void SimpleDMatrix::MakeOneBatch(const std::vector<bool>& enabled, float pkeep,
    }
  }

-  CHECK_EQ(pcol->Size(), info().num_col);
+  CHECK_EQ(pcol->Size(), Info().num_col_);

  if (sorted) {
    // sort columns
-    bst_omp_uint ncol = static_cast<bst_omp_uint>(pcol->Size());
+    auto ncol = static_cast<bst_omp_uint>(pcol->Size());
 #pragma omp parallel for schedule(dynamic, 1) num_threads(nthread)
    for (bst_omp_uint i = 0; i < ncol; ++i) {
      if (pcol->offset[i] < pcol->offset[i + 1]) {
@@ -164,7 +164,7 @@ void SimpleDMatrix::MakeManyBatch(const std::vector<bool>& enabled,
  size_t btop = 0;
  std::bernoulli_distribution coin_flip(pkeep);
  auto& rnd = common::GlobalRandom();
-  buffered_rowset_.clear();
+  buffered_rowset_.Clear();
  // internal temp cache
  SparsePage tmp; tmp.Clear();
  // start working
@@ -174,16 +174,16 @@ void SimpleDMatrix::MakeManyBatch(const std::vector<bool>& enabled,
  while (iter->Next()) {
    const RowBatch &batch = iter->Value();
    for (size_t i = 0; i < batch.size; ++i) {
-      bst_uint ridx = static_cast<bst_uint>(batch.base_rowid + i);
+      auto ridx = static_cast<bst_uint>(batch.base_rowid + i);
      if (pkeep == 1.0f || coin_flip(rnd)) {
-        buffered_rowset_.push_back(ridx);
+        buffered_rowset_.PushBack(ridx);
        tmp.Push(batch[i]);
      }
      if (tmp.Size() >= max_row_perbatch) {
        std::unique_ptr<SparsePage> page(new SparsePage());
        this->MakeColPage(tmp.GetRowBatch(0), btop, enabled, page.get(), sorted);
        col_iter_.cpages_.push_back(std::move(page));
-        btop = buffered_rowset_.size();
+        btop = buffered_rowset_.Size();
        tmp.Clear();
      }
    }
@@ -205,7 +205,7 @@ void SimpleDMatrix::MakeColPage(const RowBatch& batch,
  pcol->Clear();
  common::ParallelGroupBuilder<SparseBatch::Entry>
      builder(&pcol->offset, &pcol->data);
-  builder.InitBudget(info().num_col, nthread);
+  builder.InitBudget(Info().num_col_, nthread);
  bst_omp_uint ndata = static_cast<bst_uint>(batch.size);
  #pragma omp parallel for schedule(static) num_threads(nthread)
  for (bst_omp_uint i = 0; i < ndata; ++i) {
@@ -231,10 +231,10 @@ void SimpleDMatrix::MakeColPage(const RowBatch& batch,
          tid);
    }
  }
-  CHECK_EQ(pcol->Size(), info().num_col);
+  CHECK_EQ(pcol->Size(), Info().num_col_);
  // sort columns
  if (sorted) {
-    bst_omp_uint ncol = static_cast<bst_omp_uint>(pcol->Size());
+    auto ncol = static_cast<bst_omp_uint>(pcol->Size());
 #pragma omp parallel for schedule(dynamic, 1) num_threads(nthread)
    for (bst_omp_uint i = 0; i < ncol; ++i) {
      if (pcol->offset[i] < pcol->offset[i + 1]) {
--- a/src/data/simple_dmatrix.h
+++ b/src/data/simple_dmatrix.h
@@ -22,11 +22,11 @@ class SimpleDMatrix : public DMatrix {
  explicit SimpleDMatrix(std::unique_ptr<DataSource>&& source)
      : source_(std::move(source)) {}

-  MetaInfo& info() override {
+  MetaInfo& Info() override {
    return source_->info;
  }

-  const MetaInfo& info() const override {
+  const MetaInfo& Info() const override {
    return source_->info;
  }

@@ -37,10 +37,10 @@ class SimpleDMatrix : public DMatrix {
  }

  bool HaveColAccess(bool sorted) const override {
-    return col_size_.size() != 0 && col_iter_.sorted == sorted;
+    return col_size_.size() != 0 && col_iter_.sorted_ == sorted;
  }

-  const RowSet& buffered_rowset() const override {
+  const RowSet& BufferedRowset() const override {
    return buffered_rowset_;
  }

@@ -49,8 +49,8 @@ class SimpleDMatrix : public DMatrix {
  }

  float GetColDensity(size_t cidx) const override {
-    size_t nmiss = buffered_rowset_.size() - col_size_[cidx];
-    return 1.0f - (static_cast<float>(nmiss)) / buffered_rowset_.size();
+    size_t nmiss = buffered_rowset_.Size() - col_size_[cidx];
+    return 1.0f - (static_cast<float>(nmiss)) / buffered_rowset_.Size();
  }

  dmlc::DataIter<ColBatch>* ColIterator() override;
@@ -67,7 +67,7 @@ class SimpleDMatrix : public DMatrix {
  // in-memory column batch iterator.
  struct ColBatchIter: dmlc::DataIter<ColBatch> {
   public:
-    ColBatchIter() : data_ptr_(0), sorted(false) {}
+    ColBatchIter()  = default;
    void BeforeFirst() override {
      data_ptr_ = 0;
    }
@@ -86,11 +86,11 @@ class SimpleDMatrix : public DMatrix {
    // column sparse pages
    std::vector<std::unique_ptr<SparsePage> > cpages_;
    // data pointer
-    size_t data_ptr_;
+    size_t data_ptr_{0};
    // temporal space for batch
    ColBatch batch_;
    // Is column sorted?
-    bool sorted;
+    bool sorted_{false};
  };

  // source data pointer.
--- a/src/data/sparse_batch_page.h
+++ b/src/data/sparse_batch_page.h
@@ -51,11 +51,11 @@ class SparsePage {
    return offset.size() - 1;
  }
  /*! \return estimation of memory cost of this page */
-  inline size_t MemCostBytes(void) const {
+  inline size_t MemCostBytes() const {
    return offset.size() * sizeof(size_t) + data.size() * sizeof(SparseBatch::Entry);
  }
  /*! \brief clear the page */
-  inline void Clear(void) {
+  inline void Clear() {
    min_index = 0;
    offset.clear();
    offset.push_back(0);
@@ -92,7 +92,7 @@ class SparsePage {
    for (size_t i = batch.offset[0]; i < batch.offset[batch.size]; ++i) {
      uint32_t index = batch.index[i];
      bst_float fvalue = batch.value == nullptr ? 1.0f : batch.value[i];
-      data.push_back(SparseBatch::Entry(index, fvalue));
+      data.emplace_back(index, fvalue);
    }
    CHECK_EQ(offset.back(), data.size());
  }
@@ -145,7 +145,7 @@ class SparsePage {
 class SparsePage::Format {
 public:
  /*! \brief virtual destructor */
-  virtual ~Format() {}
+  virtual ~Format() = default;
  /*!
   * \brief Load all the segments into page, advance fi to end of the block.
   * \param page The data to read page into.
--- a/src/data/sparse_page_dmatrix.cc
+++ b/src/data/sparse_page_dmatrix.cc
@@ -94,9 +94,9 @@ void SparsePageDMatrix::ColPageIter::Init(const std::vector<bst_uint>& index_set
 }

 dmlc::DataIter<ColBatch>* SparsePageDMatrix::ColIterator() {
-  CHECK(col_iter_.get() != nullptr);
+  CHECK(col_iter_ != nullptr);
  std::vector<bst_uint> col_index;
-  size_t ncol = this->info().num_col;
+  size_t ncol = this->Info().num_col_;
  for (size_t i = 0; i < ncol; ++i) {
    col_index.push_back(static_cast<bst_uint>(i));
  }
@@ -106,12 +106,12 @@ dmlc::DataIter<ColBatch>* SparsePageDMatrix::ColIterator() {

 dmlc::DataIter<ColBatch>* SparsePageDMatrix::
 ColIterator(const std::vector<bst_uint>& fset) {
-  CHECK(col_iter_.get() != nullptr);
+  CHECK(col_iter_ != nullptr);
  std::vector<bst_uint> col_index;
-  size_t ncol = this->info().num_col;
-  for (size_t i = 0; i < fset.size(); ++i) {
-    if (fset[i] < ncol) {
-      col_index.push_back(fset[i]);
+  size_t ncol = this->Info().num_col_;
+  for (auto fidx : fset) {
+    if (fidx < ncol) {
+      col_index.push_back(fidx);
    }
  }
  col_iter_->Init(col_index, false);
@@ -126,7 +126,7 @@ bool SparsePageDMatrix::TryInitColData(bool sorted) {
    std::string col_meta_name = cache_shards[0] + ".col.meta";
    std::unique_ptr<dmlc::Stream> fmeta(
        dmlc::Stream::Create(col_meta_name.c_str(), "r", true));
-    if (fmeta.get() == nullptr) return false;
+    if (fmeta == nullptr) return false;
    CHECK(fmeta->Read(&buffered_rowset_)) << "invalid col.meta file";
    CHECK(fmeta->Read(&col_size_)) << "invalid col.meta file";
  }
@@ -136,7 +136,7 @@ bool SparsePageDMatrix::TryInitColData(bool sorted) {
    std::string col_data_name = prefix + ".col.page";
    std::unique_ptr<dmlc::SeekStream> fdata(
        dmlc::SeekStream::CreateForRead(col_data_name.c_str(), true));
-    if (fdata.get() == nullptr) return false;
+    if (fdata == nullptr) return false;
    files.push_back(std::move(fdata));
  }
  col_iter_.reset(new ColPageIter(std::move(files)));
@@ -150,12 +150,12 @@ void SparsePageDMatrix::InitColAccess(const std::vector<bool>& enabled,
                                      size_t max_row_perbatch, bool sorted) {
  if (HaveColAccess(sorted)) return;
  if (TryInitColData(sorted)) return;
-  const MetaInfo& info = this->info();
+  const MetaInfo& info = this->Info();
  if (max_row_perbatch == std::numeric_limits<size_t>::max()) {
    max_row_perbatch = kMaxRowPerBatch;
  }
-  buffered_rowset_.clear();
-  col_size_.resize(info.num_col);
+  buffered_rowset_.Clear();
+  col_size_.resize(info.num_col_);
  std::fill(col_size_.begin(), col_size_.end(), 0);
  dmlc::DataIter<RowBatch>* iter = this->RowIterator();
  std::bernoulli_distribution coin_flip(pkeep);
@@ -173,7 +173,7 @@ void SparsePageDMatrix::InitColAccess(const std::vector<bool>& enabled,
    const int nthread = std::max(omp_get_max_threads(), std::max(omp_get_num_procs() / 2 - 1, 1));
    common::ParallelGroupBuilder<SparseBatch::Entry>
    builder(&pcol->offset, &pcol->data);
-    builder.InitBudget(info.num_col, nthread);
+    builder.InitBudget(info.num_col_, nthread);
    bst_omp_uint ndata = static_cast<bst_uint>(prow.Size());
    #pragma omp parallel for schedule(static) num_threads(nthread)
    for (bst_omp_uint i = 0; i < ndata; ++i) {
@@ -196,10 +196,10 @@ void SparsePageDMatrix::InitColAccess(const std::vector<bool>& enabled,
                     tid);
      }
    }
-    CHECK_EQ(pcol->Size(), info.num_col);
+    CHECK_EQ(pcol->Size(), info.num_col_);
    // sort columns
    if (sorted) {
-      bst_omp_uint ncol = static_cast<bst_omp_uint>(pcol->Size());
+      auto ncol = static_cast<bst_omp_uint>(pcol->Size());
 #pragma omp parallel for schedule(dynamic, 1) num_threads(nthread)
      for (bst_omp_uint i = 0; i < ncol; ++i) {
        if (pcol->offset[i] < pcol->offset[i + 1]) {
@@ -213,16 +213,16 @@ void SparsePageDMatrix::InitColAccess(const std::vector<bool>& enabled,

  auto make_next_col = [&] (SparsePage* dptr) {
    tmp.Clear();
-    size_t btop = buffered_rowset_.size();
+    size_t btop = buffered_rowset_.Size();

    while (true) {
      if (batch_ptr != batch_top) {
        const RowBatch& batch = iter->Value();
        CHECK_EQ(batch_top, batch.size);
        for (size_t i = batch_ptr; i < batch_top; ++i) {
-          bst_uint ridx = static_cast<bst_uint>(batch.base_rowid + i);
+          auto ridx = static_cast<bst_uint>(batch.base_rowid + i);
          if (pkeep == 1.0f || coin_flip(rnd)) {
-            buffered_rowset_.push_back(ridx);
+            buffered_rowset_.PushBack(ridx);
            tmp.Push(batch[i]);
          }

@@ -263,7 +263,7 @@ void SparsePageDMatrix::InitColAccess(const std::vector<bool>& enabled,
    double tstart = dmlc::GetTime();
    size_t bytes_write = 0;
    // print every 4 sec.
-    const double kStep = 4.0;
+    constexpr double kStep = 4.0;
    size_t tick_expected = kStep;

    while (make_next_col(page.get())) {
--- a/src/data/sparse_page_dmatrix.h
+++ b/src/data/sparse_page_dmatrix.h
@@ -10,6 +10,7 @@
 #include <xgboost/base.h>
 #include <xgboost/data.h>
 #include <dmlc/threadediter.h>
+#include <utility>
 #include <vector>
 #include <algorithm>
 #include <string>
@@ -22,15 +23,15 @@ namespace data {
 class SparsePageDMatrix : public DMatrix {
 public:
  explicit SparsePageDMatrix(std::unique_ptr<DataSource>&& source,
-                             const std::string& cache_info)
-      : source_(std::move(source)), cache_info_(cache_info) {
+                             std::string  cache_info)
+      : source_(std::move(source)), cache_info_(std::move(cache_info)) {
  }

-  MetaInfo& info() override {
+  MetaInfo& Info() override {
    return source_->info;
  }

-  const MetaInfo& info() const override {
+  const MetaInfo& Info() const override {
    return source_->info;
  }

@@ -41,10 +42,10 @@ class SparsePageDMatrix : public DMatrix {
  }

  bool HaveColAccess(bool sorted) const override {
-    return col_iter_.get() != nullptr && col_iter_->sorted == sorted;
+    return col_iter_ != nullptr && col_iter_->sorted == sorted;
  }

-  const RowSet& buffered_rowset() const override {
+  const RowSet& BufferedRowset() const override {
    return buffered_rowset_;
  }

@@ -53,8 +54,8 @@ class SparsePageDMatrix : public DMatrix {
  }

  float GetColDensity(size_t cidx) const override {
-    size_t nmiss = buffered_rowset_.size() - col_size_[cidx];
-    return 1.0f - (static_cast<float>(nmiss)) / buffered_rowset_.size();
+    size_t nmiss = buffered_rowset_.Size() - col_size_[cidx];
+    return 1.0f - (static_cast<float>(nmiss)) / buffered_rowset_.Size();
  }

  bool SingleColBlock() const override {
@@ -79,7 +80,7 @@ class SparsePageDMatrix : public DMatrix {
  class ColPageIter : public dmlc::DataIter<ColBatch> {
   public:
    explicit ColPageIter(std::vector<std::unique_ptr<dmlc::SeekStream> >&& files);
-    virtual ~ColPageIter();
+    ~ColPageIter() override;
    void BeforeFirst() override;
    const ColBatch &Value() const override {
      return out_;
--- a/src/data/sparse_page_raw_format.cc
+++ b/src/data/sparse_page_raw_format.cc
@@ -34,8 +34,7 @@ class SparsePageRawFormat : public SparsePage::Format {
    // setup the offset
    page->offset.clear();
    page->offset.push_back(0);
-    for (size_t i = 0; i < sorted_index_set.size(); ++i) {
-      bst_uint fid = sorted_index_set[i];
+    for (unsigned int fid : sorted_index_set) {
      CHECK_LT(fid + 1, disk_offset_.size());
      size_t size = disk_offset_[fid + 1] - disk_offset_[fid];
      page->offset.push_back(page->offset.back() + size);
--- a/src/data/sparse_page_source.cc
+++ b/src/data/sparse_page_source.cc
@@ -89,12 +89,12 @@ bool SparsePageSource::CacheExist(const std::string& cache_info) {
  {
    std::string name_info = cache_shards[0];
    std::unique_ptr<dmlc::Stream> finfo(dmlc::Stream::Create(name_info.c_str(), "r", true));
-    if (finfo.get() == nullptr) return false;
+    if (finfo == nullptr) return false;
  }
  for (const std::string& prefix : cache_shards) {
    std::string name_row = prefix + ".row.page";
    std::unique_ptr<dmlc::Stream> frow(dmlc::Stream::Create(name_row.c_str(), "r", true));
-    if (frow.get() == nullptr) return false;
+    if (frow == nullptr) return false;
  }
  return true;
 }
@@ -119,22 +119,22 @@ void SparsePageSource::Create(dmlc::Parser<uint32_t>* src,
    size_t bytes_write = 0;
    double tstart = dmlc::GetTime();
    // print every 4 sec.
-    const double kStep = 4.0;
+    constexpr double kStep = 4.0;
    size_t tick_expected = static_cast<double>(kStep);

    while (src->Next()) {
      const dmlc::RowBlock<uint32_t>& batch = src->Value();
      if (batch.label != nullptr) {
-        info.labels.insert(info.labels.end(), batch.label, batch.label + batch.size);
+        info.labels_.insert(info.labels_.end(), batch.label, batch.label + batch.size);
      }
      if (batch.weight != nullptr) {
-        info.weights.insert(info.weights.end(), batch.weight, batch.weight + batch.size);
+        info.weights_.insert(info.weights_.end(), batch.weight, batch.weight + batch.size);
      }
-      info.num_row += batch.size;
-      info.num_nonzero +=  batch.offset[batch.size] - batch.offset[0];
+      info.num_row_ += batch.size;
+      info.num_nonzero_ +=  batch.offset[batch.size] - batch.offset[0];
      for (size_t i = batch.offset[0]; i < batch.offset[batch.size]; ++i) {
        uint32_t index = batch.index[i];
-        info.num_col = std::max(info.num_col,
+        info.num_col_ = std::max(info.num_col_,
                                static_cast<uint64_t>(index + 1));
      }
      page->Push(batch);
@@ -183,7 +183,7 @@ void SparsePageSource::Create(DMatrix* src,
    std::shared_ptr<SparsePage> page;
    writer.Alloc(&page); page->Clear();

-    MetaInfo info = src->info();
+    MetaInfo info = src->Info();
    size_t bytes_write = 0;
    double tstart = dmlc::GetTime();
    dmlc::DataIter<RowBatch>* iter = src->RowIterator();
--- a/src/data/sparse_page_source.h
+++ b/src/data/sparse_page_source.h
@@ -33,7 +33,7 @@ class SparsePageSource : public DataSource {
   */
  explicit SparsePageSource(const std::string& cache_prefix) noexcept(false);
  /*! \brief destructor */
-  virtual ~SparsePageSource();
+  ~SparsePageSource() override;
  // implement Next
  bool Next() override;
  // implement BeforeFirst
--- a/src/data/sparse_page_writer.cc
+++ b/src/data/sparse_page_writer.cc
@@ -34,7 +34,7 @@ SparsePage::Writer::Writer(
          fo->Write(format_shard);
          std::shared_ptr<SparsePage> page;
          while (wqueue->Pop(&page)) {
-            if (page.get() == nullptr) break;
+            if (page == nullptr) break;
            fmt->Write(*page, fo.get());
            qrecycle_.Push(std::move(page));
          }
@@ -61,7 +61,7 @@ void SparsePage::Writer::PushWrite(std::shared_ptr<SparsePage>&& page) {
 }

 void SparsePage::Writer::Alloc(std::shared_ptr<SparsePage>* out_page) {
-  CHECK(out_page->get() == nullptr);
+  CHECK(*out_page == nullptr);
  if (num_free_buffer_ != 0) {
    out_page->reset(new SparsePage());
    --num_free_buffer_;
--- a/src/gbm/gblinear.cc
+++ b/src/gbm/gblinear.cc
@@ -52,9 +52,9 @@ class GBLinear : public GradientBooster {
  explicit GBLinear(const std::vector<std::shared_ptr<DMatrix> > &cache,
                    bst_float base_margin)
      : base_margin_(base_margin),
-        sum_instance_weight(0),
-        sum_weight_complete(false),
-        is_converged(false) {
+        sum_instance_weight_(0),
+        sum_weight_complete_(false),
+        is_converged_(false) {
    // Add matrices to the prediction cache
    for (auto &d : cache) {
      PredictionCacheEntry e;
@@ -63,46 +63,48 @@ class GBLinear : public GradientBooster {
    }
  }
  void Configure(const std::vector<std::pair<std::string, std::string> >& cfg) override {
-    if (model.weight.size() == 0) {
-      model.param.InitAllowUnknown(cfg);
+    if (model_.weight.size() == 0) {
+      model_.param.InitAllowUnknown(cfg);
    }
-    param.InitAllowUnknown(cfg);
-    updater.reset(LinearUpdater::Create(param.updater));
-    updater->Init(cfg);
-    monitor.Init("GBLinear ", param.debug_verbose);
+    param_.InitAllowUnknown(cfg);
+    updater_.reset(LinearUpdater::Create(param_.updater));
+    updater_->Init(cfg);
+    monitor_.Init("GBLinear ", param_.debug_verbose);
  }
  void Load(dmlc::Stream* fi) override {
-    model.Load(fi);
+    model_.Load(fi);
  }
  void Save(dmlc::Stream* fo) const override {
-    model.Save(fo);
+    model_.Save(fo);
  }

  void DoBoost(DMatrix *p_fmat,
-               HostDeviceVector<bst_gpair> *in_gpair,
+               HostDeviceVector<GradientPair> *in_gpair,
               ObjFunction* obj) override {
-    monitor.Start("DoBoost");
+    monitor_.Start("DoBoost");

    if (!p_fmat->HaveColAccess(false)) {
-      std::vector<bool> enabled(p_fmat->info().num_col, true);
-      p_fmat->InitColAccess(enabled, 1.0f, param.max_row_perbatch, false);
+      monitor_.Start("InitColAccess");
+      std::vector<bool> enabled(p_fmat->Info().num_col_, true);
+      p_fmat->InitColAccess(enabled, 1.0f, param_.max_row_perbatch, false);
+      monitor_.Stop("InitColAccess");
    }

-    model.LazyInitModel();
+    model_.LazyInitModel();
    this->LazySumWeights(p_fmat);

    if (!this->CheckConvergence()) {
-      updater->Update(&in_gpair->data_h(), p_fmat, &model, sum_instance_weight);
+      updater_->Update(in_gpair, p_fmat, &model_, sum_instance_weight_);
    }
    this->UpdatePredictionCache();

-    monitor.Stop("DoBoost");
+    monitor_.Stop("DoBoost");
  }

  void PredictBatch(DMatrix *p_fmat,
                    HostDeviceVector<bst_float> *out_preds,
                    unsigned ntree_limit) override {
-    monitor.Start("PredictBatch");
+    monitor_.Start("PredictBatch");
    CHECK_EQ(ntree_limit, 0U)
        << "GBLinear::Predict ntrees is only valid for gbtree predictor";

@@ -110,19 +112,19 @@ class GBLinear : public GradientBooster {
    auto it = cache_.find(p_fmat);
    if (it != cache_.end() && it->second.predictions.size() != 0) {
      std::vector<bst_float> &y = it->second.predictions;
-      out_preds->resize(y.size());
-      std::copy(y.begin(), y.end(), out_preds->data_h().begin());
+      out_preds->Resize(y.size());
+      std::copy(y.begin(), y.end(), out_preds->HostVector().begin());
    } else {
-      this->PredictBatchInternal(p_fmat, &out_preds->data_h());
+      this->PredictBatchInternal(p_fmat, &out_preds->HostVector());
    }
-    monitor.Stop("PredictBatch");
+    monitor_.Stop("PredictBatch");
  }
  // add base margin
  void PredictInstance(const SparseBatch::Inst &inst,
               std::vector<bst_float> *out_preds,
               unsigned ntree_limit,
               unsigned root_index) override {
-    const int ngroup = model.param.num_output_group;
+    const int ngroup = model_.param.num_output_group;
    for (int gid = 0; gid < ngroup; ++gid) {
      this->Pred(inst, dmlc::BeginPtr(*out_preds), gid, base_margin_);
    }
@@ -138,15 +140,15 @@ class GBLinear : public GradientBooster {
                           std::vector<bst_float>* out_contribs,
                           unsigned ntree_limit, bool approximate, int condition = 0,
                           unsigned condition_feature = 0) override {
-    model.LazyInitModel();
+    model_.LazyInitModel();
    CHECK_EQ(ntree_limit, 0U)
        << "GBLinear::PredictContribution: ntrees is only valid for gbtree predictor";
-    const std::vector<bst_float>& base_margin = p_fmat->info().base_margin;
-    const int ngroup = model.param.num_output_group;
-    const size_t ncolumns = model.param.num_feature + 1;
+    const std::vector<bst_float>& base_margin = p_fmat->Info().base_margin_;
+    const int ngroup = model_.param.num_output_group;
+    const size_t ncolumns = model_.param.num_feature + 1;
    // allocate space for (#features + bias) times #groups times #rows
    std::vector<bst_float>& contribs = *out_contribs;
-    contribs.resize(p_fmat->info().num_row * ncolumns * ngroup);
+    contribs.resize(p_fmat->Info().num_row_ * ncolumns * ngroup);
    // make sure contributions is zeroed, we could be reusing a previously allocated one
    std::fill(contribs.begin(), contribs.end(), 0);
    // start collecting the contributions
@@ -155,21 +157,21 @@ class GBLinear : public GradientBooster {
    while (iter->Next()) {
      const RowBatch& batch = iter->Value();
      // parallel over local batch
-      const bst_omp_uint nsize = static_cast<bst_omp_uint>(batch.size);
+      const auto nsize = static_cast<bst_omp_uint>(batch.size);
      #pragma omp parallel for schedule(static)
      for (bst_omp_uint i = 0; i < nsize; ++i) {
        const RowBatch::Inst &inst = batch[i];
-        size_t row_idx = static_cast<size_t>(batch.base_rowid + i);
+        auto row_idx = static_cast<size_t>(batch.base_rowid + i);
        // loop over output groups
        for (int gid = 0; gid < ngroup; ++gid) {
          bst_float *p_contribs = &contribs[(row_idx * ngroup + gid) * ncolumns];
          // calculate linear terms' contributions
          for (bst_uint c = 0; c < inst.length; ++c) {
-            if (inst[c].index >= model.param.num_feature) continue;
-            p_contribs[inst[c].index] = inst[c].fvalue * model[inst[c].index][gid];
+            if (inst[c].index >= model_.param.num_feature) continue;
+            p_contribs[inst[c].index] = inst[c].fvalue * model_[inst[c].index][gid];
          }
          // add base margin to BIAS
-          p_contribs[ncolumns - 1] = model.bias()[gid] +
+          p_contribs[ncolumns - 1] = model_.bias()[gid] +
            ((base_margin.size() != 0) ? base_margin[row_idx * ngroup + gid] : base_margin_);
        }
      }
@@ -182,34 +184,34 @@ class GBLinear : public GradientBooster {
                             std::vector<bst_float>& contribs = *out_contribs;

     // linear models have no interaction effects
-     const size_t nelements = model.param.num_feature*model.param.num_feature;
-     contribs.resize(p_fmat->info().num_row * nelements * model.param.num_output_group);
+     const size_t nelements = model_.param.num_feature*model_.param.num_feature;
+     contribs.resize(p_fmat->Info().num_row_ * nelements * model_.param.num_output_group);
     std::fill(contribs.begin(), contribs.end(), 0);
  }

  std::vector<std::string> DumpModel(const FeatureMap& fmap,
                                     bool with_stats,
                                     std::string format) const override {
-    return model.DumpModel(fmap, with_stats, format);
+    return model_.DumpModel(fmap, with_stats, format);
  }

 protected:
  void PredictBatchInternal(DMatrix *p_fmat,
               std::vector<bst_float> *out_preds) {
-    monitor.Start("PredictBatchInternal");
-      model.LazyInitModel();
+    monitor_.Start("PredictBatchInternal");
+      model_.LazyInitModel();
    std::vector<bst_float> &preds = *out_preds;
-    const std::vector<bst_float>& base_margin = p_fmat->info().base_margin;
+    const std::vector<bst_float>& base_margin = p_fmat->Info().base_margin_;
    // start collecting the prediction
    dmlc::DataIter<RowBatch> *iter = p_fmat->RowIterator();
-    const int ngroup = model.param.num_output_group;
-    preds.resize(p_fmat->info().num_row * ngroup);
+    const int ngroup = model_.param.num_output_group;
+    preds.resize(p_fmat->Info().num_row_ * ngroup);
    while (iter->Next()) {
      const RowBatch &batch = iter->Value();
      // output convention: nrow * k, where nrow is number of rows
      // k is number of group
      // parallel over local batch
-      const omp_ulong nsize = static_cast<omp_ulong>(batch.size);
+      const auto nsize = static_cast<omp_ulong>(batch.size);
      #pragma omp parallel for schedule(static)
      for (omp_ulong i = 0; i < nsize; ++i) {
        const size_t ridx = batch.base_rowid + i;
@@ -221,14 +223,14 @@ class GBLinear : public GradientBooster {
        }
      }
    }
-    monitor.Stop("PredictBatchInternal");
+    monitor_.Stop("PredictBatchInternal");
  }
  void UpdatePredictionCache() {
    // update cache entry
    for (auto &kv : cache_) {
      PredictionCacheEntry &e = kv.second;
      if (e.predictions.size() == 0) {
-        size_t n = model.param.num_output_group * e.data->info().num_row;
+        size_t n = model_.param.num_output_group * e.data->Info().num_row_;
        e.predictions.resize(n);
      }
      this->PredictBatchInternal(e.data.get(), &e.predictions);
@@ -236,53 +238,53 @@ class GBLinear : public GradientBooster {
  }

  bool CheckConvergence() {
-    if (param.tolerance == 0.0f) return false;
-    if (is_converged) return true;
-    if (previous_model.weight.size() != model.weight.size()) {
-      previous_model = model;
+    if (param_.tolerance == 0.0f) return false;
+    if (is_converged_) return true;
+    if (previous_model_.weight.size() != model_.weight.size()) {
+      previous_model_ = model_;
      return false;
    }
    float largest_dw = 0.0;
-    for (size_t i = 0; i < model.weight.size(); i++) {
+    for (size_t i = 0; i < model_.weight.size(); i++) {
      largest_dw = std::max(
-          largest_dw, std::abs(model.weight[i] - previous_model.weight[i]));
+          largest_dw, std::abs(model_.weight[i] - previous_model_.weight[i]));
    }
-    previous_model = model;
+    previous_model_ = model_;

-    is_converged = largest_dw <= param.tolerance;
-    return is_converged;
+    is_converged_ = largest_dw <= param_.tolerance;
+    return is_converged_;
  }

  void LazySumWeights(DMatrix *p_fmat) {
-    if (!sum_weight_complete) {
-      auto &info = p_fmat->info();
-      for (size_t i = 0; i < info.num_row; i++) {
-        sum_instance_weight += info.GetWeight(i);
+    if (!sum_weight_complete_) {
+      auto &info = p_fmat->Info();
+      for (size_t i = 0; i < info.num_row_; i++) {
+        sum_instance_weight_ += info.GetWeight(i);
      }
-      sum_weight_complete = true;
+      sum_weight_complete_ = true;
    }
  }

  inline void Pred(const RowBatch::Inst &inst, bst_float *preds, int gid,
                   bst_float base) {
-    bst_float psum = model.bias()[gid] + base;
+    bst_float psum = model_.bias()[gid] + base;
    for (bst_uint i = 0; i < inst.length; ++i) {
-      if (inst[i].index >= model.param.num_feature) continue;
-      psum += inst[i].fvalue * model[inst[i].index][gid];
+      if (inst[i].index >= model_.param.num_feature) continue;
+      psum += inst[i].fvalue * model_[inst[i].index][gid];
    }
    preds[gid] = psum;
  }
  // biase margin score
  bst_float base_margin_;
  // model field
-  GBLinearModel model;
-  GBLinearModel previous_model;
-  GBLinearTrainParam param;
-  std::unique_ptr<LinearUpdater> updater;
-  double sum_instance_weight;
-  bool sum_weight_complete;
-  common::Monitor monitor;
-  bool is_converged;
+  GBLinearModel model_;
+  GBLinearModel previous_model_;
+  GBLinearTrainParam param_;
+  std::unique_ptr<LinearUpdater> updater_;
+  double sum_instance_weight_;
+  bool sum_weight_complete_;
+  common::Monitor monitor_;
+  bool is_converged_;

  /**
   * \struct  PredictionCacheEntry
--- a/src/gbm/gblinear_model.h
+++ b/src/gbm/gblinear_model.h
@@ -40,7 +40,7 @@ class GBLinearModel {
  // weight for each of feature, bias is the last one
  std::vector<bst_float> weight;
  // initialize the model parameter
-  inline void LazyInitModel(void) {
+  inline void LazyInitModel() {
    if (!weight.empty()) return;
    // bias is the last weight
    weight.resize((param.num_feature + 1) * param.num_output_group);
--- a/src/gbm/gbtree.cc
+++ b/src/gbm/gbtree.cc
@@ -143,32 +143,32 @@ class GBTree : public GradientBooster {
  }

  void Configure(const std::vector<std::pair<std::string, std::string> >& cfg) override {
-    this->cfg = cfg;
+    this->cfg_ = cfg;
    model_.Configure(cfg);
    // initialize the updaters only when needed.
-    std::string updater_seq = tparam.updater_seq;
-    tparam.InitAllowUnknown(cfg);
-    if (updater_seq != tparam.updater_seq) updaters.clear();
-    for (const auto& up : updaters) {
+    std::string updater_seq = tparam_.updater_seq;
+    tparam_.InitAllowUnknown(cfg);
+    if (updater_seq != tparam_.updater_seq) updaters_.clear();
+    for (const auto& up : updaters_) {
      up->Init(cfg);
    }
    // for the 'update' process_type, move trees into trees_to_update
-    if (tparam.process_type == kUpdate) {
+    if (tparam_.process_type == kUpdate) {
      model_.InitTreesToUpdate();
    }

    // configure predictor
-    predictor = std::unique_ptr<Predictor>(Predictor::Create(tparam.predictor));
-    predictor->Init(cfg, cache_);
-    monitor.Init("GBTree", tparam.debug_verbose);
+    predictor_ = std::unique_ptr<Predictor>(Predictor::Create(tparam_.predictor));
+    predictor_->Init(cfg, cache_);
+    monitor_.Init("GBTree", tparam_.debug_verbose);
  }

  void Load(dmlc::Stream* fi) override {
    model_.Load(fi);

-    this->cfg.clear();
-    this->cfg.push_back(std::make_pair(std::string("num_feature"),
-                                       common::ToString(model_.param.num_feature)));
+    this->cfg_.clear();
+    this->cfg_.emplace_back(std::string("num_feature"),
+                                       common::ToString(model_.param.num_feature));
  }

  void Save(dmlc::Stream* fo) const override {
@@ -177,29 +177,29 @@ class GBTree : public GradientBooster {

  bool AllowLazyCheckPoint() const override {
    return model_.param.num_output_group == 1 ||
-        tparam.updater_seq.find("distcol") != std::string::npos;
+        tparam_.updater_seq.find("distcol") != std::string::npos;
  }

  void DoBoost(DMatrix* p_fmat,
-               HostDeviceVector<bst_gpair>* in_gpair,
+               HostDeviceVector<GradientPair>* in_gpair,
               ObjFunction* obj) override {
    std::vector<std::vector<std::unique_ptr<RegTree> > > new_trees;
    const int ngroup = model_.param.num_output_group;
-    monitor.Start("BoostNewTrees");
+    monitor_.Start("BoostNewTrees");
    if (ngroup == 1) {
      std::vector<std::unique_ptr<RegTree> > ret;
      BoostNewTrees(in_gpair, p_fmat, 0, &ret);
      new_trees.push_back(std::move(ret));
    } else {
-      CHECK_EQ(in_gpair->size() % ngroup, 0U)
+      CHECK_EQ(in_gpair->Size() % ngroup, 0U)
          << "must have exactly ngroup*nrow gpairs";
      // TODO(canonizer): perform this on GPU if HostDeviceVector has device set.
-      HostDeviceVector<bst_gpair> tmp(in_gpair->size() / ngroup,
-                                      bst_gpair(), in_gpair->device());
-      std::vector<bst_gpair>& gpair_h = in_gpair->data_h();
-      bst_omp_uint nsize = static_cast<bst_omp_uint>(tmp.size());
+      HostDeviceVector<GradientPair> tmp(in_gpair->Size() / ngroup,
+                                      GradientPair(), in_gpair->Devices());
+      std::vector<GradientPair>& gpair_h = in_gpair->HostVector();
+      auto nsize = static_cast<bst_omp_uint>(tmp.Size());
      for (int gid = 0; gid < ngroup; ++gid) {
-        std::vector<bst_gpair>& tmp_h = tmp.data_h();
+        std::vector<GradientPair>& tmp_h = tmp.HostVector();
        #pragma omp parallel for schedule(static)
        for (bst_omp_uint i = 0; i < nsize; ++i) {
          tmp_h[i] = gpair_h[i * ngroup + gid];
@@ -209,43 +209,43 @@ class GBTree : public GradientBooster {
        new_trees.push_back(std::move(ret));
      }
    }
-    monitor.Stop("BoostNewTrees");
-    monitor.Start("CommitModel");
+    monitor_.Stop("BoostNewTrees");
+    monitor_.Start("CommitModel");
    this->CommitModel(std::move(new_trees));
-    monitor.Stop("CommitModel");
+    monitor_.Stop("CommitModel");
  }

  void PredictBatch(DMatrix* p_fmat,
               HostDeviceVector<bst_float>* out_preds,
               unsigned ntree_limit) override {
-    predictor->PredictBatch(p_fmat, out_preds, model_, 0, ntree_limit);
+    predictor_->PredictBatch(p_fmat, out_preds, model_, 0, ntree_limit);
  }

  void PredictInstance(const SparseBatch::Inst& inst,
               std::vector<bst_float>* out_preds,
               unsigned ntree_limit,
               unsigned root_index) override {
-    predictor->PredictInstance(inst, out_preds, model_,
+    predictor_->PredictInstance(inst, out_preds, model_,
                               ntree_limit, root_index);
  }

  void PredictLeaf(DMatrix* p_fmat,
                   std::vector<bst_float>* out_preds,
                   unsigned ntree_limit) override {
-    predictor->PredictLeaf(p_fmat, out_preds, model_, ntree_limit);
+    predictor_->PredictLeaf(p_fmat, out_preds, model_, ntree_limit);
  }

  void PredictContribution(DMatrix* p_fmat,
                           std::vector<bst_float>* out_contribs,
                           unsigned ntree_limit, bool approximate, int condition,
                           unsigned condition_feature) override {
-    predictor->PredictContribution(p_fmat, out_contribs, model_, ntree_limit, approximate);
+    predictor_->PredictContribution(p_fmat, out_contribs, model_, ntree_limit, approximate);
  }

  void PredictInteractionContributions(DMatrix* p_fmat,
                                       std::vector<bst_float>* out_contribs,
                                       unsigned ntree_limit, bool approximate) override {
-    predictor->PredictInteractionContributions(p_fmat, out_contribs, model_,
+    predictor_->PredictInteractionContributions(p_fmat, out_contribs, model_,
                                               ntree_limit, approximate);
  }

@@ -258,18 +258,18 @@ class GBTree : public GradientBooster {
 protected:
  // initialize updater before using them
  inline void InitUpdater() {
-    if (updaters.size() != 0) return;
-    std::string tval = tparam.updater_seq;
+    if (updaters_.size() != 0) return;
+    std::string tval = tparam_.updater_seq;
    std::vector<std::string> ups = common::Split(tval, ',');
    for (const std::string& pstr : ups) {
      std::unique_ptr<TreeUpdater> up(TreeUpdater::Create(pstr.c_str()));
-      up->Init(this->cfg);
-      updaters.push_back(std::move(up));
+      up->Init(this->cfg_);
+      updaters_.push_back(std::move(up));
    }
  }

  // do group specific group
-  inline void BoostNewTrees(HostDeviceVector<bst_gpair>* gpair,
+  inline void BoostNewTrees(HostDeviceVector<GradientPair>* gpair,
                            DMatrix *p_fmat,
                            int bst_group,
                            std::vector<std::unique_ptr<RegTree> >* ret) {
@@ -277,26 +277,27 @@ class GBTree : public GradientBooster {
    std::vector<RegTree*> new_trees;
    ret->clear();
    // create the trees
-    for (int i = 0; i < tparam.num_parallel_tree; ++i) {
-      if (tparam.process_type == kDefault) {
+    for (int i = 0; i < tparam_.num_parallel_tree; ++i) {
+      if (tparam_.process_type == kDefault) {
        // create new tree
        std::unique_ptr<RegTree> ptr(new RegTree());
-        ptr->param.InitAllowUnknown(this->cfg);
+        ptr->param.InitAllowUnknown(this->cfg_);
        ptr->InitModel();
        new_trees.push_back(ptr.get());
        ret->push_back(std::move(ptr));
-      } else if (tparam.process_type == kUpdate) {
+      } else if (tparam_.process_type == kUpdate) {
        CHECK_LT(model_.trees.size(), model_.trees_to_update.size());
        // move an existing tree from trees_to_update
        auto t = std::move(model_.trees_to_update[model_.trees.size() +
-                           bst_group * tparam.num_parallel_tree + i]);
+                           bst_group * tparam_.num_parallel_tree + i]);
        new_trees.push_back(t.get());
        ret->push_back(std::move(t));
      }
    }
    // update the trees
-    for (auto& up : updaters)
+    for (auto& up : updaters_) {
      up->Update(gpair, p_fmat, new_trees);
+}
  }

  // commit new trees all at once
@@ -307,22 +308,22 @@ class GBTree : public GradientBooster {
      num_new_trees += new_trees[gid].size();
      model_.CommitModel(std::move(new_trees[gid]), gid);
    }
-    predictor->UpdatePredictionCache(model_, &updaters, num_new_trees);
+    predictor_->UpdatePredictionCache(model_, &updaters_, num_new_trees);
  }

  // --- data structure ---
  GBTreeModel model_;
  // training parameter
-  GBTreeTrainParam tparam;
+  GBTreeTrainParam tparam_;
  // ----training fields----
  // configurations for tree
-  std::vector<std::pair<std::string, std::string> > cfg;
+  std::vector<std::pair<std::string, std::string> > cfg_;
  // the updaters that can be applied to each of tree
-  std::vector<std::unique_ptr<TreeUpdater>> updaters;
+  std::vector<std::unique_ptr<TreeUpdater>> updaters_;
  // Cached matrices
  std::vector<std::shared_ptr<DMatrix>> cache_;
-  std::unique_ptr<Predictor> predictor;
-  common::Monitor monitor;
+  std::unique_ptr<Predictor> predictor_;
+  common::Monitor monitor_;
 };

 // dart
@@ -333,22 +334,22 @@ class Dart : public GBTree {
  void Configure(const std::vector<std::pair<std::string, std::string> >& cfg) override {
    GBTree::Configure(cfg);
    if (model_.trees.size() == 0) {
-      dparam.InitAllowUnknown(cfg);
+      dparam_.InitAllowUnknown(cfg);
    }
  }

  void Load(dmlc::Stream* fi) override {
    GBTree::Load(fi);
-    weight_drop.resize(model_.param.num_trees);
+    weight_drop_.resize(model_.param.num_trees);
    if (model_.param.num_trees != 0) {
-      fi->Read(&weight_drop);
+      fi->Read(&weight_drop_);
    }
  }

  void Save(dmlc::Stream* fo) const override {
    GBTree::Save(fo);
-    if (weight_drop.size() != 0) {
-      fo->Write(weight_drop);
+    if (weight_drop_.size() != 0) {
+      fo->Write(weight_drop_);
    }
  }

@@ -357,7 +358,7 @@ class Dart : public GBTree {
                    HostDeviceVector<bst_float>* out_preds,
                    unsigned ntree_limit) override {
    DropTrees(ntree_limit);
-    PredLoopInternal<Dart>(p_fmat, &out_preds->data_h(), 0, ntree_limit, true);
+    PredLoopInternal<Dart>(p_fmat, &out_preds->HostVector(), 0, ntree_limit, true);
  }

  void PredictInstance(const SparseBatch::Inst& inst,
@@ -365,9 +366,9 @@ class Dart : public GBTree {
               unsigned ntree_limit,
               unsigned root_index) override {
    DropTrees(1);
-    if (thread_temp.size() == 0) {
-      thread_temp.resize(1, RegTree::FVec());
-      thread_temp[0].Init(model_.param.num_feature);
+    if (thread_temp_.size() == 0) {
+      thread_temp_.resize(1, RegTree::FVec());
+      thread_temp_[0].Init(model_.param.num_feature);
    }
    out_preds->resize(model_.param.num_output_group);
    ntree_limit *= model_.param.num_output_group;
@@ -378,7 +379,7 @@ class Dart : public GBTree {
    for (int gid = 0; gid < model_.param.num_output_group; ++gid) {
      (*out_preds)[gid]
          = PredValue(inst, gid, root_index,
-                      &thread_temp[0], 0, ntree_limit) + model_.base_margin;
+                      &thread_temp_[0], 0, ntree_limit) + model_.base_margin;
    }
  }

@@ -400,8 +401,8 @@ class Dart : public GBTree {
    }

    if (init_out_preds) {
-      size_t n = num_group * p_fmat->info().num_row;
-      const std::vector<bst_float>& base_margin = p_fmat->info().base_margin;
+      size_t n = num_group * p_fmat->Info().num_row_;
+      const std::vector<bst_float>& base_margin = p_fmat->Info().base_margin_;
      out_preds->resize(n);
      if (base_margin.size() != 0) {
        CHECK_EQ(out_preds->size(), n);
@@ -427,37 +428,37 @@ class Dart : public GBTree {
      int num_group,
      unsigned tree_begin,
      unsigned tree_end) {
-    const MetaInfo& info = p_fmat->info();
+    const MetaInfo& info = p_fmat->Info();
    const int nthread = omp_get_max_threads();
    CHECK_EQ(num_group, model_.param.num_output_group);
    InitThreadTemp(nthread);
    std::vector<bst_float>& preds = *out_preds;
    CHECK_EQ(model_.param.size_leaf_vector, 0)
        << "size_leaf_vector is enforced to 0 so far";
-    CHECK_EQ(preds.size(), p_fmat->info().num_row * num_group);
+    CHECK_EQ(preds.size(), p_fmat->Info().num_row_ * num_group);
    // start collecting the prediction
    dmlc::DataIter<RowBatch>* iter = p_fmat->RowIterator();
-    Derived* self = static_cast<Derived*>(this);
+    auto* self = static_cast<Derived*>(this);
    iter->BeforeFirst();
    while (iter->Next()) {
      const RowBatch &batch = iter->Value();
      // parallel over local batch
-      const int K = 8;
-      const bst_omp_uint nsize = static_cast<bst_omp_uint>(batch.size);
-      const bst_omp_uint rest = nsize % K;
+      constexpr int kUnroll = 8;
+      const auto nsize = static_cast<bst_omp_uint>(batch.size);
+      const bst_omp_uint rest = nsize % kUnroll;
      #pragma omp parallel for schedule(static)
-      for (bst_omp_uint i = 0; i < nsize - rest; i += K) {
+      for (bst_omp_uint i = 0; i < nsize - rest; i += kUnroll) {
        const int tid = omp_get_thread_num();
-        RegTree::FVec& feats = thread_temp[tid];
-        int64_t ridx[K];
-        RowBatch::Inst inst[K];
-        for (int k = 0; k < K; ++k) {
+        RegTree::FVec& feats = thread_temp_[tid];
+        int64_t ridx[kUnroll];
+        RowBatch::Inst inst[kUnroll];
+        for (int k = 0; k < kUnroll; ++k) {
          ridx[k] = static_cast<int64_t>(batch.base_rowid + i + k);
        }
-        for (int k = 0; k < K; ++k) {
+        for (int k = 0; k < kUnroll; ++k) {
          inst[k] = batch[i + k];
        }
-        for (int k = 0; k < K; ++k) {
+        for (int k = 0; k < kUnroll; ++k) {
          for (int gid = 0; gid < num_group; ++gid) {
            const size_t offset = ridx[k] * num_group + gid;
            preds[offset] +=
@@ -467,8 +468,8 @@ class Dart : public GBTree {
        }
      }
      for (bst_omp_uint i = nsize - rest; i < nsize; ++i) {
-        RegTree::FVec& feats = thread_temp[0];
-        const int64_t ridx = static_cast<int64_t>(batch.base_rowid + i);
+        RegTree::FVec& feats = thread_temp_[0];
+        const auto ridx = static_cast<int64_t>(batch.base_rowid + i);
        const RowBatch::Inst inst = batch[i];
        for (int gid = 0; gid < num_group; ++gid) {
          const size_t offset = ridx * num_group + gid;
@@ -489,9 +490,9 @@ class Dart : public GBTree {
      model_.CommitModel(std::move(new_trees[gid]), gid);
    }
    size_t num_drop = NormalizeTrees(num_new_trees);
-    if (dparam.silent != 1) {
+    if (dparam_.silent != 1) {
      LOG(INFO) << "drop " << num_drop << " trees, "
-                << "weight = " << weight_drop.back();
+                << "weight = " << weight_drop_.back();
    }
  }

@@ -506,10 +507,10 @@ class Dart : public GBTree {
    p_feats->Fill(inst);
    for (size_t i = tree_begin; i < tree_end; ++i) {
      if (model_.tree_info[i] == bst_group) {
-        bool drop = (std::binary_search(idx_drop.begin(), idx_drop.end(), i));
+        bool drop = (std::binary_search(idx_drop_.begin(), idx_drop_.end(), i));
        if (!drop) {
          int tid = model_.trees[i]->GetLeafIndex(*p_feats, root_index);
-          psum += weight_drop[i] * (*model_.trees[i])[tid].leaf_value();
+          psum += weight_drop_[i] * (*model_.trees[i])[tid].LeafValue();
        }
      }
    }
@@ -519,45 +520,45 @@ class Dart : public GBTree {

  // select which trees to drop
  inline void DropTrees(unsigned ntree_limit_drop) {
-    idx_drop.clear();
+    idx_drop_.clear();
    if (ntree_limit_drop > 0) return;

    std::uniform_real_distribution<> runif(0.0, 1.0);
    auto& rnd = common::GlobalRandom();
    bool skip = false;
-    if (dparam.skip_drop > 0.0) skip = (runif(rnd) < dparam.skip_drop);
+    if (dparam_.skip_drop > 0.0) skip = (runif(rnd) < dparam_.skip_drop);
    // sample some trees to drop
    if (!skip) {
-      if (dparam.sample_type == 1) {
+      if (dparam_.sample_type == 1) {
        bst_float sum_weight = 0.0;
-        for (size_t i = 0; i < weight_drop.size(); ++i) {
-          sum_weight += weight_drop[i];
+        for (auto elem : weight_drop_) {
+          sum_weight += elem;
        }
-        for (size_t i = 0; i < weight_drop.size(); ++i) {
-          if (runif(rnd) < dparam.rate_drop * weight_drop.size() * weight_drop[i] / sum_weight) {
-            idx_drop.push_back(i);
+        for (size_t i = 0; i < weight_drop_.size(); ++i) {
+          if (runif(rnd) < dparam_.rate_drop * weight_drop_.size() * weight_drop_[i] / sum_weight) {
+            idx_drop_.push_back(i);
          }
        }
-        if (dparam.one_drop && idx_drop.empty() && !weight_drop.empty()) {
+        if (dparam_.one_drop && idx_drop_.empty() && !weight_drop_.empty()) {
          // the expression below is an ugly but MSVC2013-friendly equivalent of
          // size_t i = std::discrete_distribution<size_t>(weight_drop.begin(),
          //                                               weight_drop.end())(rnd);
          size_t i = std::discrete_distribution<size_t>(
-            weight_drop.size(), 0., static_cast<double>(weight_drop.size()),
+            weight_drop_.size(), 0., static_cast<double>(weight_drop_.size()),
            [this](double x) -> double {
-              return weight_drop[static_cast<size_t>(x)];
+              return weight_drop_[static_cast<size_t>(x)];
            })(rnd);
-          idx_drop.push_back(i);
+          idx_drop_.push_back(i);
        }
      } else {
-        for (size_t i = 0; i < weight_drop.size(); ++i) {
-          if (runif(rnd) < dparam.rate_drop) {
-            idx_drop.push_back(i);
+        for (size_t i = 0; i < weight_drop_.size(); ++i) {
+          if (runif(rnd) < dparam_.rate_drop) {
+            idx_drop_.push_back(i);
          }
        }
-        if (dparam.one_drop && idx_drop.empty() && !weight_drop.empty()) {
-          size_t i = std::uniform_int_distribution<size_t>(0, weight_drop.size() - 1)(rnd);
-          idx_drop.push_back(i);
+        if (dparam_.one_drop && idx_drop_.empty() && !weight_drop_.empty()) {
+          size_t i = std::uniform_int_distribution<size_t>(0, weight_drop_.size() - 1)(rnd);
+          idx_drop_.push_back(i);
        }
      }
    }
@@ -565,58 +566,58 @@ class Dart : public GBTree {

  // set normalization factors
  inline size_t NormalizeTrees(size_t size_new_trees) {
-    float lr = 1.0 * dparam.learning_rate / size_new_trees;
-    size_t num_drop = idx_drop.size();
+    float lr = 1.0 * dparam_.learning_rate / size_new_trees;
+    size_t num_drop = idx_drop_.size();
    if (num_drop == 0) {
      for (size_t i = 0; i < size_new_trees; ++i) {
-        weight_drop.push_back(1.0);
+        weight_drop_.push_back(1.0);
      }
    } else {
-      if (dparam.normalize_type == 1) {
+      if (dparam_.normalize_type == 1) {
        // normalize_type 1
        float factor = 1.0 / (1.0 + lr);
-        for (size_t i = 0; i < idx_drop.size(); ++i) {
-          weight_drop[idx_drop[i]] *= factor;
+        for (auto i : idx_drop_) {
+          weight_drop_[i] *= factor;
        }
        for (size_t i = 0; i < size_new_trees; ++i) {
-          weight_drop.push_back(factor);
+          weight_drop_.push_back(factor);
        }
      } else {
        // normalize_type 0
        float factor = 1.0 * num_drop / (num_drop + lr);
-        for (size_t i = 0; i < idx_drop.size(); ++i) {
-          weight_drop[idx_drop[i]] *= factor;
+        for (auto i : idx_drop_) {
+          weight_drop_[i] *= factor;
        }
        for (size_t i = 0; i < size_new_trees; ++i) {
-          weight_drop.push_back(1.0 / (num_drop + lr));
+          weight_drop_.push_back(1.0 / (num_drop + lr));
        }
      }
    }
    // reset
-    idx_drop.clear();
+    idx_drop_.clear();
    return num_drop;
  }

  // init thread buffers
  inline void InitThreadTemp(int nthread) {
-    int prev_thread_temp_size = thread_temp.size();
+    int prev_thread_temp_size = thread_temp_.size();
    if (prev_thread_temp_size < nthread) {
-      thread_temp.resize(nthread, RegTree::FVec());
+      thread_temp_.resize(nthread, RegTree::FVec());
      for (int i = prev_thread_temp_size; i < nthread; ++i) {
-        thread_temp[i].Init(model_.param.num_feature);
+        thread_temp_[i].Init(model_.param.num_feature);
      }
    }
  }

  // --- data structure ---
  // training parameter
-  DartTrainParam dparam;
+  DartTrainParam dparam_;
  /*! \brief prediction buffer */
-  std::vector<bst_float> weight_drop;
+  std::vector<bst_float> weight_drop_;
  // indexes of dropped trees
-  std::vector<size_t> idx_drop;
+  std::vector<size_t> idx_drop_;
  // temporal storage for per thread
-  std::vector<RegTree::FVec> thread_temp;
+  std::vector<RegTree::FVec> thread_temp_;
 };

 // register the objective functions
@@ -627,7 +628,7 @@ DMLC_REGISTER_PARAMETER(DartTrainParam);
 XGBOOST_REGISTER_GBM(GBTree, "gbtree")
 .describe("Tree booster, gradient boosted trees.")
 .set_body([](const std::vector<std::shared_ptr<DMatrix> >& cached_mats, bst_float base_margin) {
-    GBTree* p = new GBTree(base_margin);
+    auto* p = new GBTree(base_margin);
    p->InitCache(cached_mats);
    return p;
  });
--- a/src/gbm/gbtree_model.h
+++ b/src/gbm/gbtree_model.h
@@ -70,8 +70,8 @@ struct GBTreeModel {

  void InitTreesToUpdate() {
    if (trees_to_update.size() == 0u) {
-      for (size_t i = 0; i < trees.size(); ++i) {
-        trees_to_update.push_back(std::move(trees[i]));
+      for (auto & tree : trees) {
+        trees_to_update.push_back(std::move(tree));
      }
      trees.clear();
      param.num_trees = 0;
@@ -100,8 +100,8 @@ struct GBTreeModel {
  void Save(dmlc::Stream* fo) const {
    CHECK_EQ(param.num_trees, static_cast<int>(trees.size()));
    fo->Write(&param, sizeof(param));
-    for (size_t i = 0; i < trees.size(); ++i) {
-      trees[i]->Save(fo);
+    for (const auto & tree : trees) {
+      tree->Save(fo);
    }
    if (tree_info.size() != 0) {
      fo->Write(dmlc::BeginPtr(tree_info), sizeof(int) * tree_info.size());
@@ -111,15 +111,15 @@ struct GBTreeModel {
  std::vector<std::string> DumpModel(const FeatureMap& fmap, bool with_stats,
                                     std::string format) const {
    std::vector<std::string> dump;
-    for (size_t i = 0; i < trees.size(); i++) {
-      dump.push_back(trees[i]->DumpModel(fmap, with_stats, format));
+    for (const auto & tree : trees) {
+      dump.push_back(tree->DumpModel(fmap, with_stats, format));
    }
    return dump;
  }
  void CommitModel(std::vector<std::unique_ptr<RegTree> >&& new_trees,
                   int bst_group) {
-    for (size_t i = 0; i < new_trees.size(); ++i) {
-      trees.push_back(std::move(new_trees[i]));
+    for (auto & new_tree : new_trees) {
+      trees.push_back(std::move(new_tree));
      tree_info.push_back(bst_group);
    }
    param.num_trees += static_cast<int>(new_trees.size());
--- a/src/learner.cc
+++ b/src/learner.cc
@@ -141,8 +141,8 @@ DMLC_REGISTER_PARAMETER(LearnerTrainParam);
 */
 class LearnerImpl : public Learner {
 public:
-  explicit LearnerImpl(const std::vector<std::shared_ptr<DMatrix> >& cache)
-      : cache_(cache) {
+  explicit LearnerImpl(std::vector<std::shared_ptr<DMatrix> >  cache)
+      : cache_(std::move(cache)) {
    // boosted tree
    name_obj_ = "reg:linear";
    name_gbm_ = "gbtree";
@@ -155,25 +155,25 @@ class LearnerImpl : public Learner {
  }

  void ConfigureUpdaters() {
-    if (tparam.tree_method == 0 || tparam.tree_method == 1 ||
-        tparam.tree_method == 2) {
+    if (tparam_.tree_method == 0 || tparam_.tree_method == 1 ||
+        tparam_.tree_method == 2) {
      if (cfg_.count("updater") == 0) {
-        if (tparam.dsplit == 1) {
+        if (tparam_.dsplit == 1) {
          cfg_["updater"] = "distcol";
-        } else if (tparam.dsplit == 2) {
+        } else if (tparam_.dsplit == 2) {
          cfg_["updater"] = "grow_histmaker,prune";
        }
-        if (tparam.prob_buffer_row != 1.0f) {
+        if (tparam_.prob_buffer_row != 1.0f) {
          cfg_["updater"] = "grow_histmaker,refresh,prune";
        }
      }
-    } else if (tparam.tree_method == 3) {
+    } else if (tparam_.tree_method == 3) {
      /* histogram-based algorithm */
      LOG(CONSOLE) << "Tree method is selected to be \'hist\', which uses a "
                      "single updater "
                   << "grow_fast_histmaker.";
      cfg_["updater"] = "grow_fast_histmaker";
-    } else if (tparam.tree_method == 4) {
+    } else if (tparam_.tree_method == 4) {
      this->AssertGPUSupport();
      if (cfg_.count("updater") == 0) {
        cfg_["updater"] = "grow_gpu,prune";
@@ -181,7 +181,7 @@ class LearnerImpl : public Learner {
      if (cfg_.count("predictor") == 0) {
        cfg_["predictor"] = "gpu_predictor";
      }
-    } else if (tparam.tree_method == 5) {
+    } else if (tparam_.tree_method == 5) {
      this->AssertGPUSupport();
      if (cfg_.count("updater") == 0) {
        cfg_["updater"] = "grow_gpu_hist";
@@ -195,8 +195,8 @@ class LearnerImpl : public Learner {
  void Configure(
      const std::vector<std::pair<std::string, std::string> >& args) override {
    // add to configurations
-    tparam.InitAllowUnknown(args);
-    monitor.Init("Learner", tparam.debug_verbose);
+    tparam_.InitAllowUnknown(args);
+    monitor_.Init("Learner", tparam_.debug_verbose);
    cfg_.clear();
    for (const auto& kv : args) {
      if (kv.first == "eval_metric") {
@@ -206,20 +206,20 @@ class LearnerImpl : public Learner {
        };
        if (std::all_of(metrics_.begin(), metrics_.end(), dup_check)) {
          metrics_.emplace_back(Metric::Create(kv.second));
-          mparam.contain_eval_metrics = 1;
+          mparam_.contain_eval_metrics = 1;
        }
      } else {
        cfg_[kv.first] = kv.second;
      }
    }
-    if (tparam.nthread != 0) {
-      omp_set_num_threads(tparam.nthread);
+    if (tparam_.nthread != 0) {
+      omp_set_num_threads(tparam_.nthread);
    }

    // add additional parameters
    // These are cosntraints that need to be satisfied.
-    if (tparam.dsplit == 0 && rabit::IsDistributed()) {
-      tparam.dsplit = 2;
+    if (tparam_.dsplit == 0 && rabit::IsDistributed()) {
+      tparam_.dsplit = 2;
    }

    if (cfg_.count("num_class") != 0) {
@@ -244,21 +244,21 @@ class LearnerImpl : public Learner {
    }

    if (!this->ModelInitialized()) {
-      mparam.InitAllowUnknown(args);
+      mparam_.InitAllowUnknown(args);
      name_obj_ = cfg_["objective"];
      name_gbm_ = cfg_["booster"];
      // set seed only before the model is initialized
-      common::GlobalRandom().seed(tparam.seed);
+      common::GlobalRandom().seed(tparam_.seed);
    }

    // set number of features correctly.
-    cfg_["num_feature"] = common::ToString(mparam.num_feature);
-    cfg_["num_class"] = common::ToString(mparam.num_class);
+    cfg_["num_feature"] = common::ToString(mparam_.num_feature);
+    cfg_["num_class"] = common::ToString(mparam_.num_class);

-    if (gbm_.get() != nullptr) {
+    if (gbm_ != nullptr) {
      gbm_->Configure(cfg_.begin(), cfg_.end());
    }
-    if (obj_.get() != nullptr) {
+    if (obj_ != nullptr) {
      obj_->Configure(cfg_.begin(), cfg_.end());
    }
  }
@@ -281,7 +281,7 @@ class LearnerImpl : public Learner {
    // use the peekable reader.
    fi = &fp;
    // read parameter
-    CHECK_EQ(fi->Read(&mparam, sizeof(mparam)), sizeof(mparam))
+    CHECK_EQ(fi->Read(&mparam_, sizeof(mparam_)), sizeof(mparam_))
        << "BoostLearner: wrong model format";
    {
      // backward compatibility code for compatible with old model type
@@ -303,9 +303,9 @@ class LearnerImpl : public Learner {
    CHECK(fi->Read(&name_gbm_)) << "BoostLearner: wrong model format";
    // duplicated code with LazyInitModel
    obj_.reset(ObjFunction::Create(name_obj_));
-    gbm_.reset(GradientBooster::Create(name_gbm_, cache_, mparam.base_score));
+    gbm_.reset(GradientBooster::Create(name_gbm_, cache_, mparam_.base_score));
    gbm_->Load(fi);
-    if (mparam.contain_extra_attrs != 0) {
+    if (mparam_.contain_extra_attrs != 0) {
      std::vector<std::pair<std::string, std::string> > attr;
      fi->Read(&attr);
      attributes_ =
@@ -316,35 +316,35 @@ class LearnerImpl : public Learner {
      fi->Read(&max_delta_step);
      cfg_["max_delta_step"] = max_delta_step;
    }
-    if (mparam.contain_eval_metrics != 0) {
+    if (mparam_.contain_eval_metrics != 0) {
      std::vector<std::string> metr;
      fi->Read(&metr);
      for (auto name : metr) {
        metrics_.emplace_back(Metric::Create(name));
      }
    }
-    cfg_["num_class"] = common::ToString(mparam.num_class);
-    cfg_["num_feature"] = common::ToString(mparam.num_feature);
+    cfg_["num_class"] = common::ToString(mparam_.num_class);
+    cfg_["num_feature"] = common::ToString(mparam_.num_feature);
    obj_->Configure(cfg_.begin(), cfg_.end());
  }

  // rabit save model to rabit checkpoint
  void Save(dmlc::Stream* fo) const override {
-    fo->Write(&mparam, sizeof(LearnerModelParam));
+    fo->Write(&mparam_, sizeof(LearnerModelParam));
    fo->Write(name_obj_);
    fo->Write(name_gbm_);
    gbm_->Save(fo);
-    if (mparam.contain_extra_attrs != 0) {
+    if (mparam_.contain_extra_attrs != 0) {
      std::vector<std::pair<std::string, std::string> > attr(
          attributes_.begin(), attributes_.end());
      fo->Write(attr);
    }
    if (name_obj_ == "count:poisson") {
-      std::map<std::string, std::string>::const_iterator it =
+      auto it =
          cfg_.find("max_delta_step");
      if (it != cfg_.end()) fo->Write(it->second);
    }
-    if (mparam.contain_eval_metrics != 0) {
+    if (mparam_.contain_eval_metrics != 0) {
      std::vector<std::string> metr;
      for (auto& ev : metrics_) {
        metr.emplace_back(ev->Name());
@@ -354,37 +354,37 @@ class LearnerImpl : public Learner {
  }

  void UpdateOneIter(int iter, DMatrix* train) override {
-    monitor.Start("UpdateOneIter");
+    monitor_.Start("UpdateOneIter");
    CHECK(ModelInitialized())
        << "Always call InitModel or LoadModel before update";
-    if (tparam.seed_per_iteration || rabit::IsDistributed()) {
-      common::GlobalRandom().seed(tparam.seed * kRandSeedMagic + iter);
+    if (tparam_.seed_per_iteration || rabit::IsDistributed()) {
+      common::GlobalRandom().seed(tparam_.seed * kRandSeedMagic + iter);
    }
    this->LazyInitDMatrix(train);
-    monitor.Start("PredictRaw");
+    monitor_.Start("PredictRaw");
    this->PredictRaw(train, &preds_);
-    monitor.Stop("PredictRaw");
-    monitor.Start("GetGradient");
-    obj_->GetGradient(&preds_, train->info(), iter, &gpair_);
-    monitor.Stop("GetGradient");
+    monitor_.Stop("PredictRaw");
+    monitor_.Start("GetGradient");
+    obj_->GetGradient(&preds_, train->Info(), iter, &gpair_);
+    monitor_.Stop("GetGradient");
    gbm_->DoBoost(train, &gpair_, obj_.get());
-    monitor.Stop("UpdateOneIter");
+    monitor_.Stop("UpdateOneIter");
  }

  void BoostOneIter(int iter, DMatrix* train,
-                    HostDeviceVector<bst_gpair>* in_gpair) override {
-    monitor.Start("BoostOneIter");
-    if (tparam.seed_per_iteration || rabit::IsDistributed()) {
-      common::GlobalRandom().seed(tparam.seed * kRandSeedMagic + iter);
+                    HostDeviceVector<GradientPair>* in_gpair) override {
+    monitor_.Start("BoostOneIter");
+    if (tparam_.seed_per_iteration || rabit::IsDistributed()) {
+      common::GlobalRandom().seed(tparam_.seed * kRandSeedMagic + iter);
    }
    this->LazyInitDMatrix(train);
    gbm_->DoBoost(train, in_gpair);
-    monitor.Stop("BoostOneIter");
+    monitor_.Stop("BoostOneIter");
  }

  std::string EvalOneIter(int iter, const std::vector<DMatrix*>& data_sets,
                          const std::vector<std::string>& data_names) override {
-    monitor.Start("EvalOneIter");
+    monitor_.Start("EvalOneIter");
    std::ostringstream os;
    os << '[' << iter << ']' << std::setiosflags(std::ios::fixed);
    if (metrics_.size() == 0) {
@@ -395,17 +395,17 @@ class LearnerImpl : public Learner {
      obj_->EvalTransform(&preds_);
      for (auto& ev : metrics_) {
        os << '\t' << data_names[i] << '-' << ev->Name() << ':'
-           << ev->Eval(preds_.data_h(), data_sets[i]->info(), tparam.dsplit == 2);
+           << ev->Eval(preds_.HostVector(), data_sets[i]->Info(), tparam_.dsplit == 2);
      }
    }

-    monitor.Stop("EvalOneIter");
+    monitor_.Stop("EvalOneIter");
    return os.str();
  }

  void SetAttr(const std::string& key, const std::string& value) override {
    attributes_[key] = value;
-    mparam.contain_extra_attrs = 1;
+    mparam_.contain_extra_attrs = 1;
  }

  bool GetAttr(const std::string& key, std::string* out) const override {
@@ -438,7 +438,7 @@ class LearnerImpl : public Learner {
    this->PredictRaw(data, &preds_);
    obj_->EvalTransform(&preds_);
    return std::make_pair(metric,
-                          ev->Eval(preds_.data_h(), data->info(), tparam.dsplit == 2));
+                          ev->Eval(preds_.HostVector(), data->Info(), tparam_.dsplit == 2));
  }

  void Predict(DMatrix* data, bool output_margin,
@@ -446,12 +446,12 @@ class LearnerImpl : public Learner {
               bool pred_leaf, bool pred_contribs, bool approx_contribs,
               bool pred_interactions) const override {
    if (pred_contribs) {
-      gbm_->PredictContribution(data, &out_preds->data_h(), ntree_limit, approx_contribs);
+      gbm_->PredictContribution(data, &out_preds->HostVector(), ntree_limit, approx_contribs);
    } else if (pred_interactions) {
-      gbm_->PredictInteractionContributions(data, &out_preds->data_h(), ntree_limit,
+      gbm_->PredictInteractionContributions(data, &out_preds->HostVector(), ntree_limit,
                                            approx_contribs);
    } else if (pred_leaf) {
-      gbm_->PredictLeaf(data, &out_preds->data_h(), ntree_limit);
+      gbm_->PredictLeaf(data, &out_preds->HostVector(), ntree_limit);
    } else {
      this->PredictRaw(data, out_preds, ntree_limit);
      if (!output_margin) {
@@ -464,21 +464,21 @@ class LearnerImpl : public Learner {
  // check if p_train is ready to used by training.
  // if not, initialize the column access.
  inline void LazyInitDMatrix(DMatrix* p_train) {
-    if (tparam.tree_method == 3 || tparam.tree_method == 4 ||
-        tparam.tree_method == 5 || name_gbm_ == "gblinear") {
+    if (tparam_.tree_method == 3 || tparam_.tree_method == 4 ||
+        tparam_.tree_method == 5 || name_gbm_ == "gblinear") {
      return;
    }

-    monitor.Start("LazyInitDMatrix");
+    monitor_.Start("LazyInitDMatrix");
    if (!p_train->HaveColAccess(true)) {
-      int ncol = static_cast<int>(p_train->info().num_col);
+      auto ncol = static_cast<int>(p_train->Info().num_col_);
      std::vector<bool> enabled(ncol, true);
      // set max row per batch to limited value
      // in distributed mode, use safe choice otherwise
-      size_t max_row_perbatch = tparam.max_row_perbatch;
-      const size_t safe_max_row = static_cast<size_t>(32ul << 10ul);
+      size_t max_row_perbatch = tparam_.max_row_perbatch;
+      const auto safe_max_row = static_cast<size_t>(32ul << 10ul);

-      if (tparam.tree_method == 0 && p_train->info().num_row >= (4UL << 20UL)) {
+      if (tparam_.tree_method == 0 && p_train->Info().num_row_ >= (4UL << 20UL)) {
        LOG(CONSOLE)
            << "Tree method is automatically selected to be \'approx\'"
            << " for faster speed."
@@ -487,57 +487,57 @@ class LearnerImpl : public Learner {
        max_row_perbatch = std::min(max_row_perbatch, safe_max_row);
      }

-      if (tparam.tree_method == 1) {
+      if (tparam_.tree_method == 1) {
        LOG(CONSOLE) << "Tree method is selected to be \'approx\'";
        max_row_perbatch = std::min(max_row_perbatch, safe_max_row);
      }

-      if (tparam.test_flag == "block" || tparam.dsplit == 2) {
+      if (tparam_.test_flag == "block" || tparam_.dsplit == 2) {
        max_row_perbatch = std::min(max_row_perbatch, safe_max_row);
      }
      // initialize column access
-      p_train->InitColAccess(enabled, tparam.prob_buffer_row, max_row_perbatch, true);
+      p_train->InitColAccess(enabled, tparam_.prob_buffer_row, max_row_perbatch, true);
    }

    if (!p_train->SingleColBlock() && cfg_.count("updater") == 0) {
-      if (tparam.tree_method == 2) {
+      if (tparam_.tree_method == 2) {
        LOG(CONSOLE) << "tree method is set to be 'exact',"
                     << " but currently we are only able to proceed with "
                        "approximate algorithm";
      }
      cfg_["updater"] = "grow_histmaker,prune";
-      if (gbm_.get() != nullptr) {
+      if (gbm_ != nullptr) {
        gbm_->Configure(cfg_.begin(), cfg_.end());
      }
    }
-    monitor.Stop("LazyInitDMatrix");
+    monitor_.Stop("LazyInitDMatrix");
  }

  // return whether model is already initialized.
-  inline bool ModelInitialized() const { return gbm_.get() != nullptr; }
+  inline bool ModelInitialized() const { return gbm_ != nullptr; }
  // lazily initialize the model if it haven't yet been initialized.
  inline void LazyInitModel() {
    if (this->ModelInitialized()) return;
    // estimate feature bound
    unsigned num_feature = 0;
-    for (size_t i = 0; i < cache_.size(); ++i) {
-      CHECK(cache_[i] != nullptr);
+    for (auto & matrix : cache_) {
+      CHECK(matrix != nullptr);
      num_feature = std::max(num_feature,
-                             static_cast<unsigned>(cache_[i]->info().num_col));
+                             static_cast<unsigned>(matrix->Info().num_col_));
    }
    // run allreduce on num_feature to find the maximum value
    rabit::Allreduce<rabit::op::Max>(&num_feature, 1);
-    if (num_feature > mparam.num_feature) {
-      mparam.num_feature = num_feature;
+    if (num_feature > mparam_.num_feature) {
+      mparam_.num_feature = num_feature;
    }
    // setup
-    cfg_["num_feature"] = common::ToString(mparam.num_feature);
-    CHECK(obj_.get() == nullptr && gbm_.get() == nullptr);
+    cfg_["num_feature"] = common::ToString(mparam_.num_feature);
+    CHECK(obj_ == nullptr && gbm_ == nullptr);
    obj_.reset(ObjFunction::Create(name_obj_));
    obj_->Configure(cfg_.begin(), cfg_.end());
    // reset the base score
-    mparam.base_score = obj_->ProbToMargin(mparam.base_score);
-    gbm_.reset(GradientBooster::Create(name_gbm_, cache_, mparam.base_score));
+    mparam_.base_score = obj_->ProbToMargin(mparam_.base_score);
+    gbm_.reset(GradientBooster::Create(name_gbm_, cache_, mparam_.base_score));
    gbm_->Configure(cfg_.begin(), cfg_.end());
  }
  /*!
@@ -549,15 +549,15 @@ class LearnerImpl : public Learner {
   */
  inline void PredictRaw(DMatrix* data, HostDeviceVector<bst_float>* out_preds,
                         unsigned ntree_limit = 0) const {
-    CHECK(gbm_.get() != nullptr)
+    CHECK(gbm_ != nullptr)
        << "Predict must happen after Load or InitModel";
    gbm_->PredictBatch(data, out_preds, ntree_limit);
  }

  // model parameter
-  LearnerModelParam mparam;
+  LearnerModelParam mparam_;
  // training parameter
-  LearnerTrainParam tparam;
+  LearnerTrainParam tparam_;
  // configurations
  std::map<std::string, std::string> cfg_;
  // attributes
@@ -569,7 +569,7 @@ class LearnerImpl : public Learner {
  // temporal storages for prediction
  HostDeviceVector<bst_float> preds_;
  // gradient pairs
-  HostDeviceVector<bst_gpair> gpair_;
+  HostDeviceVector<GradientPair> gpair_;

 private:
  /*! \brief random number transformation seed. */
@@ -577,7 +577,7 @@ class LearnerImpl : public Learner {
  // internal cached dmatrix
  std::vector<std::shared_ptr<DMatrix> > cache_;

-  common::Monitor monitor;
+  common::Monitor monitor_;
 };

 Learner* Learner::Create(
--- a/src/linear/coordinate_common.h
+++ b/src/linear/coordinate_common.h
@@ -62,14 +62,14 @@ inline double CoordinateDeltaBias(double sum_grad, double sum_hess) {
 * \return  The gradient and diagonal Hessian entry for a given feature.
 */
 inline std::pair<double, double> GetGradient(int group_idx, int num_group, int fidx,
-                                             const std::vector<bst_gpair> &gpair,
+                                             const std::vector<GradientPair> &gpair,
                                             DMatrix *p_fmat) {
  double sum_grad = 0.0, sum_hess = 0.0;
  dmlc::DataIter<ColBatch> *iter = p_fmat->ColIterator({static_cast<bst_uint>(fidx)});
  while (iter->Next()) {
    const ColBatch &batch = iter->Value();
    ColBatch::Inst col = batch[0];
-    const bst_omp_uint ndata = static_cast<bst_omp_uint>(col.length);
+    const auto ndata = static_cast<bst_omp_uint>(col.length);
    for (bst_omp_uint j = 0; j < ndata; ++j) {
      const bst_float v = col[j].fvalue;
      auto &p = gpair[col[j].index * num_group + group_idx];
@@ -93,14 +93,14 @@ inline std::pair<double, double> GetGradient(int group_idx, int num_group, int f
 * \return  The gradient and diagonal Hessian entry for a given feature.
 */
 inline std::pair<double, double> GetGradientParallel(int group_idx, int num_group, int fidx,
-                                                     const std::vector<bst_gpair> &gpair,
+                                                     const std::vector<GradientPair> &gpair,
                                                     DMatrix *p_fmat) {
  double sum_grad = 0.0, sum_hess = 0.0;
  dmlc::DataIter<ColBatch> *iter = p_fmat->ColIterator({static_cast<bst_uint>(fidx)});
  while (iter->Next()) {
    const ColBatch &batch = iter->Value();
    ColBatch::Inst col = batch[0];
-    const bst_omp_uint ndata = static_cast<bst_omp_uint>(col.length);
+    const auto ndata = static_cast<bst_omp_uint>(col.length);
 #pragma omp parallel for schedule(static) reduction(+ : sum_grad, sum_hess)
    for (bst_omp_uint j = 0; j < ndata; ++j) {
      const bst_float v = col[j].fvalue;
@@ -124,11 +124,11 @@ inline std::pair<double, double> GetGradientParallel(int group_idx, int num_grou
 * \return  The gradient and diagonal Hessian entry for the bias.
 */
 inline std::pair<double, double> GetBiasGradientParallel(int group_idx, int num_group,
-                                                         const std::vector<bst_gpair> &gpair,
+                                                         const std::vector<GradientPair> &gpair,
                                                         DMatrix *p_fmat) {
-  const RowSet &rowset = p_fmat->buffered_rowset();
+  const RowSet &rowset = p_fmat->BufferedRowset();
  double sum_grad = 0.0, sum_hess = 0.0;
-  const bst_omp_uint ndata = static_cast<bst_omp_uint>(rowset.size());
+  const auto ndata = static_cast<bst_omp_uint>(rowset.Size());
 #pragma omp parallel for schedule(static) reduction(+ : sum_grad, sum_hess)
  for (bst_omp_uint i = 0; i < ndata; ++i) {
    auto &p = gpair[rowset[i] * num_group + group_idx];
@@ -151,7 +151,7 @@ inline std::pair<double, double> GetBiasGradientParallel(int group_idx, int num_
 * \param p_fmat    The input feature matrix.
 */
 inline void UpdateResidualParallel(int fidx, int group_idx, int num_group,
-                                   float dw, std::vector<bst_gpair> *in_gpair,
+                                   float dw, std::vector<GradientPair> *in_gpair,
                                   DMatrix *p_fmat) {
  if (dw == 0.0f) return;
  dmlc::DataIter<ColBatch> *iter = p_fmat->ColIterator({static_cast<bst_uint>(fidx)});
@@ -159,12 +159,12 @@ inline void UpdateResidualParallel(int fidx, int group_idx, int num_group,
    const ColBatch &batch = iter->Value();
    ColBatch::Inst col = batch[0];
    // update grad value
-    const bst_omp_uint num_row = static_cast<bst_omp_uint>(col.length);
+    const auto num_row = static_cast<bst_omp_uint>(col.length);
 #pragma omp parallel for schedule(static)
    for (bst_omp_uint j = 0; j < num_row; ++j) {
-      bst_gpair &p = (*in_gpair)[col[j].index * num_group + group_idx];
+      GradientPair &p = (*in_gpair)[col[j].index * num_group + group_idx];
      if (p.GetHess() < 0.0f) continue;
-      p += bst_gpair(p.GetHess() * col[j].fvalue * dw, 0);
+      p += GradientPair(p.GetHess() * col[j].fvalue * dw, 0);
    }
  }
 }
@@ -179,16 +179,16 @@ inline void UpdateResidualParallel(int fidx, int group_idx, int num_group,
 * \param p_fmat    The input feature matrix.
 */
 inline void UpdateBiasResidualParallel(int group_idx, int num_group, float dbias,
-                                       std::vector<bst_gpair> *in_gpair,
+                                       std::vector<GradientPair> *in_gpair,
                                       DMatrix *p_fmat) {
  if (dbias == 0.0f) return;
-  const RowSet &rowset = p_fmat->buffered_rowset();
-  const bst_omp_uint ndata = static_cast<bst_omp_uint>(p_fmat->info().num_row);
+  const RowSet &rowset = p_fmat->BufferedRowset();
+  const auto ndata = static_cast<bst_omp_uint>(p_fmat->Info().num_row_);
 #pragma omp parallel for schedule(static)
  for (bst_omp_uint i = 0; i < ndata; ++i) {
-    bst_gpair &g = (*in_gpair)[rowset[i] * num_group + group_idx];
+    GradientPair &g = (*in_gpair)[rowset[i] * num_group + group_idx];
    if (g.GetHess() < 0.0f) continue;
-    g += bst_gpair(g.GetHess() * dbias, 0);
+    g += GradientPair(g.GetHess() * dbias, 0);
  }
 }

@@ -201,7 +201,7 @@ class FeatureSelector {
  /*! \brief factory method */
  static FeatureSelector *Create(int choice);
  /*! \brief virtual destructor */
-  virtual ~FeatureSelector() {}
+  virtual ~FeatureSelector() = default;
  /**
   * \brief Setting up the selector state prior to looping through features.
   *
@@ -213,7 +213,7 @@ class FeatureSelector {
   * \param param  A parameter with algorithm-dependent use.
   */
  virtual void Setup(const gbm::GBLinearModel &model,
-                     const std::vector<bst_gpair> &gpair,
+                     const std::vector<GradientPair> &gpair,
                     DMatrix *p_fmat,
                     float alpha, float lambda, int param) {}
  /**
@@ -232,7 +232,7 @@ class FeatureSelector {
  virtual int NextFeature(int iteration,
                          const gbm::GBLinearModel &model,
                          int group_idx,
-                          const std::vector<bst_gpair> &gpair,
+                          const std::vector<GradientPair> &gpair,
                          DMatrix *p_fmat, float alpha, float lambda) = 0;
 };

@@ -242,7 +242,7 @@ class FeatureSelector {
 class CyclicFeatureSelector : public FeatureSelector {
 public:
  int NextFeature(int iteration, const gbm::GBLinearModel &model,
-                  int group_idx, const std::vector<bst_gpair> &gpair,
+                  int group_idx, const std::vector<GradientPair> &gpair,
                  DMatrix *p_fmat, float alpha, float lambda) override {
    return iteration % model.param.num_feature;
  }
@@ -255,23 +255,23 @@ class CyclicFeatureSelector : public FeatureSelector {
 class ShuffleFeatureSelector : public FeatureSelector {
 public:
  void Setup(const gbm::GBLinearModel &model,
-             const std::vector<bst_gpair> &gpair,
+             const std::vector<GradientPair> &gpair,
             DMatrix *p_fmat, float alpha, float lambda, int param) override {
-    if (feat_index.size() == 0) {
-      feat_index.resize(model.param.num_feature);
-      std::iota(feat_index.begin(), feat_index.end(), 0);
+    if (feat_index_.size() == 0) {
+      feat_index_.resize(model.param.num_feature);
+      std::iota(feat_index_.begin(), feat_index_.end(), 0);
    }
-    std::shuffle(feat_index.begin(), feat_index.end(), common::GlobalRandom());
+    std::shuffle(feat_index_.begin(), feat_index_.end(), common::GlobalRandom());
  }

  int NextFeature(int iteration, const gbm::GBLinearModel &model,
-                  int group_idx, const std::vector<bst_gpair> &gpair,
+                  int group_idx, const std::vector<GradientPair> &gpair,
                  DMatrix *p_fmat, float alpha, float lambda) override {
-    return feat_index[iteration % model.param.num_feature];
+    return feat_index_[iteration % model.param.num_feature];
  }

 protected:
-  std::vector<bst_uint> feat_index;
+  std::vector<bst_uint> feat_index_;
 };

 /**
@@ -281,7 +281,7 @@ class ShuffleFeatureSelector : public FeatureSelector {
 class RandomFeatureSelector : public FeatureSelector {
 public:
  int NextFeature(int iteration, const gbm::GBLinearModel &model,
-                  int group_idx, const std::vector<bst_gpair> &gpair,
+                  int group_idx, const std::vector<GradientPair> &gpair,
                  DMatrix *p_fmat, float alpha, float lambda) override {
    return common::GlobalRandom()() % model.param.num_feature;
  }
@@ -299,32 +299,32 @@ class RandomFeatureSelector : public FeatureSelector {
 class GreedyFeatureSelector : public FeatureSelector {
 public:
  void Setup(const gbm::GBLinearModel &model,
-             const std::vector<bst_gpair> &gpair,
+             const std::vector<GradientPair> &gpair,
             DMatrix *p_fmat, float alpha, float lambda, int param) override {
-    top_k = static_cast<bst_uint>(param);
+    top_k_ = static_cast<bst_uint>(param);
    const bst_uint ngroup = model.param.num_output_group;
-    if (param <= 0) top_k = std::numeric_limits<bst_uint>::max();
-    if (counter.size() == 0) {
-      counter.resize(ngroup);
-      gpair_sums.resize(model.param.num_feature * ngroup);
+    if (param <= 0) top_k_ = std::numeric_limits<bst_uint>::max();
+    if (counter_.size() == 0) {
+      counter_.resize(ngroup);
+      gpair_sums_.resize(model.param.num_feature * ngroup);
    }
    for (bst_uint gid = 0u; gid < ngroup; ++gid) {
-      counter[gid] = 0u;
+      counter_[gid] = 0u;
    }
  }

  int NextFeature(int iteration, const gbm::GBLinearModel &model,
-                  int group_idx, const std::vector<bst_gpair> &gpair,
+                  int group_idx, const std::vector<GradientPair> &gpair,
                  DMatrix *p_fmat, float alpha, float lambda) override {
    // k-th selected feature for a group
-    auto k = counter[group_idx]++;
+    auto k = counter_[group_idx]++;
    // stop after either reaching top-K or going through all the features in a group
-    if (k >= top_k || counter[group_idx] == model.param.num_feature) return -1;
+    if (k >= top_k_ || counter_[group_idx] == model.param.num_feature) return -1;

    const int ngroup = model.param.num_output_group;
    const bst_omp_uint nfeat = model.param.num_feature;
    // Calculate univariate gradient sums
-    std::fill(gpair_sums.begin(), gpair_sums.end(), std::make_pair(0., 0.));
+    std::fill(gpair_sums_.begin(), gpair_sums_.end(), std::make_pair(0., 0.));
    dmlc::DataIter<ColBatch> *iter = p_fmat->ColIterator();
    while (iter->Next()) {
      const ColBatch &batch = iter->Value();
@@ -332,7 +332,7 @@ class GreedyFeatureSelector : public FeatureSelector {
      for (bst_omp_uint i = 0; i < nfeat; ++i) {
        const ColBatch::Inst col = batch[i];
        const bst_uint ndata = col.length;
-        auto &sums = gpair_sums[group_idx * nfeat + i];
+        auto &sums = gpair_sums_[group_idx * nfeat + i];
        for (bst_uint j = 0u; j < ndata; ++j) {
          const bst_float v = col[j].fvalue;
          auto &p = gpair[col[j].index * ngroup + group_idx];
@@ -346,7 +346,7 @@ class GreedyFeatureSelector : public FeatureSelector {
    int best_fidx = 0;
    double best_weight_update = 0.0f;
    for (bst_omp_uint fidx = 0; fidx < nfeat; ++fidx) {
-      auto &s = gpair_sums[group_idx * nfeat + fidx];
+      auto &s = gpair_sums_[group_idx * nfeat + fidx];
      float dw = std::abs(static_cast<bst_float>(
                 CoordinateDelta(s.first, s.second, model[fidx][group_idx], alpha, lambda)));
      if (dw > best_weight_update) {
@@ -358,9 +358,9 @@ class GreedyFeatureSelector : public FeatureSelector {
  }

 protected:
-  bst_uint top_k;
-  std::vector<bst_uint> counter;
-  std::vector<std::pair<double, double>> gpair_sums;
+  bst_uint top_k_;
+  std::vector<bst_uint> counter_;
+  std::vector<std::pair<double, double>> gpair_sums_;
 };

 /**
@@ -377,21 +377,21 @@ class GreedyFeatureSelector : public FeatureSelector {
 class ThriftyFeatureSelector : public FeatureSelector {
 public:
  void Setup(const gbm::GBLinearModel &model,
-             const std::vector<bst_gpair> &gpair,
+             const std::vector<GradientPair> &gpair,
             DMatrix *p_fmat, float alpha, float lambda, int param) override {
-    top_k = static_cast<bst_uint>(param);
-    if (param <= 0) top_k = std::numeric_limits<bst_uint>::max();
+    top_k_ = static_cast<bst_uint>(param);
+    if (param <= 0) top_k_ = std::numeric_limits<bst_uint>::max();
    const bst_uint ngroup = model.param.num_output_group;
    const bst_omp_uint nfeat = model.param.num_feature;

-    if (deltaw.size() == 0) {
-      deltaw.resize(nfeat * ngroup);
-      sorted_idx.resize(nfeat * ngroup);
-      counter.resize(ngroup);
-      gpair_sums.resize(nfeat * ngroup);
+    if (deltaw_.size() == 0) {
+      deltaw_.resize(nfeat * ngroup);
+      sorted_idx_.resize(nfeat * ngroup);
+      counter_.resize(ngroup);
+      gpair_sums_.resize(nfeat * ngroup);
    }
    // Calculate univariate gradient sums
-    std::fill(gpair_sums.begin(), gpair_sums.end(), std::make_pair(0., 0.));
+    std::fill(gpair_sums_.begin(), gpair_sums_.end(), std::make_pair(0., 0.));
    dmlc::DataIter<ColBatch> *iter = p_fmat->ColIterator();
    while (iter->Next()) {
      const ColBatch &batch = iter->Value();
@@ -401,7 +401,7 @@ class ThriftyFeatureSelector : public FeatureSelector {
        const ColBatch::Inst col = batch[i];
        const bst_uint ndata = col.length;
        for (bst_uint gid = 0u; gid < ngroup; ++gid) {
-          auto &sums = gpair_sums[gid * nfeat + i];
+          auto &sums = gpair_sums_[gid * nfeat + i];
          for (bst_uint j = 0u; j < ndata; ++j) {
            const bst_float v = col[j].fvalue;
            auto &p = gpair[col[j].index * ngroup + gid];
@@ -413,45 +413,45 @@ class ThriftyFeatureSelector : public FeatureSelector {
      }
    }
    // rank by descending weight magnitude within the groups
-    std::fill(deltaw.begin(), deltaw.end(), 0.f);
-    std::iota(sorted_idx.begin(), sorted_idx.end(), 0);
-    bst_float *pdeltaw = &deltaw[0];
+    std::fill(deltaw_.begin(), deltaw_.end(), 0.f);
+    std::iota(sorted_idx_.begin(), sorted_idx_.end(), 0);
+    bst_float *pdeltaw = &deltaw_[0];
    for (bst_uint gid = 0u; gid < ngroup; ++gid) {
      // Calculate univariate weight changes
      for (bst_omp_uint i = 0; i < nfeat; ++i) {
        auto ii = gid * nfeat + i;
-        auto &s = gpair_sums[ii];
-        deltaw[ii] = static_cast<bst_float>(CoordinateDelta(
+        auto &s = gpair_sums_[ii];
+        deltaw_[ii] = static_cast<bst_float>(CoordinateDelta(
                       s.first, s.second, model[i][gid], alpha, lambda));
      }
      // sort in descending order of deltaw abs values
-      auto start = sorted_idx.begin() + gid * nfeat;
+      auto start = sorted_idx_.begin() + gid * nfeat;
      std::sort(start, start + nfeat,
                [pdeltaw](size_t i, size_t j) {
                  return std::abs(*(pdeltaw + i)) > std::abs(*(pdeltaw + j));
                });
-      counter[gid] = 0u;
+      counter_[gid] = 0u;
    }
  }

  int NextFeature(int iteration, const gbm::GBLinearModel &model,
-                  int group_idx, const std::vector<bst_gpair> &gpair,
+                  int group_idx, const std::vector<GradientPair> &gpair,
                  DMatrix *p_fmat, float alpha, float lambda) override {
    // k-th selected feature for a group
-    auto k = counter[group_idx]++;
+    auto k = counter_[group_idx]++;
    // stop after either reaching top-N or going through all the features in a group
-    if (k >= top_k || counter[group_idx] == model.param.num_feature) return -1;
+    if (k >= top_k_ || counter_[group_idx] == model.param.num_feature) return -1;
    // note that sorted_idx stores the "long" indices
    const size_t grp_offset = group_idx * model.param.num_feature;
-    return static_cast<int>(sorted_idx[grp_offset + k] - grp_offset);
+    return static_cast<int>(sorted_idx_[grp_offset + k] - grp_offset);
  }

 protected:
-  bst_uint top_k;
-  std::vector<bst_float> deltaw;
-  std::vector<size_t> sorted_idx;
-  std::vector<bst_uint> counter;
-  std::vector<std::pair<double, double>> gpair_sums;
+  bst_uint top_k_;
+  std::vector<bst_float> deltaw_;
+  std::vector<size_t> sorted_idx_;
+  std::vector<bst_uint> counter_;
+  std::vector<std::pair<double, double>> gpair_sums_;
 };

 /**
--- a/src/linear/linear_updater.cc
+++ b/src/linear/linear_updater.cc
@@ -25,5 +25,8 @@ namespace linear {
 // List of files that will be force linked in static links.
 DMLC_REGISTRY_LINK_TAG(updater_shotgun);
 DMLC_REGISTRY_LINK_TAG(updater_coordinate);
+#ifdef XGBOOST_USE_CUDA
+DMLC_REGISTRY_LINK_TAG(updater_gpu_coordinate);
+#endif
 }  // namespace linear
 }  // namespace xgboost
--- a/src/linear/updater_coordinate.cc
+++ b/src/linear/updater_coordinate.cc
@@ -27,7 +27,7 @@ struct CoordinateTrainParam : public dmlc::Parameter<CoordinateTrainParam> {
  DMLC_DECLARE_PARAMETER(CoordinateTrainParam) {
    DMLC_DECLARE_FIELD(learning_rate)
        .set_lower_bound(0.0f)
-        .set_default(1.0f)
+        .set_default(0.5f)
        .describe("Learning rate of each update.");
    DMLC_DECLARE_FIELD(reg_lambda)
        .set_lower_bound(0.0f)
@@ -84,48 +84,46 @@ class CoordinateUpdater : public LinearUpdater {
    selector.reset(FeatureSelector::Create(param.feature_selector));
    monitor.Init("CoordinateUpdater", param.debug_verbose);
  }
-
-  void Update(std::vector<bst_gpair> *in_gpair, DMatrix *p_fmat,
+  void Update(HostDeviceVector<GradientPair> *in_gpair, DMatrix *p_fmat,
              gbm::GBLinearModel *model, double sum_instance_weight) override {
    param.DenormalizePenalties(sum_instance_weight);
    const int ngroup = model->param.num_output_group;
    // update bias
    for (int group_idx = 0; group_idx < ngroup; ++group_idx) {
-      auto grad = GetBiasGradientParallel(group_idx, ngroup, *in_gpair, p_fmat);
+      auto grad = GetBiasGradientParallel(group_idx, ngroup, in_gpair->HostVector(), p_fmat);
      auto dbias = static_cast<float>(param.learning_rate *
                                      CoordinateDeltaBias(grad.first, grad.second));
      model->bias()[group_idx] += dbias;
-      UpdateBiasResidualParallel(group_idx, ngroup, dbias, in_gpair, p_fmat);
+      UpdateBiasResidualParallel(group_idx, ngroup,
+                                 dbias, &in_gpair->HostVector(), p_fmat);
    }
    // prepare for updating the weights
-    selector->Setup(*model, *in_gpair, p_fmat, param.reg_alpha_denorm,
+    selector->Setup(*model, in_gpair->HostVector(), p_fmat, param.reg_alpha_denorm,
                    param.reg_lambda_denorm, param.top_k);
    // update weights
    for (int group_idx = 0; group_idx < ngroup; ++group_idx) {
      for (unsigned i = 0U; i < model->param.num_feature; i++) {
-        int fidx = selector->NextFeature(i, *model, group_idx, *in_gpair, p_fmat,
+        int fidx = selector->NextFeature(i, *model, group_idx, in_gpair->HostVector(), p_fmat,
                                         param.reg_alpha_denorm, param.reg_lambda_denorm);
        if (fidx < 0) break;
-        this->UpdateFeature(fidx, group_idx, in_gpair, p_fmat, model);
+        this->UpdateFeature(fidx, group_idx, &in_gpair->HostVector(), p_fmat, model);
      }
    }
+    monitor.Stop("UpdateFeature");
  }

-  inline void UpdateFeature(int fidx, int group_idx, std::vector<bst_gpair> *in_gpair,
+  inline void UpdateFeature(int fidx, int group_idx, std::vector<GradientPair> *in_gpair,
                            DMatrix *p_fmat, gbm::GBLinearModel *model) {
    const int ngroup = model->param.num_output_group;
    bst_float &w = (*model)[fidx][group_idx];
-    monitor.Start("GetGradientParallel");
-    auto gradient = GetGradientParallel(group_idx, ngroup, fidx, *in_gpair, p_fmat);
-    monitor.Stop("GetGradientParallel");
+    auto gradient =
+        GetGradientParallel(group_idx, ngroup, fidx, *in_gpair, p_fmat);
    auto dw = static_cast<float>(
        param.learning_rate *
        CoordinateDelta(gradient.first, gradient.second, w, param.reg_alpha_denorm,
                        param.reg_lambda_denorm));
    w += dw;
-    monitor.Start("UpdateResidualParallel");
    UpdateResidualParallel(fidx, group_idx, ngroup, dw, in_gpair, p_fmat);
-    monitor.Stop("UpdateResidualParallel");
  }

  // training parameter
--- a/src/linear/updater_gpu_coordinate.cu
+++ b/src/linear/updater_gpu_coordinate.cu
@@ -0,0 +1,346 @@
+/*!
+ * Copyright 2018 by Contributors
+ * \author Rory Mitchell
+ */
+
+#include <thrust/execution_policy.h>
+#include <thrust/inner_product.h>
+#include <xgboost/linear_updater.h>
+#include "../common/device_helpers.cuh"
+#include "../common/timer.h"
+#include "coordinate_common.h"
+
+namespace xgboost {
+namespace linear {
+
+DMLC_REGISTRY_FILE_TAG(updater_gpu_coordinate);
+
+// training parameter
+struct GPUCoordinateTrainParam
+    : public dmlc::Parameter<GPUCoordinateTrainParam> {
+  /*! \brief learning_rate */
+  float learning_rate;
+  /*! \brief regularization weight for L2 norm */
+  float reg_lambda;
+  /*! \brief regularization weight for L1 norm */
+  float reg_alpha;
+  int feature_selector;
+  int top_k;
+  int debug_verbose;
+  int n_gpus;
+  int gpu_id;
+  bool silent;
+  // declare parameters
+  DMLC_DECLARE_PARAMETER(GPUCoordinateTrainParam) {
+    DMLC_DECLARE_FIELD(learning_rate)
+        .set_lower_bound(0.0f)
+        .set_default(1.0f)
+        .describe("Learning rate of each update.");
+    DMLC_DECLARE_FIELD(reg_lambda)
+        .set_lower_bound(0.0f)
+        .set_default(0.0f)
+        .describe("L2 regularization on weights.");
+    DMLC_DECLARE_FIELD(reg_alpha)
+        .set_lower_bound(0.0f)
+        .set_default(0.0f)
+        .describe("L1 regularization on weights.");
+    DMLC_DECLARE_FIELD(feature_selector)
+        .set_default(kCyclic)
+        .add_enum("cyclic", kCyclic)
+        .add_enum("shuffle", kShuffle)
+        .add_enum("thrifty", kThrifty)
+        .add_enum("greedy", kGreedy)
+        .add_enum("random", kRandom)
+        .describe("Feature selection or ordering method.");
+    DMLC_DECLARE_FIELD(top_k).set_lower_bound(0).set_default(0).describe(
+        "The number of top features to select in 'thrifty' feature_selector. "
+        "The value of zero means using all the features.");
+    DMLC_DECLARE_FIELD(debug_verbose)
+        .set_lower_bound(0)
+        .set_default(0)
+        .describe("flag to print out detailed breakdown of runtime");
+    DMLC_DECLARE_FIELD(n_gpus).set_default(1).describe(
+        "Number of devices to use.");
+    DMLC_DECLARE_FIELD(gpu_id).set_default(0).describe(
+        "Primary device ordinal.");
+    DMLC_DECLARE_FIELD(silent).set_default(false).describe(
+        "Do not print information during trainig.");
+    // alias of parameters
+    DMLC_DECLARE_ALIAS(learning_rate, eta);
+    DMLC_DECLARE_ALIAS(reg_lambda, lambda);
+    DMLC_DECLARE_ALIAS(reg_alpha, alpha);
+  }
+  /*! \brief Denormalizes the regularization penalties - to be called at each
+   * update */
+  void DenormalizePenalties(double sum_instance_weight) {
+    reg_lambda_denorm = reg_lambda * sum_instance_weight;
+    reg_alpha_denorm = reg_alpha * sum_instance_weight;
+  }
+  // denormalizated regularization penalties
+  float reg_lambda_denorm;
+  float reg_alpha_denorm;
+};
+
+void RescaleIndices(size_t ridx_begin, dh::DVec<SparseBatch::Entry> *data) {
+  auto d_data = data->Data();
+  dh::LaunchN(data->DeviceIdx(), data->Size(),
+              [=] __device__(size_t idx) { d_data[idx].index -= ridx_begin; });
+}
+
+class DeviceShard {
+  int device_idx_;
+  int normalised_device_idx_;  // Device index counting from param.gpu_id
+  dh::BulkAllocator<dh::MemoryType::kDevice> ba_;
+  std::vector<size_t> row_ptr_;
+  dh::DVec<SparseBatch::Entry> data_;
+  dh::DVec<GradientPair> gpair_;
+  dh::CubMemory temp_;
+  size_t ridx_begin_;
+  size_t ridx_end_;
+
+ public:
+  DeviceShard(int device_idx, int normalised_device_idx, const ColBatch &batch,
+              bst_uint row_begin, bst_uint row_end,
+              const GPUCoordinateTrainParam &param,
+              const gbm::GBLinearModelParam &model_param)
+      : device_idx_(device_idx),
+        normalised_device_idx_(normalised_device_idx),
+        ridx_begin_(row_begin),
+        ridx_end_(row_end) {
+    dh::safe_cuda(cudaSetDevice(device_idx));
+    // The begin and end indices for the section of each column associated with
+    // this shard
+    std::vector<std::pair<bst_uint, bst_uint>> column_segments;
+    row_ptr_ = {0};
+    for (auto fidx = 0; fidx < batch.size; fidx++) {
+      auto col = batch[fidx];
+      auto cmp = [](SparseBatch::Entry e1, SparseBatch::Entry e2) {
+        return e1.index < e2.index;
+      };
+      auto column_begin =
+          std::lower_bound(col.data, col.data + col.length,
+                           SparseBatch::Entry(row_begin, 0.0f), cmp);
+      auto column_end =
+          std::upper_bound(col.data, col.data + col.length,
+                           SparseBatch::Entry(row_end, 0.0f), cmp);
+      column_segments.push_back(
+          std::make_pair(column_begin - col.data, column_end - col.data));
+      row_ptr_.push_back(row_ptr_.back() + column_end - column_begin);
+    }
+    ba_.Allocate(device_idx, param.silent, &data_, row_ptr_.back(), &gpair_,
+                (row_end - row_begin) * model_param.num_output_group);
+
+    for (int fidx = 0; fidx < batch.size; fidx++) {
+      ColBatch::Inst col = batch[fidx];
+      thrust::copy(col.data + column_segments[fidx].first,
+                   col.data + column_segments[fidx].second,
+                   data_.tbegin() + row_ptr_[fidx]);
+    }
+    // Rescale indices with respect to current shard
+    RescaleIndices(ridx_begin_, &data_);
+  }
+  void UpdateGpair(const std::vector<GradientPair> &host_gpair,
+                   const gbm::GBLinearModelParam &model_param) {
+    gpair_.copy(host_gpair.begin() + ridx_begin_ * model_param.num_output_group,
+               host_gpair.begin() + ridx_end_ * model_param.num_output_group);
+  }
+
+  GradientPair GetBiasGradient(int group_idx, int num_group) {
+    auto counting = thrust::make_counting_iterator(0ull);
+    auto f = [=] __device__(size_t idx) {
+      return idx * num_group + group_idx;
+    };  // NOLINT
+    thrust::transform_iterator<decltype(f), decltype(counting), size_t> skip(
+        counting, f);
+    auto perm = thrust::make_permutation_iterator(gpair_.tbegin(), skip);
+
+    return dh::SumReduction(temp_, perm, ridx_end_ - ridx_begin_);
+  }
+
+  void UpdateBiasResidual(float dbias, int group_idx, int num_groups) {
+    if (dbias == 0.0f) return;
+    auto d_gpair = gpair_.Data();
+    dh::LaunchN(device_idx_, ridx_end_ - ridx_begin_, [=] __device__(size_t idx) {
+      auto &g = d_gpair[idx * num_groups + group_idx];
+      g += GradientPair(g.GetHess() * dbias, 0);
+    });
+  }
+
+  GradientPair GetGradient(int group_idx, int num_group, int fidx) {
+    auto d_col = data_.Data() + row_ptr_[fidx];
+    size_t col_size = row_ptr_[fidx + 1] - row_ptr_[fidx];
+    auto d_gpair = gpair_.Data();
+    auto counting = thrust::make_counting_iterator(0ull);
+    auto f = [=] __device__(size_t idx) {
+      auto entry = d_col[idx];
+      auto g = d_gpair[entry.index * num_group + group_idx];
+      return GradientPair(g.GetGrad() * entry.fvalue,
+                          g.GetHess() * entry.fvalue * entry.fvalue);
+    };  // NOLINT
+    thrust::transform_iterator<decltype(f), decltype(counting), GradientPair>
+        multiply_iterator(counting, f);
+    return dh::SumReduction(temp_, multiply_iterator, col_size);
+  }
+
+  void UpdateResidual(float dw, int group_idx, int num_groups, int fidx) {
+    auto d_gpair = gpair_.Data();
+    auto d_col = data_.Data() + row_ptr_[fidx];
+    size_t col_size = row_ptr_[fidx + 1] - row_ptr_[fidx];
+    dh::LaunchN(device_idx_, col_size, [=] __device__(size_t idx) {
+      auto entry = d_col[idx];
+      auto &g = d_gpair[entry.index * num_groups + group_idx];
+      g += GradientPair(g.GetHess() * dw * entry.fvalue, 0);
+    });
+  }
+};
+
+/**
+ * \class GPUCoordinateUpdater
+ *
+ * \brief Coordinate descent algorithm that updates one feature per iteration
+ */
+
+class GPUCoordinateUpdater : public LinearUpdater {
+ public:
+  // set training parameter
+  void Init(
+      const std::vector<std::pair<std::string, std::string>> &args) override {
+    param.InitAllowUnknown(args);
+    selector.reset(FeatureSelector::Create(param.feature_selector));
+    monitor.Init("GPUCoordinateUpdater", param.debug_verbose);
+  }
+
+  void LazyInitShards(DMatrix *p_fmat,
+                      const gbm::GBLinearModelParam &model_param) {
+    if (!shards.empty()) return;
+    int n_devices = dh::NDevices(param.n_gpus, p_fmat->Info().num_row_);
+    bst_uint row_begin = 0;
+    bst_uint shard_size =
+        std::ceil(static_cast<double>(p_fmat->Info().num_row_) / n_devices);
+
+    device_list.resize(n_devices);
+    for (int d_idx = 0; d_idx < n_devices; ++d_idx) {
+      int device_idx = (param.gpu_id + d_idx) % dh::NVisibleDevices();
+      device_list[d_idx] = device_idx;
+    }
+    // Partition input matrix into row segments
+    std::vector<size_t> row_segments;
+    row_segments.push_back(0);
+    for (int d_idx = 0; d_idx < n_devices; ++d_idx) {
+      bst_uint row_end = std::min(static_cast<size_t>(row_begin + shard_size),
+                                  p_fmat->Info().num_row_);
+      row_segments.push_back(row_end);
+      row_begin = row_end;
+    }
+
+    dmlc::DataIter<ColBatch> *iter = p_fmat->ColIterator();
+    CHECK(p_fmat->SingleColBlock());
+    iter->Next();
+    auto batch = iter->Value();
+
+    shards.resize(n_devices);
+    // Create device shards
+    dh::ExecuteShards(&shards, [&](std::unique_ptr<DeviceShard> &shard) {
+      auto idx = &shard - &shards[0];
+      shard = std::unique_ptr<DeviceShard>(
+          new DeviceShard(device_list[idx], idx, batch, row_segments[idx],
+                          row_segments[idx + 1], param, model_param));
+    });
+  }
+  void Update(HostDeviceVector<GradientPair> *in_gpair, DMatrix *p_fmat,
+              gbm::GBLinearModel *model, double sum_instance_weight) override {
+    param.DenormalizePenalties(sum_instance_weight);
+    monitor.Start("LazyInitShards");
+    this->LazyInitShards(p_fmat, model->param);
+    monitor.Stop("LazyInitShards");
+
+    monitor.Start("UpdateGpair");
+    // Update gpair
+    dh::ExecuteShards(&shards, [&](std::unique_ptr<DeviceShard> &shard) {
+      shard->UpdateGpair(in_gpair->HostVector(), model->param);
+    });
+    monitor.Stop("UpdateGpair");
+
+    monitor.Start("UpdateBias");
+    this->UpdateBias(p_fmat, model);
+    monitor.Stop("UpdateBias");
+    // prepare for updating the weights
+    selector->Setup(*model, in_gpair->HostVector(), p_fmat,
+                    param.reg_alpha_denorm, param.reg_lambda_denorm,
+                    param.top_k);
+    monitor.Start("UpdateFeature");
+    for (auto group_idx = 0; group_idx < model->param.num_output_group;
+         ++group_idx) {
+      for (auto i = 0U; i < model->param.num_feature; i++) {
+        auto fidx = selector->NextFeature(
+            i, *model, group_idx, in_gpair->HostVector(), p_fmat,
+            param.reg_alpha_denorm, param.reg_lambda_denorm);
+        if (fidx < 0) break;
+        this->UpdateFeature(fidx, group_idx, &in_gpair->HostVector(), model);
+      }
+    }
+    monitor.Stop("UpdateFeature");
+  }
+
+  void UpdateBias(DMatrix *p_fmat, gbm::GBLinearModel *model) {
+    for (int group_idx = 0; group_idx < model->param.num_output_group;
+         ++group_idx) {
+      // Get gradient
+      auto grad = dh::ReduceShards<GradientPair>(
+          &shards, [&](std::unique_ptr<DeviceShard> &shard) {
+            return shard->GetBiasGradient(group_idx,
+                                          model->param.num_output_group);
+          });
+
+      auto dbias = static_cast<float>(
+          param.learning_rate *
+          CoordinateDeltaBias(grad.GetGrad(), grad.GetHess()));
+      model->bias()[group_idx] += dbias;
+
+      // Update residual
+      dh::ExecuteShards(&shards, [&](std::unique_ptr<DeviceShard> &shard) {
+        shard->UpdateBiasResidual(dbias, group_idx,
+                                  model->param.num_output_group);
+      });
+    }
+  }
+
+  void UpdateFeature(int fidx, int group_idx,
+                     std::vector<GradientPair> *in_gpair,
+                     gbm::GBLinearModel *model) {
+    bst_float &w = (*model)[fidx][group_idx];
+    // Get gradient
+    auto grad = dh::ReduceShards<GradientPair>(
+        &shards, [&](std::unique_ptr<DeviceShard> &shard) {
+          return shard->GetGradient(group_idx, model->param.num_output_group,
+                                    fidx);
+        });
+
+    auto dw = static_cast<float>(param.learning_rate *
+                                 CoordinateDelta(grad.GetGrad(), grad.GetHess(),
+                                                 w, param.reg_alpha_denorm,
+                                                 param.reg_lambda_denorm));
+    w += dw;
+
+    dh::ExecuteShards(&shards, [&](std::unique_ptr<DeviceShard> &shard) {
+      shard->UpdateResidual(dw, group_idx, model->param.num_output_group, fidx);
+    });
+  }
+
+  // training parameter
+  GPUCoordinateTrainParam param;
+  std::unique_ptr<FeatureSelector> selector;
+  common::Monitor monitor;
+
+  std::vector<std::unique_ptr<DeviceShard>> shards;
+  std::vector<int> device_list;
+};
+
+DMLC_REGISTER_PARAMETER(GPUCoordinateTrainParam);
+XGBOOST_REGISTER_LINEAR_UPDATER(GPUCoordinateUpdater, "gpu_coord_descent")
+    .describe(
+        "Update linear model according to coordinate descent algorithm. GPU "
+        "accelerated.")
+    .set_body([]() { return new GPUCoordinateUpdater(); });
+}  // namespace linear
+}  // namespace xgboost
--- a/src/linear/updater_shotgun.cc
+++ b/src/linear/updater_shotgun.cc
@@ -58,59 +58,59 @@ class ShotgunUpdater : public LinearUpdater {
 public:
  // set training parameter
  void Init(const std::vector<std::pair<std::string, std::string> > &args) override {
-    param.InitAllowUnknown(args);
-    selector.reset(FeatureSelector::Create(param.feature_selector));
+    param_.InitAllowUnknown(args);
+    selector_.reset(FeatureSelector::Create(param_.feature_selector));
  }
-
-  void Update(std::vector<bst_gpair> *in_gpair, DMatrix *p_fmat,
+  void Update(HostDeviceVector<GradientPair> *in_gpair, DMatrix *p_fmat,
              gbm::GBLinearModel *model, double sum_instance_weight) override {
-    param.DenormalizePenalties(sum_instance_weight);
-    std::vector<bst_gpair> &gpair = *in_gpair;
+    std::vector<GradientPair> &gpair = in_gpair->HostVector();
+    param_.DenormalizePenalties(sum_instance_weight);
    const int ngroup = model->param.num_output_group;

    // update bias
    for (int gid = 0; gid < ngroup; ++gid) {
-      auto grad = GetBiasGradientParallel(gid, ngroup, *in_gpair, p_fmat);
-      auto dbias = static_cast<bst_float>(param.learning_rate *
+      auto grad = GetBiasGradientParallel(gid, ngroup, in_gpair->HostVector(), p_fmat);
+      auto dbias = static_cast<bst_float>(param_.learning_rate *
                               CoordinateDeltaBias(grad.first, grad.second));
      model->bias()[gid] += dbias;
-      UpdateBiasResidualParallel(gid, ngroup, dbias, in_gpair, p_fmat);
+      UpdateBiasResidualParallel(gid, ngroup, dbias, &in_gpair->HostVector(), p_fmat);
    }

    // lock-free parallel updates of weights
-    selector->Setup(*model, *in_gpair, p_fmat, param.reg_alpha_denorm, param.reg_lambda_denorm, 0);
+    selector_->Setup(*model, in_gpair->HostVector(), p_fmat,
+                     param_.reg_alpha_denorm, param_.reg_lambda_denorm, 0);
    dmlc::DataIter<ColBatch> *iter = p_fmat->ColIterator();
    while (iter->Next()) {
      const ColBatch &batch = iter->Value();
-      const bst_omp_uint nfeat = static_cast<bst_omp_uint>(batch.size);
+      const auto nfeat = static_cast<bst_omp_uint>(batch.size);
 #pragma omp parallel for schedule(static)
      for (bst_omp_uint i = 0; i < nfeat; ++i) {
-        int ii = selector->NextFeature(i, *model, 0, *in_gpair, p_fmat,
-                                       param.reg_alpha_denorm, param.reg_lambda_denorm);
+        int ii = selector_->NextFeature(i, *model, 0, in_gpair->HostVector(), p_fmat,
+                                       param_.reg_alpha_denorm, param_.reg_lambda_denorm);
        if (ii < 0) continue;
        const bst_uint fid = batch.col_index[ii];
        ColBatch::Inst col = batch[ii];
        for (int gid = 0; gid < ngroup; ++gid) {
          double sum_grad = 0.0, sum_hess = 0.0;
          for (bst_uint j = 0; j < col.length; ++j) {
-            bst_gpair &p = gpair[col[j].index * ngroup + gid];
+            GradientPair &p = gpair[col[j].index * ngroup + gid];
            if (p.GetHess() < 0.0f) continue;
            const bst_float v = col[j].fvalue;
            sum_grad += p.GetGrad() * v;
            sum_hess += p.GetHess() * v * v;
          }
          bst_float &w = (*model)[fid][gid];
-          bst_float dw = static_cast<bst_float>(
-              param.learning_rate *
-              CoordinateDelta(sum_grad, sum_hess, w, param.reg_alpha_denorm,
-                              param.reg_lambda_denorm));
+          auto dw = static_cast<bst_float>(
+              param_.learning_rate *
+              CoordinateDelta(sum_grad, sum_hess, w, param_.reg_alpha_denorm,
+                              param_.reg_lambda_denorm));
          if (dw == 0.f) continue;
          w += dw;
          // update grad values
          for (bst_uint j = 0; j < col.length; ++j) {
-            bst_gpair &p = gpair[col[j].index * ngroup + gid];
+            GradientPair &p = gpair[col[j].index * ngroup + gid];
            if (p.GetHess() < 0.0f) continue;
-            p += bst_gpair(p.GetHess() * col[j].fvalue * dw, 0);
+            p += GradientPair(p.GetHess() * col[j].fvalue * dw, 0);
          }
        }
      }
@@ -119,9 +119,9 @@ class ShotgunUpdater : public LinearUpdater {

 protected:
  // training parameters
-  ShotgunTrainParam param;
+  ShotgunTrainParam param_;

-  std::unique_ptr<FeatureSelector> selector;
+  std::unique_ptr<FeatureSelector> selector_;
 };

 DMLC_REGISTER_PARAMETER(ShotgunTrainParam);
--- a/src/metric/elementwise_metric.cc
+++ b/src/metric/elementwise_metric.cc
@@ -24,16 +24,16 @@ struct EvalEWiseBase : public Metric {
  bst_float Eval(const std::vector<bst_float>& preds,
                 const MetaInfo& info,
                 bool distributed) const override {
-    CHECK_NE(info.labels.size(), 0U) << "label set cannot be empty";
-    CHECK_EQ(preds.size(), info.labels.size())
+    CHECK_NE(info.labels_.size(), 0U) << "label set cannot be empty";
+    CHECK_EQ(preds.size(), info.labels_.size())
        << "label and prediction size not match, "
        << "hint: use merror or mlogloss for multi-class classification";
-    const omp_ulong ndata = static_cast<omp_ulong>(info.labels.size());
+    const auto ndata = static_cast<omp_ulong>(info.labels_.size());
    double sum = 0.0, wsum = 0.0;
    #pragma omp parallel for reduction(+: sum, wsum) schedule(static)
    for (omp_ulong i = 0; i < ndata; ++i) {
      const bst_float wt = info.GetWeight(i);
-      sum += static_cast<const Derived*>(this)->EvalRow(info.labels[i], preds[i]) * wt;
+      sum += static_cast<const Derived*>(this)->EvalRow(info.labels_[i], preds[i]) * wt;
      wsum += wt;
    }
    double dat[2]; dat[0] = sum, dat[1] = wsum;
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Philip Hyunsu Cho	1214081f99	Release version 0.72 (#3337 )	2018-06-01 16:00:31 -07:00
Ryota Suzuki	b7cbec4d4b	Fix print.xgb.Booster for R (#3338 ) * Fix print.xgb.Booster valid_handle should be TRUE when x$handle is NOT null * Update xgb.Booster.R Modify is.null.handle to return TRUE for NULL handle	2018-05-29 11:44:55 -07:00
Kristian Gampong	a510e68dda	Add validate_features option for Booster predict (#3323 ) * Add validate_features option for Booster predict * Fix trailing whitespace in docstring	2018-05-29 11:40:49 -07:00
Yanbo Liang	b018ef104f	Remove output_margin from XGBClassifier.predict_proba argument list. (#3343 )	2018-05-28 10:30:21 -07:00
trivialfis	34aeee2961	Fix test_param.cc header path (#3317 )	2018-05-28 10:26:29 -07:00
Dave Challis	8efbadcde4	Point rabit submodule at latest commit from master. (#3330 )	2018-05-28 10:21:10 -07:00
pdavalo	480e3fd764	Sklearn: validation set weights (#2354 ) * Add option to use weights when evaluating metrics in validation sets * Add test for validation-set weights functionality * simplify case with no weights for test sets * fix lint issues	2018-05-23 17:06:20 -07:00
Philip Hyunsu Cho	71e226120a	For CRAN submission, remove all #pragma's that suppress compiler warnings (#3329 ) * For CRAN submission, remove all #pragma's that suppress compiler warnings A few headers in dmlc-core contain #pragma's that disable compiler warnings, which is against the CRAN submission policy. Fix the problem by removing the offending #pragma's as part of the command `make Rbuild`. This addresses issue #3322. * Fix script to improve Cygwin/MSYS compatibility We need this to pass rmingw CI test * Remove remove_warning_suppression_pragma.sh from packaged tarball	2018-05-23 09:58:39 -07:00
Thejaswi	d367e4fc6b	Fix for issue 3306. (#3324 )	2018-05-23 13:42:20 +12:00
Sergei Lebedev	8f6aadd4b7	[jvm-packages] Fixed CheckpointManagerSuite for Scala 2.10 (#3332 ) As before, the compilation error is caused by mixing positional and labelled arguments.	2018-05-19 18:28:11 -07:00
Rory Mitchell	3ee725e3bb	Add cuda forwards compatibility (#3316 )	2018-05-17 10:59:22 +12:00
Rory Mitchell	f8b7686719	Add cuda 8/9.1 centos 6 builds, test GPU wheel on CPU only container. (#3309 ) * Add cuda 8/9.1 centos 6 builds, test GPU wheel on CPU only container. * Add Google test	2018-05-17 10:57:01 +12:00
Tong He	098075b81b	CRAN Submission for 0.71.1 (#3311 ) * fix for CRAN manual checks * fix for CRAN manual checks * pass local check * fix variable naming style * Adding Philip's record	2018-05-14 17:32:39 -07:00
Nan Zhu	49b9f39818	[jvm-packages] update xgboost4j cross build script to be compatible with older glibc (#3307 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * static glibc glibc++ * update to build with glib 2.12 * remove unsupported flags * update version number * remove properties * remove unnecessary command * update poms	2018-05-10 06:39:44 -07:00
Philip Hyunsu Cho	9a8211f668	Update dmlc-core submodule (#3221 ) * Update dmlc-core submodule * Fix dense_parser to work with the latest dmlc-core * Specify location of Google Test * Add more source files in dmlc-minimum to get latest dmlc-core working * Update dmlc-core submodule	2018-05-09 18:55:29 -07:00
mallniya	039dbe6aec	freebsd support in libpath.py (#3247 )	2018-05-09 16:13:30 -07:00
Clive Chan	0c0a78c255	Suggest git submodule update instead of delete + reclone (#3214 )	2018-05-09 14:39:17 -07:00
Will Storey	747381b520	Improve .gitignore patterns (#3184 ) * Adjust xgboost entries in .gitignore They were overly broad. In particularly this was inconvenient when working with tools such as fzf that use the .gitignore to decide what to include. As written, we'd not look into /include/xgboost. * Make cosmetic improvements to .gitignore * Remove dmlc-core from .gitignore This seems unnecessary and has the drawback that tools that use .gitignore to know files to skip mean they won't look here, and being able to inspect the submodule files with them is useful.	2018-05-09 14:31:59 -07:00
Samuel O. Ronsin	cc79a65ab9	Increase precision of bst_float values in tree dumps (#3298 ) * Increase precision of bst_float values in tree dumps * Increase precision of bst_float values in tree dumps * Fix lint error and switch precision to right float variable * Fix clang-tidy error	2018-05-09 14:12:21 -07:00
Brandon Greenwell	d13f1a0f16	Fix typo (#3305 )	2018-05-09 10:18:36 -07:00
Rory Mitchell	088bb4b27c	Prevent multiclass Hessian approaching 0 (#3304 ) * Prevent Hessian in multiclass objective becoming zero * Set default learning rate to 0.5 for "coord_descent" linear updater	2018-05-09 20:25:51 +12:00
Andrew V. Adinetz	b8a0d66fe6	Multi-GPU HostDeviceVector. (#3287 ) * Multi-GPU HostDeviceVector. - HostDeviceVector instances can now span multiple devices, defined by GPUSet struct - the interface of HostDeviceVector has been modified accordingly - GPU objective functions are now multi-GPU - GPU predicting from cache is now multi-GPU - avoiding omp_set_num_threads() calls - other minor changes	2018-05-05 08:00:05 +12:00
Rory Mitchell	90a5c4db9d	Update Jenkins CI for GPU (#3294 )	2018-05-04 16:50:59 +12:00
Thejaswi	c80d51ccb3	Fix issue #3264 , accuracy issues on k80 GPUs. (#3293 )	2018-05-04 13:14:08 +12:00
Nan Zhu	e1f57b4417	[jvm-packages] scripts to cross-build and deploy artifacts to github (#3276 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * cross building files * update * build with docker * remove * temp * update build script * update pom * update * update version * upload build * fix path * update README.md * fix compiler version to 4.8.5	2018-04-28 07:41:30 -07:00
Yanbo Liang	4850f67b85	Fix broken link for xgboost-spark example. (#3275 )	2018-04-26 06:45:01 -07:00
Thomas J. Leeper	c2b647f26e	fix typo in README (#3263 )	2018-04-22 09:24:38 -04:00
Nan Zhu	25b2919c44	[jvm-packages] change version of jvm to keep consistent with other pkgs (#3253 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * change version of jvm to keep consistent with other pkgs	2018-04-19 20:48:50 -07:00
Nan Zhu	d9dd485313	[jvm-packages] upgrade spark version to 2.3 (#3254 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * update default spark version to 2.3	2018-04-19 20:15:19 -07:00
Rory Mitchell	a185ddfe03	Implement GPU accelerated coordinate descent algorithm (#3178 ) * Implement GPU accelerated coordinate descent algorithm. * Exclude external memory tests for GPU	2018-04-20 14:56:35 +12:00
Rory Mitchell	ccf80703ef	Clang-tidy static analysis (#3222 ) * Clang-tidy static analysis * Modernise checks * Google coding standard checks * Identifier renaming according to Google style	2018-04-19 18:57:13 +12:00
Michal Josífko	3242b0a378	Update rabit submodule to latest version. (#3246 )	2018-04-19 13:58:09 +12:00
Philip Hyunsu Cho	842e28fdcd	Fix RMinGW build error: dependency 'data.table' not available (#3257 ) The R package dependency 'data.table' is apparently unavailable in Windows binary format, resulting into the following build errors: * https://ci.appveyor.com/project/tqchen/xgboost/build/1.0.1810/job/hhanvg0c2cqpn7bc * https://ci.appveyor.com/project/tqchen/xgboost/build/1.0.1811/job/hg65t9wb3rt1f5k8 Fix: use type='both' to fall back to source when binary is unavailable	2018-04-18 10:56:44 -07:00
@@ -1 +1 @@
 .71
 .72