Tests and documents for new JSON routines. (#5120)

2019-12-18 08:44:27 +08:00 · 2019-12-18 08:44:27 +08:00 · 27b3646d29
commit 27b3646d29
parent 63ffd2f686
5 changed files with 866 additions and 3 deletions
--- a/doc/tutorials/index.rst
+++ b/doc/tutorials/index.rst
@ -10,6 +10,7 @@ See `Awesome XGBoost <https://github.com/dmlc/xgboost/tree/master/demo>`_ for mo
  :caption: Contents:
  model
  saving_model
  Distributed XGBoost with AWS YARN <aws_yarn>
  kubernetes
  Distributed XGBoost with XGBoost4J-Spark <https://xgboost.readthedocs.io/en/latest/jvm/xgboost4j_spark_tutorial.html>
--- a/doc/tutorials/saving_model.rst
+++ b/doc/tutorials/saving_model.rst
@ -0,0 +1,195 @@
 ########################
 Introduction to Model IO
 ########################
 In XGBoost 1.0.0, we introduced experimental support of using `JSON
 <https://www.json.org/json-en.html>`_ for saving/loading XGBoost models and related
 hyper-parameters for training, aiming to replace the old binary internal format with an
 open format that can be easily reused.  The support for binary format will be continued in
 the future until JSON format is no-longer experimental and has satisfying performance.
 This tutorial aims to share some basic insights into the JSON serialisation method used in
 XGBoost.  Without explicitly mentioned, the following sections assume you are using the
 experimental JSON format, which can be enabled by passing
 ``enable_experimental_json_serialization=True`` as training parameter, or provide the file
 name with ``.json`` as file extension when saving/loading model:
 ``booster.save_model('model.json')``.  More details below.
 Before we get started, XGBoost is a gradient boosting library with focus on tree model,
 which means inside XGBoost, there are 2 distinct parts: the model consisted of trees and
 algorithms used to build it.  If you come from Deep Learning community, then it should be
 clear to you that there are differences between the neural network structures composed of
 weights with fixed tensor operations, and the optimizers (like RMSprop) used to train
 them.
 So when one calls ``booster.save_model``, XGBoost saves the trees, some model parameters
 like number of input columns in trained trees, and the objective function, which combined
 to represent the concept of "model" in XGBoost.  As for why are we saving the objective as
 part of model, that's because objective controls transformation of global bias (called
 ``base_score`` in XGBoost).  Users can share this model with others for prediction,
 evaluation or continue the training with a different set of hyper-parameters etc.
 However, this is not the end of story.  There are cases where we need to save something
 more than just the model itself.  For example, in distrbuted training, XGBoost performs
 checkpointing operation.  Or for some reasons, your favorite distributed computing
 framework decide to copy the model from one worker to another and continue the training in
 there.  In such cases, the serialisation output is required to contain enougth information
 to continue previous training without user providing any parameters again.  We consider
 such scenario as memory snapshot (or memory based serialisation method) and distinguish it
 with normal model IO operation.  In Python, this can be invoked by pickling the
 ``Booster`` object.  Other language bindings are still working in progress.
 .. note::
  The old binary format doesn't distinguish difference between model and raw memory
  serialisation format, it's a mix of everything, which is part of the reason why we want
  to replace it with a more robust serialisation method.  JVM Package has its own memory
  based serialisation methods.
 To enable JSON format support for model IO (saving only the trees and objective), provide
 a filename with ``.json`` as file extension:
 .. code-block:: python
  bst.save_model('model_file_name.json')
 While for enabling JSON as memory based serialisation format, pass
 ``enable_experimental_json_serialization`` as a training parameter.  In Python this can be
 done by:
 .. code-block:: python
  bst = xgboost.train({'enable_experimental_json_serialization': True}, dtrain)
  with open('filename', 'wb') as fd:
      pickle.dump(bst, fd)
 Notice the ``filename`` is for Python intrinsic function ``open``, not for XGBoost.  Hence
 parameter ``enable_experimental_json_serialization`` is required to enable JSON format.
 As the name suggested, memory based serialisation captures many stuffs internal to
 XGBoost, so it's only suitable to be used for checkpoints, which doesn't require stable
 output format.  That being said, loading pickled booster (memory snapshot) in a different
 XGBoost version may lead to errors or undefined behaviors.  But we promise the stable
 output format of binary model and JSON model (once it's no-longer experimental) as they
 are designed to be reusable.  This scheme fits as Python itself doesn't guarantee pickled
 bytecode can be used in different Python version.
 ***************************
 Custom objective and metric
 ***************************
 XGBoost accepts user provided objective and metric functions as an extension.  These
 functions are not saved in model file as they are language dependent feature.  With
 Python, user can pickle the model to include these functions in saved binary.  One
 drawback is, the output from pickle is not a stable serialization format and doesn't work
 on different Python version or XGBoost version, not to mention different language
 environment.  Another way to workaround this limitation is to provide these functions
 again after the model is loaded. If the customized function is useful, please consider
 making a PR for implementing it inside XGBoost, this way we can have your functions
 working with different language bindings.
 ********************************************************
 Saving and Loading the internal parameters configuration
 ********************************************************
 XGBoost's ``C API`` and ``Python API`` supports saving and loading the internal
 configuration directly as a JSON string.  In Python package:
 .. code-block:: python
  bst = xgboost.train(...)
  config = bst.save_config()
  print(config)
 Will print out something similiar to (not actual output as it's too long for demonstration):
 .. code-block:: json
    {
      "Learner": {
        "generic_parameter": {
          "enable_experimental_json_serialization": "0",
          "gpu_id": "0",
          "gpu_page_size": "0",
          "n_jobs": "0",
          "random_state": "0",
          "seed": "0",
          "seed_per_iteration": "0"
        },
        "gradient_booster": {
          "gbtree_train_param": {
            "num_parallel_tree": "1",
            "predictor": "gpu_predictor",
            "process_type": "default",
            "tree_method": "gpu_hist",
            "updater": "grow_gpu_hist",
            "updater_seq": "grow_gpu_hist"
          },
          "name": "gbtree",
          "updater": {
            "grow_gpu_hist": {
              "gpu_hist_train_param": {
                "debug_synchronize": "0",
                "gpu_batch_nrows": "0",
                "single_precision_histogram": "0"
              },
              "train_param": {
                "alpha": "0",
                "cache_opt": "1",
                "colsample_bylevel": "1",
                "colsample_bynode": "1",
                "colsample_bytree": "1",
                "default_direction": "learn",
                "enable_feature_grouping": "0",
                "eta": "0.300000012",
                "gamma": "0",
                "grow_policy": "depthwise",
                "interaction_constraints": "",
                "lambda": "1",
                "learning_rate": "0.300000012",
                "max_bin": "256",
                "max_conflict_rate": "0",
                "max_delta_step": "0",
                "max_depth": "6",
                "max_leaves": "0",
                "max_search_group": "100",
                "refresh_leaf": "1",
                "sketch_eps": "0.0299999993",
                "sketch_ratio": "2",
                "subsample": "1"
              }
            }
          }
        },
        "learner_train_param": {
          "booster": "gbtree",
          "disable_default_eval_metric": "0",
          "dsplit": "auto",
          "objective": "reg:squarederror"
        },
        "metrics": [],
        "objective": {
          "name": "reg:squarederror",
          "reg_loss_param": {
            "scale_pos_weight": "1"
          }
        }
      },
      "version": [1, 0, 0]
    }
 You can load it back to the model generated by same version of XGBoost by:
 .. code-block:: python
  bst.load_config(config)
 This way users can study the internal representation more closely.
 ************
 Future Plans
 ************
 Right now using the JSON format incurs longer serialisation time, we have been working on
 optimizing the JSON implementation to close the gap between binary format and JSON format.
 You can track the progress in `#5046 <https://github.com/dmlc/xgboost/pull/5046>`_.
 Another important item for JSON format support is a stable and documented `schema
 <https://json-schema.org/>`_, based on which one can easily reuse the saved model.
--- a/include/xgboost/c_api.h
+++ b/include/xgboost/c_api.h
@ -426,6 +426,24 @@ XGB_DLL int XGBoosterPredict(BoosterHandle handle,
                             unsigned ntree_limit,
                             bst_ulong *out_len,
                             const float **out_result);
 /*
 * Short note for serialization APIs.  There are 3 different sets of serialization API.
 *
 * - Functions with the term "Model" handles saving/loading XGBoost model like trees or
 *   linear weights.  Striping out parameters configuration like training algorithms or
 *   CUDA device ID helps user to reuse the trained model for different tasks, examples
 *   are prediction, training continuation or interpretation.
 *
 * - Functions with the term "Config" handles save/loading configuration.  It helps user
 *   to study the internal of XGBoost.  Also user can use the load method for specifying
 *   paramters in a structured way.  These functions are introduced in 1.0.0, and are not
 *   yet stable.
 *
 * - Functions with the term "Serialization" are combined of above two.  They are used in
 *   situations like check-pointing, or continuing training task in distributed
 *   environment.  In these cases the task must be carried out without any user
 *   intervention.
 */
 /*!
 * \brief Load model from existing file
@ -506,7 +524,10 @@ XGB_DLL int XGBoosterSaveRabitCheckpoint(BoosterHandle handle);
 /*!
- * \brief Save XGBoost's internal configuration into a JSON document.
+ * \brief Save XGBoost's internal configuration into a JSON document.  Currently the
 *        support is experimental, function signature may change in the future without
 *        notice.
 *
 * \param handle handle to Booster object.
 * \param out_str A valid pointer to array of characters.  The characters array is
 *                allocated and managed by XGBoost, while pointer to that array needs to
@ -516,7 +537,10 @@ XGB_DLL int XGBoosterSaveRabitCheckpoint(BoosterHandle handle);
 XGB_DLL int XGBoosterSaveJsonConfig(BoosterHandle handle, bst_ulong *out_len,
                                    char const **out_str);
 /*!
- * \brief Load XGBoost's internal configuration from a JSON document.
+ * \brief Load XGBoost's internal configuration from a JSON document.  Currently the
 *        support is experimental, function signature may change in the future without
 *        notice.
 *
 * \param handle handle to Booster object.
 * \param json_parameters string representation of a JSON document.
 * \return 0 when success, -1 when failure happens
--- a/src/learner.cc
+++ b/src/learner.cc
@ -472,7 +472,10 @@ class LearnerImpl : public Learner {
  }
  // Save model into binary format.  The code is about to be deprecated by more robust
-  // JSON serialization format.
+  // JSON serialization format.  This function is uneffected by
  // `enable_experimental_json_serialization` as user might enable this flag for pickle
  // while still want a binary output.  As we are progressing at replacing the binary
  // format, there's no need to put too much effort on it.
  void SaveModel(dmlc::Stream* fo) const override {
    LearnerModelParamLegacy mparam = mparam_;  // make a copy to potentially modify
    std::vector<std::pair<std::string, std::string> > extra_attr;
--- a/tests/cpp/test_serialization.cc
+++ b/tests/cpp/test_serialization.cc
@ -0,0 +1,640 @@
 #include <gtest/gtest.h>
 #include <dmlc/filesystem.h>
 #include <string>
 #include <xgboost/learner.h>
 #include <xgboost/data.h>
 #include <xgboost/base.h>
 #include "helpers.h"
 #include "../../src/common/io.h"
 #include "../../src/common/random.h"
 namespace xgboost {
 void TestLearnerSerialization(Args args, FeatureMap const& fmap, std::shared_ptr<DMatrix> p_dmat) {
  for (auto& batch : p_dmat->GetBatches<SparsePage>()) {
    batch.data.HostVector();
    batch.offset.HostVector();
  }
  int32_t constexpr kIters = 2;
  dmlc::TemporaryDirectory tempdir;
  std::string const fname = tempdir.path + "/model";
  std::vector<std::string> dumped_0;
  std::string model_at_kiter;
  {
    std::unique_ptr<dmlc::Stream> fo(dmlc::Stream::Create(fname.c_str(), "w"));
    std::unique_ptr<Learner> learner {Learner::Create({p_dmat})};
    learner->SetParams(args);
    for (int32_t iter = 0; iter < kIters; ++iter) {
      learner->UpdateOneIter(iter, p_dmat.get());
    }
    dumped_0 = learner->DumpModel(fmap, true, "json");
    learner->Save(fo.get());
    common::MemoryBufferStream mem_out(&model_at_kiter);
    learner->Save(&mem_out);
  }
  std::vector<std::string> dumped_1;
  {
    std::unique_ptr<dmlc::Stream> fi(dmlc::Stream::Create(fname.c_str(), "r"));
    std::unique_ptr<Learner> learner {Learner::Create({p_dmat})};
    learner->Load(fi.get());
    learner->Configure();
    dumped_1 = learner->DumpModel(fmap, true, "json");
  }
  ASSERT_EQ(dumped_0, dumped_1);
  std::string model_at_2kiter;
  // Test training continuation with data from host
  {
    std::string continued_model;
    {
      // Continue the previous training with another kIters
      std::unique_ptr<dmlc::Stream> fi(
          dmlc::Stream::Create(fname.c_str(), "r"));
      std::unique_ptr<Learner> learner{Learner::Create({p_dmat})};
      learner->Load(fi.get());
      learner->Configure();
      // verify the loaded model doesn't change.
      std::string serialised_model_tmp;
      common::MemoryBufferStream mem_out(&serialised_model_tmp);
      learner->Save(&mem_out);
      ASSERT_EQ(model_at_kiter, serialised_model_tmp);
      for (auto &batch : p_dmat->GetBatches<SparsePage>()) {
        batch.data.HostVector();
        batch.offset.HostVector();
      }
      for (int32_t iter = kIters; iter < 2 * kIters; ++iter) {
        learner->UpdateOneIter(iter, p_dmat.get());
      }
      common::MemoryBufferStream fo(&continued_model);
      learner->Save(&fo);
    }
    {
      // Train 2 * kIters in one go
      std::unique_ptr<Learner> learner{Learner::Create({p_dmat})};
      learner->SetParams(args);
      for (int32_t iter = 0; iter < 2 * kIters; ++iter) {
        learner->UpdateOneIter(iter, p_dmat.get());
        // Verify model is same at the same iteration during two training
        // sessions.
        if (iter == kIters - 1) {
          std::string reproduced_model;
          common::MemoryBufferStream fo(&reproduced_model);
          learner->Save(&fo);
          ASSERT_EQ(model_at_kiter, reproduced_model);
        }
      }
      common::MemoryBufferStream fo(&model_at_2kiter);
      learner->Save(&fo);
    }
    Json m_0 = Json::Load(StringView{continued_model.c_str(), continued_model.size()});
    Json m_1 = Json::Load(StringView{model_at_2kiter.c_str(), model_at_2kiter.size()});
    ASSERT_EQ(m_0, m_1);
  }
  // Test training continuation with data from device.
  {
    // Continue the previous training but on data from device.
    std::unique_ptr<dmlc::Stream> fi(dmlc::Stream::Create(fname.c_str(), "r"));
    std::unique_ptr<Learner> learner{Learner::Create({p_dmat})};
    learner->Load(fi.get());
    learner->Configure();
    // verify the loaded model doesn't change.
    std::string serialised_model_tmp;
    common::MemoryBufferStream mem_out(&serialised_model_tmp);
    learner->Save(&mem_out);
    ASSERT_EQ(model_at_kiter, serialised_model_tmp);
    learner->SetParam("gpu_id", "0");
    // Pull data to device
    for (auto &batch : p_dmat->GetBatches<SparsePage>()) {
      batch.data.SetDevice(0);
      batch.data.DeviceSpan();
      batch.offset.SetDevice(0);
      batch.offset.DeviceSpan();
    }
    for (int32_t iter = kIters; iter < 2 * kIters; ++iter) {
      learner->UpdateOneIter(iter, p_dmat.get());
    }
    serialised_model_tmp = std::string{};
    common::MemoryBufferStream fo(&serialised_model_tmp);
    learner->Save(&fo);
    Json m_0 = Json::Load(StringView{model_at_2kiter.c_str(), model_at_2kiter.size()});
    Json m_1 = Json::Load(StringView{serialised_model_tmp.c_str(), serialised_model_tmp.size()});
    // GPU ID is changed as data is coming from device.
    ASSERT_EQ(get<Object>(m_0["Config"]["learner"]["generic_param"]).erase("gpu_id"),
              get<Object>(m_1["Config"]["learner"]["generic_param"]).erase("gpu_id"));
  }
 }
 // Binary is not tested, as it is NOT reproducible.
 class SerializationTest : public ::testing::Test {
 protected:
  size_t constexpr static kRows = 10;
  size_t constexpr static kCols = 10;
  std::shared_ptr<DMatrix>* pp_dmat_;
  FeatureMap fmap_;
 protected:
  ~SerializationTest() override {
    delete pp_dmat_;
  }
  void SetUp() override {
    pp_dmat_ = CreateDMatrix(kRows, kCols, .5f);
    std::shared_ptr<DMatrix> p_dmat{*pp_dmat_};
    p_dmat->Info().labels_.Resize(kRows);
    auto &h_labels = p_dmat->Info().labels_.HostVector();
    xgboost::SimpleLCG gen(0);
    SimpleRealUniformDistribution<float> dis(0.0f, 1.0f);
    for (auto& v : h_labels) { v = dis(&gen); }
    for (size_t i = 0; i < kCols; ++i) {
      std::string name = "feat_" + std::to_string(i);
      fmap_.PushBack(i, name.c_str(), "q");
    }
  }
 };
 TEST_F(SerializationTest, Exact) {
  TestLearnerSerialization({{"booster", "gbtree"},
                            {"seed", "0"},
                            {"nthread", "1"},
                            {"max_depth", "2"},
                            {"enable_experimental_json_serialization", "1"},
                            {"tree_method", "exact"}},
                           fmap_, *pp_dmat_);
  TestLearnerSerialization({{"booster", "gbtree"},
                            {"seed", "0"},
                            {"nthread", "1"},
                            {"max_depth", "2"},
                            {"num_parallel_tree", "4"},
                            {"enable_experimental_json_serialization", "1"},
                            {"tree_method", "exact"}},
                           fmap_, *pp_dmat_);
  TestLearnerSerialization({{"booster", "dart"},
                            {"seed", "0"},
                            {"nthread", "1"},
                            {"max_depth", "2"},
                            {"enable_experimental_json_serialization", "1"},
                            {"tree_method", "exact"}},
                           fmap_, *pp_dmat_);
 }
 TEST_F(SerializationTest, Approx) {
  TestLearnerSerialization({{"booster", "gbtree"},
                            {"seed", "0"},
                            {"nthread", "1"},
                            {"max_depth", "2"},
                            {"enable_experimental_json_serialization", "1"},
                            {"tree_method", "approx"}},
                           fmap_, *pp_dmat_);
  TestLearnerSerialization({{"booster", "gbtree"},
                            {"seed", "0"},
                            {"nthread", "1"},
                            {"max_depth", "2"},
                            {"num_parallel_tree", "4"},
                            {"enable_experimental_json_serialization", "1"},
                            {"tree_method", "approx"}},
                           fmap_, *pp_dmat_);
  TestLearnerSerialization({{"booster", "dart"},
                            {"seed", "0"},
                            {"nthread", "1"},
                            {"max_depth", "2"},
                            {"enable_experimental_json_serialization", "1"},
                            {"tree_method", "approx"}},
                           fmap_, *pp_dmat_);
 }
 TEST_F(SerializationTest, Hist) {
  TestLearnerSerialization({{"booster", "gbtree"},
                            {"seed", "0"},
                            {"nthread", "1"},
                            {"max_depth", "2"},
                            {"enable_experimental_json_serialization", "1"},
                            {"tree_method", "hist"}},
                           fmap_, *pp_dmat_);
  TestLearnerSerialization({{"booster", "gbtree"},
                            {"seed", "0"},
                            {"nthread", "1"},
                            {"max_depth", "2"},
                            {"num_parallel_tree", "4"},
                            {"enable_experimental_json_serialization", "1"},
                            {"tree_method", "hist"}},
                           fmap_, *pp_dmat_);
  TestLearnerSerialization({{"booster", "dart"},
                            {"seed", "0"},
                            {"nthread", "1"},
                            {"max_depth", "2"},
                            {"enable_experimental_json_serialization", "1"},
                            {"tree_method", "hist"}},
                           fmap_, *pp_dmat_);
 }
 TEST_F(SerializationTest, CPU_CoordDescent) {
  TestLearnerSerialization({{"booster", "gblinear"},
                            {"seed", "0"},
                            {"nthread", "1"},
                            {"enable_experimental_json_serialization", "1"},
                            {"updater", "coord_descent"}},
                           fmap_, *pp_dmat_);
 }
 #if defined(XGBOOST_USE_CUDA)
 TEST_F(SerializationTest, GPU_Hist) {
  TestLearnerSerialization({{"booster", "gbtree"},
                            {"seed", "0"},
                            {"enable_experimental_json_serialization", "1"},
                            {"nthread", "1"},
                            {"max_depth", "2"},
                            {"tree_method", "gpu_hist"}},
                           fmap_, *pp_dmat_);
  TestLearnerSerialization({{"booster", "gbtree"},
                            {"seed", "0"},
                            {"enable_experimental_json_serialization", "1"},
                            {"nthread", "1"},
                            {"max_depth", "2"},
                            {"num_parallel_tree", "4"},
                            {"tree_method", "gpu_hist"}},
                           fmap_, *pp_dmat_);
  TestLearnerSerialization({{"booster", "dart"},
                            {"seed", "0"},
                            {"enable_experimental_json_serialization", "1"},
                            {"nthread", "1"},
                            {"max_depth", "2"},
                            {"tree_method", "gpu_hist"}},
                           fmap_, *pp_dmat_);
 }
 TEST_F(SerializationTest, ConfigurationCount) {
  auto& p_dmat = *pp_dmat_;
  std::vector<std::shared_ptr<xgboost::DMatrix>> mat = {p_dmat};
  xgboost::ConsoleLogger::Configure({{"verbosity", "3"}});
  testing::internal::CaptureStderr();
  std::string model_str;
  {
    auto learner = std::unique_ptr<Learner>(Learner::Create(mat));
    learner->SetParam("tree_method", "gpu_hist");
    learner->SetParam("enable_experimental_json_serialization", "1");
    for (size_t i = 0; i < 10; ++i) {
      learner->UpdateOneIter(i, p_dmat.get());
    }
    common::MemoryBufferStream fo(&model_str);
    learner->Save(&fo);
  }
  {
    common::MemoryBufferStream fi(&model_str);
    auto learner = std::unique_ptr<Learner>(Learner::Create(mat));
    learner->Load(&fi);
    for (size_t i = 0; i < 10; ++i) {
      learner->UpdateOneIter(i, p_dmat.get());
    }
  }
  std::string output = testing::internal::GetCapturedStderr();
  std::string target = "[GPU Hist]: Configure";
  ASSERT_NE(output.find(target), std::string::npos);
  size_t occureences = 0;
  size_t pos = 0;
  // Should run configuration exactly 2 times, one for each learner.
  while ((pos = output.find("[GPU Hist]: Configure", pos)) != std::string::npos) {
    occureences ++;
    pos += target.size();
  }
  ASSERT_EQ(occureences, 2);
  xgboost::ConsoleLogger::Configure({{"verbosity", "1"}});
 }
 TEST_F(SerializationTest, GPU_CoordDescent) {
  TestLearnerSerialization({{"booster", "gblinear"},
                            {"seed", "0"},
                            {"nthread", "1"},
                            {"enable_experimental_json_serialization", "1"},
                            {"updater", "gpu_coord_descent"}},
                           fmap_, *pp_dmat_);
 }
 #endif  // defined(XGBOOST_USE_CUDA)
 class LogitSerializationTest : public SerializationTest {
 protected:
  void SetUp() override {
    pp_dmat_ = CreateDMatrix(kRows, kCols, .5f);
    std::shared_ptr<DMatrix> p_dmat{*pp_dmat_};
    p_dmat->Info().labels_.Resize(kRows);
    auto &h_labels = p_dmat->Info().labels_.HostVector();
    std::bernoulli_distribution flip(0.5);
    auto& rnd = common::GlobalRandom();
    rnd.seed(0);
    for (auto& v : h_labels) { v = flip(rnd); }
    for (size_t i = 0; i < kCols; ++i) {
      std::string name = "feat_" + std::to_string(i);
      fmap_.PushBack(i, name.c_str(), "q");
    }
  }
 };
 TEST_F(LogitSerializationTest, Exact) {
  TestLearnerSerialization({{"booster", "gbtree"},
                            {"objective", "binary:logistic"},
                            {"seed", "0"},
                            {"nthread", "1"},
                            {"max_depth", "2"},
                            {"enable_experimental_json_serialization", "1"},
                            {"tree_method", "exact"}},
                           fmap_, *pp_dmat_);
  TestLearnerSerialization({{"booster", "dart"},
                            {"objective", "binary:logistic"},
                            {"seed", "0"},
                            {"nthread", "1"},
                            {"max_depth", "2"},
                            {"enable_experimental_json_serialization", "1"},
                            {"tree_method", "exact"}},
                           fmap_, *pp_dmat_);
 }
 TEST_F(LogitSerializationTest, Approx) {
  TestLearnerSerialization({{"booster", "gbtree"},
                            {"objective", "binary:logistic"},
                            {"seed", "0"},
                            {"nthread", "1"},
                            {"max_depth", "2"},
                            {"enable_experimental_json_serialization", "1"},
                            {"tree_method", "approx"}},
                           fmap_, *pp_dmat_);
  TestLearnerSerialization({{"booster", "dart"},
                            {"objective", "binary:logistic"},
                            {"seed", "0"},
                            {"nthread", "1"},
                            {"max_depth", "2"},
                            {"enable_experimental_json_serialization", "1"},
                            {"tree_method", "approx"}},
                           fmap_, *pp_dmat_);
 }
 TEST_F(LogitSerializationTest, Hist) {
  TestLearnerSerialization({{"booster", "gbtree"},
                            {"objective", "binary:logistic"},
                            {"seed", "0"},
                            {"nthread", "1"},
                            {"max_depth", "2"},
                            {"enable_experimental_json_serialization", "1"},
                            {"tree_method", "hist"}},
                           fmap_, *pp_dmat_);
  TestLearnerSerialization({{"booster", "dart"},
                            {"objective", "binary:logistic"},
                            {"seed", "0"},
                            {"nthread", "1"},
                            {"max_depth", "2"},
                            {"enable_experimental_json_serialization", "1"},
                            {"tree_method", "hist"}},
                           fmap_, *pp_dmat_);
 }
 TEST_F(LogitSerializationTest, CPU_CoordDescent) {
  TestLearnerSerialization({{"booster", "gblinear"},
                            {"seed", "0"},
                            {"nthread", "1"},
                            {"enable_experimental_json_serialization", "1"},
                            {"updater", "coord_descent"}},
                           fmap_, *pp_dmat_);
 }
 #if defined(XGBOOST_USE_CUDA)
 TEST_F(LogitSerializationTest, GPU_Hist) {
  TestLearnerSerialization({{"booster", "gbtree"},
                            {"objective", "binary:logistic"},
                            {"seed", "0"},
                            {"nthread", "1"},
                            {"max_depth", "2"},
                            {"enable_experimental_json_serialization", "1"},
                            {"tree_method", "gpu_hist"}},
                           fmap_, *pp_dmat_);
  TestLearnerSerialization({{"booster", "gbtree"},
                            {"objective", "binary:logistic"},
                            {"seed", "0"},
                            {"enable_experimental_json_serialization", "1"},
                            {"nthread", "1"},
                            {"max_depth", "2"},
                            {"num_parallel_tree", "4"},
                            {"tree_method", "gpu_hist"}},
                           fmap_, *pp_dmat_);
  TestLearnerSerialization({{"booster", "dart"},
                            {"objective", "binary:logistic"},
                            {"seed", "0"},
                            {"nthread", "1"},
                            {"max_depth", "2"},
                            {"enable_experimental_json_serialization", "1"},
                            {"tree_method", "gpu_hist"}},
                           fmap_, *pp_dmat_);
 }
 TEST_F(LogitSerializationTest, GPU_CoordDescent) {
  TestLearnerSerialization({{"booster", "gblinear"},
                            {"objective", "binary:logistic"},
                            {"seed", "0"},
                            {"nthread", "1"},
                            {"enable_experimental_json_serialization", "1"},
                            {"updater", "gpu_coord_descent"}},
                           fmap_, *pp_dmat_);
 }
 #endif  // defined(XGBOOST_USE_CUDA)
 class MultiClassesSerializationTest : public SerializationTest {
 protected:
  size_t constexpr static kClasses = 4;
  void SetUp() override {
    pp_dmat_ = CreateDMatrix(kRows, kCols, .5f);
    std::shared_ptr<DMatrix> p_dmat{*pp_dmat_};
    p_dmat->Info().labels_.Resize(kRows);
    auto &h_labels = p_dmat->Info().labels_.HostVector();
    std::uniform_int_distribution<size_t> categorical(0, kClasses - 1);
    auto& rnd = common::GlobalRandom();
    rnd.seed(0);
    for (auto& v : h_labels) { v = categorical(rnd); }
    for (size_t i = 0; i < kCols; ++i) {
      std::string name = "feat_" + std::to_string(i);
      fmap_.PushBack(i, name.c_str(), "q");
    }
  }
 };
 TEST_F(MultiClassesSerializationTest, Exact) {
  TestLearnerSerialization({{"booster", "gbtree"},
                            {"num_class", std::to_string(kClasses)},
                            {"seed", "0"},
                            {"nthread", "1"},
                            {"max_depth", std::to_string(kClasses)},
                            {"enable_experimental_json_serialization", "1"},
                            {"tree_method", "exact"}},
                           fmap_, *pp_dmat_);
  TestLearnerSerialization({{"booster", "gbtree"},
                            {"num_class", std::to_string(kClasses)},
                            {"seed", "0"},
                            {"nthread", "1"},
                            {"max_depth", std::to_string(kClasses)},
                            {"num_parallel_tree", "4"},
                            {"enable_experimental_json_serialization", "1"},
                            {"tree_method", "exact"}},
                           fmap_, *pp_dmat_);
  TestLearnerSerialization({{"booster", "dart"},
                            {"num_class", std::to_string(kClasses)},
                            {"seed", "0"},
                            {"nthread", "1"},
                            {"max_depth", std::to_string(kClasses)},
                            {"enable_experimental_json_serialization", "1"},
                            {"tree_method", "exact"}},
                           fmap_, *pp_dmat_);
 }
 TEST_F(MultiClassesSerializationTest, Approx) {
  TestLearnerSerialization({{"booster", "gbtree"},
                            {"num_class", std::to_string(kClasses)},
                            {"seed", "0"},
                            {"nthread", "1"},
                            {"max_depth", std::to_string(kClasses)},
                            {"enable_experimental_json_serialization", "1"},
                            {"tree_method", "approx"}},
                           fmap_, *pp_dmat_);
  TestLearnerSerialization({{"booster", "dart"},
                            {"num_class", std::to_string(kClasses)},
                            {"seed", "0"},
                            {"nthread", "1"},
                            {"max_depth", std::to_string(kClasses)},
                            {"enable_experimental_json_serialization", "1"},
                            {"tree_method", "approx"}},
                           fmap_, *pp_dmat_);
 }
 TEST_F(MultiClassesSerializationTest, Hist) {
  TestLearnerSerialization({{"booster", "gbtree"},
                            {"num_class", std::to_string(kClasses)},
                            {"seed", "0"},
                            {"nthread", "1"},
                            {"max_depth", std::to_string(kClasses)},
                            {"enable_experimental_json_serialization", "1"},
                            {"tree_method", "hist"}},
                           fmap_, *pp_dmat_);
  TestLearnerSerialization({{"booster", "gbtree"},
                            {"num_class", std::to_string(kClasses)},
                            {"seed", "0"},
                            {"nthread", "1"},
                            {"max_depth", std::to_string(kClasses)},
                            {"enable_experimental_json_serialization", "1"},
                            {"num_parallel_tree", "4"},
                            {"tree_method", "hist"}},
                           fmap_, *pp_dmat_);
  TestLearnerSerialization({{"booster", "dart"},
                            {"num_class", std::to_string(kClasses)},
                            {"seed", "0"},
                            {"nthread", "1"},
                            {"max_depth", std::to_string(kClasses)},
                            {"enable_experimental_json_serialization", "1"},
                            {"tree_method", "hist"}},
                           fmap_, *pp_dmat_);
 }
 TEST_F(MultiClassesSerializationTest, CPU_CoordDescent) {
  TestLearnerSerialization({{"booster", "gblinear"},
                            {"seed", "0"},
                            {"nthread", "1"},
                            {"enable_experimental_json_serialization", "1"},
                            {"updater", "coord_descent"}},
                           fmap_, *pp_dmat_);
 }
 #if defined(XGBOOST_USE_CUDA)
 TEST_F(MultiClassesSerializationTest, GPU_Hist) {
  TestLearnerSerialization({{"booster", "gbtree"},
                            {"num_class", std::to_string(kClasses)},
                            {"seed", "0"},
                            {"nthread", "1"},
                            {"max_depth", std::to_string(kClasses)},
                            {"enable_experimental_json_serialization", "1"},
                            {"tree_method", "gpu_hist"}},
                           fmap_, *pp_dmat_);
  TestLearnerSerialization({{"booster", "gbtree"},
                            {"num_class", std::to_string(kClasses)},
                            {"seed", "0"},
                            {"nthread", "1"},
                            {"max_depth", std::to_string(kClasses)},
                            // GPU_Hist has higher floating point error. 1e-6 doesn't work
                            // after num_parallel_tree goes to 4
                            {"num_parallel_tree", "3"},
                            {"enable_experimental_json_serialization", "1"},
                            {"tree_method", "gpu_hist"}},
                           fmap_, *pp_dmat_);
  TestLearnerSerialization({{"booster", "dart"},
                            {"num_class", std::to_string(kClasses)},
                            {"seed", "0"},
                            {"nthread", "1"},
                            {"max_depth", std::to_string(kClasses)},
                            {"enable_experimental_json_serialization", "1"},
                            {"tree_method", "gpu_hist"}},
                           fmap_, *pp_dmat_);
 }
 TEST_F(MultiClassesSerializationTest, GPU_CoordDescent) {
  TestLearnerSerialization({{"booster", "gblinear"},
                            {"num_class", std::to_string(kClasses)},
                            {"seed", "0"},
                            {"nthread", "1"},
                            {"enable_experimental_json_serialization", "1"},
                            {"updater", "gpu_coord_descent"}},
                           fmap_, *pp_dmat_);
 }
 #endif  // defined(XGBOOST_USE_CUDA)
 }       // namespace xgboost