[jvm-packages] Implemented early stopping (#2710)
* Allowed subsampling test from the training data frame/RDD The implementation requires storing 1 - trainTestRatio points in memory to make the sampling work. An alternative approach would be to construct the full DMatrix and then slice it deterministically into train/test. The peak memory consumption of such scenario, however, is twice the dataset size. * Removed duplication from 'XGBoost.train' Scala callers can (and should) use names to supply a subset of parameters. Method overloading is not required. * Reuse XGBoost seed parameter to stabilize train/test splitting * Added early stopping support to non-distributed XGBoost Closes #1544 * Added early-stopping to distributed XGBoost * Moved construction of 'watches' into a separate method This commit also fixes the handling of 'baseMargin' which previously was not added to the validation matrix. * Addressed review comments
This commit is contained in:
@@ -55,7 +55,10 @@ object XGBoost {
|
||||
val trainMat = new DMatrix(dataIter, null)
|
||||
val watches = List("train" -> trainMat).toMap
|
||||
val round = 2
|
||||
val booster = XGBoostScala.train(trainMat, paramMap, round, watches, null, null)
|
||||
val numEarlyStoppingRounds = paramMap.get("numEarlyStoppingRounds")
|
||||
.map(_.toString.toInt).getOrElse(0)
|
||||
val booster = XGBoostScala.train(trainMat, paramMap, round, watches,
|
||||
earlyStoppingRound = numEarlyStoppingRounds)
|
||||
Rabit.shutdown()
|
||||
collector.collect(new XGBoostModel(booster))
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user