* Allow non-zero for missing value when training.
* Fix wrong method names.
* Add a unit test
* Move the getter/setter unit test to MissingValueHandlingSuite
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
* [jvm-packages] add gpu_hist tree method
* change updater hist to grow_quantile_histmaker
* add gpu scheduling
* pass correct parameters to xgboost library
* remove debug info
* add use.cuda for pom
* add CI for gpu_hist for jvm
* add gpu unit tests
* use gpu node to build jvm
* use nvidia-docker
* Add CLI interface to create_jni.py using argparse
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
* fix type error
* Validate number of features.
* resolve comments
* add feature size for LabelPoint and DataBatch
* pass the feature size to native
* move feature size validating tests into a separate suite
* resolve comments
Co-authored-by: fis <jm.yuan@outlook.com>
* fix the nan and non-zero missing value handling
* fix nan handling part
* add missing value
* Update MissingValueHandlingSuite.scala
* Update MissingValueHandlingSuite.scala
* stylistic fix
* [jvm-packages][hot-fix] fix column mismatch caused by zip actions at XGBooostModel.transformInternal
* apply minibatch in prediction
* an iterator-compatible minibatch prediction
* regressor impl
* continuous working on mini-batch prediction of xgboost4j-spark
* Update Booster.java
* add back train method but mark as deprecated
* add back train method but mark as deprecated
* add back train method but mark as deprecated
* add back train method but mark as deprecated
* fix scalastyle error
* fix scalastyle error
* fix scalastyle error
* fix scalastyle error
* wrap iterators
* enable copartition training and validationset
* add parameters
* converge code path and have init unit test
* enable multi evals for ranking
* unit test and doc
* update example
* fix early stopping
* address the offline comments
* udpate doc
* test eval metrics
* fix compilation issue
* fix example
* add back train method but mark as deprecated
* add back train method but mark as deprecated
* add back train method but mark as deprecated
* add back train method but mark as deprecated
* fix scalastyle error
* fix scalastyle error
* fix scalastyle error
* fix scalastyle error
* wrap iterators
* remove unused code
* refactor
* fix typo
* add back train method but mark as deprecated
* add back train method but mark as deprecated
* fix scalastyle error
* fix scalastyle error
* remove copy paste error
* added test, commented out right now
* reinstated test
* added fix for checking encryption settings
* fix by using RDD conf
* fix compilation
* renamed conf
* use SparkSession if available
* fix message
* nop
* code review fixes
* add back train method but mark as deprecated
* add back train method but mark as deprecated
* fix scalastyle error
* fix scalastyle error
* partial finish
* no test
* add test cases
* add test cases
* address comments
* add test for regressor
* fix typo
* add back train method but mark as deprecated
* add back train method but mark as deprecated
* fix scalastyle error
* fix scalastyle error
* consider spark.task.cpus when controlling parallelism
* fix bug
* fix conf setup
* calculate requestedCores within ParallelismController
* enforce spark.task.cpus = 1
* unify unit test case framework
* enable spark ui
* add back train method but mark as deprecated
* add back train method but mark as deprecated
* fix scalastyle error
* fix scalastyle error
* [jvm-packages] XGBoost Spark integration refactor. (#3313)
* XGBoost Spark integration refactor.
* Make corresponding update for xgboost4j-example
* Address comments.
* [jvm-packages] Refactor XGBoost-Spark params to make it compatible with both XGBoost and Spark MLLib (#3326)
* Refactor XGBoost-Spark params to make it compatible with both XGBoost and Spark MLLib
* Fix extra space.
* [jvm-packages] XGBoost Spark supports ranking with group data. (#3369)
* XGBoost Spark supports ranking with group data.
* Use Iterator.duplicate to prevent OOM.
* Update CheckpointManagerSuite.scala
* Resolve conflicts
* [jvm-packages] Train Booster from an existing model
* Align Scala API with Java API
* Existing model should not load rabit checkpoint
* Address minor comments
* Implement saving temporary boosters and loading previous booster
* Add more unit tests for loadPrevBooster
* Add params to XGBoostEstimator
* (1) Move repartition out of the temp model saving loop (2) Address CR comments
* Catch a corner case of training next model with fewer rounds
* Address comments
* Refactor newly added methods into TmpBoosterManager
* Add two files which is missing in previous commit
* Rename TmpBooster to checkpoint
* [jvm-packages] Fixed test/train persistence
Prior to this patch both data sets were persisted in the same directory,
i.e. the test data replaced the training one which led to
* training on less data (since usually test < train) and
* test loss being exactly equal to the training loss.
Closes#2945.
* Cleanup file cache after the training
* Addressed review comments
* [jvm-packages] Exposed train-time evaluation metrics
They are accessible via 'XGBoostModel.summary'. The summary is not
serialized with the model and is only available after the training.
* Addressed review comments
* Extracted model-related tests into 'XGBoostModelSuite'
* Added tests for copying the 'XGBoostModel'
* [jvm-packages] Fixed a subtle bug in train/test split
Iterator.partition (naturally) assumes that the predicate is deterministic
but this is not the case for
r.nextDouble() <= trainTestRatio
therefore sometimes the DMatrix(...) call got a NoSuchElementException
and crashed the JVM due to lack of exception handling in
XGBoost4jCallbackDataIterNext.
* Make sure train/test objectives are different
In the refactor to add base margins, #2532, all of the labels were lost
when creating the dmatrix. This became obvious as metrics like ndcg
always returned 1.0 regardless of the results.
Change-Id: I88be047e1c108afba4784bd3d892bfc9edeabe55