348 Commits

Author SHA1 Message Date
Stephanie Yang
0fc7dcfe6c Add public group getter for java and scala (#4838)
* Add public group getter for java and scala

* Remove unnecessary param from javadoc

* Fix typo

* Fix another typo

* Add semicolon

* Fix javadoc return statement

* Fix missing return statement

* Add a unit test
2019-09-09 10:07:48 -07:00
Nan Zhu
0184eb5d02
[jvm-packages] Refactor XGBoost.scala to put all params processing in one place (#4815)
* cleaning checkpoint file after a successful file

* address comments

* refactor xgboost.scala to avoid multiple changes when adding params

* consolidate params

* fix compilation issue

* fix failed test

* fix wrong name

* tyep conversion
2019-08-28 22:41:05 -07:00
Nan Zhu
7b5cbcc846
[jvm-packages] cleaning checkpoint file after a successful training (#4754)
* cleaning checkpoint file after a successful file

* address comments
2019-08-14 10:57:47 -07:00
Oleksandr Pryimak
b68de018b8 [jvm-packages] jvm test should clean up after themselfs (#4706) 2019-08-04 14:09:11 -07:00
Nan Zhu
1595e3f57b
upgrade version num (#4670)
* upgrade version num

* missign changes

* fix version script

* change versions

* rm files

* Update CMakeLists.txt
2019-07-17 15:25:35 -07:00
Nan Zhu
01b0c9047c
[jvm-packages] allowing chaining prediction (#4667)
* add test for chaining prediction

* update rabit

* Update XGBoostGeneralSuite.scala
2019-07-17 08:50:27 -07:00
koertkuipers
3c506b076e [jvm-packages] upgrade to Scala 2.12 (#4574)
* bump scala to 2.12 which requires java 8 and also newer flink and akka

* put scala version in artifactId

* fix appveyor

* fix for scaladoc issue that looks like https://github.com/scala/bug/issues/10509

* fix ci_build

* update versions in generate_pom.py

* fix generate_pom.py

* apache does not have a download for spark 2.4.3 distro using scala 2.12 yet, so for now i use a tgz i put on s3

* Upload spark-2.4.3-bin-scala2.12-hadoop2.7.tgz to our own S3

* Update Dockerfile.jvm_cross

* Update Dockerfile.jvm_cross
2019-07-16 08:43:34 -07:00
Mathew Wicks
6323ef94ad [jvm-packages] update local dev build process (#4640) 2019-07-15 21:23:06 -07:00
Oleksandr Pryimak
2973416f2e [jvm-packages] Fix maven warnings (#4664)
* exec plugin was missing a version
* reportPlugins has been deprecated:
  see https://maven.apache.org/plugins/maven-site-plugin/maven-3.html#Classic_configuration_Maven_2__3
2019-07-15 20:25:43 -07:00
Rong Ou
30204b50fe fix spark tests on machines with many cores (#4634) 2019-07-07 16:02:56 -07:00
Philip Hyunsu Cho
d333918f5e [jvm-packages] Expose setMissing method in XGBoostClassificationModel / XGBoostRegressionModel (#4643) 2019-07-07 16:02:44 -07:00
Nan Zhu
abffbe014e
[jvm-packages] delete all constraints from spark layer about obj and eval metrics and handle error in jvm layer (#4560)
* temp

* prediction part

* remove supported*

* add for test

* fix param name

* add rabit

* update rabit

* return value of rabit init

* eliminate compilation warnings

* update rabit

* shutdown

* update rabit again

* check sparkcontext shutdown

* fix logic

* sleep

* fix tests

* test with relaxed threshold

* create new thread each time

* stop for job quitting

* udpate rabit

* update rabit

* update rabit

* update git modules
2019-06-27 08:47:37 -07:00
Nan Zhu
fe2de6f415
[jvm-packages]fix silly bug in feature scoring (#4604) 2019-06-25 20:49:01 -07:00
Daniel Stahl
7ae11c9284 [jvm-packages] updated kryo dependency to 2.22 (#4575) 2019-06-18 09:26:54 -07:00
Jiaming Yuan
2f1319f273
Add rmsle metric and reg:squaredlogerror objective (#4541) 2019-06-11 05:48:27 +08:00
Jiaming Yuan
0ce300e73a [jvm-packages] Add back reg:linear for scala. (#4490)
* Add back reg:linear for scala.

* Fix linter.
2019-05-23 15:02:08 -07:00
Bryan Woods
278562db13 Add support for cross-validation using query ID (#4474)
* adding support for matrix slicing with query ID for cross-validation

* hail mary test of unrar installation for windows tests

* trying to modify tests to run in Github CI

* Remove dependency on wget and unrar

* Save error log from R test

* Relax assertion in test_training

* Use int instead of bool in C function interface

* Revise R interface

* Add XGDMatrixSliceDMatrixEx and keep old XGDMatrixSliceDMatrix for API compatibility
2019-05-23 10:45:02 -07:00
Philip Hyunsu Cho
515f5f5c47
[RFC] Version 0.90 release candidate (#4475)
* Release 0.90

* Add script to automatically generate acknowledgment

* Update NEWS.md
2019-05-20 01:02:44 -07:00
Philip Hyunsu Cho
6ff994126a [BLOCKING][CI] Upgrade to Spark 2.4.3 (#4414)
* [CI] Upgrade to Spark 2.4.2

* Pass Spark version to build script

* Allow multiple --build-arg in ci_build.sh

* Fix syntax

* Fix container name

* Update pom.xml

* Fix container name

* Update Jenkinsfile

* Update pom.xml

* Update Dockerfile.jvm_cross
2019-05-09 21:36:59 -07:00
Shaochen Shi
18e4fc3690 [jvm-packages] Automatically set maximize_evaluation_metrics if not explicitly given in XGBoost4J-Spark (#4446)
* Automatically set maximize_evaluation_metrics if not explicitly given.

* When custom_eval is set, require maximize_evaluation_metrics.

* Update documents on early stop in XGBoost4J-Spark.

* Fix code error.
2019-05-09 12:49:44 -07:00
Xu Xiao
797ba8e72d [jvm-packages] fix compatibility problem of spark version (#4411)
* fix compatibility problem of spark version on MissingValueHandlingSuite.scala

* call setHandleInvalid by runtime reflection
2019-04-30 09:13:05 -07:00
Nan Zhu
253fdd8a42
[jvm-packages] fix the split of input (#4417) 2019-04-29 18:52:40 -07:00
Nan Zhu
37dc82c3ff
[jvm-packages] allow partial evaluation of dataframe before prediction (#4407)
* allow partial evaluation of dataframe before prediction

* resume spark test

* comments

* Run unit tests after building JVM packages
2019-04-26 21:02:40 -07:00
Philip Hyunsu Cho
ea850ecd20
[CI] Refactor Jenkins CI pipeline + migrate all Linux tests to Jenkins (#4401)
* All Linux tests are now in Jenkins CI
* Tests are now de-coupled from builds. We can now build XGBoost with one version of CUDA/JDK and test it with another version of CUDA/JDK
* Builds (compilation) are significantly faster because 1) They use C5 instances with faster CPU cores; and 2) build environment setup is cached using Docker containers
2019-04-26 18:39:12 -07:00
Nan Zhu
995698b0cb
[BREAKING][jvm-packages] fix the non-zero missing value handling (#4349)
* fix the nan and non-zero missing value handling

* fix nan handling part

* add missing value

* Update MissingValueHandlingSuite.scala

* Update MissingValueHandlingSuite.scala

* stylistic fix
2019-04-26 11:10:33 -07:00
Xu Xiao
2d875ec019 [BLOCKING][jvm-packages] fix non-deterministic order within a partition (in the case of an upstream shuffle) on prediction (#4388)
* [jvm-packages][hot-fix] fix column mismatch caused by zip actions at XGBooostModel.transformInternal

* apply minibatch in prediction

* an iterator-compatible minibatch prediction

* regressor impl

* continuous working on mini-batch prediction of xgboost4j-spark

* Update Booster.java
2019-04-26 11:09:20 -07:00
Nan Zhu
65db8d0626
[jvm-packages] support spark 2.4 and compatibility test with previous xgboost version (#4377)
* bump spark version

* keep float.nan

* handle brokenly changed name/value

* add test

* add model files

* add model files

* update doc
2019-04-17 11:33:13 -07:00
Jiaming Yuan
207f058711 Refactor CMake scripts. (#4323)
* Refactor CMake scripts.

* Remove CMake CUDA wrapper.
* Bump CMake version for CUDA.
* Use CMake to handle Doxygen.
* Split up CMakeList.
* Export install target.
* Use modern CMake.
* Remove build.sh
* Workaround for gpu_hist test.
* Use cmake 3.12.

* Revert machine.conf.

* Move CLI test to gpu.

* Small cleanup.

* Support using XGBoost as submodule.

* Fix windows

* Fix cpp tests on Windows

* Remove duplicated find_package.
2019-04-15 10:08:12 -07:00
Adam Pocock
a448a8320c [jvm-packages] Fixing the NativeLibLoader on Java 9+ (#4351)
The old NativeLibLoader had a short-circuit load path which modified
java.library.path and attempted to load the xgboost library from outside
the jar first, falling back to loading the library from inside the jar.
This path is a no-op every time when using XGBoost outside of it's
source tree. Additionally it triggers an illegal reflective access
warning in the module system in 9, 10, and 11.

On Java 12 the ClassLoader fields are not accessible via reflection
(separately from the illegal reflective acces warning), and so it fails
in a way that isn't caught by the code which falls back to loading the
library from inside the jar.

This commit removes that code path and always loads the xgboost library
from inside the jar file as it's a valid technique across multiple JVM
implementations and works with all versions of Java.
2019-04-10 12:41:44 -07:00
Xu Xiao
60a9af567c [jvm-packages] Add methods operating attributes of booster in jvm package, which follow API design in python package. (#4336) 2019-04-08 11:00:35 -07:00
Nan Zhu
ad4de0d718
[jvm-packages] handle NaN as missing value explicitly (#4309)
* handle nan

* handle nan explicitly

* make code better and handle sparse vector in spark

* Update XGBoostGeneralSuite.scala
2019-03-30 19:34:26 +08:00
Rong Ou
7ea5b772fb do not filter shared library files (#4303) 2019-03-28 19:40:54 +08:00
Rong Ou
8c8021dfa7 use all cores to build on linux (#4304) 2019-03-27 19:51:08 -07:00
Harry Braviner
b374e0a7ab [jvm-packages] Allow supression of Rabit output in Booster::train in xgboost4j (#4262)
* Make train in xgboost4j respect print params

Previously no setting in params argument of Booster::train would prevent
the Rabit.trackerPrint call. This can fill up a lot of screen space in
the case that many folds are being trained.
* Setting "silent" in this map to "true", "True", a non-zero integer, or
  a string that can be parsed to such an int will prevent printing.
* Setting "verbose_eval" to "False" or "false" will prevent printing.
* Setting "verbose_eval" to an int (or a String parseable to an int) n
  will result in printing every n steps, or no printing is n is zero.

This is to match the python behaviour described here:
https://www.kaggle.com/c/rossmann-store-sales/discussion/17499

* Fixed 'slient' typo in xgboost4j test

* private access on two methods
2019-03-21 18:25:12 +08:00
Nan Zhu
45c89a6792
[jvm-packages] logging version number (#4271)
* print version number

* add property file
2019-03-21 18:24:29 +08:00
Nan Zhu
359ed9c5bc
[jvm-packages] add configuration flag to control whether to cache transformed training set (#4268)
* control whether to cache data

* uncache
2019-03-18 10:13:28 +08:00
Jiaming Yuan
29a1356669
Deprecate reg:linear' in favor of reg:squarederror'. (#4267)
* Deprecate `reg:linear' in favor of `reg:squarederror'.
* Replace the use of `reg:linear'.
* Replace the use of `silent`.
2019-03-17 17:55:04 +08:00
Shaochen Shi
224786f67f [xgboost4j-spark] Allow set the parameter "maxLeaves". (#4226)
* Allow set the parameter "maxLeaves".

* Add "setMaxLeaves" to XGBoostRegressor.
2019-03-07 18:36:47 -08:00
Christopher Suchanek
ac3d03089b [jvm-packages] remove shutdown of handler shutdown (#4224) 2019-03-06 19:32:43 -08:00
Nan Zhu
5f34078fba
[jvm-packages] bump version for master (#4209)
* update version

* bump version
2019-03-04 23:12:24 -08:00
Yanbo Liang
9fefa2128d [jvm-packages] Fix early stop with xgboost4j-spark (#4176)
* Fix early stop with xgboost4j-spark

* Update XGBoost.java

* Update XGBoost.java

* Update XGBoost.java

To use -Float.MAX_VALUE as the lower bound, in case there is positive metric.

* Only update best score if the current score is better (no update when equal)

* Update xgboost-spark tutorial to fix early stopping docs.
2019-03-01 13:02:57 -08:00
Nan Zhu
1b7405f688
[jvm-packages] fix comments in objectiveTrait (#4174) 2019-02-22 00:32:13 -08:00
Nan Zhu
dc2add96c5
[jvm-packages] upgrade spark version (#4170) 2019-02-21 11:51:36 -08:00
Rong Ou
d506a8bc63 [jvm-packages] add verbosity param (#4138) 2019-02-13 20:57:17 -08:00
Nan Zhu
c18a3660fa
Separate Depthwidth and Lossguide growing policy in fast histogram (#4102)
* add back train method but mark as deprecated

* add back train method but mark as deprecated

* add back train method but mark as deprecated

* fix scalastyle error

* fix scalastyle error

* fix scalastyle error

* fix scalastyle error

* init

* more changes

* temp

* update

* udpate rabit

* change the histogram

* update kfactor

* sync per node stats

* temp

* update

* final

* code clean

* update rabit

* more cleanup

* fix errors

* fix failed tests

* enforce c++11

* broadcast subsampled feature correctly

* init col

* temp

* col sampling

* fix histmastrix init

* fix col sampling

* remove cout

* fix out of bound access

* fix core dump

remove core dump file

* disbale test temporarily

* update

* add fid

* print perf data

* update

* revert some changes

* temp

* temp

* pass all tests

* bring back some tests

* recover some changes

* fix lint issue

* enable monotone and interaction constraints

* don't specify default for monotone and interactions

* recover column init part

* more recovery

* fix core dumps

* code clean

* revert some changes

* fix test compilation issue

* fix lint issue

* resolve compilation issue

* fix issues of lint caused by rebase

* fix stylistic changes and change variable names

* use regtree internal function

* modularize depth width

* address the comments

* fix failed tests

* wrap perf timers with class

* fix lint

* fix num_leaves count

* fix indention

* Update src/tree/updater_quantile_hist.cc

Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com>

* Update src/tree/updater_quantile_hist.h

Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com>

* Update src/tree/updater_quantile_hist.cc

Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com>

* Update src/tree/updater_quantile_hist.cc

Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com>

* Update src/tree/updater_quantile_hist.cc

Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com>

* Update src/tree/updater_quantile_hist.h

Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com>

* merge

* fix compilation
2019-02-13 12:56:19 -08:00
Rong Ou
9b917cda4f [jvm-packages] fix simple logic error :) (#4128)
@CodingCat
2019-02-11 21:47:30 -08:00
Nan Zhu
3320a52192 [jvm-packages] force use per-group weights in spark layer (#4118) 2019-02-10 05:38:03 +08:00
Rong Ou
2a9b085bc8 [jvm-packages] minor fix of params (#4114) 2019-02-08 00:21:59 -08:00
Nan Zhu
05243642bb
[jvm-packages] better fix for shutdown applications (#4108)
* intentionally failed task

* throw exception

* more

* stop sparkcontext directly

* stop from another thread

* new scope

* use a new thread

* daemon threads

* don't join the killer thread

* remove injected errors

* add comments
2019-02-07 09:02:17 -08:00
Nan Zhu
325b16bccd
[jvm-packages] fix return type of setEvalSets (#4105) 2019-02-06 11:00:29 -08:00