520 Commits

Author SHA1 Message Date
Liang-Chi Hsieh
397d8f0ee7
[jvm-packages] XGBoost Spark should deal with NaN when parsing evaluation output (#5546) 2020-04-19 23:10:30 -07:00
Philip Hyunsu Cho
1b1969f20d
[jvm-packages] [CI] Create a Maven repository to host SNAPSHOT JARs (#5533) 2020-04-14 19:33:32 -07:00
Liang-Chi Hsieh
449ab79e0c
[CI] Use devtoolset-6 because devtoolset-4 is EOL and no longer available (#5506)
* Use devtoolset-6.

* [CI] Use devtoolset-6 because devtoolset-4 is EOL and no longer available

* CUDA 9.0 doesn't work with devtoolset-6; use devtoolset-4 for GPU build only

Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-04-11 19:49:06 -07:00
Bobby Wang
ad826e913f
[jvm-packages]add feature size for LabelPoint and DataBatch (#5303)
* fix type error

* Validate number of features.

* resolve comments

* add feature size for LabelPoint and DataBatch

* pass the feature size to native

* move feature size validating tests into a separate suite

* resolve comments

Co-authored-by: fis <jm.yuan@outlook.com>
2020-04-07 16:49:52 -07:00
Jiaming Yuan
f2b8cd2922
Add number of columns to native data iterator. (#5202)
* Change native data iter into an adapter.
2020-02-25 23:42:01 +08:00
Philip Hyunsu Cho
7ac7e8778f
Port patches from 1.0.0 branch (#5336)
* Remove f-string, since it's not supported by Python 3.5 (#5330)

* Remove f-string, since it's not supported by Python 3.5

* Add Python 3.5 to CI, to ensure compatibility

* Remove duplicated matplotlib

* Show deprecation notice for Python 3.5

* Fix lint

* Fix lint

* Fix a unit test that mistook MINOR ver for PATCH ver

* Enforce only major version in JSON model schema

* Bump version to 1.1.0-SNAPSHOT
2020-02-21 13:13:21 -08:00
Jiaming Yuan
9f77c18b0d
Add JVM_CHECK_CALL. (#5199)
* Added a check call macro in jvm package, prevents executing other functions
from jvm when error occurred in XGBoost. For example, when prediction fails jvm
should not try to allocate memory based on the output prediction size.
2020-02-18 11:10:55 +08:00
Nan Zhu
d7b45fbcaf
[jvm-packages] do not use multiple jobs to make checkpoints (#5082)
* temp

* temp

* tep

* address the comments

* fix stylistic issues

* fix

* external checkpoint
2020-02-01 19:36:39 -08:00
Kodi Arfer
f100b8d878 [Breaking] Don't drop trees during DART prediction by default (#5115)
* Simplify DropTrees calling logic

* Add `training` parameter for prediction method.

* [Breaking]: Add `training` to C API.

* Change for R and Python custom objective.

* Correct comment.

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2020-01-13 21:48:30 +08:00
Jiaming Yuan
7b65698187
Enforce correct data shape. (#5191)
* Fix syncing DMatrix columns.
* notes for tree method.
* Enable feature validation for all interfaces except for jvm.
* Better tests for boosting from predictions.
* Disable validation on JVM.
2020-01-13 15:48:17 +08:00
Philip Hyunsu Cho
74f545bde3 [CI] Repair download URL for Maven 3.6.1 (#5139) 2019-12-20 10:07:40 +08:00
Philip Hyunsu Cho
37fdfa03f8
[jvm-packages] Comply with scala style convention + fix broken unit test (#5134)
* Fix scala style check

* fix messed unit test
2019-12-18 17:26:58 -08:00
cpfarrell
bc9d88259f [jvm-packages] Allow for bypassing spark missing value check (#4805)
* Allow for bypassing spark missing value check

* Update documentation for dealing with missing values in spark xgboost
2019-12-18 10:48:20 -08:00
Chen Qin
b29b8c2f34 [jvm-packages] update rabit, surface new changes to spark, add parity and failure tests (#4966)
* [phase 1] expose sets of rabit configurations to spark layer

* add back mutable import

* disable ring_mincount till https://github.com/dmlc/rabit/pull/106d

* Revert "disable ring_mincount till https://github.com/dmlc/rabit/pull/106d"

This reverts commit 65e95a98e24f5eb53c6ba9ef9b2379524258984d.

* apply latest rabit

* fix build error

* apply https://github.com/dmlc/xgboost/pull/4880

* downgrade cmake in rabit

* point to rabit with DMLC_ROOT fix

* relative path of rabit install prefix

* split rabit parameters to another trait

* misc

* misc

* Delete .classpath

* Delete .classpath

* Delete .classpath

* Update XGBoostClassifier.scala

* Update XGBoostRegressor.scala

* Update GeneralParams.scala

* Update GeneralParams.scala

* Update GeneralParams.scala

* Update GeneralParams.scala

* Delete .classpath

* Update RabitParams.scala

* Update .gitignore

* Update .gitignore

* apply rabitParams to training

* use string as rabit parameter value type

* cleanup

* add rabitEnv check

* point to dmlc/rabit

* per feedback

* update private scope

* misc

* update rabit

* add rabit_timtout, fix failing test.

* split tests

* allow build jvm with rabit mock

* pass mock failures to rabit with test

* add mock error and graceful handle rabit assertion error test

* split mvn test

* remove sign for test

* update rabit

* build jvm_packages with rabit mock

* point back to dmlc/rabit

* per feedback, update scala header

* cleanup pom

* per feedback

* try fix lint

* fix lint

* per feedback, remove bootstrap_cache

* per feedback 2

* try replace dev profile with passing mvn property

* fix build error

* remove mvn property and replace with env setting to build test jar

* per feedback

* revert copyright headlines, point to dmlc/rabit

* revert python lint

* remove multiple failure test case as retry is not enabled in spark

* Update core.py

* Update core.py

* per feedback, style fix
2019-11-01 14:21:19 -07:00
Jiaming Yuan
010b8f1428 Revert "[jvm-packages] update rabit, surface new changes to spark, add parity and failure tests (#4876)" (#4965)
This reverts commit 86ed01c4bbecef66e1bc4d02fb13116bd6130fae.
2019-10-18 14:02:35 -07:00
Chen Qin
86ed01c4bb [jvm-packages] update rabit, surface new changes to spark, add parity and failure tests (#4876)
* Expose sets of rabit configurations to spark layer
2019-10-18 15:07:31 -04:00
Jiaming Yuan
31030a8d3a
Set correct file permission. (#4964) 2019-10-18 12:54:29 -04:00
Liangcai Li
82ee2317e8 Add case for LongParam. (#4885)
To support specifying long parameter as String, the same as other basic
type, such as Int, Double ...
2019-09-25 05:41:53 -07:00
Nan Zhu
fc8c9b0521
[jvm-packages] enable deterministic repartitioning when checkpoint is enabled (#4807)
* do reparititoning in DataUtil

* keep previous behavior of partitioning without checkpoint

* deterministic repartitioning

* change
2019-09-19 15:21:05 -07:00
Xu Xiao
277e25797b [jvm-packages] refine numAliveCores method of SparkParallelismTracker (#4858)
* refine numAliveCores

* refine XGBoostToMLlibParams

* fix waitForCondition

* resolve conflicts

* Update SparkParallelismTracker.scala
2019-09-19 15:18:29 -07:00
Honza Sterba
22209b7b95 [jvm-packages] Add BigDenseMatrix (#4383)
* Add BigDenseMatrix

* ability to create DMatrix with bigger than Integer.MAX_VALUE size arrays
* uses sun.misc.Unsafe

* make DMatrix test work from a jar as well
2019-09-18 20:46:14 -07:00
Jiaming Yuan
d669ea1eaa
Deprecate set group (#4864)
* Convert jvm package and R package.

* Restore for compatibility.
2019-09-17 21:26:54 -04:00
Stephanie Yang
0fc7dcfe6c Add public group getter for java and scala (#4838)
* Add public group getter for java and scala

* Remove unnecessary param from javadoc

* Fix typo

* Fix another typo

* Add semicolon

* Fix javadoc return statement

* Fix missing return statement

* Add a unit test
2019-09-09 10:07:48 -07:00
Nan Zhu
0184eb5d02
[jvm-packages] Refactor XGBoost.scala to put all params processing in one place (#4815)
* cleaning checkpoint file after a successful file

* address comments

* refactor xgboost.scala to avoid multiple changes when adding params

* consolidate params

* fix compilation issue

* fix failed test

* fix wrong name

* tyep conversion
2019-08-28 22:41:05 -07:00
Nan Zhu
7b5cbcc846
[jvm-packages] cleaning checkpoint file after a successful training (#4754)
* cleaning checkpoint file after a successful file

* address comments
2019-08-14 10:57:47 -07:00
Oleksandr Pryimak
b68de018b8 [jvm-packages] jvm test should clean up after themselfs (#4706) 2019-08-04 14:09:11 -07:00
Nan Zhu
1595e3f57b
upgrade version num (#4670)
* upgrade version num

* missign changes

* fix version script

* change versions

* rm files

* Update CMakeLists.txt
2019-07-17 15:25:35 -07:00
Nan Zhu
01b0c9047c
[jvm-packages] allowing chaining prediction (#4667)
* add test for chaining prediction

* update rabit

* Update XGBoostGeneralSuite.scala
2019-07-17 08:50:27 -07:00
koertkuipers
3c506b076e [jvm-packages] upgrade to Scala 2.12 (#4574)
* bump scala to 2.12 which requires java 8 and also newer flink and akka

* put scala version in artifactId

* fix appveyor

* fix for scaladoc issue that looks like https://github.com/scala/bug/issues/10509

* fix ci_build

* update versions in generate_pom.py

* fix generate_pom.py

* apache does not have a download for spark 2.4.3 distro using scala 2.12 yet, so for now i use a tgz i put on s3

* Upload spark-2.4.3-bin-scala2.12-hadoop2.7.tgz to our own S3

* Update Dockerfile.jvm_cross

* Update Dockerfile.jvm_cross
2019-07-16 08:43:34 -07:00
Mathew Wicks
6323ef94ad [jvm-packages] update local dev build process (#4640) 2019-07-15 21:23:06 -07:00
Oleksandr Pryimak
2973416f2e [jvm-packages] Fix maven warnings (#4664)
* exec plugin was missing a version
* reportPlugins has been deprecated:
  see https://maven.apache.org/plugins/maven-site-plugin/maven-3.html#Classic_configuration_Maven_2__3
2019-07-15 20:25:43 -07:00
Rong Ou
30204b50fe fix spark tests on machines with many cores (#4634) 2019-07-07 16:02:56 -07:00
Philip Hyunsu Cho
d333918f5e [jvm-packages] Expose setMissing method in XGBoostClassificationModel / XGBoostRegressionModel (#4643) 2019-07-07 16:02:44 -07:00
Nan Zhu
abffbe014e
[jvm-packages] delete all constraints from spark layer about obj and eval metrics and handle error in jvm layer (#4560)
* temp

* prediction part

* remove supported*

* add for test

* fix param name

* add rabit

* update rabit

* return value of rabit init

* eliminate compilation warnings

* update rabit

* shutdown

* update rabit again

* check sparkcontext shutdown

* fix logic

* sleep

* fix tests

* test with relaxed threshold

* create new thread each time

* stop for job quitting

* udpate rabit

* update rabit

* update rabit

* update git modules
2019-06-27 08:47:37 -07:00
Nan Zhu
fe2de6f415
[jvm-packages]fix silly bug in feature scoring (#4604) 2019-06-25 20:49:01 -07:00
Daniel Stahl
7ae11c9284 [jvm-packages] updated kryo dependency to 2.22 (#4575) 2019-06-18 09:26:54 -07:00
Jiaming Yuan
2f1319f273
Add rmsle metric and reg:squaredlogerror objective (#4541) 2019-06-11 05:48:27 +08:00
Jiaming Yuan
0ce300e73a [jvm-packages] Add back reg:linear for scala. (#4490)
* Add back reg:linear for scala.

* Fix linter.
2019-05-23 15:02:08 -07:00
Bryan Woods
278562db13 Add support for cross-validation using query ID (#4474)
* adding support for matrix slicing with query ID for cross-validation

* hail mary test of unrar installation for windows tests

* trying to modify tests to run in Github CI

* Remove dependency on wget and unrar

* Save error log from R test

* Relax assertion in test_training

* Use int instead of bool in C function interface

* Revise R interface

* Add XGDMatrixSliceDMatrixEx and keep old XGDMatrixSliceDMatrix for API compatibility
2019-05-23 10:45:02 -07:00
Philip Hyunsu Cho
515f5f5c47
[RFC] Version 0.90 release candidate (#4475)
* Release 0.90

* Add script to automatically generate acknowledgment

* Update NEWS.md
2019-05-20 01:02:44 -07:00
Philip Hyunsu Cho
6ff994126a [BLOCKING][CI] Upgrade to Spark 2.4.3 (#4414)
* [CI] Upgrade to Spark 2.4.2

* Pass Spark version to build script

* Allow multiple --build-arg in ci_build.sh

* Fix syntax

* Fix container name

* Update pom.xml

* Fix container name

* Update Jenkinsfile

* Update pom.xml

* Update Dockerfile.jvm_cross
2019-05-09 21:36:59 -07:00
Shaochen Shi
18e4fc3690 [jvm-packages] Automatically set maximize_evaluation_metrics if not explicitly given in XGBoost4J-Spark (#4446)
* Automatically set maximize_evaluation_metrics if not explicitly given.

* When custom_eval is set, require maximize_evaluation_metrics.

* Update documents on early stop in XGBoost4J-Spark.

* Fix code error.
2019-05-09 12:49:44 -07:00
Xu Xiao
797ba8e72d [jvm-packages] fix compatibility problem of spark version (#4411)
* fix compatibility problem of spark version on MissingValueHandlingSuite.scala

* call setHandleInvalid by runtime reflection
2019-04-30 09:13:05 -07:00
Nan Zhu
253fdd8a42
[jvm-packages] fix the split of input (#4417) 2019-04-29 18:52:40 -07:00
Nan Zhu
37dc82c3ff
[jvm-packages] allow partial evaluation of dataframe before prediction (#4407)
* allow partial evaluation of dataframe before prediction

* resume spark test

* comments

* Run unit tests after building JVM packages
2019-04-26 21:02:40 -07:00
Philip Hyunsu Cho
ea850ecd20
[CI] Refactor Jenkins CI pipeline + migrate all Linux tests to Jenkins (#4401)
* All Linux tests are now in Jenkins CI
* Tests are now de-coupled from builds. We can now build XGBoost with one version of CUDA/JDK and test it with another version of CUDA/JDK
* Builds (compilation) are significantly faster because 1) They use C5 instances with faster CPU cores; and 2) build environment setup is cached using Docker containers
2019-04-26 18:39:12 -07:00
Nan Zhu
995698b0cb
[BREAKING][jvm-packages] fix the non-zero missing value handling (#4349)
* fix the nan and non-zero missing value handling

* fix nan handling part

* add missing value

* Update MissingValueHandlingSuite.scala

* Update MissingValueHandlingSuite.scala

* stylistic fix
2019-04-26 11:10:33 -07:00
Xu Xiao
2d875ec019 [BLOCKING][jvm-packages] fix non-deterministic order within a partition (in the case of an upstream shuffle) on prediction (#4388)
* [jvm-packages][hot-fix] fix column mismatch caused by zip actions at XGBooostModel.transformInternal

* apply minibatch in prediction

* an iterator-compatible minibatch prediction

* regressor impl

* continuous working on mini-batch prediction of xgboost4j-spark

* Update Booster.java
2019-04-26 11:09:20 -07:00
Nan Zhu
65db8d0626
[jvm-packages] support spark 2.4 and compatibility test with previous xgboost version (#4377)
* bump spark version

* keep float.nan

* handle brokenly changed name/value

* add test

* add model files

* add model files

* update doc
2019-04-17 11:33:13 -07:00
Jiaming Yuan
207f058711 Refactor CMake scripts. (#4323)
* Refactor CMake scripts.

* Remove CMake CUDA wrapper.
* Bump CMake version for CUDA.
* Use CMake to handle Doxygen.
* Split up CMakeList.
* Export install target.
* Use modern CMake.
* Remove build.sh
* Workaround for gpu_hist test.
* Use cmake 3.12.

* Revert machine.conf.

* Move CLI test to gpu.

* Small cleanup.

* Support using XGBoost as submodule.

* Fix windows

* Fix cpp tests on Windows

* Remove duplicated find_package.
2019-04-15 10:08:12 -07:00