Philip Hyunsu Cho
7ac7e8778f
Port patches from 1.0.0 branch ( #5336 )
...
* Remove f-string, since it's not supported by Python 3.5 (#5330 )
* Remove f-string, since it's not supported by Python 3.5
* Add Python 3.5 to CI, to ensure compatibility
* Remove duplicated matplotlib
* Show deprecation notice for Python 3.5
* Fix lint
* Fix lint
* Fix a unit test that mistook MINOR ver for PATCH ver
* Enforce only major version in JSON model schema
* Bump version to 1.1.0-SNAPSHOT
2020-02-21 13:13:21 -08:00
Nan Zhu
d7b45fbcaf
[jvm-packages] do not use multiple jobs to make checkpoints ( #5082 )
...
* temp
* temp
* tep
* address the comments
* fix stylistic issues
* fix
* external checkpoint
2020-02-01 19:36:39 -08:00
Philip Hyunsu Cho
37fdfa03f8
[jvm-packages] Comply with scala style convention + fix broken unit test ( #5134 )
...
* Fix scala style check
* fix messed unit test
2019-12-18 17:26:58 -08:00
cpfarrell
bc9d88259f
[jvm-packages] Allow for bypassing spark missing value check ( #4805 )
...
* Allow for bypassing spark missing value check
* Update documentation for dealing with missing values in spark xgboost
2019-12-18 10:48:20 -08:00
Chen Qin
b29b8c2f34
[jvm-packages] update rabit, surface new changes to spark, add parity and failure tests ( #4966 )
...
* [phase 1] expose sets of rabit configurations to spark layer
* add back mutable import
* disable ring_mincount till https://github.com/dmlc/rabit/pull/106d
* Revert "disable ring_mincount till https://github.com/dmlc/rabit/pull/106d "
This reverts commit 65e95a98e24f5eb53c6ba9ef9b2379524258984d.
* apply latest rabit
* fix build error
* apply https://github.com/dmlc/xgboost/pull/4880
* downgrade cmake in rabit
* point to rabit with DMLC_ROOT fix
* relative path of rabit install prefix
* split rabit parameters to another trait
* misc
* misc
* Delete .classpath
* Delete .classpath
* Delete .classpath
* Update XGBoostClassifier.scala
* Update XGBoostRegressor.scala
* Update GeneralParams.scala
* Update GeneralParams.scala
* Update GeneralParams.scala
* Update GeneralParams.scala
* Delete .classpath
* Update RabitParams.scala
* Update .gitignore
* Update .gitignore
* apply rabitParams to training
* use string as rabit parameter value type
* cleanup
* add rabitEnv check
* point to dmlc/rabit
* per feedback
* update private scope
* misc
* update rabit
* add rabit_timtout, fix failing test.
* split tests
* allow build jvm with rabit mock
* pass mock failures to rabit with test
* add mock error and graceful handle rabit assertion error test
* split mvn test
* remove sign for test
* update rabit
* build jvm_packages with rabit mock
* point back to dmlc/rabit
* per feedback, update scala header
* cleanup pom
* per feedback
* try fix lint
* fix lint
* per feedback, remove bootstrap_cache
* per feedback 2
* try replace dev profile with passing mvn property
* fix build error
* remove mvn property and replace with env setting to build test jar
* per feedback
* revert copyright headlines, point to dmlc/rabit
* revert python lint
* remove multiple failure test case as retry is not enabled in spark
* Update core.py
* Update core.py
* per feedback, style fix
2019-11-01 14:21:19 -07:00
Jiaming Yuan
010b8f1428
Revert "[jvm-packages] update rabit, surface new changes to spark, add parity and failure tests ( #4876 )" ( #4965 )
...
This reverts commit 86ed01c4bbecef66e1bc4d02fb13116bd6130fae.
2019-10-18 14:02:35 -07:00
Chen Qin
86ed01c4bb
[jvm-packages] update rabit, surface new changes to spark, add parity and failure tests ( #4876 )
...
* Expose sets of rabit configurations to spark layer
2019-10-18 15:07:31 -04:00
Liangcai Li
82ee2317e8
Add case for LongParam. ( #4885 )
...
To support specifying long parameter as String, the same as other basic
type, such as Int, Double ...
2019-09-25 05:41:53 -07:00
Nan Zhu
fc8c9b0521
[jvm-packages] enable deterministic repartitioning when checkpoint is enabled ( #4807 )
...
* do reparititoning in DataUtil
* keep previous behavior of partitioning without checkpoint
* deterministic repartitioning
* change
2019-09-19 15:21:05 -07:00
Xu Xiao
277e25797b
[jvm-packages] refine numAliveCores method of SparkParallelismTracker ( #4858 )
...
* refine numAliveCores
* refine XGBoostToMLlibParams
* fix waitForCondition
* resolve conflicts
* Update SparkParallelismTracker.scala
2019-09-19 15:18:29 -07:00
Nan Zhu
0184eb5d02
[jvm-packages] Refactor XGBoost.scala to put all params processing in one place ( #4815 )
...
* cleaning checkpoint file after a successful file
* address comments
* refactor xgboost.scala to avoid multiple changes when adding params
* consolidate params
* fix compilation issue
* fix failed test
* fix wrong name
* tyep conversion
2019-08-28 22:41:05 -07:00
Nan Zhu
7b5cbcc846
[jvm-packages] cleaning checkpoint file after a successful training ( #4754 )
...
* cleaning checkpoint file after a successful file
* address comments
2019-08-14 10:57:47 -07:00
Oleksandr Pryimak
b68de018b8
[jvm-packages] jvm test should clean up after themselfs ( #4706 )
2019-08-04 14:09:11 -07:00
Nan Zhu
1595e3f57b
upgrade version num ( #4670 )
...
* upgrade version num
* missign changes
* fix version script
* change versions
* rm files
* Update CMakeLists.txt
2019-07-17 15:25:35 -07:00
Nan Zhu
01b0c9047c
[jvm-packages] allowing chaining prediction ( #4667 )
...
* add test for chaining prediction
* update rabit
* Update XGBoostGeneralSuite.scala
2019-07-17 08:50:27 -07:00
koertkuipers
3c506b076e
[jvm-packages] upgrade to Scala 2.12 ( #4574 )
...
* bump scala to 2.12 which requires java 8 and also newer flink and akka
* put scala version in artifactId
* fix appveyor
* fix for scaladoc issue that looks like https://github.com/scala/bug/issues/10509
* fix ci_build
* update versions in generate_pom.py
* fix generate_pom.py
* apache does not have a download for spark 2.4.3 distro using scala 2.12 yet, so for now i use a tgz i put on s3
* Upload spark-2.4.3-bin-scala2.12-hadoop2.7.tgz to our own S3
* Update Dockerfile.jvm_cross
* Update Dockerfile.jvm_cross
2019-07-16 08:43:34 -07:00
Rong Ou
30204b50fe
fix spark tests on machines with many cores ( #4634 )
2019-07-07 16:02:56 -07:00
Philip Hyunsu Cho
d333918f5e
[jvm-packages] Expose setMissing method in XGBoostClassificationModel / XGBoostRegressionModel ( #4643 )
2019-07-07 16:02:44 -07:00
Nan Zhu
abffbe014e
[jvm-packages] delete all constraints from spark layer about obj and eval metrics and handle error in jvm layer ( #4560 )
...
* temp
* prediction part
* remove supported*
* add for test
* fix param name
* add rabit
* update rabit
* return value of rabit init
* eliminate compilation warnings
* update rabit
* shutdown
* update rabit again
* check sparkcontext shutdown
* fix logic
* sleep
* fix tests
* test with relaxed threshold
* create new thread each time
* stop for job quitting
* udpate rabit
* update rabit
* update rabit
* update git modules
2019-06-27 08:47:37 -07:00
Jiaming Yuan
2f1319f273
Add rmsle metric and reg:squaredlogerror objective ( #4541 )
2019-06-11 05:48:27 +08:00
Jiaming Yuan
0ce300e73a
[jvm-packages] Add back reg:linear for scala. ( #4490 )
...
* Add back reg:linear for scala.
* Fix linter.
2019-05-23 15:02:08 -07:00
Philip Hyunsu Cho
515f5f5c47
[RFC] Version 0.90 release candidate ( #4475 )
...
* Release 0.90
* Add script to automatically generate acknowledgment
* Update NEWS.md
2019-05-20 01:02:44 -07:00
Shaochen Shi
18e4fc3690
[jvm-packages] Automatically set maximize_evaluation_metrics if not explicitly given in XGBoost4J-Spark ( #4446 )
...
* Automatically set maximize_evaluation_metrics if not explicitly given.
* When custom_eval is set, require maximize_evaluation_metrics.
* Update documents on early stop in XGBoost4J-Spark.
* Fix code error.
2019-05-09 12:49:44 -07:00
Xu Xiao
797ba8e72d
[jvm-packages] fix compatibility problem of spark version ( #4411 )
...
* fix compatibility problem of spark version on MissingValueHandlingSuite.scala
* call setHandleInvalid by runtime reflection
2019-04-30 09:13:05 -07:00
Nan Zhu
253fdd8a42
[jvm-packages] fix the split of input ( #4417 )
2019-04-29 18:52:40 -07:00
Nan Zhu
37dc82c3ff
[jvm-packages] allow partial evaluation of dataframe before prediction ( #4407 )
...
* allow partial evaluation of dataframe before prediction
* resume spark test
* comments
* Run unit tests after building JVM packages
2019-04-26 21:02:40 -07:00
Nan Zhu
995698b0cb
[BREAKING][jvm-packages] fix the non-zero missing value handling ( #4349 )
...
* fix the nan and non-zero missing value handling
* fix nan handling part
* add missing value
* Update MissingValueHandlingSuite.scala
* Update MissingValueHandlingSuite.scala
* stylistic fix
2019-04-26 11:10:33 -07:00
Xu Xiao
2d875ec019
[BLOCKING][jvm-packages] fix non-deterministic order within a partition (in the case of an upstream shuffle) on prediction ( #4388 )
...
* [jvm-packages][hot-fix] fix column mismatch caused by zip actions at XGBooostModel.transformInternal
* apply minibatch in prediction
* an iterator-compatible minibatch prediction
* regressor impl
* continuous working on mini-batch prediction of xgboost4j-spark
* Update Booster.java
2019-04-26 11:09:20 -07:00
Nan Zhu
65db8d0626
[jvm-packages] support spark 2.4 and compatibility test with previous xgboost version ( #4377 )
...
* bump spark version
* keep float.nan
* handle brokenly changed name/value
* add test
* add model files
* add model files
* update doc
2019-04-17 11:33:13 -07:00
Nan Zhu
ad4de0d718
[jvm-packages] handle NaN as missing value explicitly ( #4309 )
...
* handle nan
* handle nan explicitly
* make code better and handle sparse vector in spark
* Update XGBoostGeneralSuite.scala
2019-03-30 19:34:26 +08:00
Nan Zhu
45c89a6792
[jvm-packages] logging version number ( #4271 )
...
* print version number
* add property file
2019-03-21 18:24:29 +08:00
Nan Zhu
359ed9c5bc
[jvm-packages] add configuration flag to control whether to cache transformed training set ( #4268 )
...
* control whether to cache data
* uncache
2019-03-18 10:13:28 +08:00
Jiaming Yuan
29a1356669
Deprecate reg:linear' in favor of reg:squarederror'. ( #4267 )
...
* Deprecate `reg:linear' in favor of `reg:squarederror'.
* Replace the use of `reg:linear'.
* Replace the use of `silent`.
2019-03-17 17:55:04 +08:00
Shaochen Shi
224786f67f
[xgboost4j-spark] Allow set the parameter "maxLeaves". ( #4226 )
...
* Allow set the parameter "maxLeaves".
* Add "setMaxLeaves" to XGBoostRegressor.
2019-03-07 18:36:47 -08:00
Nan Zhu
5f34078fba
[jvm-packages] bump version for master ( #4209 )
...
* update version
* bump version
2019-03-04 23:12:24 -08:00
Rong Ou
d506a8bc63
[jvm-packages] add verbosity param ( #4138 )
2019-02-13 20:57:17 -08:00
Nan Zhu
c18a3660fa
Separate Depthwidth and Lossguide growing policy in fast histogram ( #4102 )
...
* add back train method but mark as deprecated
* add back train method but mark as deprecated
* add back train method but mark as deprecated
* fix scalastyle error
* fix scalastyle error
* fix scalastyle error
* fix scalastyle error
* init
* more changes
* temp
* update
* udpate rabit
* change the histogram
* update kfactor
* sync per node stats
* temp
* update
* final
* code clean
* update rabit
* more cleanup
* fix errors
* fix failed tests
* enforce c++11
* broadcast subsampled feature correctly
* init col
* temp
* col sampling
* fix histmastrix init
* fix col sampling
* remove cout
* fix out of bound access
* fix core dump
remove core dump file
* disbale test temporarily
* update
* add fid
* print perf data
* update
* revert some changes
* temp
* temp
* pass all tests
* bring back some tests
* recover some changes
* fix lint issue
* enable monotone and interaction constraints
* don't specify default for monotone and interactions
* recover column init part
* more recovery
* fix core dumps
* code clean
* revert some changes
* fix test compilation issue
* fix lint issue
* resolve compilation issue
* fix issues of lint caused by rebase
* fix stylistic changes and change variable names
* use regtree internal function
* modularize depth width
* address the comments
* fix failed tests
* wrap perf timers with class
* fix lint
* fix num_leaves count
* fix indention
* Update src/tree/updater_quantile_hist.cc
Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com>
* Update src/tree/updater_quantile_hist.h
Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com>
* Update src/tree/updater_quantile_hist.cc
Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com>
* Update src/tree/updater_quantile_hist.cc
Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com>
* Update src/tree/updater_quantile_hist.cc
Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com>
* Update src/tree/updater_quantile_hist.h
Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com>
* merge
* fix compilation
2019-02-13 12:56:19 -08:00
Rong Ou
9b917cda4f
[jvm-packages] fix simple logic error :) ( #4128 )
...
@CodingCat
2019-02-11 21:47:30 -08:00
Nan Zhu
3320a52192
[jvm-packages] force use per-group weights in spark layer ( #4118 )
2019-02-10 05:38:03 +08:00
Rong Ou
2a9b085bc8
[jvm-packages] minor fix of params ( #4114 )
2019-02-08 00:21:59 -08:00
Nan Zhu
05243642bb
[jvm-packages] better fix for shutdown applications ( #4108 )
...
* intentionally failed task
* throw exception
* more
* stop sparkcontext directly
* stop from another thread
* new scope
* use a new thread
* daemon threads
* don't join the killer thread
* remove injected errors
* add comments
2019-02-07 09:02:17 -08:00
Nan Zhu
325b16bccd
[jvm-packages] fix return type of setEvalSets ( #4105 )
2019-02-06 11:00:29 -08:00
Nan Zhu
ae3bb9c2d5
Distributed Fast Histogram Algorithm ( #4011 )
...
* add back train method but mark as deprecated
* add back train method but mark as deprecated
* add back train method but mark as deprecated
* fix scalastyle error
* fix scalastyle error
* fix scalastyle error
* fix scalastyle error
* init
* allow hist algo
* more changes
* temp
* update
* remove hist sync
* udpate rabit
* change hist size
* change the histogram
* update kfactor
* sync per node stats
* temp
* update
* final
* code clean
* update rabit
* more cleanup
* fix errors
* fix failed tests
* enforce c++11
* fix lint issue
* broadcast subsampled feature correctly
* revert some changes
* fix lint issue
* enable monotone and interaction constraints
* don't specify default for monotone and interactions
* update docs
2019-02-05 05:12:53 -08:00
Nan Zhu
0d0ce32908
[jvm-packages] adding logs for parameters ( #4091 )
2019-01-30 21:50:55 -08:00
KyleLi1985
dade7c3aff
[jvm-packages] Performance consideration and Alignment input parameter of repartition function ( #4049 )
2019-01-07 08:38:05 -08:00
Nan Zhu
773ddbcfcb
[BLOCKING] fix the issue with infrequent feature ( #4045 )
...
* fix the issue with infrequent feature
* handle exception
* use only 2 workers
* address the comments
2019-01-06 16:01:03 -08:00
Nan Zhu
e290ec9a80
[jvm-packages] fix safe execution ( #4046 )
2019-01-05 19:45:37 -08:00
Nan Zhu
f368d0de2b
[jvm-packages] fix the scalability issue of prediction ( #4033 )
2018-12-29 20:46:30 -08:00
Nan Zhu
c055a32609
[jvm-packages]support multiple validation datasets in Spark ( #3910 )
...
* add back train method but mark as deprecated
* add back train method but mark as deprecated
* add back train method but mark as deprecated
* add back train method but mark as deprecated
* fix scalastyle error
* fix scalastyle error
* fix scalastyle error
* fix scalastyle error
* wrap iterators
* enable copartition training and validationset
* add parameters
* converge code path and have init unit test
* enable multi evals for ranking
* unit test and doc
* update example
* fix early stopping
* address the offline comments
* udpate doc
* test eval metrics
* fix compilation issue
* fix example
2018-12-17 21:03:57 -08:00
Huafeng Wang
42cac4a30b
[jvm-packages] Fix vector size of 'rawPredictionCol' in XGBoostClassificationModel ( #3932 )
...
* Fix vector size of 'rawPredictionCol' in XGBoostClassificationModel
* Fix UT
2018-11-23 21:09:43 -08:00