Chen Qin
b29b8c2f34
[jvm-packages] update rabit, surface new changes to spark, add parity and failure tests ( #4966 )
...
* [phase 1] expose sets of rabit configurations to spark layer
* add back mutable import
* disable ring_mincount till https://github.com/dmlc/rabit/pull/106d
* Revert "disable ring_mincount till https://github.com/dmlc/rabit/pull/106d "
This reverts commit 65e95a98e24f5eb53c6ba9ef9b2379524258984d.
* apply latest rabit
* fix build error
* apply https://github.com/dmlc/xgboost/pull/4880
* downgrade cmake in rabit
* point to rabit with DMLC_ROOT fix
* relative path of rabit install prefix
* split rabit parameters to another trait
* misc
* misc
* Delete .classpath
* Delete .classpath
* Delete .classpath
* Update XGBoostClassifier.scala
* Update XGBoostRegressor.scala
* Update GeneralParams.scala
* Update GeneralParams.scala
* Update GeneralParams.scala
* Update GeneralParams.scala
* Delete .classpath
* Update RabitParams.scala
* Update .gitignore
* Update .gitignore
* apply rabitParams to training
* use string as rabit parameter value type
* cleanup
* add rabitEnv check
* point to dmlc/rabit
* per feedback
* update private scope
* misc
* update rabit
* add rabit_timtout, fix failing test.
* split tests
* allow build jvm with rabit mock
* pass mock failures to rabit with test
* add mock error and graceful handle rabit assertion error test
* split mvn test
* remove sign for test
* update rabit
* build jvm_packages with rabit mock
* point back to dmlc/rabit
* per feedback, update scala header
* cleanup pom
* per feedback
* try fix lint
* fix lint
* per feedback, remove bootstrap_cache
* per feedback 2
* try replace dev profile with passing mvn property
* fix build error
* remove mvn property and replace with env setting to build test jar
* per feedback
* revert copyright headlines, point to dmlc/rabit
* revert python lint
* remove multiple failure test case as retry is not enabled in spark
* Update core.py
* Update core.py
* per feedback, style fix
2019-11-01 14:21:19 -07:00
Jiaming Yuan
010b8f1428
Revert "[jvm-packages] update rabit, surface new changes to spark, add parity and failure tests ( #4876 )" ( #4965 )
...
This reverts commit 86ed01c4bbecef66e1bc4d02fb13116bd6130fae.
2019-10-18 14:02:35 -07:00
Chen Qin
86ed01c4bb
[jvm-packages] update rabit, surface new changes to spark, add parity and failure tests ( #4876 )
...
* Expose sets of rabit configurations to spark layer
2019-10-18 15:07:31 -04:00
Liangcai Li
82ee2317e8
Add case for LongParam. ( #4885 )
...
To support specifying long parameter as String, the same as other basic
type, such as Int, Double ...
2019-09-25 05:41:53 -07:00
Nan Zhu
fc8c9b0521
[jvm-packages] enable deterministic repartitioning when checkpoint is enabled ( #4807 )
...
* do reparititoning in DataUtil
* keep previous behavior of partitioning without checkpoint
* deterministic repartitioning
* change
2019-09-19 15:21:05 -07:00
Xu Xiao
277e25797b
[jvm-packages] refine numAliveCores method of SparkParallelismTracker ( #4858 )
...
* refine numAliveCores
* refine XGBoostToMLlibParams
* fix waitForCondition
* resolve conflicts
* Update SparkParallelismTracker.scala
2019-09-19 15:18:29 -07:00
Nan Zhu
0184eb5d02
[jvm-packages] Refactor XGBoost.scala to put all params processing in one place ( #4815 )
...
* cleaning checkpoint file after a successful file
* address comments
* refactor xgboost.scala to avoid multiple changes when adding params
* consolidate params
* fix compilation issue
* fix failed test
* fix wrong name
* tyep conversion
2019-08-28 22:41:05 -07:00
Nan Zhu
7b5cbcc846
[jvm-packages] cleaning checkpoint file after a successful training ( #4754 )
...
* cleaning checkpoint file after a successful file
* address comments
2019-08-14 10:57:47 -07:00
Philip Hyunsu Cho
d333918f5e
[jvm-packages] Expose setMissing method in XGBoostClassificationModel / XGBoostRegressionModel ( #4643 )
2019-07-07 16:02:44 -07:00
Nan Zhu
abffbe014e
[jvm-packages] delete all constraints from spark layer about obj and eval metrics and handle error in jvm layer ( #4560 )
...
* temp
* prediction part
* remove supported*
* add for test
* fix param name
* add rabit
* update rabit
* return value of rabit init
* eliminate compilation warnings
* update rabit
* shutdown
* update rabit again
* check sparkcontext shutdown
* fix logic
* sleep
* fix tests
* test with relaxed threshold
* create new thread each time
* stop for job quitting
* udpate rabit
* update rabit
* update rabit
* update git modules
2019-06-27 08:47:37 -07:00
Jiaming Yuan
2f1319f273
Add rmsle metric and reg:squaredlogerror objective ( #4541 )
2019-06-11 05:48:27 +08:00
Jiaming Yuan
0ce300e73a
[jvm-packages] Add back reg:linear for scala. ( #4490 )
...
* Add back reg:linear for scala.
* Fix linter.
2019-05-23 15:02:08 -07:00
Shaochen Shi
18e4fc3690
[jvm-packages] Automatically set maximize_evaluation_metrics if not explicitly given in XGBoost4J-Spark ( #4446 )
...
* Automatically set maximize_evaluation_metrics if not explicitly given.
* When custom_eval is set, require maximize_evaluation_metrics.
* Update documents on early stop in XGBoost4J-Spark.
* Fix code error.
2019-05-09 12:49:44 -07:00
Nan Zhu
995698b0cb
[BREAKING][jvm-packages] fix the non-zero missing value handling ( #4349 )
...
* fix the nan and non-zero missing value handling
* fix nan handling part
* add missing value
* Update MissingValueHandlingSuite.scala
* Update MissingValueHandlingSuite.scala
* stylistic fix
2019-04-26 11:10:33 -07:00
Xu Xiao
2d875ec019
[BLOCKING][jvm-packages] fix non-deterministic order within a partition (in the case of an upstream shuffle) on prediction ( #4388 )
...
* [jvm-packages][hot-fix] fix column mismatch caused by zip actions at XGBooostModel.transformInternal
* apply minibatch in prediction
* an iterator-compatible minibatch prediction
* regressor impl
* continuous working on mini-batch prediction of xgboost4j-spark
* Update Booster.java
2019-04-26 11:09:20 -07:00
Nan Zhu
65db8d0626
[jvm-packages] support spark 2.4 and compatibility test with previous xgboost version ( #4377 )
...
* bump spark version
* keep float.nan
* handle brokenly changed name/value
* add test
* add model files
* add model files
* update doc
2019-04-17 11:33:13 -07:00
Nan Zhu
ad4de0d718
[jvm-packages] handle NaN as missing value explicitly ( #4309 )
...
* handle nan
* handle nan explicitly
* make code better and handle sparse vector in spark
* Update XGBoostGeneralSuite.scala
2019-03-30 19:34:26 +08:00
Nan Zhu
45c89a6792
[jvm-packages] logging version number ( #4271 )
...
* print version number
* add property file
2019-03-21 18:24:29 +08:00
Nan Zhu
359ed9c5bc
[jvm-packages] add configuration flag to control whether to cache transformed training set ( #4268 )
...
* control whether to cache data
* uncache
2019-03-18 10:13:28 +08:00
Jiaming Yuan
29a1356669
Deprecate reg:linear' in favor of reg:squarederror'. ( #4267 )
...
* Deprecate `reg:linear' in favor of `reg:squarederror'.
* Replace the use of `reg:linear'.
* Replace the use of `silent`.
2019-03-17 17:55:04 +08:00
Shaochen Shi
224786f67f
[xgboost4j-spark] Allow set the parameter "maxLeaves". ( #4226 )
...
* Allow set the parameter "maxLeaves".
* Add "setMaxLeaves" to XGBoostRegressor.
2019-03-07 18:36:47 -08:00
Rong Ou
d506a8bc63
[jvm-packages] add verbosity param ( #4138 )
2019-02-13 20:57:17 -08:00
Rong Ou
9b917cda4f
[jvm-packages] fix simple logic error :) ( #4128 )
...
@CodingCat
2019-02-11 21:47:30 -08:00
Nan Zhu
3320a52192
[jvm-packages] force use per-group weights in spark layer ( #4118 )
2019-02-10 05:38:03 +08:00
Rong Ou
2a9b085bc8
[jvm-packages] minor fix of params ( #4114 )
2019-02-08 00:21:59 -08:00
Nan Zhu
05243642bb
[jvm-packages] better fix for shutdown applications ( #4108 )
...
* intentionally failed task
* throw exception
* more
* stop sparkcontext directly
* stop from another thread
* new scope
* use a new thread
* daemon threads
* don't join the killer thread
* remove injected errors
* add comments
2019-02-07 09:02:17 -08:00
Nan Zhu
325b16bccd
[jvm-packages] fix return type of setEvalSets ( #4105 )
2019-02-06 11:00:29 -08:00
Nan Zhu
ae3bb9c2d5
Distributed Fast Histogram Algorithm ( #4011 )
...
* add back train method but mark as deprecated
* add back train method but mark as deprecated
* add back train method but mark as deprecated
* fix scalastyle error
* fix scalastyle error
* fix scalastyle error
* fix scalastyle error
* init
* allow hist algo
* more changes
* temp
* update
* remove hist sync
* udpate rabit
* change hist size
* change the histogram
* update kfactor
* sync per node stats
* temp
* update
* final
* code clean
* update rabit
* more cleanup
* fix errors
* fix failed tests
* enforce c++11
* fix lint issue
* broadcast subsampled feature correctly
* revert some changes
* fix lint issue
* enable monotone and interaction constraints
* don't specify default for monotone and interactions
* update docs
2019-02-05 05:12:53 -08:00
Nan Zhu
0d0ce32908
[jvm-packages] adding logs for parameters ( #4091 )
2019-01-30 21:50:55 -08:00
KyleLi1985
dade7c3aff
[jvm-packages] Performance consideration and Alignment input parameter of repartition function ( #4049 )
2019-01-07 08:38:05 -08:00
Nan Zhu
e290ec9a80
[jvm-packages] fix safe execution ( #4046 )
2019-01-05 19:45:37 -08:00
Nan Zhu
f368d0de2b
[jvm-packages] fix the scalability issue of prediction ( #4033 )
2018-12-29 20:46:30 -08:00
Nan Zhu
c055a32609
[jvm-packages]support multiple validation datasets in Spark ( #3910 )
...
* add back train method but mark as deprecated
* add back train method but mark as deprecated
* add back train method but mark as deprecated
* add back train method but mark as deprecated
* fix scalastyle error
* fix scalastyle error
* fix scalastyle error
* fix scalastyle error
* wrap iterators
* enable copartition training and validationset
* add parameters
* converge code path and have init unit test
* enable multi evals for ranking
* unit test and doc
* update example
* fix early stopping
* address the offline comments
* udpate doc
* test eval metrics
* fix compilation issue
* fix example
2018-12-17 21:03:57 -08:00
Huafeng Wang
42cac4a30b
[jvm-packages] Fix vector size of 'rawPredictionCol' in XGBoostClassificationModel ( #3932 )
...
* Fix vector size of 'rawPredictionCol' in XGBoostClassificationModel
* Fix UT
2018-11-23 21:09:43 -08:00
Nan Zhu
aa48b7e903
[jvm-packages][refactor] refactor XGBoost.scala (spark) ( #3904 )
...
* add back train method but mark as deprecated
* add back train method but mark as deprecated
* add back train method but mark as deprecated
* add back train method but mark as deprecated
* fix scalastyle error
* fix scalastyle error
* fix scalastyle error
* fix scalastyle error
* wrap iterators
* remove unused code
* refactor
* fix typo
2018-11-15 20:38:28 -08:00
Nan Zhu
4ae225a08d
[Blocking][jvm-packages] fix the early stopping feature ( #3808 )
...
* add back train method but mark as deprecated
* add back train method but mark as deprecated
* add back train method but mark as deprecated
* add back train method but mark as deprecated
* fix scalastyle error
* fix scalastyle error
* fix scalastyle error
* fix scalastyle error
* temp
* add method for classifier and regressor
* update tutorial
* address the comments
* update
2018-10-23 14:53:13 -07:00
weitian
9504f411c1
[jvm-packages] For training data with group, empty RDD partition threw exception ( #3749 ) ( #3750 )
2018-10-09 09:03:22 -07:00
Nan Zhu
785094db53
[jvm-packages] fix issue when spark job execution thread cannot return before we execute first() ( #3758 )
...
* add back train method but mark as deprecated
* add back train method but mark as deprecated
* add back train method but mark as deprecated
* add back train method but mark as deprecated
* fix scalastyle error
* fix scalastyle error
* fix scalastyle error
* fix scalastyle error
* sparjJobThread
* update
* fix issue when spark job execution thread cannot return before we execute first()
2018-10-05 22:20:50 -07:00
weitian
efc4f85505
[jvm-packages] Fix #3489 : Spark repartitionForData can potentially shuffle all data and lose ordering required for ranking objectives ( #3654 )
2018-10-03 08:43:55 -07:00
Sergei Lebedev
87aca8c244
[jvm-packages] Fixed the distributed updater check ( #3739 )
...
The updater used in distributed training is grow_histmaker and not
grow_colmaker as the error message stated prior to this commit.
2018-10-01 11:22:01 -07:00
Michael Mui
20a9e716bd
[jvm-packages] Fix "obj_type" error to enable custom objectives and evaluations ( #3646 )
...
credits to @mmui
2018-09-14 12:06:33 -07:00
Jerry Lin
9acd549dc7
[jvm-packages] Add rank:ndcg and rank:map to Spark supported objectives ( #3697 )
2018-09-13 09:51:24 -07:00
Joseph Bradley
14a8b96476
[jvm-packages] xgboost-spark warning when Spark encryption is turned on ( #3667 )
...
* added test, commented out right now
* reinstated test
* added fix for checking encryption settings
* fix by using RDD conf
* fix compilation
* renamed conf
* use SparkSession if available
* fix message
* nop
* code review fixes
2018-09-10 14:21:01 -07:00
Matthew Tovbin
beab6e08dd
Remove println in jsonDecode ( #3665 )
...
Following issue #3578
2018-09-07 15:47:26 -07:00
Nan Zhu
3261002099
[jvm-packages] throw ControlThrowable instead of InterruptedException ( #3632 )
...
* add back train method but mark as deprecated
* add back train method but mark as deprecated
* fix scalastyle error
* fix scalastyle error
* interrupted exception is not rethrown
2018-08-25 20:30:21 -07:00
Nan Zhu
4912c1f9c6
[jvm-packages] fix checkpoint save/load ( #3614 )
...
* add back train method but mark as deprecated
* add back train method but mark as deprecated
* fix scalastyle error
* fix scalastyle error
* fix update checkpoint func
2018-08-21 12:34:24 -07:00
Matthew Tovbin
b53a5a262c
[jvm-packages] getTreeLimit return type should be Int
2018-08-17 09:36:00 -07:00
Matthew Tovbin
2b7a1c5780
[jvm-packages] Avoid loosing precision when computing probabilities by converting to Double early ( #3576 )
2018-08-13 14:05:07 -07:00
Matthew Tovbin
ce0f0568a6
Make sure 'thresholds' are considered when executing predict method ( #3577 )
2018-08-13 14:04:47 -07:00
Mathew
06ef4db4cc
Fix Spark 2.2 Support (Amending #3062 ) ( #3325 )
...
This pull request amends the broken #3062 allow Spark 2.2 to work.
Please note this won't work in Spark <=2.1 as sc.removeSparkListener was implemented in Spark 2.2. (So perhaps a more general method is better, although that is what was attempted in #3062 )
This PR fixes : #3208 , #3151 and the discussion in #1927 .
I do find it strange that #3062 dose not work in Spark 2.2, it's probably due to some sort of public/private issue in the org.apache.spark.scheduler.LiveListenerBus class inheritance (In Spark itself). The error is: `java.lang.NoSuchMethodError: org.apache.spark.scheduler.LiveListenerBus.removeListener(Ljava/lang/Object;)V`
2018-08-12 18:35:20 -07:00