* [R] Fix empty empty tests and a test warnings
* [R] Remove stringi dependency (fix#5905)
* Fix R lint check
* [R] Fix automatic conversion to factor in R < 4.0.0 in xgb.model.dt.tree
* Add `R` Makefile variable
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
* Workaround a compiler bug in MacOS AppleClang
* [CI] Run C++ test with MacOS Catalina + AppleClang 11.0.3
* [CI] Migrate cmake_test on MacOS from Travis CI to GitHub Actions
* Install OpenMP runtime
* [CI] Use CMake to locate lz4 lib
* Add getNumFeature to the Java API
* Add getNumFeature to the Scala API
* Add unit tests for getNumFeature
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
* Fix CMake build with BUILD_STATIC_LIB option
* Disable BUILD_STATIC_LIB option when R/JVM pkg is enabled
* Add objxgboost to install target only when BUILD_STATIC_LIB=ON
* cancel job instead of killing SparkContext
This PR changes the default behavior that kills SparkContext. Instead, This PR
cancels jobs when coming across task failed. That means the SparkContext is
still alive even some exceptions happen.
* add a parameter to control if killing SparkContext
* cancel the jobs the failed task belongs to
* remove the jobId from the map when one job failed.
* resolve comments
We propose to only use the rowHashCode to compute the partitionKey, adding the FeatureValue hashCode does not bring more value and would make the computation slower. Even though a collision would appear at 0.2% with MurmurHash3 this is bearable for partitioning, this won't have any impact on the data balancing.
* Modin DF support
* mode change
* tests were added, ci env was extended
* mode change
* Remove redundant installation of modin
* Add a pytest skip marker for modin
* Install Modin[ray] from PyPI
* fix interfering
* avoid extra conversion
* delete cv test for modin
* revert cv function
Co-authored-by: ShvetsKS <kirill.shvets@intel.com>
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
* Update GPUTreeShap
* Update src/CMakeLists.txt
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
* [CI] Improve JVM test in GitHub Actions
* Use env var for Wagon options [skip ci]
* Move the retry flag to pom.xml [skip ci]
* Export env var RABIT_MOCK to run Spark tests [skip ci]
* Correct location of env var
* Re-try up to 5 times [skip ci]
* Don't run distributed training test on Windows
* Fix typo
* Update main.yml
* Fix a unit test on CLI, to handle RC versions
* [CI] Use mgpu machine to run gpu hist unit tests
* [CI] Build GPU-enabled JAR artifact and deploy to xgboost-maven-repo
* [CI] Move lint to GitHub Actions
* [CI] Move Doxygen to GitHub Actions
* [CI] Move Sphinx build test to GitHub Actions
* [CI] Reduce workload for Windows R tests
* [CI] Move clang-tidy to Build stage
The functions featureValueOfSparseVector or featureValueOfDenseVector could return a Float.NaN if the input vectore was containing any missing values. This would make fail the partition key computation and most of the vectors would end up in the same partition. We fix this by avoid returning a NaN and simply use the row HashCode in this case.
We added a test to ensure that the repartition is indeed now uniform on input dataset containing values by checking that the partitions size variance is below a certain threshold.
Signed-off-by: Anthony D'Amato <anthony.damato@hotmail.fr>