387 Commits

Author SHA1 Message Date
Bobby Wang
24e25802a7
[jvm-packages] Add Rapids plugin support (#7491)
* Add GPU pre-processing pipeline.
2021-12-17 13:11:12 +08:00
Bobby Wang
24be04e848
[jvm-packages] Add DeviceQuantileDMatrix to Scala binding (#7459) 2021-11-24 20:23:18 +08:00
Bobby Wang
7cfb310eb4
Rework transform (#7440)
extract the common part of transform code from XGBoostClassifier
and XGBoostRegressor
2021-11-18 15:48:57 +08:00
Jiaming Yuan
55ee272ea8
Extend array interface to handle ndarray. (#7434)
* Extend array interface to handle ndarray.

The `ArrayInterface` class is extended to support multi-dim array inputs. Previously this
class handles only 2-dim (vector is also matrix).  This PR specifies the expected
dimension at compile-time and the array interface can perform various checks automatically
for input data. Also, adapters like CSR are more rigorous about their input.  Lastly, row
vector and column vector are handled without intervention from the caller.
2021-11-16 09:52:15 +08:00
Bobby Wang
cb685607b2
[jvm-packages] Rework the train pipeline (#7401)
1. Add PreXGBoost to build RDD[Watches] from Dataset
2. Feed RDD[Watches] built from PreXGBoost to XGBoost to train
2021-11-10 17:51:38 +08:00
Bobby Wang
b81ebbef62
[jvm-packages] Fix json4s binary compatibility issue (#7376)
Spark 3.2 depends on 3.7.0-M11 which has changed some implicited functions'
signatures. And it will result the xgboost4j built against spark 3.0/3.1
failed when saving the model.
2021-10-30 03:20:57 +08:00
nicovdijk
a6bcd54b47
[jvm-packages] Fix for space in sys.executable path in create_jni.py (#7358) 2021-10-25 13:45:11 +08:00
nicovdijk
31a307cf6b
[XGBoost4J-Spark] Serialization for custom objective and eval (#7274)
* added type hints to custom_obj and custom_eval for Spark persistence


Co-authored-by: Bobby Wang <wbo4958@gmail.com>
2021-10-21 16:22:23 +08:00
nicovdijk
74bab6e504
Control logging for early stopping using shouldPrint() (#7326) 2021-10-21 12:12:06 +08:00
Bobby Wang
4fd149b3a2
[jvm-packages] update checkstyle (#7335)
* [jvm-packages] update scalastyle

1. bump scalastyle-maven-plugin and maven-checkstyle-plugin to latest
2. remove unused imports

* fix code style check
2021-10-18 18:42:01 +08:00
Jiaming Yuan
f7caac2563
Bump version to 1.6.0 in master. (#7259) 2021-10-07 16:09:26 +08:00
Jiaming Yuan
fbd58bf190
[jvm-packages] Create demo and test for xgboost4j early stopping. (#7252) 2021-09-25 03:29:27 +08:00
Bobby Wang
0ee11dac77
[jvm-packages][xgboost4j-gpu] Support GPU dataframe and DeviceQuantileDMatrix (#7195)
Following classes are added to support dataframe in java binding:

- `Column` is an abstract type for a single column in tabular data.
- `ColumnBatch` is an abstract type for dataframe.

- `CuDFColumn` is an implementaiton of `Column` that consume cuDF column
- `CudfColumnBatch` is an implementation of `ColumnBatch` that consumes cuDF dataframe.

- `DeviceQuantileDMatrix` is the interface for quantized data.

The Java implementation mimics the Python interface and uses `__cuda_array_interface__` protocol for memory indexing.  One difference is on JVM package, the data batch is staged on the host as java iterators cannot be reset.

Co-authored-by: jiamingy <jm.yuan@outlook.com>
2021-09-24 14:25:00 +08:00
Jiaming Yuan
9f63d6fead
[jvm-packages] Deprecate constructors with implicit missing value. (#7225) 2021-09-17 04:35:04 +08:00
Martin Petříček
46c46829ce
Fix model loading from stream (#7067)
Fix bug introduced in 17913713b554d820a8ce94226d854b4a5f1d8bbc (allow loading from byte array)

When loading model from stream, only last buffer read from the input stream is used to construct the model.

This may work for models smaller than 1 MiB (if you are lucky enough to read the whole model at once), but will always fail if the model is larger.
2021-08-15 21:04:33 +08:00
Jiaming Yuan
7017dd5a26
[JVM-Packages] Use Python tracker in XGBoost for JVM package. (#7132) 2021-07-27 16:20:42 +08:00
naveenkb
9f7f8b976d
[XGBoost4J-Spark] bestIteration and bestScore for early stopping (#7095) 2021-07-19 18:46:49 +08:00
Jiaming Yuan
663136aa08
Implement feature score for linear model. (#7048)
* Add feature score support for linear model.
* Port R interface to the new implementation.
* Add linear model support in Python.

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2021-06-25 14:34:02 +08:00
ShvetsKS
57c732655e
Merge lossgude and depthwise strategies for CPU hist (#7007)
* fix java/scala test: max depth is also valid parameter for lossguide

Co-authored-by: Kirill Shvets <kirill.shvets@intel.com>
2021-06-03 01:49:43 +08:00
Adam Pocock
2320aa0da2
Making the Java library loader emit helpful error messages on missing dependencies. (#6926) 2021-05-19 14:53:56 +08:00
Andrew Ziem
3e7e426b36
Fix spelling in documents (#6948)
* Update roxygen2 doc.

Co-authored-by: fis <jm.yuan@outlook.com>
2021-05-11 20:44:36 +08:00
Philip Hyunsu Cho
ec6ce08cd0
[jvm-packages] Make it easier to release GPU/CPU code artifacts to Maven Central (#6940) 2021-05-04 14:00:03 -07:00
Jiaming Yuan
74b41637de
Revert "[jvm-packages] Add XGBOOST_RABIT_TRACKER_IP_FOR_TEST to set rabit tracker IP. (#6869)" (#6886)
This reverts commit 2828da3c4c951baa45d1bb6f85c7b3a6657cd607.
2021-04-21 11:20:10 -07:00
Bobby Wang
2828da3c4c
[jvm-packages] Add XGBOOST_RABIT_TRACKER_IP_FOR_TEST to set rabit tracker IP. (#6869)
* Add `XGBOOST_RABIT_TRACKER_IP_FOR_TEST` to set rabit tracker IP

* change spark and rabit tracker IP to 127.0.0.1on GitHub Action.

Co-authored-by: fis <jm.yuan@outlook.com>
2021-04-22 02:00:22 +08:00
Jiaming Yuan
146549260a
Bump version to 1.5.0 snapshot in master. (#6875) 2021-04-22 01:53:44 +08:00
Bobby Wang
2c684ffd32
[jvm-packages] fix "key not found: train" issue (#6842)
* [jvm-packages] fix "key not found: train" issue

* fix bug
2021-04-18 23:28:39 -07:00
Viktor Szathmáry
b65e3c4444
[jvm] reduce scala-compiler, scalatest dependency scopes (#6730)
* [jvm] reduce scala-compiler, scalatest dependency scopes

* [jvm] workaround for GpuTestSuite scalatest dependency

* scalatest scope tweak
2021-04-07 15:22:08 -07:00
Bobby Wang
49c22c23b4
[jvm-packages] fix early stopping doesn't work even without custom_eval setting (#6738)
* [jvm-packages] fix early stopping doesn't work even without custom_eval setting

* remove debug info

* resolve comment
2021-03-06 20:19:40 -08:00
Honza Sterba
17913713b5
[jvm] Add ability to load booster direct from byte array (#6655)
* Add ability to load booster direct from byte array

* fix compiler error

* move InputStream to byte-buffer conversion

- move it from Booster to XGBoost facade class
2021-02-23 11:28:27 -08:00
Adam Pocock
fec66d033a
[jvm-packages] JVM library loader extensions (#6630)
* [java] extending the library loader to use both OS and CPU architecture.

* Simplifying create_jni.py's architecture detection.

* Tidying up the architecture detection in create_jni.py
2021-01-25 15:51:39 +08:00
Bobby Wang
9d2832a3a3
fix potential TaskFailedListener's callback won't be called (#6612)
there is possibility that onJobStart of TaskFailedListener won't be called, if
the job is submitted before the other thread adds addSparkListener.

detail can be found at https://github.com/dmlc/xgboost/pull/6019#issuecomment-760937628
2021-01-21 14:20:32 +08:00
Philip Hyunsu Cho
0d483cb7c1
Bump version to 1.4.0 snapshot in master (#6486) 2020-12-10 07:38:08 -08:00
zhang_jf
cc581b3b6b
Misleading exception information: no such param of "allow_non_zero_missing" (#6418) 2020-11-20 19:33:34 +08:00
Nan Zhu
4d1d5d4010
[jvm-packages] fix potential unit test suites aborted issue (#6373)
* fix race conditio

* code cleaning

rm pom.xml-e

* clean again

* fix compilation issue

* recover

* avoid using getOrCreate

* interrupt zombie threads

* safe guard

* fix deadlock

* Update SparkParallelismTracker.scala
2020-11-17 10:59:26 -08:00
Jiaming Yuan
dfac5f89e9
Group CLI demo into subdirectory. (#6258)
CLI is not most developed interface. Putting them into correct directory can help new users to avoid it as most of the use cases are from a language binding.
2020-10-28 14:40:44 -07:00
Jiaming Yuan
b180223d18
Cleanup RABIT. (#6290)
* Remove recovery and MPI speed tests.
* Remove readme.
* Remove Python binding.
* Add checks in C API.
2020-10-27 08:48:22 +08:00
Jiaming Yuan
d61b628bf5
Remove RABIT CMake targets. (#6275)
* Now it's built as part of libxgboost.
* Set correct C API error in RABIT initialization and finalization.
* Remove redundant message.
* Guard the tracker print C API.
2020-10-27 01:30:20 +08:00
Jiaming Yuan
b5c2a47b20
Drop single point model recovery (#6262)
* Pass rabit params in JVM package.
* Implement timeout using poll timeout parameter.
* Remove OOB data check.
2020-10-21 15:27:03 +08:00
dependabot[bot]
06e453ddf4
Bump junit from 4.11 to 4.13.1 in /jvm-packages/xgboost4j (#6230)
Bumps [junit](https://github.com/junit-team/junit4) from 4.11 to 4.13.1.
- [Release notes](https://github.com/junit-team/junit4/releases)
- [Changelog](https://github.com/junit-team/junit4/blob/main/doc/ReleaseNotes4.11.md)
- [Commits](https://github.com/junit-team/junit4/compare/r4.11...r4.13.1)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2020-10-13 19:46:19 -07:00
dependabot[bot]
b51a717deb
Bump junit from 4.11 to 4.13.1 in /jvm-packages/xgboost4j-gpu (#6233)
Bumps [junit](https://github.com/junit-team/junit4) from 4.11 to 4.13.1.
- [Release notes](https://github.com/junit-team/junit4/releases)
- [Changelog](https://github.com/junit-team/junit4/blob/main/doc/ReleaseNotes4.11.md)
- [Commits](https://github.com/junit-team/junit4/compare/r4.11...r4.13.1)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2020-10-13 19:44:56 -07:00
Philip Hyunsu Cho
c991eb612d
[jvm-packages] Fix up build for xgboost4j-gpu, xgboost4j-spark-gpu (#6216)
* [CI] Clean up build for JVM packages

* Use correct path for saving native lib

* Fix groupId of maven-surefire-plugin

* Fix stashing of xgboost4j_jar_gpu

* [CI] Don't run xgboost4j-tester with GPU, since it doesn't use gpu_hist
2020-10-09 14:08:15 -07:00
Christian Lorentzen
cf4f019ed6
[Breaking] Change default evaluation metric for classification to logloss / mlogloss (#6183)
* Change DefaultEvalMetric of classification from error to logloss

* Change default binary metric in plugin/example/custom_obj.cc

* Set old error metric in python tests

* Set old error metric in R tests

* Fix missed eval metrics and typos in R tests

* Fix setting eval_metric twice in R tests

* Add warning for empty eval_metric for classification

* Fix Dask tests

Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-10-02 12:06:47 -07:00
Nan Zhu
c932fb50a1
[jvm-packages]add xgboost4j-gpu/xgboost4j-spark-gpu module to facilitate release (#6136)
* add xgboost4j-gpu/xgboost4j-spark-gpu module to facilitate release

* Update pom.xml
2020-09-20 09:20:38 -07:00
Philip Hyunsu Cho
33577ef5d3
Add MAPE metric (#6119) 2020-09-14 18:45:27 -07:00
Hristo Iliev
da61d9460b
[jvm-packages] Add getNumFeature method (#6075)
* Add getNumFeature to the Java API
* Add getNumFeature to the Scala API
* Add unit tests for getNumFeature

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2020-09-07 20:57:46 -07:00
Bobby Wang
0e2d5669f6
[jvm-packages] cancel job instead of killing SparkContext (#6019)
* cancel job instead of killing SparkContext

This PR changes the default behavior that kills SparkContext. Instead, This PR
cancels jobs when coming across task failed. That means the SparkContext is
still alive even some exceptions happen.

* add a parameter to control if killing SparkContext

* cancel the jobs the failed task belongs to

* remove the jobId from the map when one job failed.

* resolve comments
2020-09-02 14:20:59 -07:00
Anthony D'Amato
ada964f16e
Clean the way deterministic paritioning is computed (#6033)
We propose to only use the rowHashCode to compute the partitionKey, adding the FeatureValue hashCode does not bring more value and would make the computation slower. Even though a collision would appear at 0.2% with MurmurHash3 this is bearable for partitioning, this won't have any impact on the data balancing.
2020-08-30 14:38:23 -07:00
FelixYBW
3a990433f9
set maxBins to 256. Align with c code in src/tree/param.h (#6066) 2020-08-28 15:06:11 +03:00
Philip Hyunsu Cho
9c14e430af
[CI] Improve JVM test in GitHub Actions (#5930)
* [CI] Improve JVM test in GitHub Actions

* Use env var for Wagon options [skip ci]

* Move the retry flag to pom.xml [skip ci]

* Export env var RABIT_MOCK to run Spark tests [skip ci]

* Correct location of env var

* Re-try up to 5 times [skip ci]

* Don't run distributed training test on Windows

* Fix typo

* Update main.yml
2020-08-25 10:14:46 -07:00
Philip Hyunsu Cho
b3193052b3
Bump version to 1.3.0 snapshot in master (#6052) 2020-08-23 17:13:46 -07:00