567 Commits

Author SHA1 Message Date
Bobby Wang
6275cdc486
[jvm-packages] add format option when saving a model (#7940) 2022-05-30 15:49:59 +08:00
Bobby Wang
fbc3d861bb
[jvm-packages] remove default parameters (#7938) 2022-05-28 10:31:19 +08:00
Daniel Clausen
755d9d4609
[JVM-Packages] Auto-detection of MUSL is replaced by system properties (#7921)
This PR removes auto-detection of MUSL-based Linux systems in favor of system properties the user can set to configure a specific path for a native library.
2022-05-26 10:53:15 +08:00
Bobby Wang
5ef33adf68
[jvm-packges] set the correct objective if user doesn't explicitly set it (#7781) 2022-05-18 14:05:18 +08:00
Bobby Wang
b41cf92dc2
[jvm-packages] move dmatrix building into rabit context for cpu pipeline (#7908) 2022-05-17 14:52:25 +08:00
Bobby Wang
11e46e4bc0
[Breaking][jvm-packages] make classification model be xgboost-compatible (#7896) 2022-05-14 15:43:05 +08:00
Bobby Wang
9fa7ed1743
[Breaking][jvm-packages] remove timeoutRequestWorkers parameter (#7839) 2022-05-13 16:26:25 +08:00
Michael Allman
f7db16add1
Ignore all Java exceptions when looking for Linux musl support (#7844) 2022-04-28 15:44:30 +08:00
Bobby Wang
a94e1b172e
[jvm-packages] Fix model compatibility (#7845) 2022-04-28 02:05:38 +08:00
Bobby Wang
686caad40c
[jvm-package] remove the coalesce in barrier mode (#7846) 2022-04-27 23:34:22 +08:00
Bobby Wang
dc2e699656
[Breaking][jvm-packages] Use barrier execution mode (#7836)
With the introduction of the barrier execution mode. we don't need to kill SparkContext when some xgboost tasks failed. Instead, Spark will handle the errors for us. So in this PR, `killSparkContextOnWorkerFailure` parameter is deleted.
2022-04-25 17:09:52 +08:00
Bobby Wang
c45665a55a
[jvm-packages] move the dmatrix building into rabit context (#7823)
This fixes the QuantileDeviceDMatrix in distributed environment.
2022-04-23 00:06:50 +08:00
Bobby Wang
2d83b2ad8f
[jvm-packages] add hostIp and python exec for rabit tracker (#7808) 2022-04-15 16:28:43 +08:00
dependabot[bot]
1bb1913811
Bump hadoop-common from 2.10.1 to 3.2.3 in /jvm-packages/xgboost4j-flink (#7801)
Bumps hadoop-common from 2.10.1 to 3.2.3.

---
updated-dependencies:
- dependency-name: org.apache.hadoop:hadoop-common
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-04-13 22:24:44 +08:00
Bobby Wang
3f536b5308
[jvm-packages] fix evaluation when featuresCols is used (#7798) 2022-04-13 12:52:50 +08:00
Bobby Wang
118192f116
[jvm-packages] xgboost4j-spark should work when featuresCols is specified (#7789) 2022-04-08 13:21:04 +08:00
Bobby Wang
729d227b89
[jvm-packages] remove the dep of com.fasterxml.jackson (#7791) 2022-04-08 13:04:34 +08:00
Bobby Wang
2454407f3a
[jvm-packages] unify setFeaturesCol API for XGBoostRegressor (#7784) 2022-04-05 13:35:33 +08:00
Jiaming Yuan
522636cb52
Bump version. (#7769) 2022-03-31 06:33:22 +08:00
Oleksandr Pryimak
f5b20286e2
[jvm-packages] Launch dev jvm image under my user (#4676)
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2022-03-23 10:39:51 -07:00
Aging
f20ffa8db3
Update JVM dev build Dockerfile and shell script (#6792)
Co-authored-by: Zhuo Yuzhen <yuzhuo@paypal.com>
2022-03-22 16:39:10 -07:00
Daniel Clausen
4dafb5fac8
[JVM-Packages] Add support for detecting musl-based Linux (#7624)
Co-authored-by: Marc Philipp <marc@gradle.com>
2022-03-14 00:37:27 +08:00
Bobby Wang
89aa8ddf52
[jvm-packages] fix the prediction issue for multi:softmax (#7694) 2022-02-24 01:09:45 +08:00
Bobby Wang
e3e6de5ed9
[jvm-packages] unify the set features API (#7692)
xgboost4j-spark provides 2 sets of API for setting features, one for CPU, another for GPU, which may cause confusion.

This PR removes the GPU API and adds an override CPU function setFeaturesCol to accept Array[String] parameters.
2022-02-23 03:37:25 +08:00
Bobby Wang
131858e7cb
[jvm-packages] Do not repartition when nWorker = 1 (#7676) 2022-02-19 21:45:54 +08:00
dependabot[bot]
87c01f49d8
Bump hadoop-common from 2.7.3 to 2.10.1 in /jvm-packages/xgboost4j-flink (#7641)
Bumps hadoop-common from 2.7.3 to 2.10.1.

---
updated-dependencies:
- dependency-name: org.apache.hadoop:hadoop-common
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-02-09 17:07:35 -08:00
Jiaming Yuan
ac7a36367c
[jvm-packages] Implement new save_raw in jvm-packages. (#7570)
* New `toByteArray` that accepts a parameter for format.
2022-01-19 16:00:14 +08:00
Jiaming Yuan
001503186c
Rewrite approx (#7214)
This PR rewrites the approx tree method to use codebase from hist for better performance and code sharing.

The rewrite has many benefits:
- Support for both `max_leaves` and `max_depth`.
- Support for `grow_policy`.
- Support for mono constraint.
- Support for feature weights.
- Support for easier bin configuration (`max_bin`).
- Support for categorical data.
- Faster performance for most of the datasets. (many times faster)
- Support for prediction cache.
- Significantly better performance for external memory.
- Unites the code base between approx and hist.
2022-01-10 21:15:05 +08:00
Jiaming Yuan
ed95e77752
[jvm-packages] Update JNI header. (#7550) 2022-01-10 14:59:40 +08:00
Bobby Wang
e8c1eb99e4
[jvm-package] Clean up the legacy gpu support tests (#7523) 2021-12-21 09:15:51 +08:00
Bobby Wang
24e25802a7
[jvm-packages] Add Rapids plugin support (#7491)
* Add GPU pre-processing pipeline.
2021-12-17 13:11:12 +08:00
Bobby Wang
24be04e848
[jvm-packages] Add DeviceQuantileDMatrix to Scala binding (#7459) 2021-11-24 20:23:18 +08:00
Bobby Wang
7cfb310eb4
Rework transform (#7440)
extract the common part of transform code from XGBoostClassifier
and XGBoostRegressor
2021-11-18 15:48:57 +08:00
Jiaming Yuan
55ee272ea8
Extend array interface to handle ndarray. (#7434)
* Extend array interface to handle ndarray.

The `ArrayInterface` class is extended to support multi-dim array inputs. Previously this
class handles only 2-dim (vector is also matrix).  This PR specifies the expected
dimension at compile-time and the array interface can perform various checks automatically
for input data. Also, adapters like CSR are more rigorous about their input.  Lastly, row
vector and column vector are handled without intervention from the caller.
2021-11-16 09:52:15 +08:00
Bobby Wang
cb685607b2
[jvm-packages] Rework the train pipeline (#7401)
1. Add PreXGBoost to build RDD[Watches] from Dataset
2. Feed RDD[Watches] built from PreXGBoost to XGBoost to train
2021-11-10 17:51:38 +08:00
Bobby Wang
b81ebbef62
[jvm-packages] Fix json4s binary compatibility issue (#7376)
Spark 3.2 depends on 3.7.0-M11 which has changed some implicited functions'
signatures. And it will result the xgboost4j built against spark 3.0/3.1
failed when saving the model.
2021-10-30 03:20:57 +08:00
nicovdijk
a6bcd54b47
[jvm-packages] Fix for space in sys.executable path in create_jni.py (#7358) 2021-10-25 13:45:11 +08:00
nicovdijk
31a307cf6b
[XGBoost4J-Spark] Serialization for custom objective and eval (#7274)
* added type hints to custom_obj and custom_eval for Spark persistence


Co-authored-by: Bobby Wang <wbo4958@gmail.com>
2021-10-21 16:22:23 +08:00
nicovdijk
74bab6e504
Control logging for early stopping using shouldPrint() (#7326) 2021-10-21 12:12:06 +08:00
Bobby Wang
4fd149b3a2
[jvm-packages] update checkstyle (#7335)
* [jvm-packages] update scalastyle

1. bump scalastyle-maven-plugin and maven-checkstyle-plugin to latest
2. remove unused imports

* fix code style check
2021-10-18 18:42:01 +08:00
Jiaming Yuan
f7caac2563
Bump version to 1.6.0 in master. (#7259) 2021-10-07 16:09:26 +08:00
Jiaming Yuan
fbd58bf190
[jvm-packages] Create demo and test for xgboost4j early stopping. (#7252) 2021-09-25 03:29:27 +08:00
Bobby Wang
0ee11dac77
[jvm-packages][xgboost4j-gpu] Support GPU dataframe and DeviceQuantileDMatrix (#7195)
Following classes are added to support dataframe in java binding:

- `Column` is an abstract type for a single column in tabular data.
- `ColumnBatch` is an abstract type for dataframe.

- `CuDFColumn` is an implementaiton of `Column` that consume cuDF column
- `CudfColumnBatch` is an implementation of `ColumnBatch` that consumes cuDF dataframe.

- `DeviceQuantileDMatrix` is the interface for quantized data.

The Java implementation mimics the Python interface and uses `__cuda_array_interface__` protocol for memory indexing.  One difference is on JVM package, the data batch is staged on the host as java iterators cannot be reset.

Co-authored-by: jiamingy <jm.yuan@outlook.com>
2021-09-24 14:25:00 +08:00
Jiaming Yuan
9f63d6fead
[jvm-packages] Deprecate constructors with implicit missing value. (#7225) 2021-09-17 04:35:04 +08:00
Martin Petříček
46c46829ce
Fix model loading from stream (#7067)
Fix bug introduced in 17913713b554d820a8ce94226d854b4a5f1d8bbc (allow loading from byte array)

When loading model from stream, only last buffer read from the input stream is used to construct the model.

This may work for models smaller than 1 MiB (if you are lucky enough to read the whole model at once), but will always fail if the model is larger.
2021-08-15 21:04:33 +08:00
Jiaming Yuan
7017dd5a26
[JVM-Packages] Use Python tracker in XGBoost for JVM package. (#7132) 2021-07-27 16:20:42 +08:00
naveenkb
9f7f8b976d
[XGBoost4J-Spark] bestIteration and bestScore for early stopping (#7095) 2021-07-19 18:46:49 +08:00
Jiaming Yuan
663136aa08
Implement feature score for linear model. (#7048)
* Add feature score support for linear model.
* Port R interface to the new implementation.
* Add linear model support in Python.

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2021-06-25 14:34:02 +08:00
ShvetsKS
57c732655e
Merge lossgude and depthwise strategies for CPU hist (#7007)
* fix java/scala test: max depth is also valid parameter for lossguide

Co-authored-by: Kirill Shvets <kirill.shvets@intel.com>
2021-06-03 01:49:43 +08:00
Adam Pocock
2320aa0da2
Making the Java library loader emit helpful error messages on missing dependencies. (#6926) 2021-05-19 14:53:56 +08:00