Remove unmaintained jvm readme and dev scripts. (#9395)
This commit is contained in:
parent
e082718c66
commit
0897477af0
@ -145,7 +145,7 @@ Send a PR to add a one sentence description:)
|
|||||||
## Tools using XGBoost
|
## Tools using XGBoost
|
||||||
|
|
||||||
- [BayesBoost](https://github.com/mpearmain/BayesBoost) - Bayesian Optimization using xgboost and sklearn API
|
- [BayesBoost](https://github.com/mpearmain/BayesBoost) - Bayesian Optimization using xgboost and sklearn API
|
||||||
- [FLAML](https://github.com/microsoft/FLAML) - An open source AutoML library
|
- [FLAML](https://github.com/microsoft/FLAML) - An open source AutoML library
|
||||||
designed to automatically produce accurate machine learning models with low computational cost. FLAML includes [XGBoost as one of the default learners](https://github.com/microsoft/FLAML/blob/main/flaml/model.py) and can also be used as a fast hyperparameter tuning tool for XGBoost ([code example](https://microsoft.github.io/FLAML/docs/Examples/AutoML-for-XGBoost)).
|
designed to automatically produce accurate machine learning models with low computational cost. FLAML includes [XGBoost as one of the default learners](https://github.com/microsoft/FLAML/blob/main/flaml/model.py) and can also be used as a fast hyperparameter tuning tool for XGBoost ([code example](https://microsoft.github.io/FLAML/docs/Examples/AutoML-for-XGBoost)).
|
||||||
- [gp_xgboost_gridsearch](https://github.com/vatsan/gp_xgboost_gridsearch) - In-database parallel grid-search for XGBoost on [Greenplum](https://github.com/greenplum-db/gpdb) using PL/Python
|
- [gp_xgboost_gridsearch](https://github.com/vatsan/gp_xgboost_gridsearch) - In-database parallel grid-search for XGBoost on [Greenplum](https://github.com/greenplum-db/gpdb) using PL/Python
|
||||||
- [tpot](https://github.com/rhiever/tpot) - A Python tool that automatically creates and optimizes machine learning pipelines using genetic programming.
|
- [tpot](https://github.com/rhiever/tpot) - A Python tool that automatically creates and optimizes machine learning pipelines using genetic programming.
|
||||||
|
|||||||
@ -3,161 +3,15 @@
|
|||||||
[](https://xgboost.readthedocs.org/en/latest/jvm/index.html)
|
[](https://xgboost.readthedocs.org/en/latest/jvm/index.html)
|
||||||
[](../LICENSE)
|
[](../LICENSE)
|
||||||
|
|
||||||
[Documentation](https://xgboost.readthedocs.org/en/latest/jvm/index.html) |
|
[Documentation](https://xgboost.readthedocs.org/en/stable/jvm/index.html) |
|
||||||
[Resources](../demo/README.md) |
|
[Resources](../demo/README.md) |
|
||||||
[Release Notes](../NEWS.md)
|
[Release Notes](../NEWS.md)
|
||||||
|
|
||||||
XGBoost4J is the JVM package of xgboost. It brings all the optimizations
|
XGBoost4J is the JVM package of xgboost. It brings all the optimizations and power xgboost
|
||||||
and power xgboost into JVM ecosystem.
|
into JVM ecosystem.
|
||||||
|
|
||||||
- Train XGBoost models in scala and java with easy customizations.
|
- Train XGBoost models in scala and java with easy customization.
|
||||||
- Run distributed xgboost natively on jvm frameworks such as
|
- Run distributed xgboost natively on jvm frameworks such as Apache Flink and Apache
|
||||||
Apache Flink and Apache Spark.
|
Spark.
|
||||||
|
|
||||||
You can find more about XGBoost on [Documentation](https://xgboost.readthedocs.org/en/latest/jvm/index.html) and [Resource Page](../demo/README.md).
|
You can find more about XGBoost on [Documentation](https://xgboost.readthedocs.org/en/stable/jvm/index.html) and [Resource Page](../demo/README.md).
|
||||||
|
|
||||||
## Add Maven Dependency
|
|
||||||
|
|
||||||
XGBoost4J, XGBoost4J-Spark, etc. in maven repository is compiled with g++-4.8.5.
|
|
||||||
|
|
||||||
### Access release version
|
|
||||||
|
|
||||||
<b>Maven</b>
|
|
||||||
|
|
||||||
```
|
|
||||||
<dependency>
|
|
||||||
<groupId>ml.dmlc</groupId>
|
|
||||||
<artifactId>xgboost4j_2.12</artifactId>
|
|
||||||
<version>latest_version_num</version>
|
|
||||||
</dependency>
|
|
||||||
<dependency>
|
|
||||||
<groupId>ml.dmlc</groupId>
|
|
||||||
<artifactId>xgboost4j-spark_2.12</artifactId>
|
|
||||||
<version>latest_version_num</version>
|
|
||||||
</dependency>
|
|
||||||
```
|
|
||||||
or
|
|
||||||
```
|
|
||||||
<dependency>
|
|
||||||
<groupId>ml.dmlc</groupId>
|
|
||||||
<artifactId>xgboost4j_2.13</artifactId>
|
|
||||||
<version>latest_version_num</version>
|
|
||||||
</dependency>
|
|
||||||
<dependency>
|
|
||||||
<groupId>ml.dmlc</groupId>
|
|
||||||
<artifactId>xgboost4j-spark_2.13</artifactId>
|
|
||||||
<version>latest_version_num</version>
|
|
||||||
</dependency>
|
|
||||||
```
|
|
||||||
|
|
||||||
<b>sbt</b>
|
|
||||||
```sbt
|
|
||||||
libraryDependencies ++= Seq(
|
|
||||||
"ml.dmlc" %% "xgboost4j" % "latest_version_num",
|
|
||||||
"ml.dmlc" %% "xgboost4j-spark" % "latest_version_num"
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
For the latest release version number, please check [here](https://github.com/dmlc/xgboost/releases).
|
|
||||||
|
|
||||||
|
|
||||||
### Access SNAPSHOT version
|
|
||||||
|
|
||||||
First add the following Maven repository hosted by the XGBoost project:
|
|
||||||
|
|
||||||
<b>Maven</b>:
|
|
||||||
|
|
||||||
```xml
|
|
||||||
<repository>
|
|
||||||
<id>XGBoost4J Snapshot Repo</id>
|
|
||||||
<name>XGBoost4J Snapshot Repo</name>
|
|
||||||
<url>https://s3-us-west-2.amazonaws.com/xgboost-maven-repo/snapshot/</url>
|
|
||||||
</repository>
|
|
||||||
```
|
|
||||||
|
|
||||||
<b>sbt</b>:
|
|
||||||
|
|
||||||
```sbt
|
|
||||||
resolvers += "XGBoost4J Snapshot Repo" at "https://s3-us-west-2.amazonaws.com/xgboost-maven-repo/snapshot/"
|
|
||||||
```
|
|
||||||
|
|
||||||
Then add XGBoost4J as a dependency:
|
|
||||||
|
|
||||||
<b>Maven</b>
|
|
||||||
|
|
||||||
```
|
|
||||||
<dependency>
|
|
||||||
<groupId>ml.dmlc</groupId>
|
|
||||||
<artifactId>xgboost4j_2.12</artifactId>
|
|
||||||
<version>latest_version_num-SNAPSHOT</version>
|
|
||||||
</dependency>
|
|
||||||
<dependency>
|
|
||||||
<groupId>ml.dmlc</groupId>
|
|
||||||
<artifactId>xgboost4j-spark_2.12</artifactId>
|
|
||||||
<version>latest_version_num-SNAPSHOT</version>
|
|
||||||
</dependency>
|
|
||||||
```
|
|
||||||
or with scala 2.13
|
|
||||||
```
|
|
||||||
<dependency>
|
|
||||||
<groupId>ml.dmlc</groupId>
|
|
||||||
<artifactId>xgboost4j_2.13</artifactId>
|
|
||||||
<version>latest_version_num-SNAPSHOT</version>
|
|
||||||
</dependency>
|
|
||||||
<dependency>
|
|
||||||
<groupId>ml.dmlc</groupId>
|
|
||||||
<artifactId>xgboost4j-spark_2.13</artifactId>
|
|
||||||
<version>latest_version_num-SNAPSHOT</version>
|
|
||||||
</dependency>
|
|
||||||
```
|
|
||||||
|
|
||||||
<b>sbt</b>
|
|
||||||
```sbt
|
|
||||||
libraryDependencies ++= Seq(
|
|
||||||
"ml.dmlc" %% "xgboost4j" % "latest_version_num-SNAPSHOT",
|
|
||||||
"ml.dmlc" %% "xgboost4j-spark" % "latest_version_num-SNAPSHOT"
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
For the latest release version number, please check [the repository listing](https://s3-us-west-2.amazonaws.com/xgboost-maven-repo/list.html).
|
|
||||||
|
|
||||||
### GPU algorithm
|
|
||||||
To enable the GPU algorithm (`tree_method='gpu_hist'`), use artifacts `xgboost4j-gpu_2.12` and `xgboost4j-spark-gpu_2.12` instead.
|
|
||||||
Note that scala 2.13 is not supported by the [NVIDIA/spark-rapids#1525](https://github.com/NVIDIA/spark-rapids/issues/1525) yet, so the GPU algorithm can only be used with scala 2.12.
|
|
||||||
|
|
||||||
## Examples
|
|
||||||
|
|
||||||
Full code examples for Scala, Java, Apache Spark, and Apache Flink can
|
|
||||||
be found in the [examples package](https://github.com/dmlc/xgboost/tree/master/jvm-packages/xgboost4j-example).
|
|
||||||
|
|
||||||
**NOTE on LIBSVM Format**:
|
|
||||||
|
|
||||||
There is an inconsistent issue between XGBoost4J-Spark and other language bindings of XGBoost.
|
|
||||||
|
|
||||||
When users use Spark to load trainingset/testset in LIBSVM format with the following code snippet:
|
|
||||||
|
|
||||||
```scala
|
|
||||||
spark.read.format("libsvm").load("trainingset_libsvm")
|
|
||||||
```
|
|
||||||
|
|
||||||
Spark assumes that the dataset is 1-based indexed. However, when you do prediction with other bindings of XGBoost (e.g. Python API of XGBoost), XGBoost assumes that the dataset is 0-based indexed. It creates a pitfall for the users who train model with Spark but predict with the dataset in the same format in other bindings of XGBoost.
|
|
||||||
|
|
||||||
## Development
|
|
||||||
|
|
||||||
You can build/package xgboost4j locally with the following steps:
|
|
||||||
|
|
||||||
**Linux:**
|
|
||||||
1. Ensure [Docker for Linux](https://docs.docker.com/install/) is installed.
|
|
||||||
2. Clone this repo: `git clone --recursive https://github.com/dmlc/xgboost.git`
|
|
||||||
3. Run the following command:
|
|
||||||
- With Tests: `./xgboost/jvm-packages/dev/build-linux.sh`
|
|
||||||
- Skip Tests: `./xgboost/jvm-packages/dev/build-linux.sh --skip-tests`
|
|
||||||
|
|
||||||
**Windows:**
|
|
||||||
1. Ensure [Docker for Windows](https://docs.docker.com/docker-for-windows/install/) is installed.
|
|
||||||
2. Clone this repo: `git clone --recursive https://github.com/dmlc/xgboost.git`
|
|
||||||
3. Run the following command:
|
|
||||||
- With Tests: `.\xgboost\jvm-packages\dev\build-linux.cmd`
|
|
||||||
- Skip Tests: `.\xgboost\jvm-packages\dev\build-linux.cmd --skip-tests`
|
|
||||||
|
|
||||||
*Note: this will create jars for deployment on Linux machines.*
|
|
||||||
3
jvm-packages/dev/.gitattributes
vendored
3
jvm-packages/dev/.gitattributes
vendored
@ -1,3 +0,0 @@
|
|||||||
# Set line endings to LF, even on Windows. Otherwise, execution within Docker fails.
|
|
||||||
# See https://help.github.com/articles/dealing-with-line-endings/
|
|
||||||
*.sh text eol=lf
|
|
||||||
1
jvm-packages/dev/.gitignore
vendored
1
jvm-packages/dev/.gitignore
vendored
@ -1 +0,0 @@
|
|||||||
.m2
|
|
||||||
@ -1,58 +0,0 @@
|
|||||||
#
|
|
||||||
# Licensed to the Apache Software Foundation (ASF) under one
|
|
||||||
# or more contributor license agreements. See the NOTICE file
|
|
||||||
# distributed with this work for additional information
|
|
||||||
# regarding copyright ownership. The ASF licenses this file
|
|
||||||
# to you under the Apache License, Version 2.0 (the
|
|
||||||
# "License"); you may not use this file except in compliance
|
|
||||||
# with the License. You may obtain a copy of the License at
|
|
||||||
#
|
|
||||||
# http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
#
|
|
||||||
# Unless required by applicable law or agreed to in writing,
|
|
||||||
# software distributed under the License is distributed on an
|
|
||||||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
||||||
# KIND, either express or implied. See the License for the
|
|
||||||
# specific language governing permissions and limitations
|
|
||||||
# under the License.
|
|
||||||
#
|
|
||||||
FROM centos:7
|
|
||||||
|
|
||||||
# Install all basic requirements
|
|
||||||
RUN \
|
|
||||||
yum -y update && \
|
|
||||||
yum install -y bzip2 make tar unzip wget xz git centos-release-scl yum-utils java-1.8.0-openjdk-devel && \
|
|
||||||
yum-config-manager --enable centos-sclo-rh-testing && \
|
|
||||||
yum -y update && \
|
|
||||||
yum install -y devtoolset-7-gcc devtoolset-7-binutils devtoolset-7-gcc-c++ && \
|
|
||||||
# Python
|
|
||||||
wget https://repo.continuum.io/miniconda/Miniconda3-4.5.12-Linux-x86_64.sh && \
|
|
||||||
bash Miniconda3-4.5.12-Linux-x86_64.sh -b -p /opt/python && \
|
|
||||||
# CMake
|
|
||||||
wget -nv -nc https://cmake.org/files/v3.18/cmake-3.18.3-Linux-x86_64.sh --no-check-certificate && \
|
|
||||||
bash cmake-3.18.3-Linux-x86_64.sh --skip-license --prefix=/usr && \
|
|
||||||
# Maven
|
|
||||||
wget https://archive.apache.org/dist/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz && \
|
|
||||||
tar xvf apache-maven-3.6.1-bin.tar.gz -C /opt && \
|
|
||||||
ln -s /opt/apache-maven-3.6.1/ /opt/maven
|
|
||||||
|
|
||||||
# Set the required environment variables
|
|
||||||
ENV PATH=/opt/python/bin:/opt/maven/bin:$PATH
|
|
||||||
ENV CC=/opt/rh/devtoolset-7/root/usr/bin/gcc
|
|
||||||
ENV CXX=/opt/rh/devtoolset-7/root/usr/bin/c++
|
|
||||||
ENV CPP=/opt/rh/devtoolset-7/root/usr/bin/cpp
|
|
||||||
ENV JAVA_HOME=/usr/lib/jvm/java
|
|
||||||
|
|
||||||
# Install Python packages
|
|
||||||
RUN \
|
|
||||||
pip install numpy pytest scipy scikit-learn wheel kubernetes urllib3==1.22 awscli
|
|
||||||
|
|
||||||
ENV GOSU_VERSION 1.10
|
|
||||||
|
|
||||||
# Install lightweight sudo (not bound to TTY)
|
|
||||||
RUN set -ex; \
|
|
||||||
wget -O /usr/local/bin/gosu "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-amd64" && \
|
|
||||||
chmod +x /usr/local/bin/gosu && \
|
|
||||||
gosu nobody true
|
|
||||||
|
|
||||||
WORKDIR /xgboost
|
|
||||||
@ -1,44 +0,0 @@
|
|||||||
@echo off
|
|
||||||
|
|
||||||
rem
|
|
||||||
rem Licensed to the Apache Software Foundation (ASF) under one
|
|
||||||
rem or more contributor license agreements. See the NOTICE file
|
|
||||||
rem distributed with this work for additional information
|
|
||||||
rem regarding copyright ownership. The ASF licenses this file
|
|
||||||
rem to you under the Apache License, Version 2.0 (the
|
|
||||||
rem "License"); you may not use this file except in compliance
|
|
||||||
rem with the License. You may obtain a copy of the License at
|
|
||||||
rem
|
|
||||||
rem http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
rem
|
|
||||||
rem Unless required by applicable law or agreed to in writing,
|
|
||||||
rem software distributed under the License is distributed on an
|
|
||||||
rem "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
||||||
rem KIND, either express or implied. See the License for the
|
|
||||||
rem specific language governing permissions and limitations
|
|
||||||
rem under the License.
|
|
||||||
rem
|
|
||||||
|
|
||||||
rem The the local path of this file
|
|
||||||
set "BASEDIR=%~dp0"
|
|
||||||
|
|
||||||
rem The local path of .m2 directory for maven
|
|
||||||
set "M2DIR=%BASEDIR%\.m2\"
|
|
||||||
|
|
||||||
rem Create a local .m2 directory if needed
|
|
||||||
if not exist "%M2DIR%" mkdir "%M2DIR%"
|
|
||||||
|
|
||||||
rem Build and tag the Dockerfile
|
|
||||||
docker build -t dmlc/xgboost4j-build %BASEDIR%
|
|
||||||
|
|
||||||
docker run^
|
|
||||||
-it^
|
|
||||||
--rm^
|
|
||||||
--memory 12g^
|
|
||||||
--env JAVA_OPTS="-Xmx9g"^
|
|
||||||
--env MAVEN_OPTS="-Xmx3g"^
|
|
||||||
--ulimit core=-1^
|
|
||||||
--volume %BASEDIR%\..\..:/xgboost^
|
|
||||||
--volume %M2DIR%:/root/.m2^
|
|
||||||
dmlc/xgboost4j-build^
|
|
||||||
/xgboost/jvm-packages/dev/package-linux.sh "%*"
|
|
||||||
@ -1,41 +0,0 @@
|
|||||||
#!/usr/bin/env bash
|
|
||||||
#
|
|
||||||
# Licensed to the Apache Software Foundation (ASF) under one
|
|
||||||
# or more contributor license agreements. See the NOTICE file
|
|
||||||
# distributed with this work for additional information
|
|
||||||
# regarding copyright ownership. The ASF licenses this file
|
|
||||||
# to you under the Apache License, Version 2.0 (the
|
|
||||||
# "License"); you may not use this file except in compliance
|
|
||||||
# with the License. You may obtain a copy of the License at
|
|
||||||
#
|
|
||||||
# http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
#
|
|
||||||
# Unless required by applicable law or agreed to in writing,
|
|
||||||
# software distributed under the License is distributed on an
|
|
||||||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
||||||
# KIND, either express or implied. See the License for the
|
|
||||||
# specific language governing permissions and limitations
|
|
||||||
# under the License.
|
|
||||||
#
|
|
||||||
BASEDIR="$( cd "$( dirname "$0" )" && pwd )" # the directory of this file
|
|
||||||
|
|
||||||
docker build -t dmlc/xgboost4j-build "${BASEDIR}" # build and tag the Dockerfile
|
|
||||||
|
|
||||||
exec docker run \
|
|
||||||
-it \
|
|
||||||
--rm \
|
|
||||||
--memory 12g \
|
|
||||||
--env JAVA_OPTS="-Xmx9g" \
|
|
||||||
--env MAVEN_OPTS="-Xmx3g -Dmaven.repo.local=/xgboost/jvm-packages/dev/.m2" \
|
|
||||||
--env CI_BUILD_UID=`id -u` \
|
|
||||||
--env CI_BUILD_GID=`id -g` \
|
|
||||||
--env CI_BUILD_USER=`id -un` \
|
|
||||||
--env CI_BUILD_GROUP=`id -gn` \
|
|
||||||
--ulimit core=-1 \
|
|
||||||
--volume "${BASEDIR}/../..":/xgboost \
|
|
||||||
dmlc/xgboost4j-build \
|
|
||||||
/xgboost/tests/ci_build/entrypoint.sh jvm-packages/dev/package-linux.sh "$@"
|
|
||||||
|
|
||||||
# CI_BUILD_UID, CI_BUILD_GID, CI_BUILD_USER, CI_BUILD_GROUP
|
|
||||||
# are used by entrypoint.sh to create the user with the same uid in a container
|
|
||||||
# so all produced artifacts would be owned by your host user
|
|
||||||
@ -1,36 +0,0 @@
|
|||||||
#!/usr/bin/env bash
|
|
||||||
#
|
|
||||||
# Licensed to the Apache Software Foundation (ASF) under one
|
|
||||||
# or more contributor license agreements. See the NOTICE file
|
|
||||||
# distributed with this work for additional information
|
|
||||||
# regarding copyright ownership. The ASF licenses this file
|
|
||||||
# to you under the Apache License, Version 2.0 (the
|
|
||||||
# "License"); you may not use this file except in compliance
|
|
||||||
# with the License. You may obtain a copy of the License at
|
|
||||||
#
|
|
||||||
# http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
#
|
|
||||||
# Unless required by applicable law or agreed to in writing,
|
|
||||||
# software distributed under the License is distributed on an
|
|
||||||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
||||||
# KIND, either express or implied. See the License for the
|
|
||||||
# specific language governing permissions and limitations
|
|
||||||
# under the License.
|
|
||||||
#
|
|
||||||
cd jvm-packages
|
|
||||||
|
|
||||||
case "$1" in
|
|
||||||
--skip-tests) SKIP_TESTS=true ;;
|
|
||||||
"") SKIP_TESTS=false ;;
|
|
||||||
esac
|
|
||||||
|
|
||||||
if [[ -n ${SKIP_TESTS} ]]; then
|
|
||||||
if [[ ${SKIP_TESTS} == "true" ]]; then
|
|
||||||
mvn --batch-mode clean package -DskipTests
|
|
||||||
elif [[ ${SKIP_TESTS} == "false" ]]; then
|
|
||||||
mvn --batch-mode clean package
|
|
||||||
fi
|
|
||||||
else
|
|
||||||
echo "Usage: $0 [--skip-tests]"
|
|
||||||
exit 1
|
|
||||||
fi
|
|
||||||
Loading…
x
Reference in New Issue
Block a user