Remove unmaintained jvm readme and dev scripts. (#9395)

This commit is contained in:
Jiaming Yuan 2023-07-18 18:23:43 +08:00 committed by GitHub
parent e082718c66
commit 0897477af0
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
8 changed files with 8 additions and 337 deletions

View File

@ -3,161 +3,15 @@
[![Documentation Status](https://readthedocs.org/projects/xgboost/badge/?version=latest)](https://xgboost.readthedocs.org/en/latest/jvm/index.html)
[![GitHub license](http://dmlc.github.io/img/apache2.svg)](../LICENSE)
[Documentation](https://xgboost.readthedocs.org/en/latest/jvm/index.html) |
[Documentation](https://xgboost.readthedocs.org/en/stable/jvm/index.html) |
[Resources](../demo/README.md) |
[Release Notes](../NEWS.md)
XGBoost4J is the JVM package of xgboost. It brings all the optimizations
and power xgboost into JVM ecosystem.
XGBoost4J is the JVM package of xgboost. It brings all the optimizations and power xgboost
into JVM ecosystem.
- Train XGBoost models in scala and java with easy customizations.
- Run distributed xgboost natively on jvm frameworks such as
Apache Flink and Apache Spark.
- Train XGBoost models in scala and java with easy customization.
- Run distributed xgboost natively on jvm frameworks such as Apache Flink and Apache
Spark.
You can find more about XGBoost on [Documentation](https://xgboost.readthedocs.org/en/latest/jvm/index.html) and [Resource Page](../demo/README.md).
## Add Maven Dependency
XGBoost4J, XGBoost4J-Spark, etc. in maven repository is compiled with g++-4.8.5.
### Access release version
<b>Maven</b>
```
<dependency>
<groupId>ml.dmlc</groupId>
<artifactId>xgboost4j_2.12</artifactId>
<version>latest_version_num</version>
</dependency>
<dependency>
<groupId>ml.dmlc</groupId>
<artifactId>xgboost4j-spark_2.12</artifactId>
<version>latest_version_num</version>
</dependency>
```
or
```
<dependency>
<groupId>ml.dmlc</groupId>
<artifactId>xgboost4j_2.13</artifactId>
<version>latest_version_num</version>
</dependency>
<dependency>
<groupId>ml.dmlc</groupId>
<artifactId>xgboost4j-spark_2.13</artifactId>
<version>latest_version_num</version>
</dependency>
```
<b>sbt</b>
```sbt
libraryDependencies ++= Seq(
"ml.dmlc" %% "xgboost4j" % "latest_version_num",
"ml.dmlc" %% "xgboost4j-spark" % "latest_version_num"
)
```
For the latest release version number, please check [here](https://github.com/dmlc/xgboost/releases).
### Access SNAPSHOT version
First add the following Maven repository hosted by the XGBoost project:
<b>Maven</b>:
```xml
<repository>
<id>XGBoost4J Snapshot Repo</id>
<name>XGBoost4J Snapshot Repo</name>
<url>https://s3-us-west-2.amazonaws.com/xgboost-maven-repo/snapshot/</url>
</repository>
```
<b>sbt</b>:
```sbt
resolvers += "XGBoost4J Snapshot Repo" at "https://s3-us-west-2.amazonaws.com/xgboost-maven-repo/snapshot/"
```
Then add XGBoost4J as a dependency:
<b>Maven</b>
```
<dependency>
<groupId>ml.dmlc</groupId>
<artifactId>xgboost4j_2.12</artifactId>
<version>latest_version_num-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>ml.dmlc</groupId>
<artifactId>xgboost4j-spark_2.12</artifactId>
<version>latest_version_num-SNAPSHOT</version>
</dependency>
```
or with scala 2.13
```
<dependency>
<groupId>ml.dmlc</groupId>
<artifactId>xgboost4j_2.13</artifactId>
<version>latest_version_num-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>ml.dmlc</groupId>
<artifactId>xgboost4j-spark_2.13</artifactId>
<version>latest_version_num-SNAPSHOT</version>
</dependency>
```
<b>sbt</b>
```sbt
libraryDependencies ++= Seq(
"ml.dmlc" %% "xgboost4j" % "latest_version_num-SNAPSHOT",
"ml.dmlc" %% "xgboost4j-spark" % "latest_version_num-SNAPSHOT"
)
```
For the latest release version number, please check [the repository listing](https://s3-us-west-2.amazonaws.com/xgboost-maven-repo/list.html).
### GPU algorithm
To enable the GPU algorithm (`tree_method='gpu_hist'`), use artifacts `xgboost4j-gpu_2.12` and `xgboost4j-spark-gpu_2.12` instead.
Note that scala 2.13 is not supported by the [NVIDIA/spark-rapids#1525](https://github.com/NVIDIA/spark-rapids/issues/1525) yet, so the GPU algorithm can only be used with scala 2.12.
## Examples
Full code examples for Scala, Java, Apache Spark, and Apache Flink can
be found in the [examples package](https://github.com/dmlc/xgboost/tree/master/jvm-packages/xgboost4j-example).
**NOTE on LIBSVM Format**:
There is an inconsistent issue between XGBoost4J-Spark and other language bindings of XGBoost.
When users use Spark to load trainingset/testset in LIBSVM format with the following code snippet:
```scala
spark.read.format("libsvm").load("trainingset_libsvm")
```
Spark assumes that the dataset is 1-based indexed. However, when you do prediction with other bindings of XGBoost (e.g. Python API of XGBoost), XGBoost assumes that the dataset is 0-based indexed. It creates a pitfall for the users who train model with Spark but predict with the dataset in the same format in other bindings of XGBoost.
## Development
You can build/package xgboost4j locally with the following steps:
**Linux:**
1. Ensure [Docker for Linux](https://docs.docker.com/install/) is installed.
2. Clone this repo: `git clone --recursive https://github.com/dmlc/xgboost.git`
3. Run the following command:
- With Tests: `./xgboost/jvm-packages/dev/build-linux.sh`
- Skip Tests: `./xgboost/jvm-packages/dev/build-linux.sh --skip-tests`
**Windows:**
1. Ensure [Docker for Windows](https://docs.docker.com/docker-for-windows/install/) is installed.
2. Clone this repo: `git clone --recursive https://github.com/dmlc/xgboost.git`
3. Run the following command:
- With Tests: `.\xgboost\jvm-packages\dev\build-linux.cmd`
- Skip Tests: `.\xgboost\jvm-packages\dev\build-linux.cmd --skip-tests`
*Note: this will create jars for deployment on Linux machines.*
You can find more about XGBoost on [Documentation](https://xgboost.readthedocs.org/en/stable/jvm/index.html) and [Resource Page](../demo/README.md).

View File

@ -1,3 +0,0 @@
# Set line endings to LF, even on Windows. Otherwise, execution within Docker fails.
# See https://help.github.com/articles/dealing-with-line-endings/
*.sh text eol=lf

View File

@ -1 +0,0 @@
.m2

View File

@ -1,58 +0,0 @@
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
FROM centos:7
# Install all basic requirements
RUN \
yum -y update && \
yum install -y bzip2 make tar unzip wget xz git centos-release-scl yum-utils java-1.8.0-openjdk-devel && \
yum-config-manager --enable centos-sclo-rh-testing && \
yum -y update && \
yum install -y devtoolset-7-gcc devtoolset-7-binutils devtoolset-7-gcc-c++ && \
# Python
wget https://repo.continuum.io/miniconda/Miniconda3-4.5.12-Linux-x86_64.sh && \
bash Miniconda3-4.5.12-Linux-x86_64.sh -b -p /opt/python && \
# CMake
wget -nv -nc https://cmake.org/files/v3.18/cmake-3.18.3-Linux-x86_64.sh --no-check-certificate && \
bash cmake-3.18.3-Linux-x86_64.sh --skip-license --prefix=/usr && \
# Maven
wget https://archive.apache.org/dist/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz && \
tar xvf apache-maven-3.6.1-bin.tar.gz -C /opt && \
ln -s /opt/apache-maven-3.6.1/ /opt/maven
# Set the required environment variables
ENV PATH=/opt/python/bin:/opt/maven/bin:$PATH
ENV CC=/opt/rh/devtoolset-7/root/usr/bin/gcc
ENV CXX=/opt/rh/devtoolset-7/root/usr/bin/c++
ENV CPP=/opt/rh/devtoolset-7/root/usr/bin/cpp
ENV JAVA_HOME=/usr/lib/jvm/java
# Install Python packages
RUN \
pip install numpy pytest scipy scikit-learn wheel kubernetes urllib3==1.22 awscli
ENV GOSU_VERSION 1.10
# Install lightweight sudo (not bound to TTY)
RUN set -ex; \
wget -O /usr/local/bin/gosu "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-amd64" && \
chmod +x /usr/local/bin/gosu && \
gosu nobody true
WORKDIR /xgboost

View File

@ -1,44 +0,0 @@
@echo off
rem
rem Licensed to the Apache Software Foundation (ASF) under one
rem or more contributor license agreements. See the NOTICE file
rem distributed with this work for additional information
rem regarding copyright ownership. The ASF licenses this file
rem to you under the Apache License, Version 2.0 (the
rem "License"); you may not use this file except in compliance
rem with the License. You may obtain a copy of the License at
rem
rem http://www.apache.org/licenses/LICENSE-2.0
rem
rem Unless required by applicable law or agreed to in writing,
rem software distributed under the License is distributed on an
rem "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
rem KIND, either express or implied. See the License for the
rem specific language governing permissions and limitations
rem under the License.
rem
rem The the local path of this file
set "BASEDIR=%~dp0"
rem The local path of .m2 directory for maven
set "M2DIR=%BASEDIR%\.m2\"
rem Create a local .m2 directory if needed
if not exist "%M2DIR%" mkdir "%M2DIR%"
rem Build and tag the Dockerfile
docker build -t dmlc/xgboost4j-build %BASEDIR%
docker run^
-it^
--rm^
--memory 12g^
--env JAVA_OPTS="-Xmx9g"^
--env MAVEN_OPTS="-Xmx3g"^
--ulimit core=-1^
--volume %BASEDIR%\..\..:/xgboost^
--volume %M2DIR%:/root/.m2^
dmlc/xgboost4j-build^
/xgboost/jvm-packages/dev/package-linux.sh "%*"

View File

@ -1,41 +0,0 @@
#!/usr/bin/env bash
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
BASEDIR="$( cd "$( dirname "$0" )" && pwd )" # the directory of this file
docker build -t dmlc/xgboost4j-build "${BASEDIR}" # build and tag the Dockerfile
exec docker run \
-it \
--rm \
--memory 12g \
--env JAVA_OPTS="-Xmx9g" \
--env MAVEN_OPTS="-Xmx3g -Dmaven.repo.local=/xgboost/jvm-packages/dev/.m2" \
--env CI_BUILD_UID=`id -u` \
--env CI_BUILD_GID=`id -g` \
--env CI_BUILD_USER=`id -un` \
--env CI_BUILD_GROUP=`id -gn` \
--ulimit core=-1 \
--volume "${BASEDIR}/../..":/xgboost \
dmlc/xgboost4j-build \
/xgboost/tests/ci_build/entrypoint.sh jvm-packages/dev/package-linux.sh "$@"
# CI_BUILD_UID, CI_BUILD_GID, CI_BUILD_USER, CI_BUILD_GROUP
# are used by entrypoint.sh to create the user with the same uid in a container
# so all produced artifacts would be owned by your host user

View File

@ -1,36 +0,0 @@
#!/usr/bin/env bash
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
cd jvm-packages
case "$1" in
--skip-tests) SKIP_TESTS=true ;;
"") SKIP_TESTS=false ;;
esac
if [[ -n ${SKIP_TESTS} ]]; then
if [[ ${SKIP_TESTS} == "true" ]]; then
mvn --batch-mode clean package -DskipTests
elif [[ ${SKIP_TESTS} == "false" ]]; then
mvn --batch-mode clean package
fi
else
echo "Usage: $0 [--skip-tests]"
exit 1
fi