Copy CMake parameter from dmlc-core. (#4948)

This commit is contained in:
Jiaming Yuan 2019-10-17 23:46:32 -04:00 committed by GitHub
parent a78d4e7aa8
commit 9fc681001a
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
4 changed files with 14 additions and 6 deletions

View File

@ -37,6 +37,12 @@ option(RABIT_MOCK "Build rabit with mock" OFF)
option(USE_CUDA "Build with GPU acceleration" OFF)
option(USE_NCCL "Build with NCCL to enable distributed GPU support." OFF)
option(BUILD_WITH_SHARED_NCCL "Build with shared NCCL library." OFF)
## Copied From dmlc
option(USE_HDFS "Build with HDFS support" OFF)
option(USE_AZURE "Build with AZURE support" OFF)
option(USE_S3 "Build with S3 support" OFF)
set(GPU_COMPUTE_VER "" CACHE STRING
"Semicolon separated list of compute versions to be built against, e.g. '35;61'")
if (BUILD_WITH_SHARED_NCCL AND (NOT USE_NCCL))

View File

@ -72,8 +72,10 @@ Our goal is to build the shared library:
The minimal building requirement is
- A recent C++ compiler supporting C++11 (g++-4.8 or higher)
- CMake 3.2 or higher
- A recent C++ compiler supporting C++11 (g++-5.0 or higher)
- CMake 3.3 or higher (3.12 for building with CUDA)
For a list of CMake options, see ``#-- Options`` in CMakeLists.txt on top of source tree.
Building on Ubuntu/Debian
=========================

View File

@ -20,7 +20,7 @@ Installation
Installation from source
========================
Building XGBoost4J using Maven requires Maven 3 or newer, Java 7+ and CMake 3.2+ for compiling the JNI bindings.
Building XGBoost4J using Maven requires Maven 3 or newer, Java 7+ and CMake 3.3+ for compiling the JNI bindings.
Before you install XGBoost4J, you need to define environment variable ``JAVA_HOME`` as your JDK directory to ensure that your compiler can find ``jni.h`` correctly, since XGBoost4J relies on JNI to implement the interaction between the JVM and native libraries.

View File

@ -158,7 +158,7 @@ Dealing with missing values
Strategies to handle missing values (and therefore overcome issues as above):
In the case that a feature column contains missing values for any reason (could be related to business logic / wrong data ingestion process / etc.), the user should decide on a strategy of how to handle it.
In the case that a feature column contains missing values for any reason (could be related to business logic / wrong data ingestion process / etc.), the user should decide on a strategy of how to handle it.
The choice of approach depends on the value representing 'missing' which fall into four different categories:
1. 0
@ -171,7 +171,7 @@ We introduce the following approaches dealing with missing value and their fitti
1. Skip VectorAssembler (using setHandleInvalid = "skip") directly. Used in (2), (3).
2. Keep it (using setHandleInvalid = "keep"), and set the "missing" parameter in XGBClassifier/XGBRegressor as the value representing missing. Used in (2) and (4).
3. Keep it (using setHandleInvalid = "keep") and transform to other irregular values. Used in (3).
4. Nothing to be done, used in (1).
4. Nothing to be done, used in (1).
Then, XGBoost will automatically learn what's the ideal direction to go when a value is missing, based on that value and strategy.
@ -241,7 +241,7 @@ Early stopping is a feature to prevent the unnecessary training iterations. By s
When it comes to custom eval metrics, in additional to ``num_early_stopping_rounds``, you also need to define ``maximize_evaluation_metrics`` or call ``setMaximizeEvaluationMetrics`` to specify whether you want to maximize or minimize the metrics in training. For built-in eval metrics, XGBoost4J-Spark will automatically select the direction.
For example, we need to maximize the evaluation metrics (set ``maximize_evaluation_metrics`` with true), and set ``num_early_stopping_rounds`` with 5. The evaluation metric of 10th iteration is the maximum one until now. In the following iterations, if there is no evaluation metric greater than the 10th iteration's (best one), the traning would be early stopped at 15th iteration.
For example, we need to maximize the evaluation metrics (set ``maximize_evaluation_metrics`` with true), and set ``num_early_stopping_rounds`` with 5. The evaluation metric of 10th iteration is the maximum one until now. In the following iterations, if there is no evaluation metric greater than the 10th iteration's (best one), the traning would be early stopped at 15th iteration.
Training with Evaluation Sets
-----------------------------