98 Commits

Author SHA1 Message Date
Chen Qin
86ed01c4bb [jvm-packages] update rabit, surface new changes to spark, add parity and failure tests (#4876)
* Expose sets of rabit configurations to spark layer
2019-10-18 15:07:31 -04:00
Philip Hyunsu Cho
f7487e4c2a [CI] Run cuDF tests in Jenkins CI server (#4927) 2019-10-13 00:04:54 -04:00
Oleksandr Pryimak
80977182c5 Use bundled gtest (#4900)
* Suggest to use gtest bundled with dmlc

* Use dmlc bundled gtest in all CI scripts

* Make clang-tidy to use dmlc embedded gtest
2019-10-09 16:26:19 -07:00
Chen Qin
512f037e55 [rabit_bootstrap_cache ] failed xgb worker recover from other workers (#4808)
* Better recovery support.  Restarting only the failed workers.
2019-09-16 23:31:52 -04:00
Jiaming Yuan
b9b57f2289 Use long key id. (#4783) 2019-08-16 11:19:22 -07:00
Rong Ou
6edddd7966 Refactor DMatrix to return batches of different page types (#4686)
* Use explicit template parameter for specifying page type.
2019-08-03 15:10:34 -04:00
Philip Hyunsu Cho
166def9f75
[CI] Fix broken installation of Pandas (#4722)
* [CI] Fix broken installation of Pandas

* Update Dockerfile.gpu
2019-07-30 22:03:11 -07:00
Philip Hyunsu Cho
2758c5acea
[CI] Fix broken installation of Pandas (#4704) 2019-07-24 19:03:35 -07:00
koertkuipers
3c506b076e [jvm-packages] upgrade to Scala 2.12 (#4574)
* bump scala to 2.12 which requires java 8 and also newer flink and akka

* put scala version in artifactId

* fix appveyor

* fix for scaladoc issue that looks like https://github.com/scala/bug/issues/10509

* fix ci_build

* update versions in generate_pom.py

* fix generate_pom.py

* apache does not have a download for spark 2.4.3 distro using scala 2.12 yet, so for now i use a tgz i put on s3

* Upload spark-2.4.3-bin-scala2.12-hadoop2.7.tgz to our own S3

* Update Dockerfile.jvm_cross

* Update Dockerfile.jvm_cross
2019-07-16 08:43:34 -07:00
Philip Hyunsu Cho
a30176907f
Support Dask 2.0 (#4617) 2019-06-27 20:42:35 -07:00
Oleksandr Pryimak
923e6c86ba Add to documentation how to run tests locally (#4610)
* Add to documentation how to build native unit tests

* Add instructions to run Python tests and to use Docker container [skip ci]

* Fix link to pytest chapter

* Add link to Google Test [skip ci]

* Set PYTHONPATH [skip ci]

* Revise test_python.sh for running tests locally

* Update test_python.sh

* Place Docker recommendation notice in a prominent place [skip ci]
2019-06-27 19:02:04 -07:00
Philip Hyunsu Cho
0c50f8417a
[CI] Specify account ID when logging into ECR Docker registry (#4584)
* [CI] Specify account ID when logging into ECR Docker registry

* Do not display awscli login command
2019-06-19 15:08:42 -07:00
Jiaming Yuan
afa99e6d9d
Use yaml.safe_load. (#4537) 2019-06-07 03:39:25 +08:00
Rory Mitchell
09b90d9329
Add native support for Dask (#4473)
* Add native support for Dask

* Add multi-GPU demo

* Add sklearn example
2019-05-27 13:29:28 +12:00
Philip Hyunsu Cho
515f5f5c47
[RFC] Version 0.90 release candidate (#4475)
* Release 0.90

* Add script to automatically generate acknowledgment

* Update NEWS.md
2019-05-20 01:02:44 -07:00
Philip Hyunsu Cho
cf2400036e
[CI] Add Python and C++ tests for Windows GPU target (#4469)
* Add CMake option to use bundled gtest from dmlc-core, so that it is easy to build XGBoost with gtest on Windows

* Consistently apply OpenMP flag to all targets. Force enable OpenMP when USE_CUDA is turned on.

* Insert vcomp140.dll into Windows wheels

* Add C++ and Python tests for CPU and GPU targets (CUDA 9.0, 10.0, 10.1)

* Prevent spurious msbuild failure

* Add GPU tests

* Upgrade dmlc-core
2019-05-16 01:06:46 +00:00
Rong Ou
df2cdaca50 add cuda 10.1 support (#4468) 2019-05-14 18:30:58 +00:00
Philip Hyunsu Cho
b5f7cbfadf
[CI] Cache two R build Docker containers (#4458) 2019-05-11 10:54:00 -07:00
Philip Hyunsu Cho
6ff994126a [BLOCKING][CI] Upgrade to Spark 2.4.3 (#4414)
* [CI] Upgrade to Spark 2.4.2

* Pass Spark version to build script

* Allow multiple --build-arg in ci_build.sh

* Fix syntax

* Fix container name

* Update pom.xml

* Fix container name

* Update Jenkinsfile

* Update pom.xml

* Update Dockerfile.jvm_cross
2019-05-09 21:36:59 -07:00
Philip Hyunsu Cho
bfddc2c42c Make CMakeLists.txt compatible with CMake 3.3 (#4420)
* Make CMakeLists.txt compatible with CMake 3.3; require CMake 3.11 for MSVC

* Use CMake 3.12 when sanitizer is enabled

* Disable funroll-loops for MSVC

* Use cmake version in container name

* Add missing arg

* Fix egrep use in ci_build.sh

* Display CMake version

* Do not set OpenMP_CXX_LIBRARIES for MSVC

* Use cmake_minimum_required()
2019-05-02 11:49:32 +08:00
Nan Zhu
37dc82c3ff
[jvm-packages] allow partial evaluation of dataframe before prediction (#4407)
* allow partial evaluation of dataframe before prediction

* resume spark test

* comments

* Run unit tests after building JVM packages
2019-04-26 21:02:40 -07:00
Philip Hyunsu Cho
ea850ecd20
[CI] Refactor Jenkins CI pipeline + migrate all Linux tests to Jenkins (#4401)
* All Linux tests are now in Jenkins CI
* Tests are now de-coupled from builds. We can now build XGBoost with one version of CUDA/JDK and test it with another version of CUDA/JDK
* Builds (compilation) are significantly faster because 1) They use C5 instances with faster CPU cores; and 2) build environment setup is cached using Docker containers
2019-04-26 18:39:12 -07:00
Jiaming Yuan
207f058711 Refactor CMake scripts. (#4323)
* Refactor CMake scripts.

* Remove CMake CUDA wrapper.
* Bump CMake version for CUDA.
* Use CMake to handle Doxygen.
* Split up CMakeList.
* Export install target.
* Use modern CMake.
* Remove build.sh
* Workaround for gpu_hist test.
* Use cmake 3.12.

* Revert machine.conf.

* Move CLI test to gpu.

* Small cleanup.

* Support using XGBoost as submodule.

* Fix windows

* Fix cpp tests on Windows

* Remove duplicated find_package.
2019-04-15 10:08:12 -07:00
Philip Hyunsu Cho
70be1e38c2
[CI] Optimize external Docker build cache (#4334)
* When building pull requests, use Docker cache for master branch

Docker build caches are per-branch, so new pull requests will initially
have no build cache, causing the Docker containers to be built from
scratch. New pull requests should use the cache associated with the
master branch. This makes sense, since most pull requests do not modify
the Dockerfile.

* Add comments
2019-04-04 15:59:07 -07:00
Philip Hyunsu Cho
37c75aac41
[CI] Add external Docker build cache (#4331) 2019-04-04 13:36:39 -07:00
sriramch
2f7087eba1 Improve HostDeviceVector exception safety (#4301)
* make the assignments of HostDeviceVector exception safe.
* storing a dummy GPUDistribution instance in HDV for CPU based code.
* change testxgboost binary location to build directory.
2019-03-31 22:48:58 +08:00
Philip Hyunsu Cho
7aed8f3d48
[CI] Upgrade to GCC 5.3.1, CMake 3.6.0 (#4306)
* Upgrade to GCC 5.3.1, CMake 3.6.0

* <regex> is now okay
2019-03-28 00:21:21 -07:00
Rong Ou
5aa42b5f11 jenkins build for cuda 10.0 (#4281)
* jenkins build for cuda 10.0

* yum install nccl2 for cuda 10.0
2019-03-22 22:35:18 -07:00
Andy Adinets
b833b642ec Improved multi-node multi-GPU random forests. (#4238)
* Improved multi-node multi-GPU random forests.

- removed rabit::Broadcast() from each invocation of column sampling
- instead, syncing the PRNG seed when a ColumnSampler() object is constructed
- this makes non-trivial column sampling significantly faster in the distributed case
- refactored distributed GPU tests
- added distributed random forests tests
2019-03-13 12:36:28 +13:00
Jiaming Yuan
7b9043cf71
Fix clang-tidy warnings. (#4149)
* Upgrade gtest for clang-tidy.
* Use CMake to install GTest instead of mv.
* Don't enforce clang-tidy to return 0 due to errors in thrust.
* Add a small test for tidy itself.

* Reformat.
2019-03-13 02:25:51 +08:00
Matthew Jones
92b7577c62 [REVIEW] Enable Multi-Node Multi-GPU functionality (#4095)
* Initial commit to support multi-node multi-gpu xgboost using dask

* Fixed NCCL initialization by not ignoring the opg parameter.

- it now crashes on NCCL initialization, but at least we're attempting it properly

* At the root node, perform a rabit::Allreduce to get initial sum_gradient across workers

* Synchronizing in a couple of more places.

- now the workers don't go down, but just hang
- no more "wild" values of gradients
- probably needs syncing in more places

* Added another missing max-allreduce operation inside BuildHistLeftRight

* Removed unnecessary collective operations.

* Simplified rabit::Allreduce() sync of gradient sums.

* Removed unnecessary rabit syncs around ncclAllReduce.

- this improves performance _significantly_ (7x faster for overall training,
  20x faster for xgboost proper)

* pulling in latest xgboost

* removing changes to updater_quantile_hist.cc

* changing use_nccl_opg initialization, removing unnecessary if statements

* added definition for opaque ncclUniqueId struct to properly encapsulate GetUniqueId

* placing struct defintion in guard to avoid duplicate code errors

* addressing linting errors

* removing

* removing additional arguments to AllReduer initialization

* removing distributed flag

* making comm init symmetric

* removing distributed flag

* changing ncclCommInit to support multiple modalities

* fix indenting

* updating ncclCommInitRank block with necessary group calls

* fix indenting

* adding print statement, and updating accessor in vector

* improving print statement to end-line

* generalizing nccl_rank construction using rabit

* assume device_ordinals is the same for every node

* test, assume device_ordinals is identical for all nodes

* test, assume device_ordinals is unique for all nodes

* changing names of offset variable to be more descriptive, editing indenting

* wrapping ncclUniqueId GetUniqueId() and aesthetic changes

* adding synchronization, and tests for distributed

* adding  to tests

* fixing broken #endif

* fixing initialization of gpu histograms, correcting errors in tests

* adding to contributors list

* adding distributed tests to jenkins

* fixing bad path in distributed test

* debugging

* adding kubernetes for distributed tests

* adding proper import for OrderedDict

* adding urllib3==1.22 to address ordered_dict import error

* added sleep to allow workers to save their models for comparison

* adding name to GPU contributors under docs
2019-03-02 10:03:22 +13:00
Jiaming Yuan
8905df4a18
Perform clang-tidy on both cpp and cuda source. (#4034)
* Basic script for using compilation database.

* Add `GENERATE_COMPILATION_DATABASE' to CMake.
* Rearrange CMakeLists.txt.
* Add basic python clang-tidy script.
* Remove modernize-use-auto.
* Add clang-tidy to Jenkins
* Refine logic for correct path detection

In Jenkins, the project root is of form /home/ubuntu/workspace/xgboost_PR-XXXX

* Run clang-tidy in CUDA 9.2 container
* Use clang_tidy container
2019-02-05 16:07:43 +08:00
Philip Hyunsu Cho
7a652a8c64
Speed up Jenkins by not compiling CMake (#4099) 2019-02-03 00:08:14 -08:00
Jiaming Yuan
2ea0f887c1
Refactor Python tests. (#3897)
* Deprecate nose tests.
* Format python tests.
2018-11-15 13:56:33 +13:00
Philip Hyunsu Cho
411df9f878
Test wheels on CUDA 10.0 container for compatibility (#3838) 2018-11-01 08:34:47 -07:00
Philip Hyunsu Cho
abf2f661be
Fix #3708: Use dmlc::TemporaryDirectory to handle temporaries in cross-platform way (#3783)
* Fix #3708: Use dmlc::TemporaryDirectory to handle temporaries in cross-platform way

Also install git inside NVIDIA GPU container

* Update dmlc-core
2018-10-18 10:16:04 -07:00
Philip Hyunsu Cho
b50bc2c1d4
Add multi-GPU unit test environment (#3741)
* Add multi-GPU unit test environment

* Better assertion message

* Temporarily disable failing test

* Distinguish between multi-GPU and single-GPU CPP tests

* Consolidate Python tests. Use attributes to distinguish multi-GPU Python tests from single-CPU counterparts
2018-09-29 11:20:58 -07:00
Philip Hyunsu Cho
baef5741df
Separate out restricted and unrestricted tasks (#3736) 2018-09-27 23:06:14 -07:00
Philip Hyunsu Cho
51478a39c9
Fix #3730: scikit-learn 0.20 compatibility fix (#3731)
* Fix #3730: scikit-learn 0.20 compatibility fix

sklearn.cross_validation has been removed from scikit-learn 0.20,
so replace it with sklearn.model_selection

* Display test names for Python tests for clarity
2018-09-27 15:03:05 -07:00
trivialfis
55caad6e49 Remove redundant FindGTest.cmake. (#3533)
During removal of FindGTest.cmake, also

* Fix gtest include dirs.
* Remove some blanks and use PWD for gtest dir.
2018-08-07 10:08:08 +12:00
Rory Mitchell
1b59316444
Updates for GPU CI tests (#3467)
* Fail GPU CI after test failure

* Fix GPU linear tests

* Reduced number of GPU tests to speed up CI

* Remove static allocations of device memory

* Resolve illegal memory access for updater_fast_hist.cc

* Fix broken r tests dependency

* Update python install documentation for GPU
2018-07-16 18:05:53 +12:00
Thejaswi
2200939416 Upgrading to NCCL2 (#3404)
* Upgrading to NCCL2

* Part - II of NCCL2 upgradation

 - Doc updates to build with nccl2
 - Dockerfile.gpu update for a correct CI build with nccl2
 - Updated FindNccl package to have env-var NCCL_ROOT to take precedence

* Upgrading to v9.2 for CI workflow, since it has the nccl2 binaries available

* Added NCCL2 license + copy the nccl binaries into /usr location for the FindNccl module to find

* Set LD_LIBRARY_PATH variable to pick nccl2 binary at runtime

* Need the nccl2 library download instructions inside Dockerfile.release as well

* Use NCCL2 as a static library
2018-07-10 00:42:15 -07:00
Philip Hyunsu Cho
cafc621914
Do not unzip google test archive if exists (#3416) 2018-06-28 04:10:39 +00:00
Philip Hyunsu Cho
e2743548ed
Fix wget for google tests in tests (#3414)
CI tests were failing because wget prompts "the user" for a response
whenever the google test archive is already on the disk.

Fix: Use `-nc` option to skip download when the archive already
exists
2018-06-27 22:12:56 +00:00
Rory Mitchell
f8b7686719
Add cuda 8/9.1 centos 6 builds, test GPU wheel on CPU only container. (#3309)
* Add cuda 8/9.1 centos 6 builds, test GPU wheel on CPU only container.

* Add Google test
2018-05-17 10:57:01 +12:00
Rory Mitchell
90a5c4db9d
Update Jenkins CI for GPU (#3294) 2018-05-04 16:50:59 +12:00
Michal Malohlava
33ee7d1615 [BUILD] Dockerfile and Jenkinsfile revisited (#2514)
Includes:
  - Dockerfile changes
    - Dockerfile clean up
    - Fix execution privileges of files used from Dockerfile.
    - New Dockerfile entrypoint to replace with_user script
    - Defined a placeholders for CPU testing (script and Dockerfile)
  - Jenkinsfile
    - Jenkins file milestone defined
    - Single source code checkout and propagation via stash/unstash
    - Bash needs to be explicitly used in launching make build, since we need
access to environment
    - Jenkinsfile build factory for cmake and make style of jobs
    - Archivation of artifacts (*.so, *.whl, *.egg) produced by cmake build

Missing:
  - CPU testing
  - Python3 env build and testing
2017-07-13 17:51:47 +12:00
Rory Mitchell
1899f9e744 [GPU-Plugin] Add basic continuous integration for GPU plugin. (#2431) 2017-06-22 10:15:28 -04:00