5079 Commits

Author SHA1 Message Date
Philip Hyunsu Cho
487ab0ce73
[BLOCKING] Handle empty rows in data iterators correctly (#5929)
* [jvm-packages] Handle empty rows in data iterators correctly

* Fix clang-tidy error

* last empty row

* Add comments [skip ci]

Co-authored-by: Nan Zhu <nanzhu@uber.com>
2020-07-25 13:46:19 -07:00
FelixYBW
e6cd74ead3
Set a minimal reducer size and parent_down size (#139)
* set a minimal reducer msg size. Receive the same data size from parent each time.

* When parent read from a child, check it receive minimal reduce size.
 fix bug. Rewrite the minimal reducer size check, make sure it's 1~N times of minimal reduce size

 Assume the minimal reduce size is X, the logic here is
 1: each child upload total_size of message
 2: each parent receive X message at least, up to total_size
 3: parent reduce X or NxX or total_size message
 4: parent sends X or NxX or total_size message to its parent
 4: parent's parent receive X message at least, up to total_size. Then reduce X or NxX or total_size message
 6: parent's parent sends X or NxX or total_size message to its children
 7: parent receives X or NxX or total_size message, sends to its children
 8: child receive X or NxN or total_size message.

 During the whole process, each transfer is (1~N)xX Byte message or up to total_size.

 if X is larger than total_size, then allreduce allways reduce the whole messages and pass down.

* Follow style check rule

* fix the cpplint check

* fix allreduce_base header seq

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2020-07-25 12:46:45 -07:00
Jiaming Yuan
a4de2f68e4
Use cudaOccupancyMaxPotentialBlockSize to calculate the block size. (#5926) 2020-07-23 14:24:42 +08:00
Jiaming Yuan
fbfbd525d8
Cache dependencies on Github Action. (#5928) 2020-07-23 14:00:19 +08:00
Philip Hyunsu Cho
4af857f95d
Add explicit template specialization for portability (#5921)
* Add explicit template specializations

* Adding Specialization for FileAdapterBatch
2020-07-22 12:31:17 -07:00
Jiaming Yuan
bc1d3ee230
Fix r early stop with custom objective. (#5923)
* Specify `ntreelimit`.
2020-07-23 03:28:17 +08:00
Jiaming Yuan
30363d9c35
Remove R and JVM from appveyor. (#5922) 2020-07-23 03:26:48 +08:00
Jiaming Yuan
66cc1e02aa
Setup github action. (#5917) 2020-07-22 15:05:25 +08:00
Philip Hyunsu Cho
627cf41a60
Add option to enable all compiler warnings in GCC/Clang (#5897)
* Add option to enable all compiler warnings in GCC/Clang

* Fix -Wall for CUDA sources

* Make -Wall private req for xgboost-r
2020-07-21 23:34:03 -07:00
Jiaming Yuan
9b688aca3b
Fix mingw build with R. (#5918) 2020-07-22 02:56:49 +08:00
Philip Hyunsu Cho
8d7702766a
[Doc] Document new objectives and metrics available on GPUs (#5909) 2020-07-21 02:10:59 -07:00
Jiaming Yuan
03fb98fbde
Fix typo in CI. [skip ci] (#5919) 2020-07-21 14:25:27 +08:00
Jiaming Yuan
8b1afce316
Add Github Action for R. (#5911)
* Fix lintr errors.
2020-07-20 19:23:36 +08:00
Andy Adinets
b3d2e7644a
Support building XGBoost with CUDA 11 (#5808)
* Change serialization test.
* Add CUDA 11 tests on Linux CI.

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2020-07-20 07:58:41 +08:00
Philip Hyunsu Cho
ac9136ee49
Further improvements and savings in Jenkins pipeline (#5904)
* Publish artifacts only on the master and release branches

* Build CUDA only for Compute Capability 7.5 when building PRs

* Run all Windows jobs in a single worker image

* Build nightly XGBoost4J SNAPSHOT JARs with Scala 2.12 only

* Show skipped Python tests on Windows

* Make Graphviz optional for Python tests

* Add back C++ tests

* Unstash xgboost_cpp_tests

* Fix label to CUDA 10.1

* Install cuPy for CUDA 10.1

* Install jsonschema

* Address reviewer's feedback
2020-07-18 03:30:40 -07:00
Jiaming Yuan
6c0c87216f
Fix Windows 2016 build. (#5902) 2020-07-18 05:50:17 +08:00
Philip Hyunsu Cho
71b0528a2f
GPU implementation of AFT survival objective and metric (#5714)
* Add interval accuracy

* De-virtualize AFT functions

* Lint

* Refactor AFT metric using GPU-CPU reducer

* Fix R build

* Fix build on Windows

* Fix copyright header

* Clang-tidy

* Fix crashing demo

* Fix typos in comment; explain GPU ID

* Remove unnecessary #include

* Add C++ test for interval accuracy

* Fix a bug in accuracy metric: use log pred

* Refactor AFT objective using GPU-CPU Transform

* Lint

* Fix lint

* Use Ninja to speed up build

* Use time, not /usr/bin/time

* Add cpu_build worker class, with concurrency = 1

* Use concurrency = 1 only for CUDA build

* concurrency = 1 for clang-tidy

* Address reviewer's feedback

* Update link to AFT paper
2020-07-17 01:18:13 -07:00
Jiaming Yuan
7c2686146e
Dask device dmatrix (#5901)
* Fix softprob with empty dmatrix.
2020-07-17 13:17:43 +08:00
Jiaming Yuan
e471056ec4
Fix sketch size calculation. (#5898) 2020-07-17 08:33:16 +08:00
Bobby Wang
730866a7bc
[CI] update spark version to 3.0.0 (#5890)
* [CI] update spark version to 3.0.0

* Update Dockerfile.jvm_cross

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2020-07-16 00:23:44 -07:00
Jiaming Yuan
029a8b533f
Simplify the data backends. (#5893) 2020-07-16 15:17:31 +08:00
Philip Hyunsu Cho
7aee0e51ed
Fix R package build with CMake 3.13 (#5895)
* Fix R package build with CMake 3.13

* Require OpenMP for xgboost-r target
2020-07-15 20:22:11 -07:00
Philip Hyunsu Cho
3c40f4a7f5
[CI] Reduce load on Windows CI pipeline (#5892) 2020-07-14 18:47:05 -07:00
Jiaming Yuan
3cae287dea
Fix NDK Build. (#5886)
* Explicit cast for slice.
2020-07-14 18:34:19 +08:00
Alexander Gugel
970b4b3fa2
Add XGBoosterGetNumFeature (#5856)
- add GetNumFeature to Learner
- add XGBoosterGetNumFeature to C API
- update c-api-demo accordingly
2020-07-13 23:25:17 -07:00
Philip Hyunsu Cho
e0c179c7cc
[CI] Enforce daily budget in Jenkins CI (#5884)
* [CI] Throttle Jenkins CI

* Don't use Jenkins master instance
2020-07-13 21:51:11 -07:00
Jiaming Yuan
dd445af56e
Cleanup on device sketch. (#5874)
* Remove old functions.

* Merge weighted and un-weighted into a common interface.
2020-07-14 10:15:54 +08:00
Bobby Wang
9f85e92602
[jvm-packages] update spark dependency to 3.0.0 (#5836) 2020-07-12 20:58:30 -07:00
Philip Hyunsu Cho
23e2c6ec91
Upgrade Rabit (#5876) 2020-07-09 16:18:33 -07:00
Zhang Zhang
1813804e36
Add new parameter singlePrecisionHistogram to xgboost4j-spark (#5811)
Expose the existing 'singlePrecisionHistogram' param to the Spark layer.
2020-07-08 16:29:35 -07:00
Philip Hyunsu Cho
0d411b0397
[CI] Simplify CMake build with modern CMake techniques (#5871)
* [CI] Simplify CMake build

* Make sure that plugins can be built

* [CI] Install lz4 on Mac
2020-07-08 04:23:24 -07:00
Philip Hyunsu Cho
22a31b1faa
[Doc] Document that CUDA 10.0 is required [skip ci] (#5872) 2020-07-07 18:55:19 -07:00
Rong Ou
06320729d4
fix device sketch with weights in external memory mode (#5870) 2020-07-08 08:44:07 +08:00
Jiaming Yuan
d0a29c3135
Remove print. (#5867) 2020-07-08 04:12:14 +08:00
Jiaming Yuan
a3ec964346
Accept iterator in device dmatrix. (#5783)
* Remove Device DMatrix.
2020-07-07 21:44:48 +08:00
Jiaming Yuan
048d969be4
Implement GK sketching on GPU. (#5846)
* Implement GK sketching on GPU.
* Strong tests on quantile building.
* Handle sparse dataset by binary searching the column index.
* Hypothesis test on dask.
2020-07-07 12:16:21 +08:00
Andy Adinets
ac3f0e78dc
Split Features into Groups to Compute Histograms in Shared Memory (#5795) 2020-07-07 15:04:35 +12:00
Jiaming Yuan
93c44a9a64
Move feature names and types of DMatrix from Python to C++. (#5858)
* Add thread local return entry for DMatrix.
* Save feature name and feature type in binary file.

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2020-07-07 09:40:13 +08:00
Jiaming Yuan
4b0852ee41
Use dmlc stream when URI protocol is not local file. (#5857) 2020-07-07 03:07:12 +08:00
Alexander Gugel
0f17e35bce
Add c-api-demo to .gitignore (#5855) 2020-07-05 04:35:22 +08:00
Philip Hyunsu Cho
efe3e48ae2
Ensure that LoadSequentialFile() actually read the whole file (#5831) 2020-07-04 16:17:11 +08:00
Jiaming Yuan
1a0801238e
Implement iterative DMatrix. (#5837) 2020-07-03 11:44:52 +08:00
Jiaming Yuan
4d277d750d
Relax linear test. (#5849)
* Increased error in coordinate is mostly due to floating point error.
* Shotgun uses Hogwild!, which is non-deterministic and can have even greater
floating point error.
2020-07-03 07:49:53 +08:00
Jiaming Yuan
eb067c1c34
Relax test for shotgun. (#5835) 2020-07-01 19:20:29 +08:00
Jiaming Yuan
90a9c68874
Implement a DMatrix Proxy. (#5803) 2020-06-29 15:03:10 +08:00
Philip Hyunsu Cho
74bf00a5ab
De-duplicate macro _CRT_SECURE_NO_WARNINGS / _CRT_SECURE_NO_DEPRECATE (#136)
* De-duplicate macro _CRT_SECURE_NO_WARNINGS / _CRT_SECURE_NO_DEPRECATE

* Move all macros to base.h

* Fix CI
2020-06-28 09:51:50 -07:00
Jiaming Yuan
47c89775d6
Accept string for ArrayInterface constructor. (#5799) 2020-06-27 00:06:54 +08:00
Yuan Tang
95f11ed27e
Rename Ant Financial to Ant Group (#5827) 2020-06-25 15:25:36 -04:00
Jiaming Yuan
8234091368
Remove unweighted GK quantile. (#5816) 2020-06-23 14:27:46 +08:00
Philip Hyunsu Cho
dcff96ed27
[Doc] Fix rendering of Markdown docs, e.g. R doc (#5821) 2020-06-21 23:49:22 -07:00