3409 Commits

Author SHA1 Message Date
Andy Adinets
72cd1517d6 Replaced std::vector with HostDeviceVector in MetaInfo and SparsePage. (#3446)
* Replaced std::vector with HostDeviceVector in MetaInfo and SparsePage.

- added distributions to HostDeviceVector
- using HostDeviceVector for labels, weights and base margings in MetaInfo
- using HostDeviceVector for offset and data in SparsePage
- other necessary refactoring

* Added const version of HostDeviceVector API calls.

- const versions added to calls that can trigger data transfers, e.g. DevicePointer()
- updated the code that uses HostDeviceVector
- objective functions now accept const HostDeviceVector<bst_float>& for predictions

* Updated src/linear/updater_gpu_coordinate.cu.

* Added read-only state for HostDeviceVector sync.

- this means no copies are performed if both host and devices access
  the HostDeviceVector read-only

* Fixed linter and test errors.

- updated the lz4 plugin
- added ConstDeviceSpan to HostDeviceVector
- using device % dh::NVisibleDevices() for the physical device number,
  e.g. in calls to cudaSetDevice()

* Fixed explicit template instantiation errors for HostDeviceVector.

- replaced HostDeviceVector<unsigned int> with HostDeviceVector<int>

* Fixed HostDeviceVector tests that require multiple GPUs.

- added a mock set device handler; when set, it is called instead of cudaSetDevice()
2018-08-30 14:28:47 +12:00
Andy Adinets
58d783df16 Fixed issue 3605. (#3628)
* Fixed issue 3605.

- https://github.com/dmlc/xgboost/issues/3605

* Fixed the bug in a better way.

* Added a test to catch the bug.

* Fixed linter errors.
2018-08-28 10:50:52 -07:00
Rory Mitchell
78bea0d204
Add google test for a column sampling, restore metainfo tests (#3637)
* Add google test for a column sampling, restore metainfo tests

* Update metainfo test for visual studio

* Fix multi-GPU bug introduced in #3635
2018-08-28 16:10:26 +12:00
gorogm
7ef2b599c7 Link fixed. (#3640) 2018-08-27 20:25:50 -07:00
Rory Mitchell
686e990ffc
GPU memory usage fixes + column sampling refactor (#3635)
* Remove thrust copy calls

* Fix  histogram memory usage

* Cap extreme histogram memory usage

* More efficient column sampling

* Use column sampler across updaters

* More efficient split evaluation on GPU with column sampling
2018-08-27 16:26:46 +12:00
trivialfis
60787ecebc Merge generic device helper functions into gpu set. (#3626)
* Remove the use of old NDevices* functions.
* Use GPUSet in timer.h.
2018-08-26 18:14:23 +12:00
Nan Zhu
3261002099
[jvm-packages] throw ControlThrowable instead of InterruptedException (#3632)
* add back train method but mark as deprecated

* add back train method but mark as deprecated

* fix scalastyle error

* fix scalastyle error

* interrupted exception is not rethrown
2018-08-25 20:30:21 -07:00
Philip Hyunsu Cho
cb4de521c1
Document CUDA requirement, lack of external memory on GPU (#3624)
* Document fact that GPU doesn't support external memory

* Document CUDA requirement
2018-08-22 22:47:10 -07:00
Philip Hyunsu Cho
4ed8a88240
Update Python API doc (#3619)
* Add XGBRanker to Python API doc

* Show inherited members of XGBRegressor in API doc, since XGBRegressor uses default methods from XGBModel

* Add table of contents to Python API doc

* Skip JVM doc download if not available

* Show inherited members for XGBRegressor and XGBRanker

* Expose XGBRanker to Python XGBoost module directory

* Add docstring to XGBRegressor.predict() and XGBRanker.predict()

* Fix rendering errors in Python docstrings

* Fix lint
2018-08-22 18:59:30 -07:00
Nan Zhu
4912c1f9c6
[jvm-packages] fix checkpoint save/load (#3614)
* add back train method but mark as deprecated

* add back train method but mark as deprecated

* fix scalastyle error

* fix scalastyle error

* fix update checkpoint func
2018-08-21 12:34:24 -07:00
Grant W Schneider
57f3c2f252 Remove errant $ (#3618) 2018-08-21 12:32:38 -07:00
Shiki-H
24a268a2e3 sklearn api for ranking (#3560)
* added xgbranker

* fixed predict method and ranking test

* reformatted code in accordance with pep8

* fixed lint error

* fixed docstring and added checks on objective

* added ranking demo for python

* fixed suffix in rank.py
2018-08-21 08:26:48 -07:00
Philip Hyunsu Cho
b13c3a8bcc
Fix #3609: Removed unused parameter 'use_buffer' (#3610) 2018-08-21 07:54:15 -07:00
trivialfis
cf2d86a4f6 Add travis sanitizers tests. (#3557)
* Add travis sanitizers tests.

* Add gcc-7 in Travis.
* Add SANITIZER_PATH for CMake.
* Enable sanitizer tests in Travis.

* Fix memory leaks in tests.

* Fix all memory leaks reported by Address Sanitizer.
* tests/cpp/helpers.h/CreateDMatrix now returns raw pointer.
2018-08-19 16:40:30 +12:00
Philip Hyunsu Cho
983cb0b374
Add option to disable default metric (#3606) 2018-08-18 11:39:20 -07:00
Grace Lam
993e62b9e7 Add JSON model dump functionality (#3603)
* Add JSON model dump functionality

* Fix lint
2018-08-17 16:18:43 -07:00
Matthew Tovbin
b53a5a262c [jvm-packages] getTreeLimit return type should be Int 2018-08-17 09:36:00 -07:00
Philip Hyunsu Cho
ac7fc1306b
Fix #3598: document that custom objective can't contain colon (:) (#3601) 2018-08-16 19:05:40 -07:00
Grace Lam
caf4a756bf Add JSON dump functionality documentation (#3600) 2018-08-16 16:32:04 -07:00
trivialfis
7c82dc92b2 Fix accessing DMatrix.handle before set. (#3599)
Close #3597.
2018-08-16 15:26:06 -07:00
Jakob Richter
725f4c36f2 replace nround with nrounds to match actual parameter (#3592) 2018-08-15 11:13:53 -07:00
Nan Zhu
73bd590a1d
[jvm-packages] add the missing scm urls (#3589)
for some reason this part was missing in master branch????
2018-08-14 15:05:23 -07:00
trivialfis
9265964ee7 Fix ptrdiff_t namespace in Span. (#3588)
Fix #3587.
2018-08-15 10:04:55 +12:00
trivialfis
2c502784ff Span class. (#3548)
* Add basic Span class based on ISO++20.

* Use Span<Entry const> instead of Inst in SparsePage.

* Add DeviceSpan in HostDeviceVector, use it in regression obj.
2018-08-14 17:58:11 +12:00
Matthew Tovbin
2b7a1c5780 [jvm-packages] Avoid loosing precision when computing probabilities by converting to Double early (#3576) 2018-08-13 14:05:07 -07:00
Matthew Tovbin
ce0f0568a6 Make sure 'thresholds' are considered when executing predict method (#3577) 2018-08-13 14:04:47 -07:00
Philip Hyunsu Cho
6288f6d563 Update JVM packages version to 0.81-SNAPSHOT (#3584) 2018-08-13 10:17:52 -07:00
Philip Hyunsu Cho
96826a3515
Release version 0.80 (#3541)
* Up versions

* Write release note for 0.80
v0.80
2018-08-13 01:38:37 -07:00
Mathew
06ef4db4cc Fix Spark 2.2 Support (Amending #3062) (#3325)
This pull request amends the broken #3062 allow Spark 2.2 to work.

Please note this won't work in Spark <=2.1 as sc.removeSparkListener was implemented in Spark 2.2. (So perhaps a more general method is better, although that is what was attempted in #3062)

This PR fixes: #3208, #3151 and the discussion in #1927.

I do find it strange that #3062 dose not work in Spark 2.2, it's probably due to some sort of public/private issue in the org.apache.spark.scheduler.LiveListenerBus class inheritance (In Spark itself). The error is: `java.lang.NoSuchMethodError: org.apache.spark.scheduler.LiveListenerBus.removeListener(Ljava/lang/Object;)V`
2018-08-12 18:35:20 -07:00
Rory Mitchell
645996b12f Remove accidental SparsePage copies (#3583) 2018-08-12 17:49:38 -07:00
Philip Hyunsu Cho
0b607fb884
Add link to XGBoost4J-Spark tutorial on AWS Yarn tutorial (#3582) 2018-08-12 07:27:28 -07:00
Philip Hyunsu Cho
4202332783
Clarify multi-GPU training, binary wheels, Pandas integration (#3581)
* Clarify multi-GPU training, binary wheels, Pandas integration

* Add a note about multi-GPU on gpu/index.rst
2018-08-11 19:21:28 -07:00
Matthew Tovbin
7300002516 [jvm-packages] Use treeLimit param in getTreeLimit (#3575) 2018-08-10 09:38:58 -07:00
Philip Hyunsu Cho
9c647d8130 Bring XGBoost4J Intro up-to-date (#3574) 2018-08-10 09:08:19 -07:00
Philip Hyunsu Cho
2e7c3a0ed5
Refined logic for locating git branch inside ReadTheDocs (#3573) 2018-08-09 15:28:12 -07:00
Philip Hyunsu Cho
aa4ee6a0e4
[BLOCKING] Adding JVM doc build to Jenkins CI (#3567)
* Adding Java/Scala doc build to Jenkins CI

* Deploy built doc to S3 bucket

* Build doc only for branches

* Build doc first, to get doc faster for branch updates

* Have ReadTheDocs download doc tarball from S3

* Update JVM doc links

* Put doc build commands in a script

* Specify Spark 2.3+ requirement for XGBoost4J-Spark

* Build GPU wheel without NCCL, to reduce binary size
2018-08-09 13:27:01 -07:00
Matthew Tovbin
bad76048d1 Eliminate use of System.out + proper error logging (#3572) 2018-08-09 10:06:17 -07:00
Rory Mitchell
bbb771f32e
Refactor parts of fast histogram utilities (#3564)
* Refactor parts of fast histogram utilities

* Removed byte packing from column matrix
2018-08-09 17:59:57 +12:00
Philip Hyunsu Cho
3c72654e3b
Revert "Fix #3485, #3540: Don't use dropout for predicting test sets" (#3563)
* Revert "Fix #3485, #3540: Don't use dropout for predicting test sets (#3556)"

This reverts commit 44811f233071c5805d70c287abd22b155b732727.

* Document behavior of predict() for DART booster

* Add notice to parameter.rst
2018-08-08 09:48:55 -07:00
Zeno Gantner
e3e776bd58 grammar fixes and typos (#3568) 2018-08-08 09:48:27 -07:00
Nan Zhu
1c08b3b2ea
[jvm-packages] enable predictLeaf/predictContrib/treeLimit in 0.8 (#3532)
* add back train method but mark as deprecated

* add back train method but mark as deprecated

* fix scalastyle error

* fix scalastyle error

* partial finish

* no test

* add test cases

* add test cases

* address comments

* add test for regressor

* fix typo
2018-08-07 14:01:18 -07:00
Philip Hyunsu Cho
246ec92163
Update broken links (#3565)
Fix #3559
Fix #3562
2018-08-07 05:27:39 -07:00
trivialfis
55caad6e49 Remove redundant FindGTest.cmake. (#3533)
During removal of FindGTest.cmake, also

* Fix gtest include dirs.
* Remove some blanks and use PWD for gtest dir.
2018-08-07 10:08:08 +12:00
Henry Gouk
69454d9487 Implementation of hinge loss for binary classification (#3477) 2018-08-07 10:06:42 +12:00
Philip Hyunsu Cho
44811f2330
Fix #3485, #3540: Don't use dropout for predicting test sets (#3556)
* Fix #3485, #3540: Don't use dropout for predicting test sets

Dropout (for DART) should only be used at training time.

* Add regression test
2018-08-05 10:17:21 -07:00
Philip Hyunsu Cho
109473dae2
Fix #3545: XGDMatrixCreateFromCSCEx silently discards empty trailing rows (#3553)
* Fix #3545: XGDMatrixCreateFromCSCEx silently discards empty trailing rows

Description: The bug is triggered when

1. The data matrix has empty rows at the bottom. More precisely, the rows
   `n-k+1`, `n-k+2`, ..., `n` of the matrix have missing values in all
   dimensions (`n` number of instances, `k` number of trailing rows)
2. The data matrix is given as Compressed Sparse Column (CSC) format.

Diagnosis: When the CSC matrix is converted to Compressed Sparse Row (CSR)
format (this is common format used for DMatrix), the trailing empty rows
are silently ignored. More specifically, the row pointer (`offset`) of the
newly created CSR matrix does not take account of these rows.

Fix: Modify the row pointer.

* Add regression test
2018-08-05 10:15:42 -07:00
Philip Hyunsu Cho
8c633d1ca3
Fix #3505: Prevent undefined behavior due to incorrectly sized base_margin (#3555)
The base margin will need to have length `[num_class] * [number of data points]`.
Otherwise, the array holding prediction results will be only partially
initialized, causing undefined behavior.

Fix: check the length of the base margin. If the length is not correct,
use the global bias (`base_score`) instead. Warn the user about the
substitution.
2018-08-05 10:14:07 -07:00
Philip Hyunsu Cho
4a429a7c4f Add reg:tweedie to supported objectives in XGBoost4J-Spark (#3552) 2018-08-05 07:42:59 -07:00
Philip Hyunsu Cho
7fefd6865d
Fix #3402: wrong fid crashes distributed algorithm (#3535)
* Fix #3402: wrong fid crashes distributed algorithm

The bug was introduced by the recent DMatrix refactor (#3301). It was partially
fixed by #3408 but the example in #3402 was still failing. The example in #3402
will succeed after this fix is applied.

* Explicitly specify "this" to prevent compile error

* Add regression test

* Add distributed test to Travis matrix

* Install kubernetes Python package as dependency of dmlc tracker

* Add Python dependencies

* Add compile step

* Reduce size of regression test case

* Further reduce size of test
2018-08-04 19:20:04 -07:00
Nan Zhu
31d1baba3d [jvm-packages] Tutorial of XGBoost4J-Spark (#3534)
* add back train method but mark as deprecated

* add back train method but mark as deprecated

* fix scalastyle error

* fix scalastyle error

* add new

* update doc

* finish Gang Scheduling

* more

* intro

* Add sections: Prediction, Model persistence and ML pipeline.

* Add XGBoost4j-Spark MLlib pipeline example

* partial finished version

* finish the doc

* adjust code

* fix the doc

* use rst

* Convert XGBoost4J-Spark tutorial to reST

* Bring XGBoost4J up to date

* add note about using hdfs

* remove duplicate file

* fix descriptions

* update doc

* Wrap HDFS/S3 export support as a note

* update

* wrap indexing_mode example in code block
2018-08-03 21:17:50 -07:00