3188 Commits

Author SHA1 Message Date
PSEUDOTENSOR / Jonathan McKinney
4d36036fe6 Avoid repeated cuda API call in GPU predictor and only synchronize used GPUs (#2936) 2017-12-09 16:00:42 +13:00
Vadim Khotilovich
e8a6597957 [R] maintenance Nov 2017; SHAP plots (#2888)
* [R] fix predict contributions for data with no colnames

* [R] add a render parameter for xgb.plot.multi.trees; fixes #2628

* [R] update Rd's

* [R] remove unnecessary dep-package from R cmake install

* silence type warnings; readability

* [R] silence complaint about incomplete line at the end

* [R] initial version of xgb.plot.shap()

* [R] more work on xgb.plot.shap

* [R] enforce black font in xgb.plot.tree; fixes #2640

* [R] if feature names are available, check in predict that they are the same; fixes #2857

* [R] cran check and lint fixes

* remove tabs

* [R] add references; a test for plot.shap
2017-12-05 09:45:34 -08:00
Rory Mitchell
1b77903eeb
Fix several GPU bugs (#2916)
* Fix #2905

* Fix gpu_exact test failures

* Fix bug in GPU prediction where multiple calls to batch prediction can produce incorrect results

* Fix GPU documentation formatting
2017-12-04 08:27:49 +13:00
jac-stripe
1e3aabbadc Include symlinks to make wheel build work (#2909) 2017-12-01 11:27:58 -05:00
Katrin Leinweber
646db1528d simplify software citation (#2912)
* simplify software citation; answers #309

* fix import issues from dl.acm.org/citation.cfm?id=2939785's BibTeX
2017-12-01 02:58:13 -08:00
Rory Mitchell
c51adb49b6
Monotone constraints for gpu_hist (#2904) 2017-11-30 10:26:19 +13:00
Jerry Dumblauskas
5867c1b96d update doc string for grid parameter (#2647)
* update doc string for grid parameter

* update doc string for grid parameter
2017-11-29 11:22:46 -08:00
LevineHuang
878f307948 Fix minor typos (#2842)
* Some minor changes to the code style

Some minor changes to the code style in file basic_walkthrough.py

* coding style changes

* coding style changes arrcording PEP8

* Update basic_walkthrough.py

* Fix minor typo

* Minor edits to coding style

Minor edits to coding style following the proposals of PEP8.
2017-11-29 11:22:09 -08:00
Rajiv Abraham
77715d5c62 Update to correct brew gcc command (#1931)
The previous command did not work for me. This one did.
2017-11-29 11:20:49 -08:00
EvanChong
790da458e7 Sync number of features after loaded matrix in different workers. (#2722) 2017-11-29 11:19:12 -08:00
Jay
bb097166b5 build.sh hints for errors related to: Cannot find XGBoost Library in the candidate path, did you install compilers and run build.sh in root path? (#2229)
* provide hints on how to build this on linux if a new user just clones the repository and is looking for help.

* add the recursive command example
2017-11-29 11:18:49 -08:00
avinocur
0ad20f8fe0 Parameterize host-ip to pass to tracker.py (#2831) 2017-11-29 11:14:34 -08:00
Sam O
602b34ab91 Fix performance of c_array in python core.py (#2786) 2017-11-29 11:12:49 -08:00
Viraj Navkal
9fbeeea46e Small fixes to notation in documentation (#2903)
* make every theta lowercase

* use uniform font and capitalization for function name
2017-11-28 13:32:35 -08:00
Rory Mitchell
c55f14668e
Update gpu_hist algorithm (#2901) 2017-11-27 13:44:24 +13:00
Rory Mitchell
24f527a1c0
AVX gradients (#2878)
* AVX gradients

* Add google test for AVX

* Create fallback implementation, remove fma instruction

* Improved accuracy of AVX exp function
2017-11-27 08:56:01 +13:00
yskn67
3dcf966bc3 Fix XGDMatrixFree argument type (#2898) 2017-11-23 10:49:05 -08:00
tomisuker
70a4c419e9 FIX typo in doc (#2892)
* FIX link

* typo
2017-11-21 18:04:48 +01:00
Sergei Lebedev
8e141427aa
[jvm-packages] Exposed train-time evaluation metrics (#2836)
* [jvm-packages] Exposed train-time evaluation metrics

They are accessible via 'XGBoostModel.summary'. The summary is not
serialized with the model and is only available after the training.

* Addressed review comments

* Extracted model-related tests into 'XGBoostModelSuite'

* Added tests for copying the 'XGBoostModel'

* [jvm-packages] Fixed a subtle bug in train/test split

Iterator.partition (naturally) assumes that the predicate is deterministic
but this is not the case for

    r.nextDouble() <= trainTestRatio

therefore sometimes the DMatrix(...) call got a NoSuchElementException
and crashed the JVM due to lack of exception handling in
XGBoost4jCallbackDataIterNext.

* Make sure train/test objectives are different
2017-11-20 22:21:54 +01:00
Joe Nyland
88177691b8 Update README (#2204)
I found the installation of the Python XGBoost package to be problematic as the documentation around compiler requirements was unclear, as discussed in #1501. I decided that I would improve the README.
2017-11-19 17:12:16 -08:00
Rory Mitchell
40c6e2f0c8
Improved gpu_hist_experimental algorithm (#2866)
- Implement colsampling, subsampling for gpu_hist_experimental

 - Optimised multi-GPU implementation for gpu_hist_experimental

 - Make nccl optional

 - Add Volta architecture flag

 - Optimise RegLossObj

 - Add timing utilities for debug verbose mode

 - Bump required cuda version to 8.0
2017-11-11 13:58:40 +13:00
Rory Mitchell
16c63f30d0
Fix MultiIndex detection (breaks for latest pandas==0.21.0). (#2872) 2017-11-11 11:12:23 +13:00
Dat Le
77ae4c8701 Update OSX build instructions (#2784)
* Update xgboost build for OS X

* Add notes on gcc and brew

* Update build.md

* Update build.md

* Update build.md
2017-11-06 13:07:10 +01:00
ebernhardson
78d0bd6c9d [jvm-packages] Repair spark model eval (#2841)
In the refactor to add base margins, #2532, all of the labels were lost
when creating the dmatrix. This became obvious as metrics like ndcg
always returned 1.0 regardless of the results.

Change-Id: I88be047e1c108afba4784bd3d892bfc9edeabe55
2017-11-04 23:28:47 +01:00
Seth Hendrickson
a8f670d247 [jvm-packages] Add some documentation to xgboost4j-spark plus minor style edits (#2823)
* add scala docs to several methods

* indentation

* license formatting

* clarify distributed boosters

* address some review comments

* reduce doc lengths

* change method name, clarify  doc

* reset make config

* delete most comments

* more review feedback
2017-11-02 13:16:02 -07:00
ebernhardson
46f2b820f1 [jvm-packages] Objectives starting with rank: are never classification (#2837)
Training a model with the experimental rank:ndcg objective incorrectly
returns a Classification model. Adjust the classification check to
not recognize rank:* objectives as classification.

While writing tests for isClassificationTask also turned up that
obj_type -> regression was incorrectly identified as a classification
task so the function was slightly adjusted to pass the new tests.
2017-10-30 17:36:03 +01:00
LevineHuang
91af8f7106 Minor edits to coding style (#2835)
* Some minor changes to the code style

Some minor changes to the code style in file basic_walkthrough.py

* coding style changes

* coding style changes arrcording PEP8

* Update basic_walkthrough.py
2017-10-26 22:12:54 -05:00
Rory Mitchell
d9d5293cdb Add warnings for large labels when using GPU histogram algorithms (#2834) 2017-10-26 17:31:10 +13:00
Rory Mitchell
13e7a2cff0 Various bug fixes (#2825)
* Fatal error if GPU algorithm selected without GPU support compiled

* Resolve type conversion warnings

* Fix gpu unit test failure

* Fix compressed iterator edge case

* Fix python unit test failures due to flake8 update on pip
2017-10-25 14:45:01 +13:00
LevineHuang
c71b62d48d Minor changes to code style (#2805)
Some minor changes to code style in file 'boost_from_prediction.py'.
2017-10-23 10:46:45 -05:00
Philip Cho
452063c32d Fix issue #2800 (#2817)
Problem:
Fast histogram updater crashes whenever subsampling picks zero rows

Diagnosis:
Row set data structure uses "nullptr" internally to indicate a non-existent
row set. Since you cannot take the address of the first element of an empty
vector, a valid row set ends up getting "nullptr" as well.

Fix:
Use an arbitrary value (not equal to "nullptr") to bypass nullptr check.
2017-10-23 10:46:25 -05:00
caoyi
3610025fb6 Fix typo (#2818)
Fix typo
2017-10-23 10:45:49 -05:00
Seth Hendrickson
ac7a9edb06 remove stale examples (#2816) 2017-10-20 23:18:46 +02:00
Qiang Luo
c09ad421a8 fix bug in loading config for pred task (#2795) 2017-10-20 00:10:14 -05:00
erikdf
5dca6745e1 Fixed typo in doc (#2799) 2017-10-18 18:20:47 -05:00
Justin Mills
b1793da30e Only set OpenMP_CXX_FLAGS when OpenMP is found (#2613)
* Only set OpenMP_CXX_FLAGS when OpenMP is found

I found this trying to get the Mac build working without OpenMP. Tips in
issue #2596 helped to point in the right direction.

* Revise check

* Trigger codecov
2017-10-16 23:02:13 -05:00
Yun Ni
b678e1711d [jvm-packages] Add SparkParallelismTracker to prevent job from hanging (#2697)
* Add SparkParallelismTracker to prevent job from hanging

* Code review comments

* Code Review Comments

* Fix unit tests

* Changes and unit test to catch the corner case.

* Update documentations

* Small improvements

* cancalAllJobs is problematic with scalatest. Remove it

* Code Review Comments

* Check number of executor cores beforehand, and throw exeception if any core is lost.

* Address CR Comments

* Add missing class

* Fix flaky unit test

* Address CR comments

* Remove redundant param for TaskFailedListener
2017-10-16 20:18:47 -07:00
Scott Lundberg
78c4188cec SHAP values for feature contributions (#2438)
* SHAP values for feature contributions

* Fix commenting error

* New polynomial time SHAP value estimation algorithm

* Update API to support SHAP values

* Fix merge conflicts with updates in master

* Correct submodule hashes

* Fix variable sized stack allocation

* Make lint happy

* Add docs

* Fix typo

* Adjust tolerances

* Remove unneeded def

* Fixed cpp test setup

* Updated R API and cleaned up

* Fixed test typo
2017-10-12 12:35:51 -07:00
Guang Wei Yu
ff9180cd73 Add a new winning solution to demo/README.md (#2778) 2017-10-09 18:07:07 -04:00
Julian Niedermeier
9a81c74a7b Add xgb_model parameter to sklearn fit (#2623)
Adding xgb_model paramter allows the continuation of model training.
Model has to be saved by calling `model.get_booster().save_model(path)`
2017-10-01 08:47:17 -04:00
Icyblade Dai
6e378452f2 coding style update (#2752)
* coding style update

Current coding style varies(for example: the mixed use of single quote and double quote), and it will be confusing, especially for new users.
This PR will try to follow proposal of PEP8, make the documents more readable.

* minor fix
2017-10-01 08:42:15 -04:00
Rory Mitchell
4cb2f7598b -Add experimental GPU algorithm for lossguided mode (#2755)
-Improved GPU algorithm unit tests
-Removed some thrust code to improve compile times
2017-10-01 00:18:35 +13:00
Sergei Lebedev
69c3b78a29 [jvm-packages] Implemented early stopping (#2710)
* Allowed subsampling test from the training data frame/RDD

The implementation requires storing 1 - trainTestRatio points in memory
to make the sampling work.

An alternative approach would be to construct the full DMatrix and then
slice it deterministically into train/test. The peak memory consumption
of such scenario, however, is twice the dataset size.

* Removed duplication from 'XGBoost.train'

Scala callers can (and should) use names to supply a subset of
parameters. Method overloading is not required.

* Reuse XGBoost seed parameter to stabilize train/test splitting

* Added early stopping support to non-distributed XGBoost

Closes #1544

* Added early-stopping to distributed XGBoost

* Moved construction of 'watches' into a separate method

This commit also fixes the handling of 'baseMargin' which previously
was not added to the validation matrix.

* Addressed review comments
2017-09-29 12:06:22 -07:00
Vadim Khotilovich
74db9757b3 [R package] GPU support (#2732)
* [R] MSVC compatibility

* [GPU] allow seed in BernoulliRng up to size_t and scale to uint32_t

* R package build with cmake and CUDA

* R package CUDA build fixes and cleanups

* always export the R package native initialization routine on windows

* update the install instructions doc

* fix lint

* use static_cast directly to set BernoulliRng seed

* [R] demo for GPU accelerated algorithm

* tidy up the R package cmake stuff

* R pack cmake: installs main dependency packages if needed

* [R] version bump in DESCRIPTION

* update NEWS

* added short missing/sparse values explanations to FAQ
2017-09-28 18:15:28 -05:00
Icyblade Dai
5c9f01d0a9 minor typo (#2751)
* minor typo

* typo

* Update discoverYourData.md
2017-09-28 07:45:10 +02:00
Andrew Hannigan
5c9f0ff9d9 Check existance of seed/nthread keys before checking their value. (#2669) 2017-09-27 03:05:59 -04:00
Philip Cho
0eaf43a5e1 A hack to fix broken search bar in doc (#2583)
Current version of xgboost.readthedocs.io has a broken search box.
Enabling themes on ReadTheDocs is known to break the search function, as
reported in
[this document](https://github.com/rtfd/readthedocs.org/issues/1487). To get
around the bug, we replace the `searchtools.js` file with our custom version.
2017-09-27 03:05:10 -04:00
Philip Cho
31ad40b963 Make __del__ method idempotent (#2627)
Addresses Issue #2533.
2017-09-27 03:03:55 -04:00
Tsukasa OMOTO
8d15024ac7 python: follow the default warning filters of Python (#2666)
* python: follow the default warning filters of Python

https://docs.python.org/3/library/warnings.html#default-warning-filters

* update tests

* update tests
2017-09-27 03:03:01 -04:00
zhxfl
178517524f fix bug for demo/multiclass_classification/train.py (#2747) 2017-09-25 22:37:21 -05:00