3060 Commits

Author SHA1 Message Date
Nan Zhu
e24f25e0c6 add Qubole example (#2401) 2017-06-09 20:33:26 -07:00
Sergei Lebedev
3820ab6a0b [jvm-packages] Minor improvements to the CMake build (#2379)
* [jvm-packages] Fixed JNI_OnLoad overload

It does not compile on Windows without proper export flags.

* [jvm-packages] Use JNI types directly where appropriate

* Removed lib hack from CMake build

Prior to this commit the CMake build use hardcoded lib prefix for
libxgboost and libxgboost4j. Unfortunatelly this did not play well with
Windows, which does not use the lib- prefix.
2017-06-09 08:25:09 -07:00
Sergei Lebedev
37c27ab8e8 [jvm-packages] Replaced create_jni.{bat,sh} with a Python version (#2383)
* [jvm-packages] Replaced create_jni.{bat,sh} with a Python version

This allows to have a single script for all platforms.

* [jvm-packages] Added all configuration options to create_jni.py
2017-06-07 21:55:45 -07:00
Vadim Khotilovich
c82276386d [R] xgb.importance: fix for multiclass gblinear, new 'trees' parameter (#2388) 2017-06-07 13:13:21 -05:00
Xiaoguang Sun
2ae56ca84f Use int32_t explicitly when serializing version (#2389)
Use int32_t explicitly when serializing version field of dmatrix in binary
format. On ILP64 architectures, although very little, size of int is 64 bits.
2017-06-07 10:03:42 -07:00
Thejaswi
85b2fb3eee [GPU-Plugin] Integration of a faster version of grow_gpu plugin into mainstream (#2360)
* Integrating a faster version of grow_gpu plugin
1. Removed the older files to reduce duplication
2. Moved all of the grow_gpu files under 'exact' folder
3. All of them are inside 'exact' namespace to avoid any conflicts
4. Fixed a bug in benchmark.py while running only 'grow_gpu' plugin
5. Added cub and googletest submodules to ease integration and unit-testing
6. Updates to CMakeLists.txt to directly build cuda objects into libxgboost

* Added support for building gpu plugins through make flow
1. updated makefile and config.mk to add right targets
2. added unit-tests for gpu exact plugin code

* 1. Added support for building gpu plugin using 'make' flow as well
2. Updated instructions for building and testing gpu plugin

* Fix travis-ci errors for PR#2360
1. lint errors on unit-tests
2. removed googletest, instead depended upon dmlc-core provide gtest cache

* Some more fixes to travis-ci lint failures PR#2360

* Added Rory's copyrights to the files containing code from both.

* updated copyright statement as per Rory's request

* moved the static datasets into a script to generate them at runtime

* 1. memory usage print when silent=0
2. tests/ and test/ folder organization
3. removal of the dependency of googletest for just building xgboost
4. coding style updates for .cuh as well

* Fixes for compilation warnings

* add cuda object files as well when JVM_BINDINGS=ON
2017-06-06 09:39:53 +12:00
Sergei Lebedev
2d9052bc7d libxgboost4j is now part of the CMake build (#2373)
* [jvm-packages] Added libxgboost4j to CMake build

* [jvm-packages] Wired CMake build into create_jni.sh

* User newer CMake version on Travis

* Lowered CMake version constraints

* Fixed various quirks in the new CMake build
2017-06-03 17:14:51 -07:00
Jakub Zakrzewski
ed6384ecbf [Python] Use appropriate integer types when calling native code. (#2361)
Don't use implicit conversions to c_int, which incidentally happen to work
on (some) 64-bit platforms, but:
* may lead to truncation of the input value to a 32-bit signed int,
* cause segfaults on some 32-bit architectures (tested on Ubuntu ARM,
  but is also the likely cause of issue #1707).

Also, when passing references use explicit 64-bit integers, where needed,
instead of c_ulong, which is not guaranteed to be this large.
2017-06-02 10:16:54 -07:00
Artem Krylysov
ed8da45f9d Fix C API header compatibility with C compilers (#2369) 2017-06-02 10:14:30 -07:00
Sergei Lebedev
97abfc487a [jvm-packages] Fixed checkstyle excludes on Windows (#2370)
XGBoostJNI.java was not excluded on Windows, probably because the path
specified in 'checkstyle-suppressions.xml' used UNIX file separators.
2017-06-02 10:14:13 -07:00
Michaël Benesty
8e2a1ff2bf Improve setinfo documentation on R package (#2357) 2017-05-30 20:08:31 +02:00
Sergei Lebedev
433269c335 Minor improvements to xgboost/jvm-packages build (#2356)
* Specified 'exec-maven-plugin' version

* Changed 'create_jni.sh' to fail on error

and also report each of the executed commands, which makes it easier
to debug.
2017-05-30 17:51:27 +02:00
davidt0x
b29b7d1d76 Fixed loop bound in create.new.tree.features (#2328)
for loop in create.new.tree.features was referencing length(trees) as the upper bound of the loop. trees is a base R dataset and not the model that the code is generating. Changed loop boundary to model$niter which should be the number of trees.
2017-05-30 17:50:33 +02:00
Juang, Yi-Lin
812300bb7f Update CONTRIBUTORS.md (#2350) 2017-05-27 08:38:32 -07:00
Juang, Yi-Lin
6776292951 Minor cleanup (#2342)
* Clean up demo of multiclass classification

* Remove extra space
2017-05-26 09:40:41 -04:00
Alexander Kiselev
f1dc82e3e1 Update parameter.md (#2348) 2017-05-25 09:27:10 -04:00
gaw89
0f3a404d91 Sklearn kwargs (#2338)
* Added kwargs support for Sklearn API

* Updated NEWS and CONTRIBUTORS

* Fixed CONTRIBUTORS.md

* Added clarification of **kwargs and test for proper usage

* Fixed lint error

* Fixed more lint errors and clf assigned but never used

* Fixed more lint errors

* Fixed more lint errors

* Fixed issue with changes from different branch bleeding over

* Fixed issue with changes from other branch bleeding over

* Added note that kwargs may not be compatible with Sklearn

* Fixed linting on kwargs note
2017-05-23 21:47:53 -05:00
gaw89
6cea1e3fb7 Sklearn convention update (#2323)
* Added n_jobs and random_state to keep up to date with sklearn API.
Deprecated nthread and seed.  Added tests for new params and
deprecations.

* Fixed docstring to reflect updates to n_jobs and random_state.

* Fixed whitespace issues and removed nose import.

* Added deprecation note for nthread and seed in docstring.

* Attempted fix of deprecation tests.

* Second attempted fix to tests.

* Set n_jobs to 1.
2017-05-22 08:22:05 -05:00
Vadim Khotilovich
da1629e848 [gbtree] fix update process to work with multiclass and multitree; fixes #2315 (#2332) 2017-05-21 23:47:57 -05:00
Vadim Khotilovich
b52db87d5c adding feature contributions to R and gblinear (#2295)
* [gblinear] add features contribution prediction; fix DumpModel bug

* [gbtree] minor changes to PredContrib

* [R] add feature contribution prediction to R

* [R] bump up version; update NEWS

* [gblinear] fix the base_margin issue; fixes #1969

* [R] list of matrices as output of multiclass feature contributions

* [gblinear] make order of DumpModel coefficients consistent: group index changes the fastest
2017-05-21 07:41:51 -04:00
Sergei Lebedev
e5e721722e Fix compilation on OS X with GCC 7 (#2256)
* Fix compilation on OS X with GCC 7

Compilation failed with

In file included from src/tree/tree_updater.cc:6:0:
include/xgboost/tree_updater.h:75:46: error: 'function' is not a member of 'std'
                                         std::function<TreeUpdater* ()> > {

caused by a missing <functional> include.

* Fixed another occurence of that issue spotted by @ClimberPG
2017-05-19 22:04:07 -07:00
PSEUDOTENSOR / Jonathan McKinney
3ca64ffa02 [GPU-Plugin] Improved split finding performance. (#2325) 2017-05-19 19:16:24 -07:00
jayzed82
29289d2302 Add option to choose booster in scikit intreface (gbtree by default) (#2303)
* Add option to choose booster in scikit intreface (gbtree by default)

* Add option to choose booster in scikit intreface: complete docstring.

* Fix XGBClassifier to work with booster option

* Added test case for gblinear booster
2017-05-18 23:12:27 -04:00
Nan Zhu
96f9776ab0 Update ISSUE_TEMPLATE.md (#2308)
* Update ISSUE_TEMPLATE.md

* Update ISSUE_TEMPLATE.md
2017-05-18 08:49:07 -07:00
Nan Zhu
a607f697e3 [jvm-packages] Disable fast histo for spark (#2296)
* add back train method but mark as deprecated

* fix scalastyle error

* disable fast histogram in xgboost4j-spark temporarily
2017-05-15 20:43:16 -07:00
Vadim Khotilovich
c66ca79221 [R] native routines registration (#2290)
* [R] add native routines registration

* c_api.h needs to include <cstdint> since it uses fixed width integer types

* [R] use registered native routines from R code

* [R] bump version; add info on native routine registration to the contributors guide

* make lint happy
2017-05-14 11:00:46 -07:00
Maurus Cuelenaere
6bd1869026 Add prediction of feature contributions (#2003)
* Add prediction of feature contributions

This implements the idea described at http://blog.datadive.net/interpreting-random-forests/
which tries to give insight in how a prediction is composed of its feature contributions
and a bias.

* Support multi-class models

* Calculate learning_rate per-tree instead of using the one from the first tree

* Do not rely on node.base_weight * learning_rate having the same value as the node mean value (aka leaf value, if it were a leaf); instead calculate them (lazily) on-the-fly

* Add simple test for contributions feature

* Check against param.num_nodes instead of checking for non-zero length

* Loop over all roots instead of only the first
2017-05-14 00:58:10 -05:00
Sergei Lebedev
e62be19c70 Removed 'flink.suffix' and added 'flink.version' (#2277)
The former was just Scala binary tag, and the latter was hardcoded in
the 'xgboost4j-flink' POM.
2017-05-10 08:42:40 -07:00
Nan Zhu
428453f7d6 [jvm-packages] fix the persistence of XGBoostEstimator (#2265)
* add back train method but mark as deprecated

* fix scalastyle error

* fix the persistence of XGBoostEstimator

* test persistence of a complete pipeline

* fix compilation issue

* do not allow persist custom_eval and custom_obj

* fix the failed tesl
2017-05-08 21:58:06 -07:00
Rory Mitchell
6bf968efe6 [GPU Plugin] Fast histogram speed improvements. Updated benchmarks. (#2258) 2017-05-08 09:21:38 -07:00
Dmitry Nikulin
98ea461532 Fix typo (#2264) 2017-05-07 16:54:48 -07:00
ebernhardson
197a9eacc5 [jvm-packages] Expose json dumps to scala (#2247)
* Add parameter passthru of format on Booster.getModelDump
2017-05-02 17:41:27 -07:00
ebernhardson
ccccf8a015 [jvm-packages] Accept groupData in spark model eval (#2244)
* Support model evaluation for ranking tasks by accepting
 groupData in XGBoostModel.eval
2017-05-02 10:03:20 -07:00
Vadim Khotilovich
a375ad2822 [R] maintenance Apr 2017 (#2237)
* [R] make sure things work for a single split model; fixes #2191

* [R] add option use_int_id to xgb.model.dt.tree

* [R] add example of exporting tree plot to a file

* [R] set save_period = NULL as default in xgboost() to be the same as in xgb.train; fixes #2182

* [R] it's a good practice after CRAN releases to bump up package version in dev

* [R] allow xgb.DMatrix construction from integer dense matrices

* [R] xgb.DMatrix: silent parameter; improve documentation

* [R] xgb.model.dt.tree code style changes

* [R] update NEWS with parameter changes

* [R] code safety & style; handle non-strict matrix and inherited classes of input and model; fixes #2242

* [R] change to x.y.z.p R-package versioning scheme and set version to 0.6.4.3

* [R] add an R package versioning section to the contributors guide

* [R] R-package/README.md: clean up the redundant old installation instructions, link the contributors guide
2017-05-01 22:51:34 -07:00
Philip Cho
d769b6bcb5 Fix performance degradation of BuildHist on Windows (#2243)
Reported in issue #2165. Dynamic scheduling of OpenMP loops involve
implicit synchronization. To implement synchronization, libgomp uses futex
(fast userspace mutex), whereas MinGW uses kernel-space mutex, which is more
costly. With chunk size of 1, synchronization overhead may become prohibitive
on Windows machines.

Solution: use 'guided' schedule to minimize the number of syncs
2017-05-01 15:54:44 -07:00
ebernhardson
da58f34ff8 Store metrics with learner (#2241)
Storing and then loading a model loses any eval_metric that was
provided. This causes implementations that always store/load, like
xgboost4j-spark, to be unable to eval with the desired metric.
2017-04-30 14:23:24 -07:00
ebernhardson
d3b866e3fd [jvm-packages] Expose json formatted booster dumps (#2233) (#2234)
* Change Booster dump from XGBoosterDumpModel to XGBoosterDumpModelEx

Allows exposing multiple formatting options of model dumping.
2017-04-29 20:23:09 -07:00
Qiang Kou (KK)
c441d0916e fix #2228 (#2238) 2017-04-29 18:44:08 -07:00
Rory Mitchell
8ab5d4611c [GPU-Plugin] (#2227)
* Add fast histogram algorithm
* Fix Linux build
* Add 'gpu_id' parameter
2017-04-25 16:37:10 -07:00
Tianqi Chen
d281c6aafa Update CONTRIBUTORS.md 2017-04-22 08:46:31 -07:00
Alex Bain
dbaa5d0bdf Disable invalid check for completely sparse batch that results in failed assertion for issue #1827 (#2213) 2017-04-21 09:28:02 -07:00
Nan Zhu
392aa6d1d3 [jvm-packages] make XGBoostModel hold BoosterParams as well (#2214)
* add back train method but mark as deprecated

* fix scalastyle error

* make XGBoostModel hold BoosterParams as well
2017-04-21 08:12:50 -07:00
Benjamin Pachev
e38bea3cdf Update README.md (#2202)
Add a link to a demo for the proposed PHP XGBoost wrapper.
2017-04-17 15:28:37 -07:00
avpronkin
31e800f340 erratum in index.md (#2203)
Mxnet instead of XGBoost
2017-04-17 15:24:18 -07:00
Seong-Jin Kim
8222755564 Fix typo in R-package README.md (#2190) 2017-04-13 20:22:23 +02:00
Preston Parry
1ab8088a09 Removes extraneous log (#2186)
This log appears to fire every time I ask the python package to make a prediction. It's the only log that fires from XGBoost. When we're getting predictions on millions of items a day in production, this log seems out of place.
2017-04-11 17:38:29 -07:00
Nan Zhu
a837fa9620 [jvm-packages] rdds containing boosters should be cleaned once we got boosters to driver (#2183) 2017-04-11 06:12:49 -07:00
Nan Zhu
f08077606c [jvm-packages] Clean external cache (#2181)
* add back train method but mark as deprecated

* fix scalastyle error

* change class to object in examples

* fix compilation error

* small fix for cleanExternalCache
2017-04-10 07:49:58 -07:00
Nan Zhu
8d8cbcc6db [jvm-packages] fixed several issues in unit tests (#2173)
* add back train method but mark as deprecated

* fix scalastyle error

* change class to object in examples

* fix compilation error

* fix several issues in tests
2017-04-06 06:25:23 -07:00
Philip Cho
2715baef64 Fix bugs in multithreaded ApplySplitSparseData() (#2161)
* Bugfix 1: Fix segfault in multithreaded ApplySplitSparseData()

When there are more threads than rows in rowset, some threads end up
with empty ranges, causing them to crash. (iend - 1 needs to be
accessible as part of algorithm)

Fix: run only those threads with nonempty ranges.

* Add regression test for Bugfix 1

* Moving python_omp_test to existing python test group

Turns out you don't need to set "OMP_NUM_THREADS" to enable
multithreading. Just add nthread parameter.

* Bugfix 2: Fix corner case of ApplySplitSparseData() for categorical feature

When split value is less than all cut points, split_cond is set
incorrectly.

Fix: set split_cond = -1 to indicate this scenario

* Bugfix 3: Initialize data layout indicator before using it

data_layout_ is accessed before being set; this variable determines
whether feature 0 is included in feat_set.

Fix: re-order code in InitData() to initialize data_layout_ first

* Adding regression test for Bugfix 2

Unfortunately, no regression test for Bugfix 3, as there is no
way to deterministically assign value to an uninitialized variable.
2017-04-02 11:37:39 -07:00