* Disabled excessive Spark logging in tests
* Fixed a singature of XGBoostModel.predict
Prior to this commit XGBoostModel.predict produced an RDD with
an array of predictions for each partition, effectively changing
the shape wrt the input RDD. A more natural contract for prediction
API is that given an RDD it returns a new RDD with the same number
of elements. This allows the users to easily match inputs with
predictions.
This commit removes one layer of nesting in XGBoostModel.predict output.
Even though the change is clearly non-backward compatible, I still
think it is well justified.
* Removed boxing in XGBoost.fromDenseToSparseLabeledPoints
* Inlined XGBoost.repartitionData
An if is more explicit than an opaque method name.
* Moved XGBoost.convertBoosterToXGBoostModel to XGBoostModel
* Check the input dimension in DMatrix.setBaseMargin
Prior to this commit providing an array of incorrect dimensions would
have resulted in memory corruption. Maybe backport this to C++?
* Reduced nesting in XGBoost.buildDistributedBoosters
* Ensured consistent naming of the params map
* Cleaned up DataBatch to make it easier to comprehend
* Made scalastyle happy
* Added baseMargin to XGBoost.train and trainWithRDD
* Deprecated XGBoost.train
It is ambiguous and work only for RDDs.
* Addressed review comments
* Revert "Fixed a singature of XGBoostModel.predict"
This reverts commit 06bd5dcae7780265dd57e93ed7d4135f4e78f9b4.
* Addressed more review comments
* Fixed NullPointerException in buildDistributedBoosters
* Fixed DLL name on Windows in ``xgboost.libpath``
* Added support for OS X to ``xgboost.libpath``
* Use .dylib for shared library on OS X
This does not affect the JNI library, because it is not trully
cross-platform in the Makefile-build anyway.
* Exposed prediction feature contribution on the Java side
* was not supplying the newly added argument
* Exposed from Scala-side as well
* formatting (keep declaration in one line unless exceeding 100 chars)
* [jvm-packages] Ensure the native library is loaded once
Previously any class using XGBoostJNI queried NativeLibLoader to make
sure the native library is loaded. This commit moves the initXGBoost
call to XGBoostJNI, effectively delegating the initialization to the class
loader.
Note also, that now XGBoostJNI would NOT suppress an IOException if it
occured in initXGBoost.
* [jvm-packages] Fused JNIErrorHandle with XGBoostJNI
There was no reason for having a separate class.
When using xgboost4j-spark I had executors getting killed much more
often than i would expect by yarn for overrunning their memory limits,
based on the memoryOverhead provided. It looks like a significant
amount of this is because dmatrix's were being created but not released,
because they were only released when the GC decided it was time to
cleanup the references.
Rather than waiting for the GC, relesae the DMatrix's when we know
they are no longer necessary.
* [jvm-packages] Fixed compilation on Windows
* [jvm-packages] Build the JNI bindings on Appveyor
* [jvm-packages] Build & test on OS X
* [jvm-packages] Re-applied the CMake build changes reverted by #2395
* Fixed Appveyor JVM build
* Muted Maven on Travis
* Don't link with libawt
* "linux2"->"linux"
Python2.x and 3.X use slightly different values for ``sys.platform``.
* Support for builing gpu-plugins to specific GPU architectures
1. Option GPU_COMPUTE_VER exposed from both Makefile and CMakeLists.txt
2. updater_gpu documentation updated accordingly
* Re-introduced GPU_COMPUTE_VER option in the cmake flow.
This seems to fix the compile-time, rdc=true and copy-constructor related
errors seen and discussed in PR #2390.
* [jvm-packages] Fixed JNI_OnLoad overload
It does not compile on Windows without proper export flags.
* [jvm-packages] Use JNI types directly where appropriate
* Removed lib hack from CMake build
Prior to this commit the CMake build use hardcoded lib prefix for
libxgboost and libxgboost4j. Unfortunatelly this did not play well with
Windows, which does not use the lib- prefix.
* [jvm-packages] Replaced create_jni.{bat,sh} with a Python version
This allows to have a single script for all platforms.
* [jvm-packages] Added all configuration options to create_jni.py
Use int32_t explicitly when serializing version field of dmatrix in binary
format. On ILP64 architectures, although very little, size of int is 64 bits.
* Integrating a faster version of grow_gpu plugin
1. Removed the older files to reduce duplication
2. Moved all of the grow_gpu files under 'exact' folder
3. All of them are inside 'exact' namespace to avoid any conflicts
4. Fixed a bug in benchmark.py while running only 'grow_gpu' plugin
5. Added cub and googletest submodules to ease integration and unit-testing
6. Updates to CMakeLists.txt to directly build cuda objects into libxgboost
* Added support for building gpu plugins through make flow
1. updated makefile and config.mk to add right targets
2. added unit-tests for gpu exact plugin code
* 1. Added support for building gpu plugin using 'make' flow as well
2. Updated instructions for building and testing gpu plugin
* Fix travis-ci errors for PR#2360
1. lint errors on unit-tests
2. removed googletest, instead depended upon dmlc-core provide gtest cache
* Some more fixes to travis-ci lint failures PR#2360
* Added Rory's copyrights to the files containing code from both.
* updated copyright statement as per Rory's request
* moved the static datasets into a script to generate them at runtime
* 1. memory usage print when silent=0
2. tests/ and test/ folder organization
3. removal of the dependency of googletest for just building xgboost
4. coding style updates for .cuh as well
* Fixes for compilation warnings
* add cuda object files as well when JVM_BINDINGS=ON
* [jvm-packages] Added libxgboost4j to CMake build
* [jvm-packages] Wired CMake build into create_jni.sh
* User newer CMake version on Travis
* Lowered CMake version constraints
* Fixed various quirks in the new CMake build
Don't use implicit conversions to c_int, which incidentally happen to work
on (some) 64-bit platforms, but:
* may lead to truncation of the input value to a 32-bit signed int,
* cause segfaults on some 32-bit architectures (tested on Ubuntu ARM,
but is also the likely cause of issue #1707).
Also, when passing references use explicit 64-bit integers, where needed,
instead of c_ulong, which is not guaranteed to be this large.
* Specified 'exec-maven-plugin' version
* Changed 'create_jni.sh' to fail on error
and also report each of the executed commands, which makes it easier
to debug.
for loop in create.new.tree.features was referencing length(trees) as the upper bound of the loop. trees is a base R dataset and not the model that the code is generating. Changed loop boundary to model$niter which should be the number of trees.
* Added kwargs support for Sklearn API
* Updated NEWS and CONTRIBUTORS
* Fixed CONTRIBUTORS.md
* Added clarification of **kwargs and test for proper usage
* Fixed lint error
* Fixed more lint errors and clf assigned but never used
* Fixed more lint errors
* Fixed more lint errors
* Fixed issue with changes from different branch bleeding over
* Fixed issue with changes from other branch bleeding over
* Added note that kwargs may not be compatible with Sklearn
* Fixed linting on kwargs note
* Added n_jobs and random_state to keep up to date with sklearn API.
Deprecated nthread and seed. Added tests for new params and
deprecations.
* Fixed docstring to reflect updates to n_jobs and random_state.
* Fixed whitespace issues and removed nose import.
* Added deprecation note for nthread and seed in docstring.
* Attempted fix of deprecation tests.
* Second attempted fix to tests.
* Set n_jobs to 1.
* [gblinear] add features contribution prediction; fix DumpModel bug
* [gbtree] minor changes to PredContrib
* [R] add feature contribution prediction to R
* [R] bump up version; update NEWS
* [gblinear] fix the base_margin issue; fixes#1969
* [R] list of matrices as output of multiclass feature contributions
* [gblinear] make order of DumpModel coefficients consistent: group index changes the fastest
* Fix compilation on OS X with GCC 7
Compilation failed with
In file included from src/tree/tree_updater.cc:6:0:
include/xgboost/tree_updater.h:75:46: error: 'function' is not a member of 'std'
std::function<TreeUpdater* ()> > {
caused by a missing <functional> include.
* Fixed another occurence of that issue spotted by @ClimberPG
* Add option to choose booster in scikit intreface (gbtree by default)
* Add option to choose booster in scikit intreface: complete docstring.
* Fix XGBClassifier to work with booster option
* Added test case for gblinear booster
* [R] add native routines registration
* c_api.h needs to include <cstdint> since it uses fixed width integer types
* [R] use registered native routines from R code
* [R] bump version; add info on native routine registration to the contributors guide
* make lint happy
* Add prediction of feature contributions
This implements the idea described at http://blog.datadive.net/interpreting-random-forests/
which tries to give insight in how a prediction is composed of its feature contributions
and a bias.
* Support multi-class models
* Calculate learning_rate per-tree instead of using the one from the first tree
* Do not rely on node.base_weight * learning_rate having the same value as the node mean value (aka leaf value, if it were a leaf); instead calculate them (lazily) on-the-fly
* Add simple test for contributions feature
* Check against param.num_nodes instead of checking for non-zero length
* Loop over all roots instead of only the first