Move this function into gbtree, and uses only updater for doing so. As now the predictor knows exactly how many trees to predict, there's no need for it to update the prediction cache.
* Simplify DropTrees calling logic
* Add `training` parameter for prediction method.
* [Breaking]: Add `training` to C API.
* Change for R and Python custom objective.
* Correct comment.
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
* Use CMake config file for representing version.
* Generate c and Python version file with CMake.
The generated file is written into source tree. But unless XGBoost upgrades
its version, there will be no actual modification. This retains compatibility
with Makefiles for R.
* Add XGBoost version the DMatrix binaries.
* Simplify prefetch detection in CMakeLists.txt
* adding support for matrix slicing with query ID for cross-validation
* hail mary test of unrar installation for windows tests
* trying to modify tests to run in Github CI
* Remove dependency on wget and unrar
* Save error log from R test
* Relax assertion in test_training
* Use int instead of bool in C function interface
* Revise R interface
* Add XGDMatrixSliceDMatrixEx and keep old XGDMatrixSliceDMatrix for API compatibility
* Refactor CMake scripts.
* Remove CMake CUDA wrapper.
* Bump CMake version for CUDA.
* Use CMake to handle Doxygen.
* Split up CMakeList.
* Export install target.
* Use modern CMake.
* Remove build.sh
* Workaround for gpu_hist test.
* Use cmake 3.12.
* Revert machine.conf.
* Move CLI test to gpu.
* Small cleanup.
* Support using XGBoost as submodule.
* Fix windows
* Fix cpp tests on Windows
* Remove duplicated find_package.
* [gblinear] add features contribution prediction; fix DumpModel bug
* [gbtree] minor changes to PredContrib
* [R] add feature contribution prediction to R
* [R] bump up version; update NEWS
* [gblinear] fix the base_margin issue; fixes#1969
* [R] list of matrices as output of multiclass feature contributions
* [gblinear] make order of DumpModel coefficients consistent: group index changes the fastest
* [R] add native routines registration
* c_api.h needs to include <cstdint> since it uses fixed width integer types
* [R] use registered native routines from R code
* [R] bump version; add info on native routine registration to the contributors guide
* make lint happy
* Add prediction of feature contributions
This implements the idea described at http://blog.datadive.net/interpreting-random-forests/
which tries to give insight in how a prediction is composed of its feature contributions
and a bias.
* Support multi-class models
* Calculate learning_rate per-tree instead of using the one from the first tree
* Do not rely on node.base_weight * learning_rate having the same value as the node mean value (aka leaf value, if it were a leaf); instead calculate them (lazily) on-the-fly
* Add simple test for contributions feature
* Check against param.num_nodes instead of checking for non-zero length
* Loop over all roots instead of only the first
* Fix various typos
* Add override to functions that are overridden
gcc gives warnings about functions that are being overridden by not
being marked as oveirridden. This fixes it.
* Use bst_float consistently
Use bst_float for all the variables that involve weight,
leaf value, gradient, hessian, gain, loss_chg, predictions,
base_margin, feature values.
In some cases, when due to additions and so on the value can
take a larger value, double is used.
This ensures that type conversions are minimal and reduces loss of
precision.
* Add format to the params accepted by DumpModel
Currently, only the test format is supported when trying to dump
a model. The plan is to add more such formats like JSON which are
easy to read and/or parse by machines. And to make the interface
for this even more generic to allow other formats to be added.
Hence, we make some modifications to make these function generic
and accept a new parameter "format" which signifies the format of
the dump to be created.
* Fix typos and errors in docs
* plugin: Mention all the register macros available
Document the register macros currently available to the plugin
writers so they know what exactly can be extended using hooks.
* sparce_page_source: Use same arg name in .h and .cc
* gbm: Add JSON dump
The dump_format argument can be used to specify what type
of dump file should be created. Add functionality to dump
gblinear and gbtree into a JSON file.
The JSON file has an array, each item is a JSON object for the tree.
For gblinear:
- The item is the bias and weights vectors
For gbtree:
- The item is the root node. The root node has a attribute "children"
which holds the children nodes. This happens recursively.
* core.py: Add arg dump_format for get_dump()
* Changes for Mingw64 compilation to ensure long is a consistent size.
Mainly impacts the Java API which would not compile, but there may be
silent errors on Windows with large datasets before this patch (as long
is 32-bits when compiled with mingw64 even in 64-bit mode).
* Adding ifdefs to ensure it still compiles on MacOS
* Makefile and create_jni.bat changes for Windows.
* Switching XGDMatrixCreateFromCSREx JNI call to use size_t cast
* Fixing lint error, adding profile switching to jvm-packages build to make create-jni.bat get called, adding myself to Contributors.Md