* Specified 'exec-maven-plugin' version
* Changed 'create_jni.sh' to fail on error
and also report each of the executed commands, which makes it easier
to debug.
for loop in create.new.tree.features was referencing length(trees) as the upper bound of the loop. trees is a base R dataset and not the model that the code is generating. Changed loop boundary to model$niter which should be the number of trees.
* Added kwargs support for Sklearn API
* Updated NEWS and CONTRIBUTORS
* Fixed CONTRIBUTORS.md
* Added clarification of **kwargs and test for proper usage
* Fixed lint error
* Fixed more lint errors and clf assigned but never used
* Fixed more lint errors
* Fixed more lint errors
* Fixed issue with changes from different branch bleeding over
* Fixed issue with changes from other branch bleeding over
* Added note that kwargs may not be compatible with Sklearn
* Fixed linting on kwargs note
* Added n_jobs and random_state to keep up to date with sklearn API.
Deprecated nthread and seed. Added tests for new params and
deprecations.
* Fixed docstring to reflect updates to n_jobs and random_state.
* Fixed whitespace issues and removed nose import.
* Added deprecation note for nthread and seed in docstring.
* Attempted fix of deprecation tests.
* Second attempted fix to tests.
* Set n_jobs to 1.
* [gblinear] add features contribution prediction; fix DumpModel bug
* [gbtree] minor changes to PredContrib
* [R] add feature contribution prediction to R
* [R] bump up version; update NEWS
* [gblinear] fix the base_margin issue; fixes#1969
* [R] list of matrices as output of multiclass feature contributions
* [gblinear] make order of DumpModel coefficients consistent: group index changes the fastest
* Fix compilation on OS X with GCC 7
Compilation failed with
In file included from src/tree/tree_updater.cc:6:0:
include/xgboost/tree_updater.h:75:46: error: 'function' is not a member of 'std'
std::function<TreeUpdater* ()> > {
caused by a missing <functional> include.
* Fixed another occurence of that issue spotted by @ClimberPG
* Add option to choose booster in scikit intreface (gbtree by default)
* Add option to choose booster in scikit intreface: complete docstring.
* Fix XGBClassifier to work with booster option
* Added test case for gblinear booster
* [R] add native routines registration
* c_api.h needs to include <cstdint> since it uses fixed width integer types
* [R] use registered native routines from R code
* [R] bump version; add info on native routine registration to the contributors guide
* make lint happy
* Add prediction of feature contributions
This implements the idea described at http://blog.datadive.net/interpreting-random-forests/
which tries to give insight in how a prediction is composed of its feature contributions
and a bias.
* Support multi-class models
* Calculate learning_rate per-tree instead of using the one from the first tree
* Do not rely on node.base_weight * learning_rate having the same value as the node mean value (aka leaf value, if it were a leaf); instead calculate them (lazily) on-the-fly
* Add simple test for contributions feature
* Check against param.num_nodes instead of checking for non-zero length
* Loop over all roots instead of only the first
* add back train method but mark as deprecated
* fix scalastyle error
* fix the persistence of XGBoostEstimator
* test persistence of a complete pipeline
* fix compilation issue
* do not allow persist custom_eval and custom_obj
* fix the failed tesl
* [R] make sure things work for a single split model; fixes#2191
* [R] add option use_int_id to xgb.model.dt.tree
* [R] add example of exporting tree plot to a file
* [R] set save_period = NULL as default in xgboost() to be the same as in xgb.train; fixes#2182
* [R] it's a good practice after CRAN releases to bump up package version in dev
* [R] allow xgb.DMatrix construction from integer dense matrices
* [R] xgb.DMatrix: silent parameter; improve documentation
* [R] xgb.model.dt.tree code style changes
* [R] update NEWS with parameter changes
* [R] code safety & style; handle non-strict matrix and inherited classes of input and model; fixes#2242
* [R] change to x.y.z.p R-package versioning scheme and set version to 0.6.4.3
* [R] add an R package versioning section to the contributors guide
* [R] R-package/README.md: clean up the redundant old installation instructions, link the contributors guide
Reported in issue #2165. Dynamic scheduling of OpenMP loops involve
implicit synchronization. To implement synchronization, libgomp uses futex
(fast userspace mutex), whereas MinGW uses kernel-space mutex, which is more
costly. With chunk size of 1, synchronization overhead may become prohibitive
on Windows machines.
Solution: use 'guided' schedule to minimize the number of syncs
Storing and then loading a model loses any eval_metric that was
provided. This causes implementations that always store/load, like
xgboost4j-spark, to be unable to eval with the desired metric.
This log appears to fire every time I ask the python package to make a prediction. It's the only log that fires from XGBoost. When we're getting predictions on millions of items a day in production, this log seems out of place.
* add back train method but mark as deprecated
* fix scalastyle error
* change class to object in examples
* fix compilation error
* small fix for cleanExternalCache
* add back train method but mark as deprecated
* fix scalastyle error
* change class to object in examples
* fix compilation error
* fix several issues in tests
* Bugfix 1: Fix segfault in multithreaded ApplySplitSparseData()
When there are more threads than rows in rowset, some threads end up
with empty ranges, causing them to crash. (iend - 1 needs to be
accessible as part of algorithm)
Fix: run only those threads with nonempty ranges.
* Add regression test for Bugfix 1
* Moving python_omp_test to existing python test group
Turns out you don't need to set "OMP_NUM_THREADS" to enable
multithreading. Just add nthread parameter.
* Bugfix 2: Fix corner case of ApplySplitSparseData() for categorical feature
When split value is less than all cut points, split_cond is set
incorrectly.
Fix: set split_cond = -1 to indicate this scenario
* Bugfix 3: Initialize data layout indicator before using it
data_layout_ is accessed before being set; this variable determines
whether feature 0 is included in feat_set.
Fix: re-order code in InitData() to initialize data_layout_ first
* Adding regression test for Bugfix 2
Unfortunately, no regression test for Bugfix 3, as there is no
way to deterministically assign value to an uninitialized variable.
* Add UpdatePredictionCache() option to updaters
Some updaters (e.g. fast_hist) has enough information to quickly compute
prediction cache for the training data. Each updater may override
UpdaterPredictionCache() method to update the prediction cache. Note: this
trick does not apply to validation data.
* Respond to code review
* Disable some debug messages by default
* Document UpdatePredictionCache() interface
* Remove base_margin logic from UpdatePredictionCache() implementation
* Do not take pointer to cfg, as reference may get stale
* Improve multi-threaded performance
* Use columnwise accessor to accelerate ApplySplit() step,
with support for a compressed representation
* Parallel sort for evaluation step
* Inline BuildHist() function
* Cache gradient pairs when building histograms in BuildHist()
* Add missing #if macro
* Respond to code review
* Use wrapper to enable parallel sort on Linux
* Fix C++ compatibility issues
* MSVC doesn't support unsigned in OpenMP loops
* gcc 4.6 doesn't support using keyword
* Fix lint issues
* Respond to code review
* Fix bug in ApplySplitSparseData()
* Attempting to read beyond the end of a sparse column
* Mishandling the case where an entire range of rows have missing values
* Fix training continuation bug
Disable UpdatePredictionCache() in the first iteration. This way, we can
accomodate the scenario where we build off of an existing (nonempty) ensemble.
* Add regression test for fast_hist
* Respond to code review
* Add back old version of ApplySplitSparseData
* Updated sklearn_parallel.py for soon-to-be-deprecated modules
* Updated predict_leaf_indices.py; Use python3 print() as other exmaples and removed unused module
This commit proposes a simpler single compiler specification for OSX and *nix. It also let's people override the setting on both systems, not just *nix.