This log appears to fire every time I ask the python package to make a prediction. It's the only log that fires from XGBoost. When we're getting predictions on millions of items a day in production, this log seems out of place.
* add back train method but mark as deprecated
* fix scalastyle error
* change class to object in examples
* fix compilation error
* small fix for cleanExternalCache
* add back train method but mark as deprecated
* fix scalastyle error
* change class to object in examples
* fix compilation error
* fix several issues in tests
* Bugfix 1: Fix segfault in multithreaded ApplySplitSparseData()
When there are more threads than rows in rowset, some threads end up
with empty ranges, causing them to crash. (iend - 1 needs to be
accessible as part of algorithm)
Fix: run only those threads with nonempty ranges.
* Add regression test for Bugfix 1
* Moving python_omp_test to existing python test group
Turns out you don't need to set "OMP_NUM_THREADS" to enable
multithreading. Just add nthread parameter.
* Bugfix 2: Fix corner case of ApplySplitSparseData() for categorical feature
When split value is less than all cut points, split_cond is set
incorrectly.
Fix: set split_cond = -1 to indicate this scenario
* Bugfix 3: Initialize data layout indicator before using it
data_layout_ is accessed before being set; this variable determines
whether feature 0 is included in feat_set.
Fix: re-order code in InitData() to initialize data_layout_ first
* Adding regression test for Bugfix 2
Unfortunately, no regression test for Bugfix 3, as there is no
way to deterministically assign value to an uninitialized variable.
* Add UpdatePredictionCache() option to updaters
Some updaters (e.g. fast_hist) has enough information to quickly compute
prediction cache for the training data. Each updater may override
UpdaterPredictionCache() method to update the prediction cache. Note: this
trick does not apply to validation data.
* Respond to code review
* Disable some debug messages by default
* Document UpdatePredictionCache() interface
* Remove base_margin logic from UpdatePredictionCache() implementation
* Do not take pointer to cfg, as reference may get stale
* Improve multi-threaded performance
* Use columnwise accessor to accelerate ApplySplit() step,
with support for a compressed representation
* Parallel sort for evaluation step
* Inline BuildHist() function
* Cache gradient pairs when building histograms in BuildHist()
* Add missing #if macro
* Respond to code review
* Use wrapper to enable parallel sort on Linux
* Fix C++ compatibility issues
* MSVC doesn't support unsigned in OpenMP loops
* gcc 4.6 doesn't support using keyword
* Fix lint issues
* Respond to code review
* Fix bug in ApplySplitSparseData()
* Attempting to read beyond the end of a sparse column
* Mishandling the case where an entire range of rows have missing values
* Fix training continuation bug
Disable UpdatePredictionCache() in the first iteration. This way, we can
accomodate the scenario where we build off of an existing (nonempty) ensemble.
* Add regression test for fast_hist
* Respond to code review
* Add back old version of ApplySplitSparseData
* Updated sklearn_parallel.py for soon-to-be-deprecated modules
* Updated predict_leaf_indices.py; Use python3 print() as other exmaples and removed unused module
This commit proposes a simpler single compiler specification for OSX and *nix. It also let's people override the setting on both systems, not just *nix.
I use the online prediction function(`inline void Predict(const SparseBatch::Inst &inst, ... ) const;`), the results obtained are different from the results of the batch prediction function(` virtual void Predict(DMatrix* data, ...) const = 0`). After the investigation found that the online prediction function using the `base_score_` parameters, and the batch prediction function is not used in this parameter. It is found that the `base_score_` values are different when the same model file is loaded many times.
```
1st times:base_score_: 6.69023e-21
2nd times:base_score_: -3.7668e+19
3rd times:base_score_: 5.40507e+07
```
Online prediction results are affected by `base_score_` parameters. After deleting the if condition(`if (out_preds->size() == 1)`) , the online prediction is consistent with the batch prediction results, and the xgboost prediction results are consistent with python version. Therefore, it is likely that the online prediction function is bug
* [jvm-packages] call setGroup for ranking task
* passing groupData through xgBoostConfMap
* fix original comment position
* make groupData param
* remove groupData variable, use xgBoostConfMap directly
* set default groupData value
* add use groupData tests
* reduce rank-demo size
* use TaskContext.getPartitionId() instead of mapPartitionsWithIndex
* add DF use groupData test
* remove unused varable
* add back train method but mark as deprecated
* fix scalastyle error
* first commit in scala binding for fast histo
* java test
* add missed scala tests
* spark training
* add back train method but mark as deprecated
* fix scalastyle error
* local change
* first commit in scala binding for fast histo
* local change
* fix df frame test
* add back train method but mark as deprecated
* fix scalastyle error
* change class to object in examples
* fix compilation error
* bump spark version to 2.1
* preserve num_class issues
* fix failed test cases
* rivising
* add multi class test
verbose_eval docs claim it will log the last iteration (http://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.train). this is also consistent w/the behavior from 0.4. not a huge deal but I found it handy to see the last iter's result b/c my period is usually large.
this doesn't address logging the last stage found by early_stopping (as noted in docs) as I'm not sure how to do that.