115 Commits

Author SHA1 Message Date
Vadim Khotilovich
a44032d095 [CORE] The update process for a tree model, and its application to feature importance (#1670)
* [CORE] allow updating trees in an existing model

* [CORE] in refresh updater, allow keeping old leaf values and update stats only

* [R-package] xgb.train mod to allow updating trees in an existing model

* [R-package] added check for nrounds when is_update

* [CORE] merge parameter declaration changes; unify their code style

* [CORE] move the update-process trees initialization to Configure; rename default process_type to 'default'; fix the trees and trees_to_update sizes comparison check

* [R-package] unit tests for the update process type

* [DOC] documentation for process_type parameter; improved docs for updater, Gamma and Tweedie; added some parameter aliases; metrics indentation and some were non-documented

* fix my sloppy merge conflict resolutions

* [CORE] add a TreeProcessType enum

* whitespace fix
2016-12-04 09:33:52 -08:00
AbdealiJK
6f16f0ef58 Use bst_float consistently throughout (#1824)
* Fix various typos

* Add override to functions that are overridden

gcc gives warnings about functions that are being overridden by not
being marked as oveirridden. This fixes it.

* Use bst_float consistently

Use bst_float for all the variables that involve weight,
leaf value, gradient, hessian, gain, loss_chg, predictions,
base_margin, feature values.

In some cases, when due to additions and so on the value can
take a larger value, double is used.

This ensures that type conversions are minimal and reduces loss of
precision.
2016-11-30 10:02:10 -08:00
AbdealiJK
b94fcab4dc Add dump_format=json option (#1726)
* Add format to the params accepted by DumpModel

Currently, only the test format is supported when trying to dump
a model. The plan is to add more such formats like JSON which are
easy to read and/or parse by machines. And to make the interface
for this even more generic to allow other formats to be added.

Hence, we make some modifications to make these function generic
and accept a new parameter "format" which signifies the format of
the dump to be created.

* Fix typos and errors in docs

* plugin: Mention all the register macros available

Document the register macros currently available to the plugin
writers so they know what exactly can be extended using hooks.

* sparce_page_source: Use same arg name in .h and .cc

* gbm: Add JSON dump

The dump_format argument can be used to specify what type
of dump file should be created. Add functionality to dump
gblinear and gbtree into a JSON file.

The JSON file has an array, each item is a JSON object for the tree.
For gblinear:
 - The item is the bias and weights vectors
For gbtree:
 - The item is the root node. The root node has a attribute "children"
   which holds the children nodes. This happens recursively.

* core.py: Add arg dump_format for get_dump()
2016-11-04 09:55:25 -07:00
Tianqi Chen
c93c9b7ed6 [TREE] Experimental version of monotone constraint (#1516)
* [TREE] Experimental version of monotone constraint

* Allow default detection of montone option

* loose the condition of strict check

* Update gbtree.cc
2016-09-07 21:28:43 -07:00
Tianqi Chen
ecec5f7959 [CORE] Refactor cache mechanism (#1540) 2016-09-02 20:39:07 -07:00
Vadim Khotilovich
75f401481f no exception throwing within omp parallel; set nthread in Learner (#1421) 2016-07-29 10:08:03 -07:00
anpark
0e61c514a7 fix duplicate loop over output_group when predict (#1342)
* fix sparse page source meta info empty when load from dmatrix

* fix duplicate loop over output_group when predict
2016-07-13 10:03:10 -07:00
Yoshinori Nakano
7cfeb5f012 fix Dart::NormalizeTrees (#1265) 2016-06-09 15:28:24 -07:00
Yoshinori Nakano
949d1e3027 add Dart booster (#1220) 2016-06-08 14:04:01 -07:00
tqchen
96b17971ac Fix continue training in CLI 2016-03-10 12:43:25 -08:00
tqchen
e80d3db64b [DIST] Enable multiple thread and tracker, make rabit and xgboost more thread-safe by using thread local variables. 2016-03-03 20:36:14 -08:00
tqchen
88447ca32e [MEM] Add rowset struct to save memory with billion level rows 2016-02-10 11:17:17 -08:00
tqchen
d75e3ed05d [LIBXGBOOST] pass demo running. 2016-01-16 10:24:01 -08:00
tqchen
0d95e863c9 [LEARNER] refactor learner 2016-01-16 10:24:01 -08:00
tqchen
4b4b36d047 [GBM] remove need to explicit InitModel, rename save/load 2016-01-16 10:24:01 -08:00
tqchen
82ceb4de0a [LEARNER] Init learner interface 2016-01-16 10:24:01 -08:00
tqchen
9042b9e2c7 [GBM] Finish migrate all gbms 2016-01-16 10:24:01 -08:00
tqchen
4f26d98150 [Update] remove rabit subtree, use submodule, move code 2016-01-16 10:24:01 -08:00
tqchen
d530e0c14f [REFACTOR] cleanup structure 2016-01-16 10:24:00 -08:00
Vadim Khotilovich
c70022e6c4 spelling, wording, and doc fixes in c++ code
I was reading through the code and fixing some things in the comments.
Only a few trivial actual code changes were made to make things more
readable.
2015-12-12 21:40:12 -06:00
yoori
981f06b9d1 style fix 2015-10-20 00:58:11 +04:00
yoori
49c1cb6990 GBTree::Predict performance fix: removed excess thread_temp initialization 2015-10-20 00:52:37 +04:00
yoori
c0853967d5 GBTree::Predict performance fix: removed excess thread_temp initialization 2015-10-20 00:06:00 +04:00
tqchen
0162bb7034 lint half way 2015-07-03 18:31:52 -07:00
tqchen
7f7947f31c add with pbuffer info to model, allow xgb model to be saved in a more memory compact way 2015-05-06 15:43:15 -07:00
Tianqi Chen
afdebe8d8f fix platform dependent thing 2015-04-25 20:40:43 -07:00
tqchen
fba9e5c714 quick fix 2015-04-05 12:01:19 -07:00
tqchen
12528c535a fix 2015-03-11 11:22:51 -07:00
tqchen
a16cbedfab try fix memleak when test data have more features than training 2015-02-10 21:49:29 -08:00
tqchen
1211ea40c9 add single instance prediction 2015-01-19 08:07:22 -08:00
tqchen
b762231b02 change makefile to lazy checkpt, fix col splt code 2015-01-15 21:32:31 -08:00
tqchen
c8f422b3b9 add dump to linear model 2014-12-24 02:56:32 -08:00
tqchen
31eedfea59 pas mock, need to fix rabit lib for not initialization 2014-12-21 00:14:00 -08:00
tqchen
deb21351b9 add rabit checkpoint to xgb 2014-12-20 01:05:40 -08:00
tqchen
db2adb6885 start check windows compatiblity 2014-11-23 20:59:10 -08:00
Tianqi Chen
2e444f8338 remove warning from MSVC need another round of check 2014-11-23 20:52:13 -08:00
tqchen
168bb0d0c9 add predict leaf indices 2014-11-21 09:32:09 -08:00
tqchen
970dd58dc2 checkin continue training 2014-11-19 20:06:08 -08:00
tqchen
0cf2dd39ea new change for mpi 2014-10-16 15:12:10 -07:00
tqchen
d4ab359be1 fix 2014-09-07 20:01:03 -07:00
tqchen
19a1ee24a5 try predpath 2014-09-07 18:40:15 -07:00
tqchen
10648a1ca7 remove using std from cpp 2014-09-02 22:43:19 -07:00
tqchen
9100ffc12a chg version 2014-09-01 22:32:03 -07:00
tqchen
4592e500cb add ntree limit 2014-09-01 15:10:19 -07:00
tqchen@graphlab.com
075dc9a998 pass build 2014-08-27 19:19:04 -07:00
tqchen@graphlab.com
605269133e complete refactor data.h, now replies on iterator to access column 2014-08-27 17:00:21 -07:00
tqchen@graphlab.com
a59f8945dc rename SparseBatch to RowBatch 2014-08-27 10:56:55 -07:00
tqchen@graphlab.com
d5a5e0a42a rename findex->index 2014-08-27 10:52:27 -07:00
tqchen
414e7f27ff Merge branch 'master' into unity
Conflicts:
	src/learner/evaluation-inl.hpp
	wrapper/xgboost_R.cpp
	wrapper/xgboost_wrapper.cpp
	wrapper/xgboost_wrapper.h
2014-08-26 20:32:07 -07:00
tqchen
7739f57c8b change omp loop var to bst_omp_uint, add XGB_DLL to wrapper 2014-08-26 19:37:04 -07:00