Compare commits

...

709 Commits
v0.1 ... v0.32

Author SHA1 Message Date
Tianqi Chen
852ce6be0b Update README.md 2014-09-07 16:48:45 -07:00
Tong He
946f3c7ac5 Update DESCRIPTION 2014-09-07 10:36:50 -07:00
tqchen
5621d9811f remove deprecate 2014-09-07 10:17:34 -07:00
hetong
9e3b878943 refine style with max.depth 2014-09-06 23:20:11 -07:00
hetong
1925321a16 remove incorrect link to old folders 2014-09-06 23:14:38 -07:00
hetong
80636cd804 improve runall.R 2014-09-06 23:06:47 -07:00
hetong
cd35d88a03 remove inst/, improve vignette 2014-09-06 23:05:21 -07:00
hetong
50d77c72eb Merge branch 'master' of https://github.com/tqchen/xgboost 2014-09-06 22:48:24 -07:00
hetong
fbecd163c5 replace iris in docs 2014-09-06 22:48:08 -07:00
tqchen
89b9965cbf change max depth 2014-09-06 22:40:51 -07:00
Tianqi Chen
32a2925be8 Update build.sh 2014-09-06 22:27:25 -07:00
Tianqi Chen
2d2cee879d Update build.sh 2014-09-06 22:26:35 -07:00
tqchen
17ebdde707 chg back to g++ 2014-09-06 22:21:50 -07:00
tqchen
014e830a04 Merge branch 'master' of ssh://github.com/tqchen/xgboost 2014-09-06 22:20:18 -07:00
tqchen
a7a0b34a54 add auto build script 2014-09-06 22:20:11 -07:00
hetong
ddf715953a forced add doc for test 2014-09-06 22:03:07 -07:00
hetong
d174a79fbd add doc for agaricus.test 2014-09-06 21:54:12 -07:00
hetong
43a781f59b improvement for reducing warnings 2014-09-06 21:28:42 -07:00
hetong
d214013681 Merge branch 'master' of https://github.com/tqchen/xgboost 2014-09-06 19:02:56 -07:00
hetong
e04b6aaec5 add documentation for datasets 2014-09-06 19:02:23 -07:00
Tianqi Chen
e7bce3a940 Update xgb.DMatrix.save.R 2014-09-06 18:38:01 -07:00
Tianqi Chen
67fc1dd990 Update xgb.DMatrix.save.R 2014-09-06 18:37:34 -07:00
hetong
99b7ead5ad re-compress the data 2014-09-06 18:29:13 -07:00
hetong
a9bdf38885 Merge branch 'master' of https://github.com/tqchen/xgboost 2014-09-06 11:23:19 -07:00
tqchen
09e39e5901 chg pack file 2014-09-06 11:21:54 -07:00
hetong
c3cef7e2c7 Merge branch 'master' of https://github.com/tqchen/xgboost 2014-09-06 11:17:43 -07:00
hetong
f1d7b012a6 refine doc, with Rd 2014-09-06 11:17:38 -07:00
tqchen
515befd4f9 remove runall 2014-09-06 11:15:10 -07:00
tqchen
a42bcaf61f add 2014-09-06 11:14:32 -07:00
tqchen
e9ed4eb1a2 ok 2014-09-06 11:13:19 -07:00
tqchen
7879db8702 Merge branch 'master' of ssh://github.com/tqchen/xgboost 2014-09-06 10:29:42 -07:00
tqchen
35431e664e add boost from prediction 2014-09-06 10:28:48 -07:00
hetong
166df74024 Merge branch 'master' of https://github.com/tqchen/xgboost 2014-09-06 10:20:05 -07:00
hetong
a35d93c736 change data from iris back to mushroom 2014-09-06 10:19:46 -07:00
tqchen
4a8612defc add customize objective 2014-09-06 10:19:19 -07:00
tqchen
b858283ec5 add basic walkthrough 2014-09-06 10:11:45 -07:00
hetong
8ad9293437 expose setinfo 2014-09-06 00:44:24 -07:00
hetong
9e05db7261 add mushroom data 2014-09-06 00:26:02 -07:00
hetong
3014ac6778 Merge branch 'master' of https://github.com/tqchen/xgboost 2014-09-06 00:23:02 -07:00
hetong
bb2c61f7b5 custom eval 2014-09-06 00:16:55 -07:00
tqchen
6157d538c1 check in current iris 2014-09-05 23:22:54 -07:00
hetong
4d00be84c3 Merge branch 'master' of https://github.com/tqchen/xgboost 2014-09-05 23:04:00 -07:00
hetong
905051b7cb in the middle of guide-r 2014-09-05 23:03:04 -07:00
tqchen
ab238ff831 chg cv 2014-09-05 22:46:09 -07:00
tqchen
831a102d48 add cv 2014-09-05 22:36:59 -07:00
tqchen
0ecd6c08f3 add cross validation 2014-09-05 22:34:32 -07:00
tqchen
bc1817ca2f Merge branch 'master' of ssh://github.com/tqchen/xgboost 2014-09-05 20:34:46 -07:00
tqchen
984102e586 style cleanup, incomplete CV 2014-09-05 20:34:41 -07:00
hetong
af07f5135a cleaning 2014-09-05 20:33:39 -07:00
hetong
63dd037db6 add r basic walkthrough 2014-09-05 20:25:38 -07:00
hetong
de08c5a3da remove temp files 2014-09-05 19:49:25 -07:00
hetong
801a17fa02 fix iris to Rd files 2014-09-05 19:47:58 -07:00
hetong
d776e0fdf5 fix iris multiclass problem 2014-09-05 19:22:27 -07:00
Tianqi Chen
2b170ecda4 Merge pull request #69 from giuliohome/fix
Fixing Configuration Type for Win32/Debug.

Thanks Giulio!
2014-09-05 08:41:34 -07:00
giuliohome
59e1e75857 same version
reset changes
2014-09-05 13:37:18 +02:00
giuliohome
1d90288655 Fixing Configuration Type for Win32/Debug
Proposed fix to the main repo
Changed the windows wrapper type to DynamicLibrary. It was already ok
for the Win64/Release. maybe it got lost after latest commit
2014-09-05 13:30:02 +02:00
giuliohome
efbd1b21a6 Merge branch 'tqchen-master' 2014-09-05 13:26:20 +02:00
giuliohome
909a61edac Merge branch 'master' of https://github.com/tqchen/xgboost into tqchen-master
Conflicts:
	README.md
2014-09-05 13:24:45 +02:00
giuliohome
73b627d532 Fixing Configuration Type for Win32/Debug
Proposed fix to the main repo
Changed the windows wrapper type to DynamicLibrary. It was already ok
for the Win64/Release. maybe it got lost after latest commit
2014-09-05 13:08:06 +02:00
tqchen
e8df76b131 make it cleaner 2014-09-04 21:22:02 -07:00
tqchen
80bf8b71f2 OK 2014-09-04 21:21:26 -07:00
tqchen
a9dc145433 add what is new 2014-09-04 21:20:27 -07:00
tqchen
0752b8b9f3 change readme 2014-09-04 21:12:25 -07:00
tqchen
512a0f69fd add glm 2014-09-04 21:09:52 -07:00
tqchen
f9f982a7aa Merge branch 'master' of ssh://github.com/tqchen/xgboost 2014-09-04 20:58:05 -07:00
tqchen
a1c6e22af9 add create from csc 2014-09-04 20:57:49 -07:00
antinucleon
1222839efa higgs cv 2014-09-04 11:00:42 -06:00
tqchen
2bc1d2e73a fix doc 2014-09-04 09:23:35 -07:00
tqchen
6c6d00261c small fix to the doc 2014-09-04 09:18:52 -07:00
tqchen
da9c856701 add cv for python 2014-09-03 22:43:55 -07:00
Tianqi Chen
586d6ae740 Update basic_walkthrough.py 2014-09-03 22:05:56 -07:00
Tianqi Chen
d4b62e679d Update README.md 2014-09-03 22:05:13 -07:00
Tianqi Chen
b078c159bd Update README.md 2014-09-03 21:42:28 -07:00
giuliohome
3f11354adb Parallel execution of CV plus double inputted model 2014-09-03 23:14:31 +02:00
tqchen
8952d9c357 fix 2014-09-03 13:28:03 -07:00
tqchen
b2586b6130 ok 2014-09-03 13:27:06 -07:00
tqchen
5cd92e33f6 remove R for now 2014-09-03 13:24:34 -07:00
tqchen
e6359b5484 ok 2014-09-03 13:23:36 -07:00
tqchen
60e1167b56 fix doc 2014-09-03 13:20:23 -07:00
tqchen
7a61f0dca2 ok 2014-09-03 13:18:36 -07:00
tqchen
c1e0ff0326 push python examples in 2014-09-03 13:15:17 -07:00
tqchen
41ea0bf97a Merge branch 'master' of ssh://github.com/tqchen/xgboost 2014-09-03 13:14:00 -07:00
tqchen
fa11840f4b move python example 2014-09-03 13:13:54 -07:00
Tianqi Chen
3192bf82d8 Update xgboost.py 2014-09-03 12:15:57 -07:00
antinucleon
0c36231ea3 chg 2014-09-03 12:57:05 -06:00
tqchen
998ca3bdc9 make some changes to cv 2014-09-03 11:46:33 -07:00
antinucleon
2182ebcba1 Merge branch 'master' of github.com:tqchen/xgboost 2014-09-03 00:38:06 -06:00
antinucleon
02dd8d1212 chg 2014-09-03 00:37:55 -06:00
Tianqi Chen
85dbaf638b Update xgboost.Rnw 2014-09-02 23:33:04 -07:00
Tianqi Chen
642b5bda0a Update DESCRIPTION 2014-09-02 23:30:53 -07:00
Tianqi Chen
582ef2f9d5 Update DESCRIPTION 2014-09-02 23:29:48 -07:00
tqchen
06b5533209 chg fobj back to obj, to keep parameter name unchanged 2014-09-02 23:15:41 -07:00
tqchen
ac8958b284 move custom obj build in into booster 2014-09-02 23:07:50 -07:00
tqchen
10648a1ca7 remove using std from cpp 2014-09-02 22:43:19 -07:00
tqchen
1dbcebb6fe fix cxx98 2014-09-02 22:12:28 -07:00
tqchen
65340ffda6 quick lint 2014-09-02 17:51:05 -07:00
tqchen
e4817bb4c3 fix ntreelimit 2014-09-02 15:05:49 -07:00
antinucleon
5177fa02e4 adjust weight 2014-09-02 15:22:08 -06:00
tqchen
c75275a861 more movement to beginptr 2014-09-02 11:14:57 -07:00
tqchen
27cabd131e add beginPtr, to make vector address taking safe 2014-09-02 11:01:38 -07:00
tqchen
70219ee1ae move nthread to local var 2014-09-02 09:06:24 -07:00
tqchen
28128a1b6e fix new warning 2014-09-02 09:02:27 -07:00
tqchen
1d5db6877d fix param.h 2014-09-02 08:55:26 -07:00
tqchen
c9f2f47acb fix som solaris 2014-09-02 00:12:15 -07:00
tqchen
bb5c151f57 move sprintf into std 2014-09-01 23:12:50 -07:00
tqchen
29a7027dba fix the zero length vector 2014-09-01 22:50:48 -07:00
tqchen
9100ffc12a chg version 2014-09-01 22:32:03 -07:00
tqchen
42fb7b4d9d some fix to make it more c++ 2014-09-01 22:06:10 -07:00
Tianqi Chen
50f1b5d903 Update README.md 2014-09-01 19:00:37 -07:00
Tianqi Chen
b60b23ed1c Update README.md 2014-09-01 18:58:56 -07:00
Tianqi Chen
48411193ae Update README.md 2014-09-01 18:58:00 -07:00
Tianqi Chen
1841d730af Update README.md 2014-09-01 18:55:20 -07:00
Tianqi Chen
85e3fbb06a Update README.md 2014-09-01 18:54:45 -07:00
Tianqi Chen
51a9a36b51 Update DESCRIPTION 2014-09-01 18:53:24 -07:00
hetong
76d5fc7e78 attemp to fix line breaking issue of doc 2014-09-01 17:43:28 -07:00
hetong
19887dcc37 Merge branch 'master' of https://github.com/tqchen/xgboost 2014-09-01 17:24:37 -07:00
hetong
9ee9d29f13 refine readme.md 2014-09-01 17:24:13 -07:00
tqchen
0d5debcc25 fine fix 2014-09-01 17:23:44 -07:00
tqchen
0c5f2b9409 gard GNU c 2014-09-01 17:15:04 -07:00
tqchen
2f6a64e8fa Merge branch 'master' of ssh://github.com/tqchen/xgboost
Conflicts:
	src/utils/omp.h
2014-09-01 17:03:20 -07:00
tqchen
a6ce55493d make R package strict c99 2014-09-01 17:02:42 -07:00
Tong He
d391becb4e Update omp.h 2014-09-01 16:16:06 -07:00
Tong He
ada9dd94ad Update omp.h 2014-09-01 15:51:48 -07:00
hetong
b973a4dcaa improve doc in predict 2014-09-01 15:38:29 -07:00
tqchen
8863c520e7 some quick fix 2014-09-01 15:32:02 -07:00
Tong He
025ca170ec Update predict.xgb.Booster.R 2014-09-01 15:25:16 -07:00
tqchen
6ac6a3d9c9 Merge branch 'master' of ssh://github.com/tqchen/xgboost 2014-09-01 15:10:29 -07:00
tqchen
4592e500cb add ntree limit 2014-09-01 15:10:19 -07:00
hetong
24e87e1cf8 fix doc with redirection to inst/examples 2014-09-01 15:07:17 -07:00
tqchen
4c451de90b change message 2014-09-01 09:00:45 -07:00
Tianqi Chen
7393291f81 msvc 2014-09-01 08:59:02 -07:00
tqchen
427ab6434c message 2014-09-01 08:56:40 -07:00
tqchen
6641fa546d change warning to pragma message 2014-09-01 08:50:45 -07:00
tqchen
485e0f140e add 2014-08-31 22:53:35 -07:00
tqchen
8b3465cde0 cleaner makevar 2014-08-31 22:42:15 -07:00
tqchen
b2097b96c7 more clean makevar 2014-08-31 22:39:37 -07:00
giuliohome
0be4f0032c new theory: predict from cv + parametric rounds 2014-09-01 01:50:07 +02:00
giuliohome
dde22976cf format README 2014-09-01 01:17:29 +02:00
giuliohome
c60649d28c README 2014-09-01 01:16:12 +02:00
giuliohome
2d1430ac01 set NFold CV from cmd args 2014-09-01 01:14:10 +02:00
giuliohome
f1d6429e96 Parametric NFold from cmd args 2014-09-01 01:10:29 +02:00
giuliohome
147b7d33fe NFold Refactoring 2014-09-01 00:50:43 +02:00
Tianqi Chen
b49927e602 Update xgboost_R.cpp 2014-08-31 14:32:45 -07:00
tqchen
79fa8b99d4 pack script with cleanup 2014-08-31 14:26:35 -07:00
tqchen
a3187e932a Merge branch 'master' of ssh://github.com/tqchen/xgboost 2014-08-31 14:15:53 -07:00
tqchen
88da7839b7 fix random 2014-08-31 14:14:39 -07:00
Tianqi Chen
d5f37d1238 add git ignore 2014-08-31 14:13:44 -07:00
tqchen
9e0cc778e8 fix win 2014-08-31 14:12:47 -07:00
tqchen
c1e9acba17 Merge branch 'master' of ssh://github.com/tqchen/xgboost 2014-08-31 14:07:51 -07:00
tqchen
168f78623f allow standalone random 2014-08-31 14:07:44 -07:00
Tong He
12d503cec8 Update DESCRIPTION 2014-08-31 13:39:49 -07:00
tqchen
ba4f00d55d Merge branch 'master' of ssh://github.com/tqchen/xgboost 2014-08-31 13:13:19 -07:00
tqchen
1ed40e2b46 more strict makefile 2014-08-31 13:13:11 -07:00
Tianqi Chen
172423ca0c Update README.md 2014-08-31 12:19:44 -07:00
tqchen
37499245ea remove GNUism 2014-08-31 10:26:20 -07:00
Tianqi Chen
4d5ec01cd3 change windows 2014-08-31 09:25:25 -07:00
tqchen
e83090a579 change flagname to pass check 2014-08-31 09:17:49 -07:00
tqchen
bba13af922 Merge branch 'master' of ssh://github.com/tqchen/xgboost 2014-08-31 09:13:07 -07:00
tqchen
26c61dc0a3 remove useless flag 2014-08-31 09:12:58 -07:00
Tianqi Chen
d4aacbf8cf add ignore 2014-08-31 09:08:17 -07:00
giuliohome
f42b25ec82 test my inline cv 2014-08-31 18:04:28 +02:00
giuliohome
21f16eac7b fix: cv2 2014-08-31 18:03:12 +02:00
giuliohome
f88aa8d137 fix: submission format 2014-08-31 18:00:34 +02:00
tqchen
fabe2f39e2 more clean makefile 2014-08-31 08:36:17 -07:00
giuliohome
cd0976202b 5 fold cv implementation in c# for the demo: you see inline cv ams while training (of course on a completely separate set) 2014-08-31 17:23:58 +02:00
giuliohome
442d17501f cv1 + cv2 (inline 5-fold cross validation) 2014-08-31 17:09:52 +02:00
giuliohome
23195ac95b Merge branch 'master' of https://github.com/giuliohome/xgboost 2014-08-31 16:31:11 +02:00
giuliohome
04fc25615c Update README.md 2014-08-31 16:28:49 +02:00
giuliohome
318d57f9d0 CV 5-fold implemented 2014-08-31 16:26:42 +02:00
giuliohome
71e5b4c413 Update README.md 2014-08-31 16:13:20 +02:00
giuliohome
41eef462f0 Update README.md 2014-08-31 15:49:34 +02:00
giuliohome
e4ad70e21c Update README.md 2014-08-31 15:41:34 +02:00
giuliohome
e26c072e83 Update README.md 2014-08-31 15:39:20 +02:00
giuliohome
a7b512a1c8 Update README.md 2014-08-31 15:31:16 +02:00
giuliohome
0f28ee4a8e Update README.md 2014-08-31 15:30:48 +02:00
giuliohome
a68f6680a0 Update README.md 2014-08-31 15:29:03 +02:00
giuliohome
82470ef96b Update README.md 2014-08-31 15:28:23 +02:00
hetong
b123fbbcf9 final revision before CRAN 2014-08-30 22:24:25 -07:00
unknown
22a38d8440 move demo to inst/examples 2014-08-30 21:04:47 -07:00
Tong He
b153ffe451 Update DESCRIPTION 2014-08-30 20:46:21 -07:00
Tianqi Chen
629799df0b Update DESCRIPTION 2014-08-30 20:24:23 -07:00
tqchen
f2c8093ba6 check in description 2014-08-30 20:22:36 -07:00
tqchen
104d1d61c7 add license name 2014-08-30 20:06:31 -07:00
tqchen
273816a3b4 chg data 2014-08-30 18:58:32 -07:00
tqchen
9c0389981a fix print problem, fix Tong's email format 2014-08-30 18:49:30 -07:00
Tong He
9739a1c806 Update DESCRIPTION 2014-08-30 18:17:20 -07:00
hetong
257c864274 remove pdf file 2014-08-30 16:26:26 -07:00
hetong
9b618acba2 add import methods in NAMESPACE 2014-08-30 15:42:57 -07:00
hetong
3e85419428 add back import of methdos 2014-08-30 15:34:36 -07:00
hetong
1abdcaa11d eliminate warnings and notes from R CMD check 2014-08-30 15:17:17 -07:00
hetong
a06f01e8ec improve document format 2014-08-30 15:14:36 -07:00
tqchen
2c1aabf6b0 fix indent 2014-08-30 12:47:04 -07:00
tqchen
6e054e8fa4 fix indent 2014-08-30 12:45:46 -07:00
Tianqi Chen
3f7aeb22c5 fix some windows type conversion warning 2014-08-30 12:40:51 -07:00
Tianqi Chen
99c44f2e51 fix makefile in win 2014-08-30 12:25:41 -07:00
hetong
daf430506e Merge branch 'master' of https://github.com/tqchen/xgboost 2014-08-30 12:11:40 -07:00
hetong
f9fc1aec2f modify licence and desc to standard format 2014-08-30 12:11:15 -07:00
Tianqi Chen
202a17f148 fix windows 2014-08-30 12:10:50 -07:00
hetong
4cebbdae66 Merge branch 'master' of https://github.com/tqchen/xgboost 2014-08-30 12:10:41 -07:00
tqchen
74b27bfad2 Merge branch 'master' of ssh://github.com/tqchen/xgboost 2014-08-30 12:03:41 -07:00
tqchen
51ef32d73a chg makefile 2014-08-30 12:03:32 -07:00
hetong
70cdd2787c add 00Index 2014-08-30 12:02:01 -07:00
hetong
1b7de855e9 remove logo 2014-08-30 11:53:58 -07:00
hetong
6d36e8460d change getinfo Rd 2014-08-30 11:28:10 -07:00
Tong He
efe8b38a35 fix error in demo 2014-08-30 11:24:15 -07:00
hetong
5e839f6fe7 change location and template of vignette 2014-08-30 10:55:13 -07:00
Tianqi Chen
7845ee0c85 Update CHANGES.md 2014-08-30 09:58:35 -07:00
Tianqi Chen
784ab8d02c Update README.md 2014-08-30 09:58:14 -07:00
Tianqi Chen
86e852d1da edit the doc 2014-08-30 09:31:14 -07:00
giuliohome
6d3eea5056 c# Booster class (almost ready to do cv) 2014-08-30 16:14:09 +02:00
giuliohome
77e967f0e6 Fix: Events Dictionary 2014-08-30 15:19:12 +02:00
giuliohome
473744c5ac conversion from csv to libsvm 2014-08-30 14:55:45 +02:00
giuliohome
b208338098 c# kaggle higgs demo drafted 2014-08-30 10:26:41 +02:00
hetong
84607a34a5 refine vignette 2014-08-29 22:40:07 -07:00
giuliohome
2587da5fea First example of c# wrapper done (marshalling prediction to submission file) 2014-08-30 03:05:40 +02:00
giuliohome
8b26cba148 eval training 2014-08-30 02:03:00 +02:00
giuliohome
4a67296e30 program cleanse
NEXT TO DO: try to predict after training
2014-08-30 01:43:45 +02:00
giuliohome
ba2d062f09 sharp higgs demo - training 2014-08-30 01:36:04 +02:00
giuliohome
db46e7a730 starting to develop a c# wrapper for xgboost:
c# implementation of kaggle higgs demo
2014-08-30 01:01:30 +02:00
giuliohome
6c3bc36a25 starting to develop a c# wrapper for xgboost 2014-08-30 00:36:01 +02:00
hetong
04c520ea3d refine vignette 2014-08-29 11:53:59 -07:00
hetong
8eb00e3916 refinement of document 2014-08-29 11:43:03 -07:00
hetong
cc12ee0d22 Merge branch 'master' of https://github.com/tqchen/xgboost 2014-08-29 11:40:37 -07:00
hetong
5f510c683b add vignette 2014-08-29 11:40:15 -07:00
tqchen@graphlab.com
6db4e99b19 improve pack script 2014-08-29 09:47:50 -07:00
unknown
086433da0d add speedtest.R by -f 2014-08-28 22:40:44 -07:00
Tianqi Chen
23e80413f5 Update README.md 2014-08-28 22:34:12 -07:00
Tianqi Chen
6f6d754d4d Update README.md 2014-08-28 22:33:09 -07:00
tqchen
03127fc07e checkin makefile 2014-08-28 22:21:51 -07:00
unknown
b0130545a6 Merge branch 'master' of https://github.com/tqchen/xgboost 2014-08-28 22:00:44 -07:00
unknown
6ed5d37771 speed test for R, and refinement of item list in doc 2014-08-28 22:00:13 -07:00
tqchen
3e92eb13d3 make it packable 2014-08-28 21:46:12 -07:00
tqchen
2e96bc51f5 do things 2014-08-28 21:23:27 -07:00
unknown
fba591fbf5 add slice document 2014-08-28 09:24:23 -07:00
unknown
26868ebada fix NAMESPACE with import classes 2014-08-28 09:22:11 -07:00
tqchen
8c50cbb6dd checkin slice 2014-08-28 09:04:30 -07:00
tqchen
776e4627de pass pedantic 2014-08-28 08:40:34 -07:00
tqchen
8100006483 fix 2014-08-28 08:34:51 -07:00
hetong
d95bc458e3 fix NAMESPACE 2014-08-28 08:16:45 -07:00
hetong
73419f6cd7 compile Rd files, i.e. R documents 2014-08-28 08:12:48 -07:00
tqchen
df6cd25fd5 OK 2014-08-28 07:43:26 -07:00
tqchen
d79161cfce chg 2014-08-28 07:38:44 -07:00
tqchen
d00302d3ac get a pass in function docstring 2014-08-28 07:35:57 -07:00
unknown
8127f31cdd add documentation notes 2014-08-28 01:44:03 -07:00
unknown
a0f22f6aaa hide xgb.Boost 2014-08-27 22:25:54 -07:00
unknown
8a4e66299a remove default value for nrounds 2014-08-27 22:12:30 -07:00
unknown
4723b8c07e Merge branch 'master' of https://github.com/tqchen/xgboost 2014-08-27 21:36:27 -07:00
unknown
6ed5e713d5 ignore csv 2014-08-27 21:35:55 -07:00
Tianqi Chen
b380e0432f Update DESCRIPTION 2014-08-27 21:35:28 -07:00
Tianqi Chen
d7735512cf Delete LICENSE 2014-08-27 21:35:00 -07:00
Tianqi Chen
077c556179 Update DESCRIPTION 2014-08-27 21:34:41 -07:00
Tianqi Chen
ca3141208f Update README.md 2014-08-27 21:32:33 -07:00
Tianqi Chen
af5abc04b3 Update README.md 2014-08-27 21:31:47 -07:00
unknown
b51b913494 modification of higgs-pred.R 2014-08-27 21:31:13 -07:00
Tianqi Chen
8be3249cb8 Update README.md 2014-08-27 21:16:54 -07:00
Tianqi Chen
582e4e3d8c Merge pull request #51 from tqchen/unity
merge unity into master, R package ready
2014-08-27 21:13:38 -07:00
tqchen
12b19c97fa change higgs script, remove R wrapper 2014-08-27 21:13:04 -07:00
tqchen
7ab45b3e64 add files back 2014-08-27 21:07:31 -07:00
Tianqi Chen
de111a1c26 make windows version in 2010 2014-08-27 21:01:39 -07:00
Bing Xu
211d85f04b make py work 2014-08-27 20:55:44 -06:00
tqchen@graphlab.com
4369bc2bfd chg code guide 2014-08-27 19:31:49 -07:00
tqchen@graphlab.com
b162acb858 adapt R package 2014-08-27 19:30:09 -07:00
Tianqi Chen
f9541efa01 Merge pull request #50 from tqchen/master
pull master into unity
2014-08-27 19:19:48 -07:00
tqchen@graphlab.com
075dc9a998 pass build 2014-08-27 19:19:04 -07:00
tqchen@graphlab.com
8aeb038ddd seems ok, need review destructors 2014-08-27 19:12:13 -07:00
tqchen@graphlab.com
f175e1cfb4 finish refactor, need debug 2014-08-27 18:33:52 -07:00
tqchen@graphlab.com
605269133e complete refactor data.h, now replies on iterator to access column 2014-08-27 17:00:21 -07:00
unknown
ae4128fcb2 styling of else in R 2014-08-27 16:46:47 -07:00
Tong He
114cfb2167 fix a tiny bug in xgboost 2014-08-27 15:51:34 -07:00
unknown
b151617ac1 Merge branch 'master' of https://github.com/tqchen/xgboost 2014-08-27 15:49:26 -07:00
unknown
02df006286 modify readme in R-package 2014-08-27 15:15:22 -07:00
unknown
d693e8d5cc use demo instead of inst 2014-08-27 15:10:07 -07:00
unknown
0f0c12707c modify xgb.getinfo to getinfo 2014-08-27 15:03:24 -07:00
Tianqi Chen
0b5e611c22 Merge pull request #49 from giuliohome/master
Thanks giulio!
2014-08-27 14:49:06 -07:00
giuliohome
f3136c2d92 README 2014-08-27 23:24:57 +02:00
giuliohome
73c42d4574 FIX: If you are using Windows, __declspec(dllexport) is necessary 2014-08-27 23:21:55 +02:00
unknown
a060a2e9a6 remove old R demo files 2014-08-27 13:16:16 -07:00
unknown
247e0d5d78 tidy code by formatR 2014-08-27 13:15:28 -07:00
unknown
4dcc7d7303 Merge branch 'master' of https://github.com/tqchen/xgboost 2014-08-27 12:58:04 -07:00
unknown
d747172d37 refinement of R package 2014-08-27 12:57:37 -07:00
Tianqi Chen
57c0ab2721 Update xgboost.py 2014-08-27 12:27:25 -07:00
Tianqi Chen
2451ba0f1c Merge pull request #48 from giuliohome/master
adding a dll project to the msvc solution for the python wrapper on win64
2014-08-27 12:24:09 -07:00
giuliohome
30b31a6910 win64 python dll project 2014-08-27 20:38:30 +02:00
giuliohome
1383afd8f4 MSVS DLL Project for Python wrapper (ver.3 on win64) 2014-08-27 20:27:05 +02:00
giuliohome
ce1803a40c Merge pull request #1 from tqchen/master
updating fork to current master
2014-08-27 20:17:44 +02:00
tqchen@graphlab.com
a59f8945dc rename SparseBatch to RowBatch 2014-08-27 10:56:55 -07:00
tqchen@graphlab.com
d5a5e0a42a rename findex->index 2014-08-27 10:52:27 -07:00
tqchen@graphlab.com
f3a3470916 make wrapper compile 2014-08-27 10:48:25 -07:00
tqchen@graphlab.com
0fe5470a4f delete extra things 2014-08-27 09:59:39 -07:00
unknown
0130be4acc major change in the design of R interface 2014-08-26 23:41:03 -07:00
Tianqi Chen
84e5fc285b bst_ulong supported by sparsematrix builder 2014-08-26 20:32:33 -07:00
tqchen
414e7f27ff Merge branch 'master' into unity
Conflicts:
	src/learner/evaluation-inl.hpp
	wrapper/xgboost_R.cpp
	wrapper/xgboost_wrapper.cpp
	wrapper/xgboost_wrapper.h
2014-08-26 20:32:07 -07:00
tqchen
4787108b5f change uint64_t to ulong, to make mac happy, this is final change 2014-08-26 20:10:07 -07:00
Tianqi Chen
d00f27dc6b change uint64_t to depend on utils 2014-08-26 20:08:13 -07:00
Tianqi Chen
3e5cb25830 minor fix, add openmp 2014-08-26 20:02:10 -07:00
Tianqi Chen
9d2c1cf9f5 add omp uint when openmp is not there 2014-08-26 19:59:55 -07:00
tqchen
90226035fa chg r package path back 2014-08-26 19:39:34 -07:00
tqchen
7739f57c8b change omp loop var to bst_omp_uint, add XGB_DLL to wrapper 2014-08-26 19:37:04 -07:00
tqchen
97467fe807 chg size_t to uint64_t 2014-08-26 19:12:51 -07:00
tqchen
2623ab0a60 chg size_t to uint64_t unsigned long in wrapper 2014-08-26 19:06:53 -07:00
tqchen
3c1ed847fb remove dependency on bst 2014-08-26 18:06:22 -07:00
Tianqi Chen
636ffaf23b Merge pull request #46 from tqchen/master
merge master into unity
2014-08-26 12:18:26 -07:00
tqchen@graphlab.com
46f14b8c27 fix magic so that it can detect binary file 2014-08-26 12:17:27 -07:00
tqchen@graphlab.com
9eb32b9dd4 Merge branch 'master' of ssh://github.com/tqchen/xgboost 2014-08-26 10:24:04 -07:00
tqchen@graphlab.com
2e3c214173 improve makefile 2014-08-26 10:23:57 -07:00
hetong
41d290906f fix NAMESPACE with export method predict 2014-08-26 10:14:29 -07:00
hetong
262108cf3b modify demo filenames 2014-08-26 10:02:13 -07:00
hetong
d9f363632a Merge branch 'master' of https://github.com/tqchen/xgboost
Initial development of R pacakge and merge with the modification from tqchen.
2014-08-26 09:57:38 -07:00
hetong
4940fff55b export fewer functions to user and optimize parameter setting 2014-08-26 09:57:28 -07:00
Tianqi Chen
98e92f1a79 more detailed warning 2014-08-26 09:29:17 -07:00
Tianqi Chen
b1bffde6c9 fix compile under rtools 2014-08-26 09:09:28 -07:00
hetong
5f6d5d19b8 import package methods in desc 2014-08-25 23:01:53 -07:00
tqchen@graphlab.com
a1f1015ae1 add package parameter to all calls, test pass in mac 2014-08-25 22:25:03 -07:00
tqchen
7297c0a92b add openmp flags 2014-08-25 22:14:48 -07:00
tqchen
ddc0970c46 Merge branch 'master' of ssh://github.com/tqchen/xgboost 2014-08-25 22:02:19 -07:00
tqchen
0fca16008e runnable 2014-08-25 22:01:35 -07:00
Tianqi Chen
47a0e84c5f add win make 2014-08-25 21:54:24 -07:00
tqchen
c6eaf01a97 add git ignore 2014-08-25 21:25:49 -07:00
tqchen
68f38cf228 initial trial package 2014-08-25 21:20:55 -07:00
Tianqi Chen
c6d59dac4b Merge pull request #45 from tqchen/master
better error handling
2014-08-25 16:00:33 -07:00
tqchen@graphlab.com
c2484f3134 better error handling 2014-08-25 15:58:52 -07:00
tqchen
4c04cf8728 add grow5 back, seems no changes 2014-08-25 14:08:38 -07:00
tqchen
0066cd13a7 Merge branch 'unity' of ssh://github.com/tqchen/xgboost into unity 2014-08-25 13:57:21 -07:00
tqchen
3e9f8bfac9 change things back 2014-08-25 13:56:03 -07:00
tqchen@graphlab.com
6da62159d0 fix by giulio 2014-08-25 12:10:45 -07:00
tqchen@graphlab.com
e26af5e66c Merge branch 'unity' of ssh://github.com/tqchen/xgboost into unity 2014-08-25 12:08:50 -07:00
tqchen@graphlab.com
b83a96fa21 fix by giulio 2014-08-25 12:08:41 -07:00
tqchen
b708f3f029 Merge branch 'unity' of ssh://github.com/tqchen/xgboost into unity
Conflicts:
	src/learner/evaluation-inl.hpp
2014-08-25 11:56:59 -07:00
tqchen@graphlab.com
d61b0b757f chg 2014-08-25 11:35:38 -07:00
tqchen@graphlab.com
c78a2164c2 fix line from auto spacing by msvc 2014-08-25 11:34:49 -07:00
tqchen
9e5788a47c Merge branch 'master' into unity 2014-08-25 11:22:37 -07:00
tqchen
e4b9ee22fa :Merge branch 'unity'
Conflicts:
	src/gbm/gbtree-inl.hpp
	src/learner/evaluation-inl.hpp
	src/tree/param.h
2014-08-25 11:21:56 -07:00
Tianqi Chen
bd52a7f448 changes 2014-08-25 11:13:06 -07:00
Tianqi Chen
ca0b008fb0 clean up warnings from msvc 2014-08-25 11:01:21 -07:00
tqchen
fd03239b77 fix now today, try to think how to work tmr 2014-08-24 22:08:21 -07:00
tqchen
f62b4a02f9 beta version, do a review 2014-08-24 21:36:30 -07:00
tqchen
ce97f2fdf8 a fixed version 2014-08-24 21:17:13 -07:00
tqchen
6daa1c365d add cvgrad stats, simplify data 2014-08-24 20:07:16 -07:00
tqchen
c640485f1d initial correction for vec tree 2014-08-24 18:48:19 -07:00
Tianqi Chen
4f0b0d2c88 Merge pull request #43 from tqchen/unity
add changes that are not commited
2014-08-24 17:26:21 -07:00
tqchen
7874c2559b add changes 2014-08-24 17:25:17 -07:00
Tianqi Chen
4c023077dd Merge pull request #42 from tqchen/unity
Unity this is final minor change in data structure
2014-08-24 17:23:46 -07:00
tqchen
da75f8f1a4 move ncol, row to booster, add set/get uint info 2014-08-24 17:19:22 -07:00
tqchen
19447cdb12 chg higgs back 2014-08-24 16:09:13 -07:00
tqchen
4889b40abc tstats now depend on param 2014-08-24 16:08:58 -07:00
tqchen
49e6575c86 add set leaf, constructor of tstats now rely on param 2014-08-24 16:07:59 -07:00
Tianqi Chen
d7c6f8e81a Merge pull request #41 from tqchen/unity
Unity
2014-08-24 15:24:20 -07:00
tqchen
ba9fbd380c templatize refresher 2014-08-24 15:22:11 -07:00
tqchen
f71b732e7a refactor grad stats to be like visitor 2014-08-24 15:17:22 -07:00
Tianqi Chen
c0496685c4 Merge pull request #39 from tqchen/unity
fix mac compile issue
2014-08-24 09:52:03 -07:00
tqchen
d49c6e6e84 fix 2014-08-24 09:51:15 -07:00
tqchen
88beee5639 try to fix compile bug 2014-08-24 09:47:08 -07:00
tqchen@graphlab.com
46d41a2b43 fix compilation on mac 2014-08-24 09:32:06 -07:00
Tianqi Chen
40483e6dc3 Merge pull request #38 from tqchen/unity
Unity
2014-08-23 21:16:14 -07:00
tqchen
b381c842f1 link glc 2014-08-23 21:14:53 -07:00
tqchen
5802141d59 add glc comment 2014-08-23 21:12:55 -07:00
Tianqi Chen
cf274e76f4 Merge pull request #37 from tqchen/unity
Unity
2014-08-23 20:54:27 -07:00
tqchen
fea7245fa0 chg python back 2014-08-23 20:53:56 -07:00
tqchen
d16a56814b remove pred.csv 2014-08-23 20:53:16 -07:00
tqchen
ed9d8a1c0e add higgs example 2014-08-23 20:52:56 -07:00
Tianqi Chen
851f3fce86 Merge pull request #36 from tqchen/unity
add acknowledgement
2014-08-23 19:05:22 -07:00
tqchen
d86cd62415 add acknowledgement 2014-08-23 19:04:50 -07:00
Tianqi Chen
cd16a3b124 Merge pull request #35 from tqchen/unity
ok
2014-08-23 18:59:52 -07:00
tqchen
a656e61571 ok 2014-08-23 18:57:19 -07:00
Tianqi Chen
b2b5895634 Merge pull request #34 from tqchen/unity
Unity
2014-08-23 18:56:38 -07:00
tqchen
3b12ff51b9 seems ok 2014-08-23 18:38:39 -07:00
tqchen
de83ac72ea complete R example 2014-08-23 15:26:08 -07:00
tqchen
8bf758c63b chg wrapper 2014-08-23 14:27:56 -07:00
tqchen
08a6b92216 chg 2014-08-23 14:20:29 -07:00
tqchen
3ba7995754 finish dump 2014-08-23 13:09:47 -07:00
tqchen
40da2fa2c0 workable R wrapper 2014-08-23 12:14:44 -07:00
tqchen
5e23f6577f try add R wrapper 2014-08-23 09:30:02 -07:00
tqchen
9d210f9bd3 ok 2014-08-22 20:14:43 -07:00
Tianqi Chen
741bfe015f Merge pull request #32 from tqchen/master
merge master into unity
2014-08-22 20:13:23 -07:00
Tianqi Chen
13b5269855 Update machine.conf 2014-08-22 20:00:04 -07:00
Tianqi Chen
cf69d34d06 Update mq2008.conf 2014-08-22 19:59:30 -07:00
Tianqi Chen
4378f1f039 Update mushroom.conf 2014-08-22 19:58:59 -07:00
Tianqi Chen
3acd10e031 Merge pull request #31 from tqchen/unity
Change master branch into unity
2014-08-22 19:54:48 -07:00
tqchen
58cda4d708 ok 2014-08-22 19:53:52 -07:00
tqchen
104fced9c3 ok 2014-08-22 19:52:43 -07:00
tqchen
ce5b776bdc add change note 2014-08-22 19:47:05 -07:00
tqchen
07ddf98718 add log 2014-08-22 19:41:58 -07:00
tqchen
2ac8cdb873 check in linear model 2014-08-22 19:27:33 -07:00
tqchen
37b707e110 clean up 2014-08-22 16:51:27 -07:00
tqchen
bf71cf52be add 2014-08-22 16:50:28 -07:00
tqchen
24030b26fd add 2014-08-22 16:49:42 -07:00
tqchen
edc539a024 add message about glc 2014-08-22 16:47:50 -07:00
tqchen
4ed67b9c27 Merge branch 'unity' of ssh://github.com/tqchen/xgboost into unity 2014-08-22 16:26:45 -07:00
tqchen
58354643b0 chg root index to booster info, need review 2014-08-22 16:26:37 -07:00
tqchen
a45fb2d737 Merge branch 'unity' of ssh://github.com/tqchen/xgboost into unity 2014-08-22 16:10:23 -07:00
tqchen
3f5b5e1fdc add apratio 2014-08-22 16:10:19 -07:00
tqchen
58d74861b9 fix multiclass 2014-08-22 14:29:32 -07:00
tqchen@graphlab.com
1fd6ff817f ok 2014-08-19 12:20:31 -07:00
tqchen@graphlab.com
9caccd3b36 change row subsample to prob 2014-08-19 12:07:52 -07:00
tqchen@graphlab.com
91e70c76ff refresher test 2014-08-19 11:41:35 -07:00
tqchen
762b360739 fix typo 2014-08-19 08:42:36 -07:00
tqchen
e7de77aa1f chg 2014-08-19 08:08:54 -07:00
tqchen
406db647f2 add pratio 2014-08-19 08:05:05 -07:00
tqchen
fdba6e9c46 add pratio 2014-08-19 08:02:29 -07:00
tqchen
d08d8ed3ed add tree refresher, need review 2014-08-18 21:32:48 -07:00
tqchen
f757520c02 add tree refresher, need review 2014-08-18 21:32:31 -07:00
tqchen
dbf3a21942 change dense fvec logic to tree 2014-08-18 19:03:32 -07:00
tqchen
1d8c2391e8 update tree maker to make it more robust 2014-08-18 14:58:30 -07:00
tqchen
3de07b0abe add more guideline about python path 2014-08-18 14:12:35 -07:00
tqchen@graphlab.com
3b02fb26b0 fix num parallel tree 2014-08-18 13:33:58 -07:00
tqchen@graphlab.com
c4b21775fa some lint 2014-08-18 12:57:31 -07:00
antinucleon
e9bfc026b7 fix typo 2014-08-18 13:38:09 -06:00
antinucleon
0b36c8295d lack include 2014-08-18 13:33:36 -06:00
tqchen@graphlab.com
9da2ced8a2 add base_margin 2014-08-18 12:20:13 -07:00
tqchen@graphlab.com
46fed899ab add more note 2014-08-18 10:57:08 -07:00
tqchen@graphlab.com
f6c763a2a7 fix base score, and print message 2014-08-18 10:53:15 -07:00
tqchen@graphlab.com
04e04ec5a0 chg readme 2014-08-18 10:19:47 -07:00
tqchen@graphlab.com
66ae3a7578 add no omp flag 2014-08-18 10:17:49 -07:00
tqchen@graphlab.com
7c068cbe46 fix mac 2014-08-18 10:14:34 -07:00
tqchen
d3bfc31e6a enforce putting iteration numbers in train 2014-08-18 09:00:23 -07:00
tqchen
3c1c7e2780 Merge branch 'unity' of ssh://github.com/tqchen/xgboost into unity 2014-08-18 08:57:45 -07:00
tqchen
e912dd3364 fix omp 2014-08-18 08:57:26 -07:00
Bing Xu
b76853731c make it compatible with old code 2014-08-18 02:10:54 -04:00
tqchen
0d9a8c042c make xgcombine buffer work 2014-08-17 22:49:36 -07:00
tqchen
4ed4b08146 ok 2014-08-17 20:47:20 -07:00
tqchen
5a472145de check in rank loss 2014-08-17 20:32:02 -07:00
tqchen
9df8bb1397 check in softmax multiclass 2014-08-17 19:16:17 -07:00
tqchen
e77df13815 ok 2014-08-17 18:49:54 -07:00
tqchen
301685e0a4 python module pass basic test 2014-08-17 18:43:25 -07:00
tqchen
af100dd869 remake the wrapper 2014-08-17 17:43:46 -07:00
tqchen
2c969ecf14 first version that reproduce binary classification demo 2014-08-16 15:44:35 -07:00
tqchen
c4acb4fe01 check in io module 2014-08-16 14:06:31 -07:00
tqchen
ac1cc15b90 pass fmatrix as const 2014-08-15 21:24:23 -07:00
tqchen
d9dbd1efc6 modify readme 2014-08-15 21:06:44 -07:00
tqchen
34dd409c5b mv code into src 2014-08-15 21:04:23 -07:00
tqchen
3589e8252f refactor config 2014-08-15 21:02:33 -07:00
tqchen
dafa44753a chg readme 2014-08-15 20:22:54 -07:00
tqchen
2a92c82b92 start unity refactor 2014-08-15 20:15:58 -07:00
tqchen@graphlab.com
5b215742c2 Merge branch 'master' of ssh://github.com/tqchen/xgboost 2014-08-15 13:36:56 -07:00
tqchen@graphlab.com
5edc4f3775 save name_obj from now 2014-08-15 13:36:19 -07:00
Tianqi Chen
6d7b33a883 Update README.md 2014-08-12 14:57:28 -07:00
Tianqi Chen
f033f88221 Update README.md 2014-08-12 14:57:05 -07:00
Tianqi Chen
048194ce23 Update README.md 2014-08-12 14:56:51 -07:00
Tianqi Chen
e7ae704504 Update README.md 2014-08-12 14:56:12 -07:00
tqchen
662733db31 support for multiclass output prob 2014-08-01 11:21:17 -07:00
Tianqi Chen
8b4f7d7fa2 Update xgboost_regrank.h 2014-07-12 10:14:30 -07:00
Tianqi Chen
497fc86998 Merge pull request #16 from smly/minor-leak
fix (trivial) leak in xgboost_regrank, Thanks for the fix
2014-07-12 09:58:07 -07:00
Kohei Ozaki
0516d09938 fix (trivial) leak in xgboost_regrank 2014-07-12 17:29:49 +09:00
tqchen
1620cfc9e8 fix combine buffer 2014-05-25 16:46:03 -07:00
tqchen
ec62953e54 add rand seeds back 2014-05-25 10:18:04 -07:00
tqchen
86515a2c15 ok 2014-05-25 10:15:57 -07:00
Tianqi Chen
1048561ede change rank order output to follow kaggle convention 2014-05-25 10:08:38 -07:00
tqchen
6abfce620c make python random seed invariant in each round 2014-05-24 20:57:39 -07:00
tqchen
e2999a0efb fix sometimes python cachelist problem 2014-05-20 15:42:19 -07:00
tqchen
89a2fc5e94 more clean demo 2014-05-20 08:33:35 -07:00
tqchen
ea3bf5d57e fix bug in classification, scale_pos_weight initialization 2014-05-20 08:30:19 -07:00
tqchen
f4dedc4d2d chg 2014-05-19 10:02:01 -07:00
Tianqi Chen
1b9372f431 Merge pull request #7 from jrings/master
Compatibility with both Python 2(.7) and 3
2014-05-19 09:48:34 -07:00
Joerg Rings
93d83ca077 Compatibility with both Python 2(.7) and 3 2014-05-19 11:23:53 -05:00
Tianqi Chen
991634a58e Merge pull request #6 from tqchen/dev
Fix the bug in MAC
2014-05-17 11:07:42 -07:00
tqchen
7aae2ec009 add omp flag back 2014-05-17 11:07:12 -07:00
tqchen
1afe894a63 use back g++ 2014-05-17 11:06:36 -07:00
tqchen
29363d6100 force handle as void_p, seems fix mac problem 2014-05-17 11:03:21 -07:00
Tianqi Chen
049e8cfb2d Merge pull request #5 from tqchen/dev
add return type for xgboost, don't know if it is mac problem. #4
2014-05-17 09:19:20 -07:00
tqchen
2507e4403a add return type for xgboost, don't know if it is mac problem 2014-05-17 09:13:54 -07:00
Tianqi Chen
007f60a352 Update README.md 2014-05-16 22:54:24 -07:00
Tianqi Chen
85108e6a65 Merge pull request #2 from tqchen/dev
fix loss_type
2014-05-16 21:30:09 -07:00
tqchen
3975bf1e62 some cleanup 2014-05-16 21:29:14 -07:00
tqchen
baed0d0f08 fix for loss_type problem in outside reset base 2014-05-16 21:28:03 -07:00
tqchen
bf473bd6c8 Merge branch 'master' of ssh://github.com/tqchen/xgboost 2014-05-16 20:58:03 -07:00
tqchen
71fc734d3b chg 2014-05-16 20:57:54 -07:00
antinucleon
9f3e5a2778 del 2014-05-17 03:57:38 +00:00
Tianqi Chen
59a9b6b325 Merge pull request #1 from tqchen/dev
2.0 version, lots of changes
2014-05-16 20:53:19 -07:00
Tianqi Chen
8e941b2a79 Update README.md 2014-05-16 20:49:05 -07:00
tqchen
877bac216c Merge branch 'dev' of ssh://github.com/tqchen/xgboost into dev 2014-05-16 20:46:18 -07:00
tqchen
348d35a668 add ignore 2014-05-16 20:46:08 -07:00
tqchen
d7bb10eb79 final check 2014-05-16 20:44:02 -07:00
Tianqi Chen
4dadc76652 Update README.md 2014-05-16 20:41:59 -07:00
Tianqi Chen
4218c1ef53 Update README.md 2014-05-16 20:41:43 -07:00
Tianqi Chen
32a3371073 Update README.md 2014-05-16 20:41:21 -07:00
Tianqi Chen
58cbfa0692 Update README.md 2014-05-16 20:41:05 -07:00
tqchen
51482a29bf Merge branch 'dev' of ssh://github.com/tqchen/xgboost into dev 2014-05-16 20:37:55 -07:00
tqchen
d429289ad3 ok 2014-05-16 20:37:45 -07:00
yepyao
1cf41066d9 Merge branch 'dev' of https://github.com/tqchen/xgboost into dev 2014-05-17 11:36:12 +08:00
yepyao
391be10806 small change 2014-05-17 11:35:43 +08:00
yepyao
255bad90cb small change 2014-05-17 11:34:24 +08:00
tqchen
84afaaaa7d Merge branch 'dev' of ssh://github.com/tqchen/xgboost into dev 2014-05-16 20:29:17 -07:00
tqchen
b07ff1ac8d fix softmax 2014-05-16 20:28:07 -07:00
antinucleon
3e4dd2fce0 chg 2014-05-16 21:27:37 -06:00
tqchen
6c72d02205 chg 2014-05-16 20:18:34 -07:00
Tianqi Chen
cfd6c9e3b7 Update train.py 2014-05-16 20:16:10 -07:00
tqchen
8e5e3340a2 multi class 2014-05-16 20:12:04 -07:00
antinucleon
f52f7b7899 demo 2014-05-16 21:05:11 -06:00
antinucleon
f971d1b554 Merge branch 'dev' of github.com:tqchen/xgboost into dev 2014-05-16 21:03:32 -06:00
Tianqi Chen
7537d691d9 Update README.md 2014-05-16 20:00:20 -07:00
antinucleon
c67b098bd6 demo 2014-05-17 02:59:10 +00:00
antinucleon
d05cb13751 demo 2014-05-16 20:57:42 -06:00
tqchen
2cae28087a do not need to dump in rank 2014-05-16 19:52:39 -07:00
tqchen
12bf54d4ef Merge branch 'dev' of ssh://github.com/tqchen/xgboost into dev 2014-05-16 19:51:41 -07:00
tqchen
6a9438ac86 before commit 2014-05-16 19:51:33 -07:00
yepyao
c4a783f408 small change 2014-05-17 10:50:15 +08:00
yepyao
e872f488a5 Merge branch 'dev' of https://github.com/tqchen/xgboost into dev
Conflicts:
	demo/rank/mq2008.conf
	demo/rank/runexp.sh
	regrank/xgboost_regrank_obj.h
2014-05-17 10:40:12 +08:00
yepyao
e565916c1c fix small bug 2014-05-17 10:35:10 +08:00
tqchen
a70454e3ce add bing to author list 2014-05-16 19:33:59 -07:00
Tianqi Chen
1150fb59a8 Update demo.py 2014-05-16 19:30:32 -07:00
tqchen
53633ae9c2 chgs 2014-05-16 19:24:53 -07:00
tqchen
98e507451c chg all settings to obj 2014-05-16 19:10:52 -07:00
tqchen
213375baca pre-release version 2014-05-16 18:49:02 -07:00
tqchen
8a0f8a93c7 chg scripts 2014-05-16 18:46:43 -07:00
tqchen
02cefb8f1b cleanup 2014-05-16 18:40:46 -07:00
tqchen
bee87cfce7 chg rank demo 2014-05-16 18:38:40 -07:00
tqchen
4743cc98ec Merge branch 'dev' of ssh://github.com/tqchen/xgboost into dev 2014-05-16 18:29:37 -07:00
tqchen
bf66d31b49 chng few things 2014-05-16 18:25:01 -07:00
tqchen
c67b4d1864 minor changes 2014-05-16 18:19:57 -07:00
antinucleon
4bf23cfbb1 new speed test 2014-05-16 18:05:17 -06:00
antinucleon
4bcf947408 speedtest 2014-05-16 17:48:03 -06:00
yepyao
4d03729683 use ndcg@all in lambdarank for ndcg 2014-05-16 23:06:24 +08:00
yepyao
5db373e73c small change 2014-05-16 21:20:41 +08:00
yepyao
e3a0c0efe5 Download data set from web site 2014-05-16 21:18:32 +08:00
kalenhaha
07e98254f5 Impement new Lambda rank interface 2014-05-16 20:42:46 +08:00
tqchen
2baeeabac4 new lambda rank interface 2014-05-16 00:02:26 -07:00
Bing Xu
da0bb3f44e Update README.md 2014-05-16 01:30:29 -04:00
tqchen
92d1df2d2e ok 2014-05-15 21:17:17 -07:00
tqchen
6af6d64f0b a correct version 2014-05-15 21:11:46 -07:00
tqchen
2be3f6ece0 fix numpy convert 2014-05-15 20:28:34 -07:00
tqchen
a7f3d7edd7 ok 2014-05-15 20:05:22 -07:00
tqchen
c22df2b31a ok 2014-05-15 18:56:28 -07:00
tqchen
e2d13db24e bug fix in pairwise rank 2014-05-15 15:37:58 -07:00
tqchen
37e1473cea cleanup code 2014-05-15 15:01:41 -07:00
tqchen
3960ac9cb4 add xgcombine_buffer with weights 2014-05-15 14:41:11 -07:00
tqchen
a59969cd52 change data format to include weight in binary file, add get weight to python 2014-05-15 14:37:56 -07:00
tqchen
3cb42d3f87 ok 2014-05-15 14:25:44 -07:00
tqchen
88526668f5 add ams 2014-05-14 23:23:27 -07:00
tqchen
31a0823e6d some fix 2014-05-14 16:55:59 -07:00
tqchen
ae9d937510 add AMS metric 2014-05-14 11:30:45 -07:00
kalenhaha
121348c0d7 add in grad and hess rescale in lambdarank 2014-05-14 23:13:27 +08:00
kalenhaha
671c34be63 small bug in ndcg eval 2014-05-13 14:30:42 +08:00
kalenhaha
8967be4af5 Merge branch 'dev' of https://github.com/tqchen/xgboost into dev 2014-05-12 22:22:32 +08:00
kalenhaha
5411e2a500 Add LETOR MQ2008 for rank demo 2014-05-12 22:21:07 +08:00
kalenhaha
e858523d19 remove sampler 2014-05-11 14:31:57 +08:00
kalenhaha
6648a15817 small change 2014-05-11 14:25:30 +08:00
kalenhaha
faf35c409e small change 2014-05-11 14:03:21 +08:00
tqchen
604568b512 simple chgs 2014-05-09 20:39:15 -07:00
kalenhaha
f7b2281510 fix some warnings 2014-05-09 14:14:43 +08:00
kalenhaha
0794dd0f6f Merge branch 'dev' of https://github.com/tqchen/xgboost into dev 2014-05-09 14:07:06 +08:00
kalenhaha
4b6024c563 Separating Lambda MAP and Lambda NDCG 2014-05-09 14:05:52 +08:00
tqchen
41edad7b3d add python o3 2014-05-08 20:15:23 -07:00
tqchen
2ccd28339e faster convert to numpy array 2014-05-08 19:35:06 -07:00
tqchen
a0c0fbbb61 commit the fix 2014-05-08 19:31:32 -07:00
tqchen
06327ff8d0 Merge branch 'dev' of ssh://github.com/tqchen/xgboost into dev 2014-05-07 12:00:17 -07:00
tqchen
0bf6261961 fix omp for bug in obj 2014-05-07 11:52:12 -07:00
kalenhaha
8b3fc78999 Merge branch 'dev' of https://github.com/tqchen/xgboost into dev
Conflicts:
	regrank/xgboost_regrank_obj.hpp
2014-05-07 22:15:59 +08:00
tqchen
833cf29867 fix 2014-05-06 16:53:37 -07:00
tqchen
4b00b3e565 Merge branch 'dev' of ssh://github.com/tqchen/xgboost into dev 2014-05-06 16:51:18 -07:00
tqchen
abe5309977 Merge branch 'dev' of ssh://github.com/tqchen/xgboost into dev
Conflicts:
	regrank/xgboost_regrank_data.h
2014-05-06 16:51:11 -07:00
tqchen
7ddff7b570 add regrank utils 2014-05-06 16:50:46 -07:00
tqchen
c39e1f2f30 right group size 2014-05-06 16:49:10 -07:00
tqchen
4f9833ed76 add cutomized training 2014-05-04 13:57:10 -07:00
tqchen
9bc699fd0e add cutomized training 2014-05-04 13:55:58 -07:00
tqchen
8c0c10463e add boost group support to xgboost. now have beta multi-class classification 2014-05-04 12:10:03 -07:00
kalenhaha
8eae8d956d c++11 features removed 2014-05-04 16:58:44 +08:00
kalenhaha
7161618b4c c++11 features removed 2014-05-04 16:56:57 +08:00
tqchen
21f93ffd6a fix 2014-05-04 00:09:16 -07:00
tqchen
2057dda560 add interact mode 2014-05-03 23:24:22 -07:00
tqchen
6fd77cbb24 add python interface for xgboost 2014-05-03 23:04:02 -07:00
tqchen
adc9400736 finish python lib 2014-05-03 22:18:25 -07:00
tqchen
20de7f8f97 finish matrix 2014-05-03 17:12:25 -07:00
tqchen
5bab27cfa6 good 2014-05-03 16:15:44 -07:00
tqchen
30e725a28c ok 2014-05-03 14:24:00 -07:00
tqchen
aab1b0e7b3 important change to regrank interface, need some more test 2014-05-03 14:20:27 -07:00
tqchen
2305ea7af7 try python 2014-05-03 10:54:08 -07:00
tqchen
c1223bfdef pass test 2014-05-02 18:04:45 -07:00
tqchen
cc91c73160 add new combine tool as promised 2014-05-02 12:55:34 -07:00
tqchen
cbceeb8ca6 Merge branch 'dev' of ssh://github.com/tqchen/xgboost into dev 2014-05-01 11:01:05 -07:00
tqchen
ef7df40bc8 cleanup of evaluation metric, move c++11 codes into sample.h for backup, add lambda in a clean way latter 2014-05-01 11:00:50 -07:00
Tianqi Chen
f93ccda075 Update xgboost_omp.h 2014-05-01 10:16:05 -07:00
kalenhaha
f17d400fd3 fix some bugs in linux 2014-05-02 00:16:12 +08:00
kalenhaha
b836b1123e lambda rank added 2014-05-01 22:17:26 +08:00
tqchen
bf64608cc9 add softmax 2014-04-30 22:11:26 -07:00
tqchen
54c482ffd5 add pre @ n 2014-04-30 22:00:53 -07:00
tqchen
223bb5638b use omp parallel sortting 2014-04-30 09:48:41 -07:00
tqchen
bb93c0aaac add rank 2014-04-30 09:32:42 -07:00
tqchen
a383f11759 add pairwise rank first version 2014-04-29 21:12:30 -07:00
tqchen
81414c0e5b new AUC code 2014-04-29 17:26:58 -07:00
tqchen
87a9c22795 new AUC evaluator, now compatible with weighted loss 2014-04-29 17:03:34 -07:00
tqchen
31edfda03c make regression module compatible with rank loss, now support weighted loss 2014-04-29 16:16:02 -07:00
tqchen
7a79c009ce chg fmap format 2014-04-29 09:59:10 -07:00
tqchen
ea354683b4 add auc evaluation metric 2014-04-24 22:20:40 -07:00
tqchen
7f9637aae4 remove unwanted private field 2014-04-21 10:42:19 -07:00
tqchen
5f0018b070 expose fmatrixs 2014-04-18 18:18:19 -07:00
tqchen
c3592dc06c Merge branch 'master' of ssh://github.com/tqchen/xgboost
Conflicts:
	regression/xgboost_reg_data.h
2014-04-18 17:46:44 -07:00
tqchen
3d327503fd simplify data 2014-04-18 17:43:44 -07:00
kalenhaha
91bb4777b0 Lambda rank added 2014-04-11 10:50:13 +08:00
kalenhaha
efeea99283 Merge branch 'master' of https://github.com/tqchen/xgboost 2014-04-11 10:48:45 +08:00
kalenhaha
07eea71010 Lambda rank added 2014-04-10 22:11:15 +08:00
kalenhaha
c8b2f46b89 lambda rank added 2014-04-10 22:09:19 +08:00
Tianqi Chen
a022a783ce Update xgboost_utils.h 2014-04-07 16:25:21 -07:00
kalenhaha
a10f594644 rank pass toy 2014-04-07 23:25:35 +08:00
tqchen
40c380e40a add deleted main back 2014-04-06 09:32:27 -07:00
kalenhaha
1fa367b220 small fix 2014-04-06 22:54:41 +08:00
kalenhaha
6bc71df494 compiled 2014-04-06 22:51:52 +08:00
tqchen
ddb8a6982c add dev 2014-04-04 10:42:13 -07:00
kalenhaha
c62dea8325 pairwise ranking implemented 2014-04-05 00:14:55 +08:00
kalenhaha
0b1e584d73 Adding ranking task 2014-04-03 16:22:55 +08:00
tqchen
dc239376c7 add dump nice to regression demo 2014-03-26 16:47:01 -07:00
tqchen
7d97d6b1d4 update regression 2014-03-26 16:25:44 -07:00
kalenhaha
0a971cb466 small fix 2014-03-27 00:08:47 +08:00
kalenhaha
52992442ad Merge branch 'master' of https://github.com/tqchen/xgboost 2014-03-26 23:50:56 +08:00
tqchen
c751d6ead3 Merge branch 'master' of ssh://github.com/tqchen/xgboost 2014-03-25 17:18:27 -07:00
tqchen
c7869a7855 small fix 2014-03-25 17:17:00 -07:00
Tianqi Chen
87fc848b12 Update README.md 2014-03-26 08:01:47 +08:00
Tianqi Chen
159ed0f7e1 Update README.md 2014-03-26 08:01:24 +08:00
Tianqi Chen
f7d9c774d7 Update README 2014-03-26 07:21:15 +08:00
kalenhaha
feb914c35b change the regression demo data set 2014-03-24 23:23:11 +08:00
tqchen
d93e8717c1 fix test to pred 2014-03-24 00:31:53 -07:00
kalenhaha
57713be940 remove test directory 2014-03-23 00:05:46 +08:00
kalenhaha
77901f2428 adding regression demo 2014-03-22 21:52:29 +08:00
kalenhaha
55d1b1e109 Merge branch 'master' of https://github.com/tqchen/xgboost 2014-03-22 21:50:31 +08:00
kalenhaha
193d1d165f separate binary classification and regression demo 2014-03-22 21:48:27 +08:00
Tianqi Chen
bc071cac4f Update README.md 2014-03-20 23:12:41 -07:00
Tianqi Chen
50c76ec0d3 Update README.md 2014-03-20 23:12:16 -07:00
tqchen
db285cc4ba add batch running 2014-03-20 16:27:24 -07:00
tqchen
255b1f4043 add feature constraint 2014-03-19 10:47:56 -07:00
tqchen
d3fe4b26a9 fixed remove bug 2014-03-13 13:42:40 -07:00
tqchen
c13126191d neglok 2014-03-12 20:28:21 -07:00
tqchen
8c8dd1a740 support int type 2014-03-12 17:58:14 -07:00
tqchen
329cc61795 more compact 2014-03-11 13:07:20 -07:00
tqchen
a191863213 add accuracy 2014-03-11 13:06:22 -07:00
tqchen
d9ff9fadf6 fix delete 2014-03-11 12:40:51 -07:00
tqchen
377a573097 add remove tree 2014-03-11 11:25:50 -07:00
tqchen
364b4a0f77 add name dumpath 2014-03-06 11:23:51 -08:00
tqchen
d960550933 add add and remove 2014-03-05 16:39:07 -08:00
tqchen
ef5a389ecf try interact mode 2014-03-05 15:28:53 -08:00
tqchen
2bdcad9630 add a test folder 2014-03-05 15:20:11 -08:00
tqchen
74828295fe complete row maker 2014-03-05 14:38:13 -08:00
tqchen
73dfdc539b add row tree maker, to be finished 2014-03-05 11:00:03 -08:00
tqchen
cf14b11130 split new base treemaker, not very good abstraction, but ok 2014-03-05 10:20:36 -08:00
tqchen
8ef7d6beb4 fix reg model_out 2014-03-05 09:34:37 -08:00
tqchen
0fdda29470 reupdate data 2014-03-04 22:47:39 -08:00
tqchen
1479adba58 fix text 2014-03-04 16:22:24 -08:00
tqchen
ae5c26daf6 fix fmatrix 2014-03-04 11:45:22 -08:00
tqchen
ffcfb12515 add simple text loader 2014-03-04 11:33:33 -08:00
tqchen
cba130c40c ok fix 2014-03-03 22:20:45 -08:00
tqchen
9da9861377 big change, change interface to template, everything still OK 2014-03-03 22:16:37 -08:00
tqchen
fad6522a53 backup makefile 2014-03-03 15:21:50 -08:00
tqchen
bbbbe6bc4e compatibility issue with openmp 2014-03-03 15:11:41 -08:00
tqchen
5a65f4b958 ok 2014-03-03 12:26:40 -08:00
tqchen
f0b38810bb maptree is not needed 2014-03-03 11:06:24 -08:00
tqchen
623e003923 fix fmap 2014-03-03 11:05:10 -08:00
tqchen
074a861e7b auto do reboost 2014-03-02 16:42:22 -08:00
tqchen
d534c22094 chg file name of reg 2014-03-02 16:39:00 -08:00
tqchen
4ebdd3cdd2 chg file name of reg 2014-03-02 16:38:59 -08:00
tqchen
c2460da2ab change test task to pred 2014-03-02 16:20:42 -08:00
tqchen
2dd03b1963 make style more like Google style 2014-03-02 13:30:24 -08:00
tqchen
7761d562b1 add smart decision of nfeatures 2014-03-01 21:49:29 -08:00
tqchen
0f410ac54a fix type 2014-03-01 21:29:07 -08:00
tqchen
75427938c3 add smart load 2014-03-01 21:15:54 -08:00
tqchen
5cdc38648b full omp support for regression 2014-03-01 20:56:25 -08:00
tqchen
550010e9d2 fix col maker, make it default 2014-03-01 15:16:30 -08:00
tqchen
394d325078 add col maker 2014-03-01 14:00:09 -08:00
Tianqi Chen
1f04893784 Update README.md 2014-02-28 20:13:01 -08:00
Tianqi Chen
260cbcd3c0 Update README.md 2014-02-28 20:10:57 -08:00
tqchen
e4a4f7d315 chg license, README 2014-02-28 20:09:40 -08:00
tqchen
b57656902e start add coltree maker 2014-02-28 11:44:50 -08:00
tqchen
82807b3a55 add dump2json 2014-02-26 18:54:12 -08:00
tqchen
733f8ae393 add pathdump 2014-02-26 17:08:23 -08:00
tqchen
4a612eb3ba modify tree so that training is standalone 2014-02-26 16:03:00 -08:00
tqchen
2c6922f432 modify tree so that training is standalone 2014-02-26 16:02:58 -08:00
tqchen
9b09cd3d49 change input data structure 2014-02-26 11:51:58 -08:00
tqchen
6fa5c30777 fix mushroom 2014-02-24 23:19:58 -08:00
tqchen
c4949c0937 finish mushroom 2014-02-24 23:06:57 -08:00
tqchen
9d6ef11eb5 add mushroom classification 2014-02-24 22:25:43 -08:00
tqchen
4aa4faa625 add mushroom 2014-02-24 22:19:40 -08:00
tqchen
daab1fef19 pass simple test 2014-02-20 22:28:05 -08:00
tqchen
e52720976c changes to reg booster 2014-02-20 22:08:31 -08:00
kalenhaha
a0dddaf224 tab eliminated 2014-02-19 13:25:01 +08:00
kalenhaha
a20b1d1866 add toy data 2014-02-19 13:01:15 +08:00
kalenhaha
e1b5b99113 add in reg.conf for configuration demo 2014-02-18 16:49:23 +08:00
kalenhaha
7821ef3a7c Merge branch 'master' of https://github.com/tqchen/xgboost 2014-02-16 14:34:35 +08:00
kalenhaha
6d500b2964 fix some bugs 2014-02-16 11:44:03 +08:00
tqchen
f204dd7fcf fix nboosters 2014-02-15 19:42:02 -08:00
tqchen
c38399b989 update license 2014-02-15 17:45:48 -08:00
tqchen
ece5f00ca1 Merge branch 'master' of ssh://github.com/tqchen/xgboost 2014-02-15 17:42:31 -08:00
tqchen
db938ff595 update license 2014-02-15 17:42:23 -08:00
tqchen
5c09686c78 Update README.md 2014-02-15 11:22:50 -08:00
kalenhaha
32e670a4da Comments added 2014-02-13 13:04:55 +08:00
kalenhaha
4dfc4491c2 GBRT Train and Test Phase added 2014-02-12 23:30:32 +08:00
tqchen
d6261c25f2 Update README.md 2014-02-11 20:38:06 -08:00
tqchen
bf81263301 chg fmt to libsvm 2014-02-10 21:41:43 -08:00
tqchen
45a452b27e cleanup reg 2014-02-10 21:09:09 -08:00
tqchen
56e4a2ced1 add regression data 2014-02-10 20:32:23 -08:00
kalenhaha
4d1d3712ea Merge branch 'master' of https://github.com/tqchen/xgboost 2014-02-11 11:19:27 +08:00
kalenhaha
fb568a7a47 gbrt modified 2014-02-11 11:07:00 +08:00
kalenhaha
3afd186ea9 gbrt implemented 2014-02-10 23:40:38 +08:00
tqchen
365b8c4bdc Update README.md 2014-02-08 19:02:33 -08:00
tqchen
6c38e35ffb Update README.md 2014-02-08 13:01:10 -08:00
tqchen
08604d35fc Update README.md 2014-02-08 13:00:49 -08:00
tqchen
52058735d0 Update README.md 2014-02-08 12:50:24 -08:00
tqchen
6a43247bc3 finish readme 2014-02-08 11:47:37 -08:00
tqchen
33acaaa3ae add linear booster 2014-02-08 11:24:35 -08:00
tqchen
d656d9df2c add ok 2014-02-07 22:51:16 -08:00
tqchen
e8feddc6a8 chg makefile 2014-02-07 22:43:13 -08:00
tqchen
bed2e26019 adapt tree booster 2014-02-07 22:41:32 -08:00
tqchen
5d052b9e14 adapt svdfeature tree 2014-02-07 22:38:26 -08:00
tqchen
bf36374678 add detailed comment about gbmcore 2014-02-07 20:30:39 -08:00
tqchen
1e7ac402e6 add empty folder for regression. TODO 2014-02-07 20:20:09 -08:00
tqchen
9ee1048fe9 move core code to booster 2014-02-07 20:13:27 -08:00
tqchen
0d3ecd9033 add base code 2014-02-07 18:40:53 -08:00
tqchen
4e2d67b81a sync everything 2014-02-06 21:28:47 -08:00
tqchen
51d8409e30 add config 2014-02-06 21:26:27 -08:00
tqchen
ee7643bdf6 update this folder 2014-02-06 16:06:59 -08:00
tqchen
5a2b8678fc update this folder 2014-02-06 16:06:18 -08:00
tqchen
750871a158 initial cleanup of interface 2014-02-06 16:03:04 -08:00
tqchen
aecfbf5096 init commit 2014-02-06 15:50:50 -08:00
143 changed files with 28181 additions and 204 deletions

35
.gitignore vendored
View File

@@ -6,8 +6,41 @@
# Compiled Dynamic libraries
*.so
*.dylib
*.page
# Compiled Static libraries
*.lai
*.la
*.a
*~
*.Rcheck
*.rds
*.tar.gz
*txt*
*conf
*buffer
*model
*pyc
*train
*test
*group
*rar
*vali
*data
*sdf
Release
*exe*
*exp
ipch
*.filters
*.user
*log
Debug
*suo
*test*
.Rhistory
*.dll
*i386
*x64
*dump
*save
*csv

22
CHANGES.md Normal file
View File

@@ -0,0 +1,22 @@
Change Log
=====
xgboost-0.1
=====
* Initial release
xgboost-0.2x
=====
* Python module
* Weighted samples instances
* Initial version of pairwise rank
xgboost-0.3
=====
* Faster tree construction module
- Allows subsample columns during tree construction via ```bst:col_samplebytree=ratio```
* Support for boosting from initial predictions
* Experimental version of LambdaRank
* Linear booster is now parallelized, using parallel coordinated descent.
* Add [Code Guide](src/README.md) for customizing objective function and evaluation
* Add R module

211
LICENSE
View File

@@ -1,202 +1,13 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
Copyright (c) 2014 by Tianqi Chen and Contributors
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "{}"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright {yyyy} {name of copyright owner}
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

65
Makefile Normal file
View File

@@ -0,0 +1,65 @@
export CC = gcc
export CXX = g++
export LDFLAGS= -pthread -lm
export CFLAGS = -Wall -O3 -msse2 -Wno-unknown-pragmas -fPIC -pedantic
ifeq ($(no_omp),1)
CFLAGS += -DDISABLE_OPENMP
else
CFLAGS += -fopenmp
endif
# specify tensor path
BIN = xgboost
OBJ = updater.o gbm.o io.o
SLIB = wrapper/libxgboostwrapper.so
.PHONY: clean all python Rpack
all: $(BIN) $(OBJ) $(SLIB)
python: wrapper/libxgboostwrapper.so
# now the wrapper takes in two files. io and wrapper part
wrapper/libxgboostwrapper.so: wrapper/xgboost_wrapper.cpp $(OBJ)
updater.o: src/tree/updater.cpp src/tree/*.hpp src/*.h src/tree/*.h
gbm.o: src/gbm/gbm.cpp src/gbm/*.hpp src/gbm/*.h
io.o: src/io/io.cpp src/io/*.hpp src/utils/*.h src/learner/dmatrix.h src/*.h
xgboost: src/xgboost_main.cpp src/utils/*.h src/*.h src/learner/*.hpp src/learner/*.h $(OBJ)
wrapper/libxgboostwrapper.so: wrapper/xgboost_wrapper.cpp src/utils/*.h src/*.h src/learner/*.hpp src/learner/*.h $(OBJ)
$(BIN) :
$(CXX) $(CFLAGS) $(LDFLAGS) -o $@ $(filter %.cpp %.o %.c, $^)
$(SLIB) :
$(CXX) $(CFLAGS) -fPIC $(LDFLAGS) -shared -o $@ $(filter %.cpp %.o %.c, $^)
$(OBJ) :
$(CXX) -c $(CFLAGS) -o $@ $(firstword $(filter %.cpp %.c, $^) )
install:
cp -f -r $(BIN) $(INSTALL_PATH)
Rpack:
make clean
rm -rf xgboost xgboost*.tar.gz
cp -r R-package xgboost
rm -rf xgboost/inst/examples/*.buffer
rm -rf xgboost/inst/examples/*.model
rm -rf xgboost/inst/examples/dump*
rm -rf xgboost/src/*.o xgboost/src/*.so xgboost/src/*.dll
rm -rf xgboost/demo/*.model xgboost/demo/*.buffer xgboost/demo/*.txt
rm -rf xgboost/demo/runall.R
cp -r src xgboost/src/src
mkdir xgboost/src/wrapper
cp wrapper/xgboost_wrapper.h xgboost/src/wrapper
cp wrapper/xgboost_wrapper.cpp xgboost/src/wrapper
cp ./LICENSE xgboost
cat R-package/src/Makevars|sed '2s/.*/PKGROOT=./' > xgboost/src/Makevars
cat R-package/src/Makevars.win|sed '2s/.*/PKGROOT=./' > xgboost/src/Makevars.win
R CMD build xgboost
rm -rf xgboost
R CMD check --as-cran xgboost*.tar.gz
clean:
$(RM) $(OBJ) $(BIN) $(SLIB) *.o */*.o */*/*.o *~ */*~ */*/*~

24
R-package/DESCRIPTION Normal file
View File

@@ -0,0 +1,24 @@
Package: xgboost
Type: Package
Title: eXtreme Gradient Boosting
Version: 0.3-2
Date: 2014-08-23
Author: Tianqi Chen <tianqi.tchen@gmail.com>, Tong He <hetong007@gmail.com>
Maintainer: Tong He <hetong007@gmail.com>
Description: This package is a R wrapper of xgboost, which is short for eXtreme
Gradient Boosting. It is an efficient and scalable implementation of
gradient boosting framework. The package includes efficient linear model
solver and tree learning algorithms. The package can automatically do
parallel computation with OpenMP, and it can be more than 10 times faster
than existing gradient boosting packages such as gbm. It supports various
objective functions, including regression, classification and ranking. The
package is made to be extensible, so that users are also allowed to define
their own objectives easily.
License: Apache License (== 2.0) | file LICENSE
URL: https://github.com/tqchen/xgboost
BugReports: https://github.com/tqchen/xgboost/issues
Depends:
R (>= 2.10)
Imports:
Matrix (>= 1.1-0),
methods

13
R-package/LICENSE Normal file
View File

@@ -0,0 +1,13 @@
Copyright (c) 2014 by Tianqi Chen and Contributors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

17
R-package/NAMESPACE Normal file
View File

@@ -0,0 +1,17 @@
# Generated by roxygen2 (4.0.1): do not edit by hand
export(getinfo)
export(setinfo)
export(slice)
export(xgb.DMatrix)
export(xgb.DMatrix.save)
export(xgb.cv)
export(xgb.dump)
export(xgb.load)
export(xgb.save)
export(xgb.train)
export(xgboost)
exportMethods(predict)
import(methods)
importClassesFrom(Matrix,dgCMatrix)
importClassesFrom(Matrix,dgeMatrix)

View File

@@ -0,0 +1,41 @@
setClass('xgb.DMatrix')
#' Get information of an xgb.DMatrix object
#'
#' Get information of an xgb.DMatrix object
#'
#' @examples
#' data(agaricus.train, package='xgboost')
#' train <- agaricus.train
#' dtrain <- xgb.DMatrix(train$data, label=train$label)
#' labels <- getinfo(dtrain, 'label')
#' setinfo(dtrain, 'label', 1-labels)
#' labels2 <- getinfo(dtrain, 'label')
#' stopifnot(all(labels2 == 1-labels))
#' @rdname getinfo
#' @export
#'
getinfo <- function(object, ...){
UseMethod("getinfo")
}
#' @param object Object of class "xgb.DMatrix"
#' @param name the name of the field to get
#' @param ... other parameters
#' @rdname getinfo
#' @method getinfo xgb.DMatrix
setMethod("getinfo", signature = "xgb.DMatrix",
definition = function(object, name) {
if (typeof(name) != "character") {
stop("xgb.getinfo: name must be character")
}
if (class(object) != "xgb.DMatrix") {
stop("xgb.setinfo: first argument dtrain must be xgb.DMatrix")
}
if (name != "label" && name != "weight" && name != "base_margin") {
stop(paste("xgb.getinfo: unknown info name", name))
}
ret <- .Call("XGDMatrixGetInfo_R", object, name, PACKAGE = "xgboost")
return(ret)
})

View File

@@ -0,0 +1,42 @@
setClass("xgb.Booster")
#' Predict method for eXtreme Gradient Boosting model
#'
#' Predicted values based on xgboost model object.
#'
#' @param object Object of class "xgb.Boost"
#' @param newdata takes \code{matrix}, \code{dgCMatrix}, local data file or
#' \code{xgb.DMatrix}.
#' @param outputmargin whether the prediction should be shown in the original
#' value of sum of functions, when outputmargin=TRUE, the prediction is
#' untransformed margin value. In logistic regression, outputmargin=T will
#' output value before logistic transformation.
#' @param ntreelimit limit number of trees used in prediction, this parameter is
#' only valid for gbtree, but not for gblinear. set it to be value bigger
#' than 0. It will use all trees by default.
#' @examples
#' data(agaricus.train, package='xgboost')
#' data(agaricus.test, package='xgboost')
#' train <- agaricus.train
#' test <- agaricus.test
#' bst <- xgboost(data = train$data, label = train$label, max.depth = 2,
#' eta = 1, nround = 2,objective = "binary:logistic")
#' pred <- predict(bst, test$data)
#' @export
#'
setMethod("predict", signature = "xgb.Booster",
definition = function(object, newdata, outputmargin = FALSE, ntreelimit = NULL) {
if (class(newdata) != "xgb.DMatrix") {
newdata <- xgb.DMatrix(newdata)
}
if (is.null(ntreelimit)) {
ntreelimit <- 0
} else {
if (ntreelimit < 1){
stop("predict: ntreelimit must be equal to or greater than 1")
}
}
ret <- .Call("XGBoosterPredict_R", object, newdata, as.integer(outputmargin), as.integer(ntreelimit), PACKAGE = "xgboost")
return(ret)
})

View File

@@ -0,0 +1,29 @@
#' Set information of an xgb.DMatrix object
#'
#' Set information of an xgb.DMatrix object
#'
#' @examples
#' data(agaricus.train, package='xgboost')
#' train <- agaricus.train
#' dtrain <- xgb.DMatrix(train$data, label=train$label)
#' labels <- getinfo(dtrain, 'label')
#' setinfo(dtrain, 'label', 1-labels)
#' labels2 <- getinfo(dtrain, 'label')
#' stopifnot(all(labels2 == 1-labels))
#' @rdname setinfo
#' @export
#'
setinfo <- function(object, ...){
UseMethod("setinfo")
}
#' @param object Object of class "xgb.DMatrix"
#' @param name the name of the field to get
#' @param info the specific field of information to set
#' @param ... other parameters
#' @rdname setinfo
#' @method setinfo xgb.DMatrix
setMethod("setinfo", signature = "xgb.DMatrix",
definition = function(object, name, info) {
xgb.setinfo(object, name, info)
})

View File

@@ -0,0 +1,33 @@
setClass('xgb.DMatrix')
#' Get a new DMatrix containing the specified rows of
#' orginal xgb.DMatrix object
#'
#' Get a new DMatrix containing the specified rows of
#' orginal xgb.DMatrix object
#'
#' @examples
#' data(agaricus.train, package='xgboost')
#' train <- agaricus.train
#' dtrain <- xgb.DMatrix(train$data, label=train$label)
#' dsub <- slice(dtrain, 1:3)
#' @rdname slice
#' @export
#'
slice <- function(object, ...){
UseMethod("slice")
}
#' @param object Object of class "xgb.DMatrix"
#' @param idxset a integer vector of indices of rows needed
#' @param ... other parameters
#' @rdname slice
#' @method slice xgb.DMatrix
setMethod("slice", signature = "xgb.DMatrix",
definition = function(object, idxset, ...) {
if (class(object) != "xgb.DMatrix") {
stop("slice: first argument dtrain must be xgb.DMatrix")
}
ret <- .Call("XGDMatrixSliceDMatrix_R", object, idxset, PACKAGE = "xgboost")
return(structure(ret, class = "xgb.DMatrix"))
})

214
R-package/R/utils.R Normal file
View File

@@ -0,0 +1,214 @@
#' @importClassesFrom Matrix dgCMatrix dgeMatrix
#' @import methods
# depends on matrix
.onLoad <- function(libname, pkgname) {
library.dynam("xgboost", pkgname, libname)
}
.onUnload <- function(libpath) {
library.dynam.unload("xgboost", libpath)
}
# set information into dmatrix, this mutate dmatrix
xgb.setinfo <- function(dmat, name, info) {
if (class(dmat) != "xgb.DMatrix") {
stop("xgb.setinfo: first argument dtrain must be xgb.DMatrix")
}
if (name == "label") {
.Call("XGDMatrixSetInfo_R", dmat, name, as.numeric(info),
PACKAGE = "xgboost")
return(TRUE)
}
if (name == "weight") {
.Call("XGDMatrixSetInfo_R", dmat, name, as.numeric(info),
PACKAGE = "xgboost")
return(TRUE)
}
if (name == "base_margin") {
.Call("XGDMatrixSetInfo_R", dmat, name, as.numeric(info),
PACKAGE = "xgboost")
return(TRUE)
}
if (name == "group") {
.Call("XGDMatrixSetInfo_R", dmat, name, as.integer(info),
PACKAGE = "xgboost")
return(TRUE)
}
stop(paste("xgb.setinfo: unknown info name", name))
return(FALSE)
}
# construct a Booster from cachelist
xgb.Booster <- function(params = list(), cachelist = list(), modelfile = NULL) {
if (typeof(cachelist) != "list") {
stop("xgb.Booster: only accepts list of DMatrix as cachelist")
}
for (dm in cachelist) {
if (class(dm) != "xgb.DMatrix") {
stop("xgb.Booster: only accepts list of DMatrix as cachelist")
}
}
handle <- .Call("XGBoosterCreate_R", cachelist, PACKAGE = "xgboost")
if (length(params) != 0) {
for (i in 1:length(params)) {
p <- params[i]
.Call("XGBoosterSetParam_R", handle, gsub("\\.", "_", names(p)), as.character(p),
PACKAGE = "xgboost")
}
}
if (!is.null(modelfile)) {
if (typeof(modelfile) != "character") {
stop("xgb.Booster: modelfile must be character")
}
.Call("XGBoosterLoadModel_R", handle, modelfile, PACKAGE = "xgboost")
}
return(structure(handle, class = "xgb.Booster"))
}
## ----the following are low level iteratively function, not needed if
## you do not want to use them ---------------------------------------
# get dmatrix from data, label
xgb.get.DMatrix <- function(data, label = NULL) {
inClass <- class(data)
if (inClass == "dgCMatrix" || inClass == "matrix") {
if (is.null(label)) {
stop("xgboost: need label when data is a matrix")
}
dtrain <- xgb.DMatrix(data, label = label)
} else {
if (!is.null(label)) {
warning("xgboost: label will be ignored.")
}
if (inClass == "character") {
dtrain <- xgb.DMatrix(data)
} else if (inClass == "xgb.DMatrix") {
dtrain <- data
} else {
stop("xgboost: Invalid input of data")
}
}
return (dtrain)
}
xgb.numrow <- function(dmat) {
nrow <- .Call("XGDMatrixNumRow_R", dmat, PACKAGE="xgboost")
return(nrow)
}
# iteratively update booster with customized statistics
xgb.iter.boost <- function(booster, dtrain, gpair) {
if (class(booster) != "xgb.Booster") {
stop("xgb.iter.update: first argument must be type xgb.Booster")
}
if (class(dtrain) != "xgb.DMatrix") {
stop("xgb.iter.update: second argument must be type xgb.DMatrix")
}
.Call("XGBoosterBoostOneIter_R", booster, dtrain, gpair$grad, gpair$hess,
PACKAGE = "xgboost")
return(TRUE)
}
# iteratively update booster with dtrain
xgb.iter.update <- function(booster, dtrain, iter, obj = NULL) {
if (class(booster) != "xgb.Booster") {
stop("xgb.iter.update: first argument must be type xgb.Booster")
}
if (class(dtrain) != "xgb.DMatrix") {
stop("xgb.iter.update: second argument must be type xgb.DMatrix")
}
if (is.null(obj)) {
.Call("XGBoosterUpdateOneIter_R", booster, as.integer(iter), dtrain,
PACKAGE = "xgboost")
} else {
pred <- predict(booster, dtrain)
gpair <- obj(pred, dtrain)
succ <- xgb.iter.boost(booster, dtrain, gpair)
}
return(TRUE)
}
# iteratively evaluate one iteration
xgb.iter.eval <- function(booster, watchlist, iter, feval = NULL) {
if (class(booster) != "xgb.Booster") {
stop("xgb.eval: first argument must be type xgb.Booster")
}
if (typeof(watchlist) != "list") {
stop("xgb.eval: only accepts list of DMatrix as watchlist")
}
for (w in watchlist) {
if (class(w) != "xgb.DMatrix") {
stop("xgb.eval: watch list can only contain xgb.DMatrix")
}
}
if (length(watchlist) != 0) {
if (is.null(feval)) {
evnames <- list()
for (i in 1:length(watchlist)) {
w <- watchlist[i]
if (length(names(w)) == 0) {
stop("xgb.eval: name tag must be presented for every elements in watchlist")
}
evnames <- append(evnames, names(w))
}
msg <- .Call("XGBoosterEvalOneIter_R", booster, as.integer(iter), watchlist,
evnames, PACKAGE = "xgboost")
} else {
msg <- paste("[", iter, "]", sep="")
for (j in 1:length(watchlist)) {
w <- watchlist[j]
if (length(names(w)) == 0) {
stop("xgb.eval: name tag must be presented for every elements in watchlist")
}
ret <- feval(predict(booster, w[[1]]), w[[1]])
msg <- paste(msg, "\t", names(w), "-", ret$metric, ":", ret$value, sep="")
}
}
} else {
msg <- ""
}
return(msg)
}
#------------------------------------------
# helper functions for cross validation
#
xgb.cv.mknfold <- function(dall, nfold, param) {
randidx <- sample(1 : xgb.numrow(dall))
kstep <- length(randidx) / nfold
idset <- list()
for (i in 1:nfold) {
idset[[i]] <- randidx[ ((i-1) * kstep + 1) : min(i * kstep, length(randidx)) ]
}
ret <- list()
for (k in 1:nfold) {
dtest <- slice(dall, idset[[k]])
didx = c()
for (i in 1:nfold) {
if (i != k) {
didx <- append(didx, idset[[i]])
}
}
dtrain <- slice(dall, didx)
bst <- xgb.Booster(param, list(dtrain, dtest))
watchlist = list(train=dtrain, test=dtest)
ret[[k]] <- list(dtrain=dtrain, booster=bst, watchlist=watchlist)
}
return (ret)
}
xgb.cv.aggcv <- function(res, showsd = TRUE) {
header <- res[[1]]
ret <- header[1]
for (i in 2:length(header)) {
kv <- strsplit(header[i], ":")[[1]]
ret <- paste(ret, "\t", kv[1], ":", sep="")
stats <- c()
stats[1] <- as.numeric(kv[2])
for (j in 2:length(res)) {
tkv <- strsplit(res[[j]][i], ":")[[1]]
stats[j] <- as.numeric(tkv[2])
}
ret <- paste(ret, sprintf("%f", mean(stats)), sep="")
if (showsd) {
ret <- paste(ret, sprintf("+%f", sd(stats)), sep="")
}
}
return (ret)
}

45
R-package/R/xgb.DMatrix.R Normal file
View File

@@ -0,0 +1,45 @@
#' Contruct xgb.DMatrix object
#'
#' Contruct xgb.DMatrix object from dense matrix, sparse matrix or local file.
#'
#' @param data a \code{matrix} object, a \code{dgCMatrix} object or a character
#' indicating the data file.
#' @param info a list of information of the xgb.DMatrix object
#' @param missing Missing is only used when input is dense matrix, pick a float
# value that represents missing value. Sometime a data use 0 or other extreme value to represents missing values.
#
#' @param ... other information to pass to \code{info}.
#'
#' @examples
#' data(agaricus.train, package='xgboost')
#' train <- agaricus.train
#' dtrain <- xgb.DMatrix(train$data, label=train$label)
#' xgb.DMatrix.save(dtrain, 'xgb.DMatrix.data')
#' dtrain <- xgb.DMatrix('xgb.DMatrix.data')
#' @export
#'
xgb.DMatrix <- function(data, info = list(), missing = 0, ...) {
if (typeof(data) == "character") {
handle <- .Call("XGDMatrixCreateFromFile_R", data, as.integer(FALSE),
PACKAGE = "xgboost")
} else if (is.matrix(data)) {
handle <- .Call("XGDMatrixCreateFromMat_R", data, missing,
PACKAGE = "xgboost")
} else if (class(data) == "dgCMatrix") {
handle <- .Call("XGDMatrixCreateFromCSC_R", data@p, data@i, data@x,
PACKAGE = "xgboost")
} else {
stop(paste("xgb.DMatrix: does not support to construct from ",
typeof(data)))
}
dmat <- structure(handle, class = "xgb.DMatrix")
info <- append(info, list(...))
if (length(info) == 0)
return(dmat)
for (i in 1:length(info)) {
p <- info[i]
xgb.setinfo(dmat, names(p), p[[1]])
}
return(dmat)
}

View File

@@ -0,0 +1,27 @@
#' Save xgb.DMatrix object to binary file
#'
#' Save xgb.DMatrix object to binary file
#'
#' @param DMatrix the DMatrix object
#' @param fname the name of the binary file.
#'
#' @examples
#' data(agaricus.train, package='xgboost')
#' train <- agaricus.train
#' dtrain <- xgb.DMatrix(train$data, label=train$label)
#' xgb.DMatrix.save(dtrain, 'xgb.DMatrix.data')
#' dtrain <- xgb.DMatrix('xgb.DMatrix.data')
#' @export
#'
xgb.DMatrix.save <- function(DMatrix, fname) {
if (typeof(fname) != "character") {
stop("xgb.save: fname must be character")
}
if (class(DMatrix) == "xgb.DMatrix") {
.Call("XGDMatrixSaveBinary_R", DMatrix, fname, as.integer(FALSE),
PACKAGE = "xgboost")
return(TRUE)
}
stop("xgb.DMatrix.save: the input must be xgb.DMatrix")
return(FALSE)
}

86
R-package/R/xgb.cv.R Normal file
View File

@@ -0,0 +1,86 @@
#' Cross Validation
#'
#' The cross valudation function of xgboost
#'
#' @param params the list of parameters. Commonly used ones are:
#' \itemize{
#' \item \code{objective} objective function, common ones are
#' \itemize{
#' \item \code{reg:linear} linear regression
#' \item \code{binary:logistic} logistic regression for classification
#' }
#' \item \code{eta} step size of each boosting step
#' \item \code{max.depth} maximum depth of the tree
#' \item \code{nthread} number of thread used in training, if not set, all threads are used
#' }
#'
#' See \url{https://github.com/tqchen/xgboost/wiki/Parameters} for
#' further details. See also demo/ for walkthrough example in R.
#' @param data takes an \code{xgb.DMatrix} as the input.
#' @param nrounds the max number of iterations
#' @param nfold number of folds used
#' @param label option field, when data is Matrix
#' @param showsd boolean, whether show standard deviation of cross validation
#' @param metrics, list of evaluation metrics to be used in corss validation,
#' when it is not specified, the evaluation metric is chosen according to objective function.
#' Possible options are:
#' \itemize{
#' \item \code{error} binary classification error rate
#' \item \code{rmse} Rooted mean square error
#' \item \code{logloss} negative log-likelihood function
#' \item \code{auc} Area under curve
#' \item \code{merror} Exact matching error, used to evaluate multi-class classification
#' }
#' @param obj customized objective function. Returns gradient and second order
#' gradient with given prediction and dtrain,
#' @param feval custimized evaluation function. Returns
#' \code{list(metric='metric-name', value='metric-value')} with given
#' prediction and dtrain,
#' @param ... other parameters to pass to \code{params}.
#'
#' @details
#' This is the cross validation function for xgboost
#'
#' Parallelization is automatically enabled if OpenMP is present.
#' Number of threads can also be manually specified via "nthread" parameter.
#'
#' This function only accepts an \code{xgb.DMatrix} object as the input.
#'
#' @examples
#' data(agaricus.train, package='xgboost')
#' dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
#' history <- xgb.cv(data = dtrain, nround=3, nfold = 5, metrics=list("rmse","auc"),
#' "max.depth"=3, "eta"=1, "objective"="binary:logistic")
#' @export
#'
xgb.cv <- function(params=list(), data, nrounds, nfold, label = NULL,
showsd = TRUE, metrics=list(), obj = NULL, feval = NULL, ...) {
if (typeof(params) != "list") {
stop("xgb.cv: first argument params must be list")
}
if (nfold <= 1) {
stop("nfold must be bigger than 1")
}
dtrain <- xgb.get.DMatrix(data, label)
params <- append(params, list(...))
params <- append(params, list(silent=1))
for (mc in metrics) {
params <- append(params, list("eval_metric"=mc))
}
folds <- xgb.cv.mknfold(dtrain, nfold, params)
history <- list()
for (i in 1:nrounds) {
msg <- list()
for (k in 1:nfold) {
fd <- folds[[k]]
succ <- xgb.iter.update(fd$booster, fd$dtrain, i - 1, obj)
msg[[k]] <- strsplit(xgb.iter.eval(fd$booster, fd$watchlist, i - 1, feval),
"\t")[[1]]
}
ret <- xgb.cv.aggcv(msg, showsd)
history <- append(history, ret)
cat(paste(ret, "\n", sep=""))
}
return (TRUE)
}

33
R-package/R/xgb.dump.R Normal file
View File

@@ -0,0 +1,33 @@
#' Save xgboost model to text file
#'
#' Save a xgboost model to text file. Could be parsed later.
#'
#' @param model the model object.
#' @param fname the name of the binary file.
#' @param fmap feature map file representing the type of feature.
#' Detailed description could be found at
#' \url{https://github.com/tqchen/xgboost/wiki/Binary-Classification#dump-model}.
#' See demo/ for walkthrough example in R, and
#' \url{https://github.com/tqchen/xgboost/blob/master/demo/data/featmap.txt}
#' for example Format.
#'
#' @examples
#' data(agaricus.train, package='xgboost')
#' data(agaricus.test, package='xgboost')
#' train <- agaricus.train
#' test <- agaricus.test
#' bst <- xgboost(data = train$data, label = train$label, max.depth = 2,
#' eta = 1, nround = 2,objective = "binary:logistic")
#' xgb.dump(bst, 'xgb.model.dump')
#' @export
#'
xgb.dump <- function(model, fname, fmap = "") {
if (class(model) != "xgb.Booster") {
stop("xgb.dump: first argument must be type xgb.Booster")
}
if (typeof(fname) != "character") {
stop("xgb.dump: second argument must be type character")
}
.Call("XGBoosterDumpModel_R", model, fname, fmap, PACKAGE = "xgboost")
return(TRUE)
}

23
R-package/R/xgb.load.R Normal file
View File

@@ -0,0 +1,23 @@
#' Load xgboost model from binary file
#'
#' Load xgboost model from the binary model file
#'
#' @param modelfile the name of the binary file.
#'
#' @examples
#' data(agaricus.train, package='xgboost')
#' data(agaricus.test, package='xgboost')
#' train <- agaricus.train
#' test <- agaricus.test
#' bst <- xgboost(data = train$data, label = train$label, max.depth = 2,
#' eta = 1, nround = 2,objective = "binary:logistic")
#' xgb.save(bst, 'xgb.model')
#' bst <- xgb.load('xgb.model')
#' pred <- predict(bst, test$data)
#' @export
#'
xgb.load <- function(modelfile) {
if (is.null(modelfile))
stop("xgb.load: modelfile cannot be NULL")
xgb.Booster(modelfile = modelfile)
}

31
R-package/R/xgb.save.R Normal file
View File

@@ -0,0 +1,31 @@
#' Save xgboost model to binary file
#'
#' Save xgboost model from xgboost or xgb.train
#'
#' @param model the model object.
#' @param fname the name of the binary file.
#'
#' @examples
#' data(agaricus.train, package='xgboost')
#' data(agaricus.test, package='xgboost')
#' train <- agaricus.train
#' test <- agaricus.test
#' bst <- xgboost(data = train$data, label = train$label, max.depth = 2,
#' eta = 1, nround = 2,objective = "binary:logistic")
#' xgb.save(bst, 'xgb.model')
#' bst <- xgb.load('xgb.model')
#' pred <- predict(bst, test$data)
#' @export
#'
xgb.save <- function(model, fname) {
if (typeof(fname) != "character") {
stop("xgb.save: fname must be character")
}
if (class(model) == "xgb.Booster") {
.Call("XGBoosterSaveModel_R", model, fname, PACKAGE = "xgboost")
return(TRUE)
}
stop("xgb.save: the input must be xgb.Booster. Use xgb.DMatrix.save to save
xgb.DMatrix object.")
return(FALSE)
}

98
R-package/R/xgb.train.R Normal file
View File

@@ -0,0 +1,98 @@
#' eXtreme Gradient Boosting Training
#'
#' The training function of xgboost
#'
#' @param params the list of parameters. Commonly used ones are:
#' \itemize{
#' \item \code{objective} objective function, common ones are
#' \itemize{
#' \item \code{reg:linear} linear regression
#' \item \code{binary:logistic} logistic regression for classification
#' }
#' \item \code{eta} step size of each boosting step
#' \item \code{max.depth} maximum depth of the tree
#' \item \code{nthread} number of thread used in training, if not set, all threads are used
#' }
#'
#' See \url{https://github.com/tqchen/xgboost/wiki/Parameters} for
#' further details. See also demo/ for walkthrough example in R.
#' @param data takes an \code{xgb.DMatrix} as the input.
#' @param nrounds the max number of iterations
#' @param watchlist what information should be printed when \code{verbose=1} or
#' \code{verbose=2}. Watchlist is used to specify validation set monitoring
#' during training. For example user can specify
#' watchlist=list(validation1=mat1, validation2=mat2) to watch
#' the performance of each round's model on mat1 and mat2
#'
#' @param obj customized objective function. Returns gradient and second order
#' gradient with given prediction and dtrain,
#' @param feval custimized evaluation function. Returns
#' \code{list(metric='metric-name', value='metric-value')} with given
#' prediction and dtrain,
#' @param verbose If 0, xgboost will stay silent. If 1, xgboost will print
#' information of performance. If 2, xgboost will print information of both
#'
#' @param ... other parameters to pass to \code{params}.
#'
#' @details
#' This is the training function for xgboost.
#'
#' Parallelization is automatically enabled if OpenMP is present.
#' Number of threads can also be manually specified via "nthread" parameter.
#'
#' This function only accepts an \code{xgb.DMatrix} object as the input.
#' It supports advanced features such as watchlist, customized objective function,
#' therefore it is more flexible than \code{\link{xgboost}}.
#'
#'
#' @examples
#' data(agaricus.train, package='xgboost')
#' dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
#' dtest <- dtrain
#' watchlist <- list(eval = dtest, train = dtrain)
#' param <- list(max.depth = 2, eta = 1, silent = 1)
#' logregobj <- function(preds, dtrain) {
#' labels <- getinfo(dtrain, "label")
#' preds <- 1/(1 + exp(-preds))
#' grad <- preds - labels
#' hess <- preds * (1 - preds)
#' return(list(grad = grad, hess = hess))
#' }
#' evalerror <- function(preds, dtrain) {
#' labels <- getinfo(dtrain, "label")
#' err <- as.numeric(sum(labels != (preds > 0)))/length(labels)
#' return(list(metric = "error", value = err))
#' }
#' bst <- xgb.train(param, dtrain, nround = 2, watchlist, logregobj, evalerror)
#' @export
#'
xgb.train <- function(params=list(), data, nrounds, watchlist = list(),
obj = NULL, feval = NULL, verbose = 1, ...) {
dtrain <- data
if (typeof(params) != "list") {
stop("xgb.train: first argument params must be list")
}
if (class(dtrain) != "xgb.DMatrix") {
stop("xgb.train: second argument dtrain must be xgb.DMatrix")
}
if (verbose > 1) {
params <- append(params, list(silent = 0))
} else {
params <- append(params, list(silent = 1))
}
if (length(watchlist) != 0 && verbose == 0) {
warning('watchlist is provided but verbose=0, no evaluation information will be printed')
watchlist <- list()
}
params = append(params, list(...))
bst <- xgb.Booster(params, append(watchlist, dtrain))
for (i in 1:nrounds) {
succ <- xgb.iter.update(bst, dtrain, i - 1, obj)
if (length(watchlist) != 0) {
msg <- xgb.iter.eval(bst, watchlist, i - 1, feval)
cat(paste(msg, "\n", sep=""))
}
}
return(bst)
}

115
R-package/R/xgboost.R Normal file
View File

@@ -0,0 +1,115 @@
#' eXtreme Gradient Boosting (Tree) library
#'
#' A simple interface for xgboost in R
#'
#' @param data takes \code{matrix}, \code{dgCMatrix}, local data file or
#' \code{xgb.DMatrix}.
#' @param label the response variable. User should not set this field,
# if data is local data file or \code{xgb.DMatrix}.
#' @param params the list of parameters. Commonly used ones are:
#' \itemize{
#' \item \code{objective} objective function, common ones are
#' \itemize{
#' \item \code{reg:linear} linear regression
#' \item \code{binary:logistic} logistic regression for classification
#' }
#' \item \code{eta} step size of each boosting step
#' \item \code{max.depth} maximum depth of the tree
#' \item \code{nthread} number of thread used in training, if not set, all threads are used
#' }
#'
#' See \url{https://github.com/tqchen/xgboost/wiki/Parameters} for
#' further details. See also demo/ for walkthrough example in R.
#' @param nrounds the max number of iterations
#' @param verbose If 0, xgboost will stay silent. If 1, xgboost will print
#' information of performance. If 2, xgboost will print information of both
#' performance and construction progress information
#' @param ... other parameters to pass to \code{params}.
#'
#' @details
#' This is the modeling function for xgboost.
#'
#' Parallelization is automatically enabled if OpenMP is present.
#' Number of threads can also be manually specified via "nthread" parameter
#'
#' @examples
#' data(agaricus.train, package='xgboost')
#' data(agaricus.test, package='xgboost')
#' train <- agaricus.train
#' test <- agaricus.test
#' bst <- xgboost(data = train$data, label = train$label, max.depth = 2,
#' eta = 1, nround = 2,objective = "binary:logistic")
#' pred <- predict(bst, test$data)
#'
#' @export
#'
xgboost <- function(data = NULL, label = NULL, params = list(), nrounds,
verbose = 1, ...) {
dtrain <- xgb.get.DMatrix(data, label)
params <- append(params, list(...))
if (verbose > 0) {
watchlist <- list(train = dtrain)
} else {
watchlist <- list()
}
bst <- xgb.train(params, dtrain, nrounds, watchlist, verbose=verbose)
return(bst)
}
#' Training part from Mushroom Data Set
#'
#' This data set is originally from the Mushroom data set,
#' UCI Machine Learning Repository.
#'
#' This data set includes the following fields:
#'
#' \itemize{
#' \item \code{label} the label for each record
#' \item \code{data} a sparse Matrix of \code{dgCMatrix} class, with 127 columns.
#' }
#'
#' @references
#' https://archive.ics.uci.edu/ml/datasets/Mushroom
#'
#' Bache, K. & Lichman, M. (2013). UCI Machine Learning Repository
#' [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California,
#' School of Information and Computer Science.
#'
#' @docType data
#' @keywords datasets
#' @name agaricus.train
#' @usage data(agaricus.train)
#' @format A list containing a label vector, and a dgCMatrix object with 6513
#' rows and 127 variables
NULL
#' Test part from Mushroom Data Set
#'
#' This data set is originally from the Mushroom data set,
#' UCI Machine Learning Repository.
#'
#' This data set includes the following fields:
#'
#' \itemize{
#' \item \code{label} the label for each record
#' \item \code{data} a sparse Matrix of \code{dgCMatrix} class, with 127 columns.
#' }
#'
#' @references
#' https://archive.ics.uci.edu/ml/datasets/Mushroom
#'
#' Bache, K. & Lichman, M. (2013). UCI Machine Learning Repository
#' [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California,
#' School of Information and Computer Science.
#'
#' @docType data
#' @keywords datasets
#' @name agaricus.test
#' @usage data(agaricus.test)
#' @format A list containing a label vector, and a dgCMatrix object with 1611
#' rows and 127 variables
NULL

21
R-package/README.md Normal file
View File

@@ -0,0 +1,21 @@
# R package for xgboost.
## Installation
For up-to-date version(which is recommended), please install from github. Windows user will need to install [RTools](http://cran.r-project.org/bin/windows/Rtools/) first.
```r
require(devtools)
install_github('xgboost','tqchen',subdir='R-package')
```
For stable version on CRAN, please run
```r
install.packages('xgboost')
```
## Examples
* Please visit [walk through example](https://github.com/tqchen/xgboost/blob/master/R-package/demo).
* See also the [example scripts](https://github.com/tqchen/xgboost/tree/master/demo/kaggle-higgs) for Kaggle Higgs Challenge, including [speedtest script](https://github.com/tqchen/xgboost/blob/master/demo/kaggle-higgs/speedtest.R) on this dataset.

Binary file not shown.

Binary file not shown.

6
R-package/demo/00Index Normal file
View File

@@ -0,0 +1,6 @@
basic_walkthrough Basic feature walkthrough
custom_objective Cutomize loss function, and evaluation metric
boost_from_prediction Boosting from existing prediction
predict_first_ntree Predicting using first n trees
generalized_linear_model Generalized Linear Model
cross_validation Cross validation

17
R-package/demo/README.md Normal file
View File

@@ -0,0 +1,17 @@
XGBoost R Feature Walkthrough
====
* [Basic walkthrough of wrappers](basic_walkthrough.R)
* [Cutomize loss function, and evaluation metric](custom_objective.R)
* [Boosting from existing prediction](boost_from_prediction.R)
* [Predicting using first n trees](predict_first_ntree.R)
* [Generalized Linear Model](generalized_linear_model.R)
* [Cross validation](cross_validation.R)
Benchmarks
====
* [Starter script for Kaggle Higgs Boson](../../demo/kaggle-higgs)
Notes
====
* Contribution of exampls, benchmarks is more than welcomed!
* If you like to share how you use xgboost to solve your problem, send a pull request:)

View File

@@ -0,0 +1,93 @@
require(xgboost)
require(methods)
# we load in the agaricus dataset
# In this example, we are aiming to predict whether a mushroom can be eated
data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')
train <- agaricus.train
test <- agaricus.test
# the loaded data is stored in sparseMatrix, and label is a numeric vector in {0,1}
class(train$label)
class(train$data)
#-------------Basic Training using XGBoost-----------------
# this is the basic usage of xgboost you can put matrix in data field
# note: we are puting in sparse matrix here, xgboost naturally handles sparse input
# use sparse matrix when your feature is sparse(e.g. when you using one-hot encoding vector)
print("training xgboost with sparseMatrix")
bst <- xgboost(data = train$data, label = train$label, max.depth = 2, eta = 1, nround = 2,
objective = "binary:logistic")
# alternatively, you can put in dense matrix, i.e. basic R-matrix
print("training xgboost with Matrix")
bst <- xgboost(data = as.matrix(train$data), label = train$label, max.depth = 2, eta = 1, nround = 2,
objective = "binary:logistic")
# you can also put in xgb.DMatrix object, stores label, data and other meta datas needed for advanced features
print("training xgboost with xgb.DMatrix")
dtrain <- xgb.DMatrix(data = train$data, label = train$label)
bst <- xgboost(data = dtrain, max.depth = 2, eta = 1, nround = 2, objective = "binary:logistic")
# Verbose = 0,1,2
print ('train xgboost with verbose 0, no message')
bst <- xgboost(data = dtrain, max.depth = 2, eta = 1, nround = 2,
objective = "binary:logistic", verbose = 0)
print ('train xgboost with verbose 1, print evaluation metric')
bst <- xgboost(data = dtrain, max.depth = 2, eta = 1, nround = 2,
objective = "binary:logistic", verbose = 1)
print ('train xgboost with verbose 2, also print information about tree')
bst <- xgboost(data = dtrain, max.depth = 2, eta = 1, nround = 2,
objective = "binary:logistic", verbose = 2)
# you can also specify data as file path to a LibSVM format input
# since we do not have this file with us, the following line is just for illustration
# bst <- xgboost(data = 'agaricus.train.svm', max.depth = 2, eta = 1, nround = 2,objective = "binary:logistic")
#--------------------basic prediction using xgboost--------------
# you can do prediction using the following line
# you can put in Matrix, sparseMatrix, or xgb.DMatrix
pred <- predict(bst, test$data)
err <- mean(as.numeric(pred > 0.5) != test$label)
print(paste("test-error=", err))
#-------------------save and load models-------------------------
# save model to binary local file
xgb.save(bst, "xgboost.model")
# load binary model to R
bst2 <- xgb.load("xgboost.model")
pred2 <- predict(bst2, test$data)
# pred2 should be identical to pred
print(paste("sum(abs(pred2-pred))=", sum(abs(pred2-pred))))
#----------------Advanced features --------------
# to use advanced features, we need to put data in xgb.DMatrix
dtrain <- xgb.DMatrix(data = train$data, label=train$label)
dtest <- xgb.DMatrix(data = test$data, label=test$label)
#---------------Using watchlist----------------
# watchlist is a list of xgb.DMatrix, each of them tagged with name
watchlist <- list(train=dtrain, test=dtest)
# to train with watchlist, use xgb.train, which contains more advanced features
# watchlist allows us to monitor the evaluation result on all data in the list
print ('train xgboost using xgb.train with watchlist')
bst <- xgb.train(data=dtrain, max.depth=2, eta=1, nround=2, watchlist=watchlist,
objective = "binary:logistic")
# we can change evaluation metrics, or use multiple evaluation metrics
print ('train xgboost using xgb.train with watchlist, watch logloss and error')
bst <- xgb.train(data=dtrain, max.depth=2, eta=1, nround=2, watchlist=watchlist,
eval.metric = "error", eval.metric = "logloss",
objective = "binary:logistic")
# xgb.DMatrix can also be saved using xgb.DMatrix.save
xgb.DMatrix.save(dtrain, "dtrain.buffer")
# to load it in, simply call xgb.DMatrix
dtrain2 <- xgb.DMatrix("dtrain.buffer")
bst <- xgb.train(data=dtrain2, max.depth=2, eta=1, nround=2, watchlist=watchlist,
objective = "binary:logistic")
# information can be extracted from xgb.DMatrix using getinfo
label = getinfo(dtest, "label")
pred <- predict(bst, dtest)
err <- as.numeric(sum(as.integer(pred > 0.5) != label))/length(label)
print(paste("test-error=", err))
# Finally, you can dump the tree you learned using xgb.dump into a text file
xgb.dump(bst, "dump.raw.txt")

View File

@@ -0,0 +1,26 @@
require(xgboost)
# load in the agaricus dataset
data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')
dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
dtest <- xgb.DMatrix(agaricus.test$data, label = agaricus.test$label)
watchlist <- list(eval = dtest, train = dtrain)
###
# advanced: start from a initial base prediction
#
print('start running example to start from a initial prediction')
# train xgboost for 1 round
param <- list(max.depth=2,eta=1,silent=1,objective='binary:logistic')
bst <- xgb.train( param, dtrain, 1, watchlist )
# Note: we need the margin value instead of transformed prediction in set_base_margin
# do predict with output_margin=TRUE, will always give you margin values before logistic transformation
ptrain <- predict(bst, dtrain, outputmargin=TRUE)
ptest <- predict(bst, dtest, outputmargin=TRUE)
# set the base_margin property of dtrain and dtest
# base margin is the base prediction we will boost from
setinfo(dtrain, "base_margin", ptrain)
setinfo(dtest, "base_margin", ptest)
print('this is result of boost from initial prediction')
bst <- xgb.train( param, dtrain, 1, watchlist )

View File

@@ -0,0 +1,47 @@
require(xgboost)
# load in the agaricus dataset
data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')
dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
dtest <- xgb.DMatrix(agaricus.test$data, label = agaricus.test$label)
nround <- 2
param <- list(max.depth=2,eta=1,silent=1,objective='binary:logistic')
cat('running cross validation\n')
# do cross validation, this will print result out as
# [iteration] metric_name:mean_value+std_value
# std_value is standard deviation of the metric
xgb.cv(param, dtrain, nround, nfold=5, metrics={'error'})
cat('running cross validation, disable standard deviation display\n')
# do cross validation, this will print result out as
# [iteration] metric_name:mean_value+std_value
# std_value is standard deviation of the metric
xgb.cv(param, dtrain, nround, nfold=5,
metrics={'error'}, , showsd = FALSE)
###
# you can also do cross validation with cutomized loss function
# See custom_objective.R
##
print ('running cross validation, with cutomsized loss function')
logregobj <- function(preds, dtrain) {
labels <- getinfo(dtrain, "label")
preds <- 1/(1 + exp(-preds))
grad <- preds - labels
hess <- preds * (1 - preds)
return(list(grad = grad, hess = hess))
}
evalerror <- function(preds, dtrain) {
labels <- getinfo(dtrain, "label")
err <- as.numeric(sum(labels != (preds > 0)))/length(labels)
return(list(metric = "error", value = err))
}
param <- list(max.depth=2,eta=1,silent=1)
# train with customized objective
xgb.cv(param, dtrain, nround, nfold = 5,
obj = logregobj, feval=evalerror)

View File

@@ -0,0 +1,39 @@
require(xgboost)
# load in the agaricus dataset
data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')
dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
dtest <- xgb.DMatrix(agaricus.test$data, label = agaricus.test$label)
# note: for customized objective function, we leave objective as default
# note: what we are getting is margin value in prediction
# you must know what you are doing
param <- list(max.depth=2,eta=1,silent=1)
watchlist <- list(eval = dtest, train = dtrain)
num_round <- 2
# user define objective function, given prediction, return gradient and second order gradient
# this is loglikelihood loss
logregobj <- function(preds, dtrain) {
labels <- getinfo(dtrain, "label")
preds <- 1/(1 + exp(-preds))
grad <- preds - labels
hess <- preds * (1 - preds)
return(list(grad = grad, hess = hess))
}
# user defined evaluation function, return a pair metric_name, result
# NOTE: when you do customized loss function, the default prediction value is margin
# this may make buildin evalution metric not function properly
# for example, we are doing logistic loss, the prediction is score before logistic transformation
# the buildin evaluation error assumes input is after logistic transformation
# Take this in mind when you use the customization, and maybe you need write customized evaluation function
evalerror <- function(preds, dtrain) {
labels <- getinfo(dtrain, "label")
err <- as.numeric(sum(labels != (preds > 0)))/length(labels)
return(list(metric = "error", value = err))
}
print ('start training with user customized objective')
# training with customized objective, we can also do step by step training
# simply look at xgboost.py's implementation of train
bst <- xgb.train(param, dtrain, num_round, watchlist, logregobj, evalerror)

View File

@@ -0,0 +1,34 @@
require(xgboost)
# load in the agaricus dataset
data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')
dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
dtest <- xgb.DMatrix(agaricus.test$data, label = agaricus.test$label)
##
# this script demonstrate how to fit generalized linear model in xgboost
# basically, we are using linear model, instead of tree for our boosters
# you can fit a linear regression, or logistic regression model
##
# change booster to gblinear, so that we are fitting a linear model
# alpha is the L1 regularizer
# lambda is the L2 regularizer
# you can also set lambda_bias which is L2 regularizer on the bias term
param <- list(objective = "binary:logistic", booster = "gblinear",
alpha = 0.0001, lambda = 1)
# normally, you do not need to set eta (step_size)
# XGBoost uses a parallel coordinate descent algorithm (shotgun),
# there could be affection on convergence with parallelization on certain cases
# setting eta to be smaller value, e.g 0.5 can make the optimization more stable
##
# the rest of settings are the same
##
watchlist <- list(eval = dtest, train = dtrain)
num_round <- 2
bst <- xgb.train(param, dtrain, num_round, watchlist)
ypred <- predict(bst, dtest)
labels <- getinfo(dtest, 'label')
cat('error of preds=', mean(as.numeric(ypred>0.5)!=labels),'\n')

View File

@@ -0,0 +1,23 @@
require(xgboost)
# load in the agaricus dataset
data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')
dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
dtest <- xgb.DMatrix(agaricus.test$data, label = agaricus.test$label)
param <- list(max.depth=2,eta=1,silent=1,objective='binary:logistic')
watchlist <- list(eval = dtest, train = dtrain)
nround = 2
# training the model for two rounds
bst = xgb.train(param, dtrain, nround, watchlist)
cat('start testing prediction from first n trees\n')
labels <- getinfo(dtest,'label')
### predict using first 1 tree
ypred1 = predict(bst, dtest, ntreelimit=1)
# by default, we predict using all the trees
ypred2 = predict(bst, dtest)
cat('error of ypred1=', mean(as.numeric(ypred1>0.5)!=labels),'\n')
cat('error of ypred2=', mean(as.numeric(ypred2>0.5)!=labels),'\n')

8
R-package/demo/runall.R Normal file
View File

@@ -0,0 +1,8 @@
# running all scripts in demo folder
demo(basic_walkthrough)
demo(custom_objective)
demo(boost_from_prediction)
demo(predict_first_ntree)
demo(generalized_linear_model)
demo(cross_validation)

View File

@@ -0,0 +1,31 @@
% Generated by roxygen2 (4.0.1): do not edit by hand
\docType{data}
\name{agaricus.test}
\alias{agaricus.test}
\title{Test part from Mushroom Data Set}
\format{A list containing a label vector, and a dgCMatrix object with 1611
rows and 127 variables}
\usage{
data(agaricus.test)
}
\description{
This data set is originally from the Mushroom data set,
UCI Machine Learning Repository.
}
\details{
This data set includes the following fields:
\itemize{
\item \code{label} the label for each record
\item \code{data} a sparse Matrix of \code{dgCMatrix} class, with 127 columns.
}
}
\references{
https://archive.ics.uci.edu/ml/datasets/Mushroom
Bache, K. & Lichman, M. (2013). UCI Machine Learning Repository
[http://archive.ics.uci.edu/ml]. Irvine, CA: University of California,
School of Information and Computer Science.
}
\keyword{datasets}

View File

@@ -0,0 +1,31 @@
% Generated by roxygen2 (4.0.1): do not edit by hand
\docType{data}
\name{agaricus.train}
\alias{agaricus.train}
\title{Training part from Mushroom Data Set}
\format{A list containing a label vector, and a dgCMatrix object with 6513
rows and 127 variables}
\usage{
data(agaricus.train)
}
\description{
This data set is originally from the Mushroom data set,
UCI Machine Learning Repository.
}
\details{
This data set includes the following fields:
\itemize{
\item \code{label} the label for each record
\item \code{data} a sparse Matrix of \code{dgCMatrix} class, with 127 columns.
}
}
\references{
https://archive.ics.uci.edu/ml/datasets/Mushroom
Bache, K. & Lichman, M. (2013). UCI Machine Learning Repository
[http://archive.ics.uci.edu/ml]. Irvine, CA: University of California,
School of Information and Computer Science.
}
\keyword{datasets}

31
R-package/man/getinfo.Rd Normal file
View File

@@ -0,0 +1,31 @@
% Generated by roxygen2 (4.0.1): do not edit by hand
\docType{methods}
\name{getinfo}
\alias{getinfo}
\alias{getinfo,xgb.DMatrix-method}
\title{Get information of an xgb.DMatrix object}
\usage{
getinfo(object, ...)
\S4method{getinfo}{xgb.DMatrix}(object, name)
}
\arguments{
\item{object}{Object of class "xgb.DMatrix"}
\item{name}{the name of the field to get}
\item{...}{other parameters}
}
\description{
Get information of an xgb.DMatrix object
}
\examples{
data(agaricus.train, package='xgboost')
train <- agaricus.train
dtrain <- xgb.DMatrix(train$data, label=train$label)
labels <- getinfo(dtrain, 'label')
setinfo(dtrain, 'label', 1-labels)
labels2 <- getinfo(dtrain, 'label')
stopifnot(all(labels2 == 1-labels))
}

View File

@@ -0,0 +1,37 @@
% Generated by roxygen2 (4.0.1): do not edit by hand
\docType{methods}
\name{predict,xgb.Booster-method}
\alias{predict,xgb.Booster-method}
\title{Predict method for eXtreme Gradient Boosting model}
\usage{
\S4method{predict}{xgb.Booster}(object, newdata, outputmargin = FALSE,
ntreelimit = NULL)
}
\arguments{
\item{object}{Object of class "xgb.Boost"}
\item{newdata}{takes \code{matrix}, \code{dgCMatrix}, local data file or
\code{xgb.DMatrix}.}
\item{outputmargin}{whether the prediction should be shown in the original
value of sum of functions, when outputmargin=TRUE, the prediction is
untransformed margin value. In logistic regression, outputmargin=T will
output value before logistic transformation.}
\item{ntreelimit}{limit number of trees used in prediction, this parameter is
only valid for gbtree, but not for gblinear. set it to be value bigger
than 0. It will use all trees by default.}
}
\description{
Predicted values based on xgboost model object.
}
\examples{
data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')
train <- agaricus.train
test <- agaricus.test
bst <- xgboost(data = train$data, label = train$label, max.depth = 2,
eta = 1, nround = 2,objective = "binary:logistic")
pred <- predict(bst, test$data)
}

33
R-package/man/setinfo.Rd Normal file
View File

@@ -0,0 +1,33 @@
% Generated by roxygen2 (4.0.1): do not edit by hand
\docType{methods}
\name{setinfo}
\alias{setinfo}
\alias{setinfo,xgb.DMatrix-method}
\title{Set information of an xgb.DMatrix object}
\usage{
setinfo(object, ...)
\S4method{setinfo}{xgb.DMatrix}(object, name, info)
}
\arguments{
\item{object}{Object of class "xgb.DMatrix"}
\item{name}{the name of the field to get}
\item{info}{the specific field of information to set}
\item{...}{other parameters}
}
\description{
Set information of an xgb.DMatrix object
}
\examples{
data(agaricus.train, package='xgboost')
train <- agaricus.train
dtrain <- xgb.DMatrix(train$data, label=train$label)
labels <- getinfo(dtrain, 'label')
setinfo(dtrain, 'label', 1-labels)
labels2 <- getinfo(dtrain, 'label')
stopifnot(all(labels2 == 1-labels))
}

30
R-package/man/slice.Rd Normal file
View File

@@ -0,0 +1,30 @@
% Generated by roxygen2 (4.0.1): do not edit by hand
\docType{methods}
\name{slice}
\alias{slice}
\alias{slice,xgb.DMatrix-method}
\title{Get a new DMatrix containing the specified rows of
orginal xgb.DMatrix object}
\usage{
slice(object, ...)
\S4method{slice}{xgb.DMatrix}(object, idxset, ...)
}
\arguments{
\item{object}{Object of class "xgb.DMatrix"}
\item{idxset}{a integer vector of indices of rows needed}
\item{...}{other parameters}
}
\description{
Get a new DMatrix containing the specified rows of
orginal xgb.DMatrix object
}
\examples{
data(agaricus.train, package='xgboost')
train <- agaricus.train
dtrain <- xgb.DMatrix(train$data, label=train$label)
dsub <- slice(dtrain, 1:3)
}

View File

@@ -0,0 +1,28 @@
% Generated by roxygen2 (4.0.1): do not edit by hand
\name{xgb.DMatrix}
\alias{xgb.DMatrix}
\title{Contruct xgb.DMatrix object}
\usage{
xgb.DMatrix(data, info = list(), missing = 0, ...)
}
\arguments{
\item{data}{a \code{matrix} object, a \code{dgCMatrix} object or a character
indicating the data file.}
\item{info}{a list of information of the xgb.DMatrix object}
\item{missing}{Missing is only used when input is dense matrix, pick a float}
\item{...}{other information to pass to \code{info}.}
}
\description{
Contruct xgb.DMatrix object from dense matrix, sparse matrix or local file.
}
\examples{
data(agaricus.train, package='xgboost')
train <- agaricus.train
dtrain <- xgb.DMatrix(train$data, label=train$label)
xgb.DMatrix.save(dtrain, 'xgb.DMatrix.data')
dtrain <- xgb.DMatrix('xgb.DMatrix.data')
}

View File

@@ -0,0 +1,23 @@
% Generated by roxygen2 (4.0.1): do not edit by hand
\name{xgb.DMatrix.save}
\alias{xgb.DMatrix.save}
\title{Save xgb.DMatrix object to binary file}
\usage{
xgb.DMatrix.save(DMatrix, fname)
}
\arguments{
\item{DMatrix}{the DMatrix object}
\item{fname}{the name of the binary file.}
}
\description{
Save xgb.DMatrix object to binary file
}
\examples{
data(agaricus.train, package='xgboost')
train <- agaricus.train
dtrain <- xgb.DMatrix(train$data, label=train$label)
xgb.DMatrix.save(dtrain, 'xgb.DMatrix.data')
dtrain <- xgb.DMatrix('xgb.DMatrix.data')
}

72
R-package/man/xgb.cv.Rd Normal file
View File

@@ -0,0 +1,72 @@
% Generated by roxygen2 (4.0.1): do not edit by hand
\name{xgb.cv}
\alias{xgb.cv}
\title{Cross Validation}
\usage{
xgb.cv(params = list(), data, nrounds, nfold, label = NULL, showsd = TRUE,
metrics = list(), obj = NULL, feval = NULL, ...)
}
\arguments{
\item{params}{the list of parameters. Commonly used ones are:
\itemize{
\item \code{objective} objective function, common ones are
\itemize{
\item \code{reg:linear} linear regression
\item \code{binary:logistic} logistic regression for classification
}
\item \code{eta} step size of each boosting step
\item \code{max.depth} maximum depth of the tree
\item \code{nthread} number of thread used in training, if not set, all threads are used
}
See \url{https://github.com/tqchen/xgboost/wiki/Parameters} for
further details. See also demo/ for walkthrough example in R.}
\item{data}{takes an \code{xgb.DMatrix} as the input.}
\item{nrounds}{the max number of iterations}
\item{nfold}{number of folds used}
\item{label}{option field, when data is Matrix}
\item{showsd}{boolean, whether show standard deviation of cross validation}
\item{metrics,}{list of evaluation metrics to be used in corss validation,
when it is not specified, the evaluation metric is chosen according to objective function.
Possible options are:
\itemize{
\item \code{error} binary classification error rate
\item \code{rmse} Rooted mean square error
\item \code{logloss} negative log-likelihood function
\item \code{auc} Area under curve
\item \code{merror} Exact matching error, used to evaluate multi-class classification
}}
\item{obj}{customized objective function. Returns gradient and second order
gradient with given prediction and dtrain,}
\item{feval}{custimized evaluation function. Returns
\code{list(metric='metric-name', value='metric-value')} with given
prediction and dtrain,}
\item{...}{other parameters to pass to \code{params}.}
}
\description{
The cross valudation function of xgboost
}
\details{
This is the cross validation function for xgboost
Parallelization is automatically enabled if OpenMP is present.
Number of threads can also be manually specified via "nthread" parameter.
This function only accepts an \code{xgb.DMatrix} object as the input.
}
\examples{
data(agaricus.train, package='xgboost')
dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
history <- xgb.cv(data = dtrain, nround=3, nfold = 5, metrics=list("rmse","auc"),
"max.depth"=3, "eta"=1, "objective"="binary:logistic")
}

32
R-package/man/xgb.dump.Rd Normal file
View File

@@ -0,0 +1,32 @@
% Generated by roxygen2 (4.0.1): do not edit by hand
\name{xgb.dump}
\alias{xgb.dump}
\title{Save xgboost model to text file}
\usage{
xgb.dump(model, fname, fmap = "")
}
\arguments{
\item{model}{the model object.}
\item{fname}{the name of the binary file.}
\item{fmap}{feature map file representing the type of feature.
Detailed description could be found at
\url{https://github.com/tqchen/xgboost/wiki/Binary-Classification#dump-model}.
See demo/ for walkthrough example in R, and
\url{https://github.com/tqchen/xgboost/blob/master/demo/data/featmap.txt}
for example Format.}
}
\description{
Save a xgboost model to text file. Could be parsed later.
}
\examples{
data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')
train <- agaricus.train
test <- agaricus.test
bst <- xgboost(data = train$data, label = train$label, max.depth = 2,
eta = 1, nround = 2,objective = "binary:logistic")
xgb.dump(bst, 'xgb.model.dump')
}

25
R-package/man/xgb.load.Rd Normal file
View File

@@ -0,0 +1,25 @@
% Generated by roxygen2 (4.0.1): do not edit by hand
\name{xgb.load}
\alias{xgb.load}
\title{Load xgboost model from binary file}
\usage{
xgb.load(modelfile)
}
\arguments{
\item{modelfile}{the name of the binary file.}
}
\description{
Load xgboost model from the binary model file
}
\examples{
data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')
train <- agaricus.train
test <- agaricus.test
bst <- xgboost(data = train$data, label = train$label, max.depth = 2,
eta = 1, nround = 2,objective = "binary:logistic")
xgb.save(bst, 'xgb.model')
bst <- xgb.load('xgb.model')
pred <- predict(bst, test$data)
}

27
R-package/man/xgb.save.Rd Normal file
View File

@@ -0,0 +1,27 @@
% Generated by roxygen2 (4.0.1): do not edit by hand
\name{xgb.save}
\alias{xgb.save}
\title{Save xgboost model to binary file}
\usage{
xgb.save(model, fname)
}
\arguments{
\item{model}{the model object.}
\item{fname}{the name of the binary file.}
}
\description{
Save xgboost model from xgboost or xgb.train
}
\examples{
data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')
train <- agaricus.train
test <- agaricus.test
bst <- xgboost(data = train$data, label = train$label, max.depth = 2,
eta = 1, nround = 2,objective = "binary:logistic")
xgb.save(bst, 'xgb.model')
bst <- xgb.load('xgb.model')
pred <- predict(bst, test$data)
}

View File

@@ -0,0 +1,80 @@
% Generated by roxygen2 (4.0.1): do not edit by hand
\name{xgb.train}
\alias{xgb.train}
\title{eXtreme Gradient Boosting Training}
\usage{
xgb.train(params = list(), data, nrounds, watchlist = list(), obj = NULL,
feval = NULL, verbose = 1, ...)
}
\arguments{
\item{params}{the list of parameters. Commonly used ones are:
\itemize{
\item \code{objective} objective function, common ones are
\itemize{
\item \code{reg:linear} linear regression
\item \code{binary:logistic} logistic regression for classification
}
\item \code{eta} step size of each boosting step
\item \code{max.depth} maximum depth of the tree
\item \code{nthread} number of thread used in training, if not set, all threads are used
}
See \url{https://github.com/tqchen/xgboost/wiki/Parameters} for
further details. See also demo/ for walkthrough example in R.}
\item{data}{takes an \code{xgb.DMatrix} as the input.}
\item{nrounds}{the max number of iterations}
\item{watchlist}{what information should be printed when \code{verbose=1} or
\code{verbose=2}. Watchlist is used to specify validation set monitoring
during training. For example user can specify
watchlist=list(validation1=mat1, validation2=mat2) to watch
the performance of each round's model on mat1 and mat2}
\item{obj}{customized objective function. Returns gradient and second order
gradient with given prediction and dtrain,}
\item{feval}{custimized evaluation function. Returns
\code{list(metric='metric-name', value='metric-value')} with given
prediction and dtrain,}
\item{verbose}{If 0, xgboost will stay silent. If 1, xgboost will print
information of performance. If 2, xgboost will print information of both}
\item{...}{other parameters to pass to \code{params}.}
}
\description{
The training function of xgboost
}
\details{
This is the training function for xgboost.
Parallelization is automatically enabled if OpenMP is present.
Number of threads can also be manually specified via "nthread" parameter.
This function only accepts an \code{xgb.DMatrix} object as the input.
It supports advanced features such as watchlist, customized objective function,
therefore it is more flexible than \code{\link{xgboost}}.
}
\examples{
data(agaricus.train, package='xgboost')
dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
dtest <- dtrain
watchlist <- list(eval = dtest, train = dtrain)
param <- list(max.depth = 2, eta = 1, silent = 1)
logregobj <- function(preds, dtrain) {
labels <- getinfo(dtrain, "label")
preds <- 1/(1 + exp(-preds))
grad <- preds - labels
hess <- preds * (1 - preds)
return(list(grad = grad, hess = hess))
}
evalerror <- function(preds, dtrain) {
labels <- getinfo(dtrain, "label")
err <- as.numeric(sum(labels != (preds > 0)))/length(labels)
return(list(metric = "error", value = err))
}
bst <- xgb.train(param, dtrain, nround = 2, watchlist, logregobj, evalerror)
}

56
R-package/man/xgboost.Rd Normal file
View File

@@ -0,0 +1,56 @@
% Generated by roxygen2 (4.0.1): do not edit by hand
\name{xgboost}
\alias{xgboost}
\title{eXtreme Gradient Boosting (Tree) library}
\usage{
xgboost(data = NULL, label = NULL, params = list(), nrounds,
verbose = 1, ...)
}
\arguments{
\item{data}{takes \code{matrix}, \code{dgCMatrix}, local data file or
\code{xgb.DMatrix}.}
\item{label}{the response variable. User should not set this field,}
\item{params}{the list of parameters. Commonly used ones are:
\itemize{
\item \code{objective} objective function, common ones are
\itemize{
\item \code{reg:linear} linear regression
\item \code{binary:logistic} logistic regression for classification
}
\item \code{eta} step size of each boosting step
\item \code{max.depth} maximum depth of the tree
\item \code{nthread} number of thread used in training, if not set, all threads are used
}
See \url{https://github.com/tqchen/xgboost/wiki/Parameters} for
further details. See also demo/ for walkthrough example in R.}
\item{nrounds}{the max number of iterations}
\item{verbose}{If 0, xgboost will stay silent. If 1, xgboost will print
information of performance. If 2, xgboost will print information of both
performance and construction progress information}
\item{...}{other parameters to pass to \code{params}.}
}
\description{
A simple interface for xgboost in R
}
\details{
This is the modeling function for xgboost.
Parallelization is automatically enabled if OpenMP is present.
Number of threads can also be manually specified via "nthread" parameter
}
\examples{
data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')
train <- agaricus.train
test <- agaricus.test
bst <- xgboost(data = train$data, label = train$label, max.depth = 2,
eta = 1, nround = 2,objective = "binary:logistic")
pred <- predict(bst, test$data)
}

9
R-package/src/Makevars Normal file
View File

@@ -0,0 +1,9 @@
# package root
PKGROOT=../../
# _*_ mode: Makefile; _*_
PKG_CPPFLAGS= -DXGBOOST_CUSTOMIZE_MSG_ -DXGBOOST_CUSTOMIZE_PRNG_ -DXGBOOST_STRICT_CXX98_ -I$(PKGROOT)
PKG_CXXFLAGS= $(SHLIB_OPENMP_CFLAGS)
PKG_LIBS = $(SHLIB_OPENMP_CFLAGS)
OBJECTS= xgboost_R.o xgboost_assert.o $(PKGROOT)/wrapper/xgboost_wrapper.o $(PKGROOT)/src/io/io.o $(PKGROOT)/src/gbm/gbm.o $(PKGROOT)/src/tree/updater.o

View File

@@ -0,0 +1,7 @@
# package root
PKGROOT=../../
# _*_ mode: Makefile; _*_
PKG_CPPFLAGS= -DXGBOOST_CUSTOMIZE_MSG_ -DXGBOOST_CUSTOMIZE_PRNG_ -DXGBOOST_STRICT_CXX98_ -I$(PKGROOT)
PKG_CXXFLAGS= $(SHLIB_OPENMP_CFLAGS)
PKG_LIBS = $(SHLIB_OPENMP_CFLAGS)
OBJECTS= xgboost_R.o xgboost_assert.o $(PKGROOT)/wrapper/xgboost_wrapper.o $(PKGROOT)/src/io/io.o $(PKGROOT)/src/gbm/gbm.o $(PKGROOT)/src/tree/updater.o

289
R-package/src/xgboost_R.cpp Normal file
View File

@@ -0,0 +1,289 @@
#include <vector>
#include <string>
#include <utility>
#include <cstring>
#include <cstdio>
#include "xgboost_R.h"
#include "wrapper/xgboost_wrapper.h"
#include "src/utils/utils.h"
#include "src/utils/omp.h"
using namespace std;
using namespace xgboost;
extern "C" {
void XGBoostAssert_R(int exp, const char *fmt, ...);
void XGBoostCheck_R(int exp, const char *fmt, ...);
int XGBoostSPrintf_R(char *buf, size_t size, const char *fmt, ...);
}
// implements error handling
namespace xgboost {
namespace utils {
extern "C" {
void (*Printf)(const char *fmt, ...) = Rprintf;
int (*SPrintf)(char *buf, size_t size, const char *fmt, ...) = XGBoostSPrintf_R;
void (*Assert)(int exp, const char *fmt, ...) = XGBoostAssert_R;
void (*Check)(int exp, const char *fmt, ...) = XGBoostCheck_R;
void (*Error)(const char *fmt, ...) = error;
}
} // namespace utils
namespace random {
void Seed(unsigned seed) {
warning("parameter seed is ignored, please set random seed using set.seed");
}
double Uniform(void) {
return unif_rand();
}
double Normal(void) {
return norm_rand();
}
} // namespace random
} // namespace xgboost
// call before wrapper starts
inline void _WrapperBegin(void) {
GetRNGstate();
}
// call after wrapper starts
inline void _WrapperEnd(void) {
PutRNGstate();
}
extern "C" {
void _DMatrixFinalizer(SEXP ext) {
if (R_ExternalPtrAddr(ext) == NULL) return;
XGDMatrixFree(R_ExternalPtrAddr(ext));
R_ClearExternalPtr(ext);
}
SEXP XGDMatrixCreateFromFile_R(SEXP fname, SEXP silent) {
_WrapperBegin();
void *handle = XGDMatrixCreateFromFile(CHAR(asChar(fname)), asInteger(silent));
SEXP ret = PROTECT(R_MakeExternalPtr(handle, R_NilValue, R_NilValue));
R_RegisterCFinalizerEx(ret, _DMatrixFinalizer, TRUE);
UNPROTECT(1);
_WrapperEnd();
return ret;
}
SEXP XGDMatrixCreateFromMat_R(SEXP mat,
SEXP missing) {
_WrapperBegin();
SEXP dim = getAttrib(mat, R_DimSymbol);
int nrow = INTEGER(dim)[0];
int ncol = INTEGER(dim)[1];
double *din = REAL(mat);
std::vector<float> data(nrow * ncol);
#pragma omp parallel for schedule(static)
for (int i = 0; i < nrow; ++i) {
for (int j = 0; j < ncol; ++j) {
data[i * ncol +j] = din[i + nrow * j];
}
}
void *handle = XGDMatrixCreateFromMat(BeginPtr(data), nrow, ncol, asReal(missing));
SEXP ret = PROTECT(R_MakeExternalPtr(handle, R_NilValue, R_NilValue));
R_RegisterCFinalizerEx(ret, _DMatrixFinalizer, TRUE);
UNPROTECT(1);
_WrapperEnd();
return ret;
}
SEXP XGDMatrixCreateFromCSC_R(SEXP indptr,
SEXP indices,
SEXP data) {
_WrapperBegin();
const int *p_indptr = INTEGER(indptr);
const int *p_indices = INTEGER(indices);
const double *p_data = REAL(data);
int nindptr = length(indptr);
int ndata = length(data);
std::vector<bst_ulong> col_ptr_(nindptr);
std::vector<unsigned> indices_(ndata);
std::vector<float> data_(ndata);
for (int i = 0; i < nindptr; ++i) {
col_ptr_[i] = static_cast<bst_ulong>(p_indptr[i]);
}
#pragma omp parallel for schedule(static)
for (int i = 0; i < ndata; ++i) {
indices_[i] = static_cast<unsigned>(p_indices[i]);
data_[i] = static_cast<float>(p_data[i]);
}
void *handle = XGDMatrixCreateFromCSC(BeginPtr(col_ptr_), BeginPtr(indices_),
BeginPtr(data_), nindptr, ndata);
SEXP ret = PROTECT(R_MakeExternalPtr(handle, R_NilValue, R_NilValue));
R_RegisterCFinalizerEx(ret, _DMatrixFinalizer, TRUE);
UNPROTECT(1);
_WrapperEnd();
return ret;
}
SEXP XGDMatrixSliceDMatrix_R(SEXP handle, SEXP idxset) {
_WrapperBegin();
int len = length(idxset);
std::vector<int> idxvec(len);
for (int i = 0; i < len; ++i) {
idxvec[i] = INTEGER(idxset)[i] - 1;
}
void *res = XGDMatrixSliceDMatrix(R_ExternalPtrAddr(handle), BeginPtr(idxvec), len);
SEXP ret = PROTECT(R_MakeExternalPtr(res, R_NilValue, R_NilValue));
R_RegisterCFinalizerEx(ret, _DMatrixFinalizer, TRUE);
UNPROTECT(1);
_WrapperEnd();
return ret;
}
void XGDMatrixSaveBinary_R(SEXP handle, SEXP fname, SEXP silent) {
_WrapperBegin();
XGDMatrixSaveBinary(R_ExternalPtrAddr(handle),
CHAR(asChar(fname)), asInteger(silent));
_WrapperEnd();
}
void XGDMatrixSetInfo_R(SEXP handle, SEXP field, SEXP array) {
_WrapperBegin();
int len = length(array);
const char *name = CHAR(asChar(field));
if (!strcmp("group", name)) {
std::vector<unsigned> vec(len);
#pragma omp parallel for schedule(static)
for (int i = 0; i < len; ++i) {
vec[i] = static_cast<unsigned>(INTEGER(array)[i]);
}
XGDMatrixSetGroup(R_ExternalPtrAddr(handle), BeginPtr(vec), len);
_WrapperEnd();
return;
}
{
std::vector<float> vec(len);
#pragma omp parallel for schedule(static)
for (int i = 0; i < len; ++i) {
vec[i] = REAL(array)[i];
}
XGDMatrixSetFloatInfo(R_ExternalPtrAddr(handle),
CHAR(asChar(field)),
BeginPtr(vec), len);
}
_WrapperEnd();
}
SEXP XGDMatrixGetInfo_R(SEXP handle, SEXP field) {
_WrapperBegin();
bst_ulong olen;
const float *res = XGDMatrixGetFloatInfo(R_ExternalPtrAddr(handle),
CHAR(asChar(field)), &olen);
SEXP ret = PROTECT(allocVector(REALSXP, olen));
for (size_t i = 0; i < olen; ++i) {
REAL(ret)[i] = res[i];
}
UNPROTECT(1);
_WrapperEnd();
return ret;
}
SEXP XGDMatrixNumRow_R(SEXP handle) {
bst_ulong nrow = XGDMatrixNumRow(R_ExternalPtrAddr(handle));
return ScalarInteger(static_cast<int>(nrow));
}
// functions related to booster
void _BoosterFinalizer(SEXP ext) {
if (R_ExternalPtrAddr(ext) == NULL) return;
XGBoosterFree(R_ExternalPtrAddr(ext));
R_ClearExternalPtr(ext);
}
SEXP XGBoosterCreate_R(SEXP dmats) {
_WrapperBegin();
int len = length(dmats);
std::vector<void*> dvec;
for (int i = 0; i < len; ++i){
dvec.push_back(R_ExternalPtrAddr(VECTOR_ELT(dmats, i)));
}
void *handle = XGBoosterCreate(BeginPtr(dvec), dvec.size());
SEXP ret = PROTECT(R_MakeExternalPtr(handle, R_NilValue, R_NilValue));
R_RegisterCFinalizerEx(ret, _BoosterFinalizer, TRUE);
UNPROTECT(1);
_WrapperEnd();
return ret;
}
void XGBoosterSetParam_R(SEXP handle, SEXP name, SEXP val) {
_WrapperBegin();
XGBoosterSetParam(R_ExternalPtrAddr(handle),
CHAR(asChar(name)),
CHAR(asChar(val)));
_WrapperEnd();
}
void XGBoosterUpdateOneIter_R(SEXP handle, SEXP iter, SEXP dtrain) {
_WrapperBegin();
XGBoosterUpdateOneIter(R_ExternalPtrAddr(handle),
asInteger(iter),
R_ExternalPtrAddr(dtrain));
_WrapperEnd();
}
void XGBoosterBoostOneIter_R(SEXP handle, SEXP dtrain, SEXP grad, SEXP hess) {
_WrapperBegin();
utils::Check(length(grad) == length(hess), "gradient and hess must have same length");
int len = length(grad);
std::vector<float> tgrad(len), thess(len);
#pragma omp parallel for schedule(static)
for (int j = 0; j < len; ++j) {
tgrad[j] = REAL(grad)[j];
thess[j] = REAL(hess)[j];
}
XGBoosterBoostOneIter(R_ExternalPtrAddr(handle),
R_ExternalPtrAddr(dtrain),
BeginPtr(tgrad), BeginPtr(thess), len);
_WrapperEnd();
}
SEXP XGBoosterEvalOneIter_R(SEXP handle, SEXP iter, SEXP dmats, SEXP evnames) {
_WrapperBegin();
utils::Check(length(dmats) == length(evnames), "dmats and evnams must have same length");
int len = length(dmats);
std::vector<void*> vec_dmats;
std::vector<std::string> vec_names;
std::vector<const char*> vec_sptr;
for (int i = 0; i < len; ++i) {
vec_dmats.push_back(R_ExternalPtrAddr(VECTOR_ELT(dmats, i)));
vec_names.push_back(std::string(CHAR(asChar(VECTOR_ELT(evnames, i)))));
}
for (int i = 0; i < len; ++i) {
vec_sptr.push_back(vec_names[i].c_str());
}
return mkString(XGBoosterEvalOneIter(R_ExternalPtrAddr(handle),
asInteger(iter),
BeginPtr(vec_dmats), BeginPtr(vec_sptr), len));
_WrapperEnd();
}
SEXP XGBoosterPredict_R(SEXP handle, SEXP dmat, SEXP output_margin, SEXP ntree_limit) {
_WrapperBegin();
bst_ulong olen;
const float *res = XGBoosterPredict(R_ExternalPtrAddr(handle),
R_ExternalPtrAddr(dmat),
asInteger(output_margin),
asInteger(ntree_limit),
&olen);
SEXP ret = PROTECT(allocVector(REALSXP, olen));
for (size_t i = 0; i < olen; ++i) {
REAL(ret)[i] = res[i];
}
UNPROTECT(1);
_WrapperEnd();
return ret;
}
void XGBoosterLoadModel_R(SEXP handle, SEXP fname) {
_WrapperBegin();
XGBoosterLoadModel(R_ExternalPtrAddr(handle), CHAR(asChar(fname)));
_WrapperEnd();
}
void XGBoosterSaveModel_R(SEXP handle, SEXP fname) {
_WrapperBegin();
XGBoosterSaveModel(R_ExternalPtrAddr(handle), CHAR(asChar(fname)));
_WrapperEnd();
}
void XGBoosterDumpModel_R(SEXP handle, SEXP fname, SEXP fmap) {
_WrapperBegin();
bst_ulong olen;
const char **res = XGBoosterDumpModel(R_ExternalPtrAddr(handle),
CHAR(asChar(fmap)),
&olen);
FILE *fo = utils::FopenCheck(CHAR(asChar(fname)), "w");
for (size_t i = 0; i < olen; ++i) {
fprintf(fo, "booster[%u]:\n", static_cast<unsigned>(i));
fprintf(fo, "%s", res[i]);
}
fclose(fo);
_WrapperEnd();
}
}

138
R-package/src/xgboost_R.h Normal file
View File

@@ -0,0 +1,138 @@
#ifndef XGBOOST_WRAPPER_R_H_
#define XGBOOST_WRAPPER_R_H_
/*!
* \file xgboost_wrapper_R.h
* \author Tianqi Chen
* \brief R wrapper of xgboost
*/
extern "C" {
#include <Rinternals.h>
#include <R_ext/Random.h>
}
extern "C" {
/*!
* \brief load a data matrix
* \param fname name of the content
* \param silent whether print messages
* \return a loaded data matrix
*/
SEXP XGDMatrixCreateFromFile_R(SEXP fname, SEXP silent);
/*!
* \brief create matrix content from dense matrix
* This assumes the matrix is stored in column major format
* \param data R Matrix object
* \param missing which value to represent missing value
* \return created dmatrix
*/
SEXP XGDMatrixCreateFromMat_R(SEXP mat,
SEXP missing);
/*!
* \brief create a matrix content from CSC format
* \param indptr pointer to column headers
* \param indices row indices
* \param data content of the data
* \return created dmatrix
*/
SEXP XGDMatrixCreateFromCSC_R(SEXP indptr,
SEXP indices,
SEXP data);
/*!
* \brief create a new dmatrix from sliced content of existing matrix
* \param handle instance of data matrix to be sliced
* \param idxset index set
* \return a sliced new matrix
*/
SEXP XGDMatrixSliceDMatrix_R(SEXP handle, SEXP idxset);
/*!
* \brief load a data matrix into binary file
* \param handle a instance of data matrix
* \param fname file name
* \param silent print statistics when saving
*/
void XGDMatrixSaveBinary_R(SEXP handle, SEXP fname, SEXP silent);
/*!
* \brief set information to dmatrix
* \param handle a instance of data matrix
* \param field field name, can be label, weight
* \param array pointer to float vector
*/
void XGDMatrixSetInfo_R(SEXP handle, SEXP field, SEXP array);
/*!
* \brief get info vector from matrix
* \param handle a instance of data matrix
* \param field field name
* \return info vector
*/
SEXP XGDMatrixGetInfo_R(SEXP handle, SEXP field);
/*!
* \brief return number of rows
* \param handle a instance of data matrix
*/
SEXP XGDMatrixNumRow_R(SEXP handle);
/*!
* \brief create xgboost learner
* \param dmats a list of dmatrix handles that will be cached
*/
SEXP XGBoosterCreate_R(SEXP dmats);
/*!
* \brief set parameters
* \param handle handle
* \param name parameter name
* \param val value of parameter
*/
void XGBoosterSetParam_R(SEXP handle, SEXP name, SEXP val);
/*!
* \brief update the model in one round using dtrain
* \param handle handle
* \param iter current iteration rounds
* \param dtrain training data
*/
void XGBoosterUpdateOneIter_R(SEXP ext, SEXP iter, SEXP dtrain);
/*!
* \brief update the model, by directly specify gradient and second order gradient,
* this can be used to replace UpdateOneIter, to support customized loss function
* \param handle handle
* \param dtrain training data
* \param grad gradient statistics
* \param hess second order gradient statistics
*/
void XGBoosterBoostOneIter_R(SEXP handle, SEXP dtrain, SEXP grad, SEXP hess);
/*!
* \brief get evaluation statistics for xgboost
* \param handle handle
* \param iter current iteration rounds
* \param dmats list of handles to dmatrices
* \param evname name of evaluation
* \return the string containing evaluation stati
*/
SEXP XGBoosterEvalOneIter_R(SEXP handle, SEXP iter, SEXP dmats, SEXP evnames);
/*!
* \brief make prediction based on dmat
* \param handle handle
* \param dmat data matrix
* \param output_margin whether only output raw margin value
* \param ntree_limit limit number of trees used in prediction
*/
SEXP XGBoosterPredict_R(SEXP handle, SEXP dmat, SEXP output_margin, SEXP ntree_limit);
/*!
* \brief load model from existing file
* \param handle handle
* \param fname file name
*/
void XGBoosterLoadModel_R(SEXP handle, SEXP fname);
/*!
* \brief save model into existing file
* \param handle handle
* \param fname file name
*/
void XGBoosterSaveModel_R(SEXP handle, SEXP fname);
/*!
* \brief dump model into text file
* \param handle handle
* \param fname file name of model that can be dumped into
* \param fmap name to fmap can be empty string
*/
void XGBoosterDumpModel_R(SEXP handle, SEXP fname, SEXP fmap);
}
#endif // XGBOOST_WRAPPER_R_H_

View File

@@ -0,0 +1,33 @@
#include <stdio.h>
#include <stdarg.h>
#include <Rinternals.h>
// implements error handling
void XGBoostAssert_R(int exp, const char *fmt, ...) {
char buf[1024];
if (exp == 0) {
va_list args;
va_start(args, fmt);
vsprintf(buf, fmt, args);
va_end(args);
error("AssertError:%s\n", buf);
}
}
void XGBoostCheck_R(int exp, const char *fmt, ...) {
char buf[1024];
if (exp == 0) {
va_list args;
va_start(args, fmt);
vsprintf(buf, fmt, args);
va_end(args);
error("%s\n", buf);
}
}
int XGBoostSPrintf_R(char *buf, size_t size, const char *fmt, ...) {
int ret;
va_list args;
va_start(args, fmt);
ret = vsnprintf(buf, size, fmt, args);
va_end(args);
return ret;
}

View File

@@ -0,0 +1,216 @@
\documentclass{article}
\RequirePackage{url}
\usepackage{hyperref}
\RequirePackage{amsmath}
\RequirePackage{natbib}
\RequirePackage[a4paper,lmargin={1.25in},rmargin={1.25in},tmargin={1in},bmargin={1in}]{geometry}
\makeatletter
% \VignetteIndexEntry{xgboost: eXtreme Gradient Boosting}
%\VignetteKeywords{xgboost, gbm, gradient boosting machines}
%\VignettePackage{xgboost}
% \VignetteEngine{knitr::knitr}
\makeatother
\begin{document}
%\SweaveOpts{concordance=TRUE}
<<knitropts,echo=FALSE,message=FALSE>>=
if (require('knitr')) opts_chunk$set(fig.width = 5, fig.height = 5, fig.align = 'center', tidy = FALSE, warning = FALSE, cache = TRUE)
@
%
<<prelim,echo=FALSE>>=
xgboost.version = '0.3-0'
@
%
\begin{center}
\vspace*{6\baselineskip}
\rule{\textwidth}{1.6pt}\vspace*{-\baselineskip}\vspace*{2pt}
\rule{\textwidth}{0.4pt}\\[2\baselineskip]
{\LARGE \textbf{xgboost: eXtreme Gradient Boosting}}\\[1.2\baselineskip]
\rule{\textwidth}{0.4pt}\vspace*{-\baselineskip}\vspace{3.2pt}
\rule{\textwidth}{1.6pt}\\[2\baselineskip]
{\Large Tianqi Chen, Tong He}\\[\baselineskip]
{\large Package Version: \Sexpr{xgboost.version}}\\[\baselineskip]
{\large \today}\par
\vfill
\end{center}
\thispagestyle{empty}
\clearpage
\setcounter{page}{1}
\section{Introduction}
This is an introductory document of using the \verb@xgboost@ package in R.
\verb@xgboost@ is short for eXtreme Gradient Boosting package. It is an efficient
and scalable implementation of gradient boosting framework by \citep{friedman2001greedy}.
The package includes efficient linear model solver and tree learning algorithm.
It supports various objective functions, including regression, classification
and ranking. The package is made to be extendible, so that users are also allowed to define their own objectives easily. It has several features:
\begin{enumerate}
\item{Speed: }{\verb@xgboost@ can automatically do parallel computation on
Windows and Linux, with openmp. It is generally over 10 times faster than
\verb@gbm@.}
\item{Input Type: }{\verb@xgboost@ takes several types of input data:}
\begin{itemize}
\item{Dense Matrix: }{R's dense matrix, i.e. \verb@matrix@}
\item{Sparse Matrix: }{R's sparse matrix \verb@Matrix::dgCMatrix@}
\item{Data File: }{Local data files}
\item{xgb.DMatrix: }{\verb@xgboost@'s own class. Recommended.}
\end{itemize}
\item{Sparsity: }{\verb@xgboost@ accepts sparse input for both tree booster
and linear booster, and is optimized for sparse input.}
\item{Customization: }{\verb@xgboost@ supports customized objective function
and evaluation function}
\item{Performance: }{\verb@xgboost@ has better performance on several different
datasets.}
\end{enumerate}
\section{Example with Mushroom data}
In this section, we will illustrate some common usage of \verb@xgboost@. The
Mushroom data is cited from UCI Machine Learning Repository. \citep{Bache+Lichman:2013}
<<Training and prediction with iris>>=
library(xgboost)
data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')
train <- agaricus.train
test <- agaricus.test
bst <- xgboost(data = train$data, label = train$label, max.depth = 2, eta = 1,
nround = 2, objective = "binary:logistic")
xgb.save(bst, 'model.save')
bst = xgb.load('model.save')
pred <- predict(bst, test$data)
@
\verb@xgboost@ is the main function to train a \verb@Booster@, i.e. a model.
\verb@predict@ does prediction on the model.
Here we can save the model to a binary local file, and load it when needed.
We can't inspect the trees inside. However we have another function to save the
model in plain text.
<<Dump Model>>=
xgb.dump(bst, 'model.dump')
@
The output looks like
\begin{verbatim}
booster[0]:
0:[f28<1.00001] yes=1,no=2,missing=2
1:[f108<1.00001] yes=3,no=4,missing=4
3:leaf=1.85965
4:leaf=-1.94071
2:[f55<1.00001] yes=5,no=6,missing=6
5:leaf=-1.70044
6:leaf=1.71218
booster[1]:
0:[f59<1.00001] yes=1,no=2,missing=2
1:leaf=-6.23624
2:[f28<1.00001] yes=3,no=4,missing=4
3:leaf=-0.96853
4:leaf=0.784718
\end{verbatim}
It is important to know \verb@xgboost@'s own data type: \verb@xgb.DMatrix@.
It speeds up \verb@xgboost@, and is needed for advanced features such as
training from initial prediction value, weighted training instance.
We can use \verb@xgb.DMatrix@ to construct an \verb@xgb.DMatrix@ object:
<<xgb.DMatrix>>=
dtrain <- xgb.DMatrix(train$data, label = train$label)
class(dtrain)
head(getinfo(dtrain,'label'))
@
We can also save the matrix to a binary file. Then load it simply with
\verb@xgb.DMatrix@
<<save model>>=
xgb.DMatrix.save(dtrain, 'xgb.DMatrix')
dtrain = xgb.DMatrix('xgb.DMatrix')
@
\section{Advanced Examples}
The function \verb@xgboost@ is a simple function with less parameter, in order
to be R-friendly. The core training function is wrapped in \verb@xgb.train@. It is more flexible than \verb@xgboost@, but it requires users to read the document a bit more carefully.
\verb@xgb.train@ only accept a \verb@xgb.DMatrix@ object as its input, while it supports advanced features as custom objective and evaluation functions.
<<Customized loss function>>=
logregobj <- function(preds, dtrain) {
labels <- getinfo(dtrain, "label")
preds <- 1/(1 + exp(-preds))
grad <- preds - labels
hess <- preds * (1 - preds)
return(list(grad = grad, hess = hess))
}
evalerror <- function(preds, dtrain) {
labels <- getinfo(dtrain, "label")
err <- sqrt(mean((preds-labels)^2))
return(list(metric = "MSE", value = err))
}
dtest <- xgb.DMatrix(test$data, label = test$label)
watchlist <- list(eval = dtest, train = dtrain)
param <- list(max.depth = 2, eta = 1, silent = 1)
bst <- xgb.train(param, dtrain, nround = 2, watchlist, logregobj, evalerror)
@
The gradient and second order gradient is required for the output of customized
objective function.
We also have \verb@slice@ for row extraction. It is useful in
cross-validation.
For a walkthrough demo, please see \verb@R-package/demo/@ for further
details.
\section{The Higgs Boson competition}
We have made a demo for \href{http://www.kaggle.com/c/higgs-boson}{the Higgs
Boson Machine Learning Challenge}.
Here are the instructions to make a submission
\begin{enumerate}
\item Download the \href{http://www.kaggle.com/c/higgs-boson/data}{datasets}
and extract them to \verb@data/@.
\item Run scripts under \verb@xgboost/demo/kaggle-higgs/@:
\href{https://github.com/tqchen/xgboost/blob/master/demo/kaggle-higgs/higgs-train.R}{higgs-train.R}
and \href{https://github.com/tqchen/xgboost/blob/master/demo/kaggle-higgs/higgs-pred.R}{higgs-pred.R}.
The computation will take less than a minute on Intel i7.
\item Go to the \href{http://www.kaggle.com/c/higgs-boson/submissions/attach}{submission page}
and submit your result.
\end{enumerate}
We provide \href{https://github.com/tqchen/xgboost/blob/master/demo/kaggle-higgs/speedtest.R}{a script}
to compare the time cost on the higgs dataset with \verb@gbm@ and \verb@xgboost@.
The training set contains 350000 records and 30 features.
\verb@xgboost@ can automatically do parallel computation. On a machine with Intel
i7-4700MQ and 24GB memories, we found that \verb@xgboost@ costs about 35 seconds, which is about 20 times faster
than \verb@gbm@. When we limited \verb@xgboost@ to use only one thread, it was
still about two times faster than \verb@gbm@.
Meanwhile, the result from \verb@xgboost@ reaches
\href{http://www.kaggle.com/c/higgs-boson/details/evaluation}{3.60@AMS} with a
single model. This results stands in the
\href{http://www.kaggle.com/c/higgs-boson/leaderboard}{top 30\%} of the
competition.
\bibliographystyle{jss}
\nocite{*} % list uncited references
\bibliography{xgboost}
\end{document}

View File

@@ -0,0 +1,30 @@
@article{friedman2001greedy,
title={Greedy function approximation: a gradient boosting machine},
author={Friedman, Jerome H},
journal={Annals of Statistics},
pages={1189--1232},
year={2001},
publisher={JSTOR}
}
@article{friedman2000additive,
title={Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors)},
author={Friedman, Jerome and Hastie, Trevor and Tibshirani, Robert and others},
journal={The annals of statistics},
volume={28},
number={2},
pages={337--407},
year={2000},
publisher={Institute of Mathematical Statistics}
}
@misc{
Bache+Lichman:2013 ,
author = "K. Bache and M. Lichman",
year = "2013",
title = "{UCI} Machine Learning Repository",
url = "http://archive.ics.uci.edu/ml",
institution = "University of California, Irvine, School of Information and Computer Sciences"
}

View File

@@ -1,4 +1,52 @@
xgboost
=======
xgboost: eXtreme Gradient Boosting
======
An optimized general purpose gradient boosting library. The library is parallelized using OpenMP. It implements machine learning algorithm under gradient boosting framework, including generalized linear model and gradient boosted regression tree.
General Purpose Gradient Boosting Library
Contributors: https://github.com/tqchen/xgboost/graphs/contributors
Turorial and Documentation: https://github.com/tqchen/xgboost/wiki
Questions and Issues: [https://github.com/tqchen/xgboost/issues](https://github.com/tqchen/xgboost/issues?q=is%3Aissue+label%3Aquestion)
Examples Code: [Learning to use xgboost by examples](demo)
Notes on the Code: [Code Guide](src)
What's New
=====
* See the updated [demo folder](demo) for feature walkthrough
* Thanks to Tong He, the new [R package](R-package) is available
Features
======
* Sparse feature format:
- Sparse feature format allows easy handling of missing values, and improve computation efficiency.
* Push the limit on single machine:
- Efficient implementation that optimizes memory and computation.
* Speed: XGBoost is very fast
- IN [demo/higgs/speedtest.py](demo/kaggle-higgs/speedtest.py), kaggle higgs data it is faster(on our machine 20 times faster using 4 threads) than sklearn.ensemble.GradientBoostingClassifier
* Layout of gradient boosting algorithm to support user defined objective
* Python interface, works with numpy and scipy.sparse matrix
Build
=====
* Run ```bash build.sh``` (you can also type make)
* If your compiler does not come with OpenMP support, it will fire an warning telling you that the code will compile into single thread mode, and you will get single thread xgboost
* You may get a error: -lgomp is not found
- You can type ```make no_omp=1```, this will get you single thread xgboost
- Alternatively, you can upgrade your compiler to compile multi-thread version
* Windows(VS 2010): see [windows](windows) folder
- In principle, you put all the cpp files in the Makefile to the project, and build
Version
======
* This version xgboost-0.3, the code has been refactored from 0.2x to be cleaner and more flexibility
* This version of xgboost is not compatible with 0.2x, due to huge amount of changes in code structure
- This means the model and buffer file of previous version can not be loaded in xgboost-3.0
* For legacy 0.2x code, refer to [Here](https://github.com/tqchen/xgboost/releases/tag/v0.22)
* Change log in [CHANGES.md](CHANGES.md)
XGBoost in Graphlab Create
======
* XGBoost is adopted as part of boosted tree toolkit in Graphlab Create (GLC). Graphlab Create is a powerful python toolkit that allows you to data manipulation, graph processing, hyper-parameter search, and visualization of TeraBytes scale data in one framework. Try the Graphlab Create in http://graphlab.com/products/create/quick-start-guide.html
* Nice blogpost by Jay Gu using GLC boosted tree to solve kaggle bike sharing challenge: http://blog.graphlab.com/using-gradient-boosted-trees-to-predict-bike-sharing-demand

15
build.sh Executable file
View File

@@ -0,0 +1,15 @@
#!/bin/bash
# this is a simple script to make xgboost in MAC nad Linux
# basically, it first try to make with OpenMP, if fails, disable OpenMP and make again
# This will automatically make xgboost for MAC users who do not have openmp support
# In most cases, type make will give what you want
if make; then
echo "Successfully build multi-thread xgboost"
else
echo "-----------------------------"
echo "Building multi-thread xgboost failed"
echo "Start to build single-thread xgboost"
make clean
make no_omp=1
echo "Successfully build single-thread xgboost"
fi

27
demo/README.md Normal file
View File

@@ -0,0 +1,27 @@
XGBoost Examples
====
This folder contains the all example codes using xgboost.
* Contribution of exampls, benchmarks is more than welcomed!
* If you like to share how you use xgboost to solve your problem, send a pull request:)
Features Walkthrough
====
This is a list of short codes introducing different functionalities of xgboost and its wrapper.
* Basic walkthrough of wrappers [python](guide-python/basic_walkthrough.py)
* Cutomize loss function, and evaluation metric [python](guide-python/custom_objective.py)
* Boosting from existing prediction [python](guide-python/boost_from_prediction.py)
* Predicting using first n trees [python](guide-python/predict_first_ntree.py)
* Generalized Linear Model [python](guide-python/generalized_linear_model.py)
* Cross validation [python](guide-python/cross_validation.py)
Basic Examples by Tasks
====
* [Binary classification](binary_classification)
* [Multiclass classification](multiclass_classification)
* [Regression](regression)
* [Learning to Rank](rank)
Benchmarks
====
* [Starter script for Kaggle Higgs Boson](kaggle-higgs)

View File

@@ -0,0 +1,14 @@
Demonstrating how to use XGBoost accomplish binary classification tasks on UCI mushroom dataset http://archive.ics.uci.edu/ml/datasets/Mushroom
Run: ./runexp.sh
Format of input: LIBSVM format
Format of ```featmap.txt: <featureid> <featurename> <q or i or int>\n ```:
- Feature id must be from 0 to number of features, in sorted order.
- i means this feature is binary indicator feature
- q means this feature is a quantitative value, such as age, time, can be missing
- int means this feature is integer value (when int is hinted, the decision boundary will be integer)
Explainations: https://github.com/tqchen/xgboost/wiki/Binary-Classification

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,32 @@
1. cap-shape: bell=b,conical=c,convex=x,flat=f,knobbed=k,sunken=s
2. cap-surface: fibrous=f,grooves=g,scaly=y,smooth=s
3. cap-color: brown=n,buff=b,cinnamon=c,gray=g,green=r,pink=p,purple=u,red=e,white=w,yellow=y
4. bruises?: bruises=t,no=f
5. odor: almond=a,anise=l,creosote=c,fishy=y,foul=f,
musty=m,none=n,pungent=p,spicy=s
6. gill-attachment: attached=a,descending=d,free=f,notched=n
7. gill-spacing: close=c,crowded=w,distant=d
8. gill-size: broad=b,narrow=n
9. gill-color: black=k,brown=n,buff=b,chocolate=h,gray=g,
green=r,orange=o,pink=p,purple=u,red=e,
white=w,yellow=y
10. stalk-shape: enlarging=e,tapering=t
11. stalk-root: bulbous=b,club=c,cup=u,equal=e,
rhizomorphs=z,rooted=r,missing=?
12. stalk-surface-above-ring: fibrous=f,scaly=y,silky=k,smooth=s
13. stalk-surface-below-ring: fibrous=f,scaly=y,silky=k,smooth=s
14. stalk-color-above-ring: brown=n,buff=b,cinnamon=c,gray=g,orange=o,
pink=p,red=e,white=w,yellow=y
15. stalk-color-below-ring: brown=n,buff=b,cinnamon=c,gray=g,orange=o,
pink=p,red=e,white=w,yellow=y
16. veil-type: partial=p,universal=u
17. veil-color: brown=n,orange=o,white=w,yellow=y
18. ring-number: none=n,one=o,two=t
19. ring-type: cobwebby=c,evanescent=e,flaring=f,large=l,
none=n,pendant=p,sheathing=s,zone=z
20. spore-print-color: black=k,brown=n,buff=b,chocolate=h,green=r,
orange=o,purple=u,white=w,yellow=y
21. population: abundant=a,clustered=c,numerous=n,
scattered=s,several=v,solitary=y
22. habitat: grasses=g,leaves=l,meadows=m,paths=p,
urban=u,waste=w,woods=d

View File

@@ -0,0 +1,148 @@
1. Title: Mushroom Database
2. Sources:
(a) Mushroom records drawn from The Audubon Society Field Guide to North
American Mushrooms (1981). G. H. Lincoff (Pres.), New York: Alfred
A. Knopf
(b) Donor: Jeff Schlimmer (Jeffrey.Schlimmer@a.gp.cs.cmu.edu)
(c) Date: 27 April 1987
3. Past Usage:
1. Schlimmer,J.S. (1987). Concept Acquisition Through Representational
Adjustment (Technical Report 87-19). Doctoral disseration, Department
of Information and Computer Science, University of California, Irvine.
--- STAGGER: asymptoted to 95% classification accuracy after reviewing
1000 instances.
2. Iba,W., Wogulis,J., & Langley,P. (1988). Trading off Simplicity
and Coverage in Incremental Concept Learning. In Proceedings of
the 5th International Conference on Machine Learning, 73-79.
Ann Arbor, Michigan: Morgan Kaufmann.
-- approximately the same results with their HILLARY algorithm
3. In the following references a set of rules (given below) were
learned for this data set which may serve as a point of
comparison for other researchers.
Duch W, Adamczak R, Grabczewski K (1996) Extraction of logical rules
from training data using backpropagation networks, in: Proc. of the
The 1st Online Workshop on Soft Computing, 19-30.Aug.1996, pp. 25-30,
available on-line at: http://www.bioele.nuee.nagoya-u.ac.jp/wsc1/
Duch W, Adamczak R, Grabczewski K, Ishikawa M, Ueda H, Extraction of
crisp logical rules using constrained backpropagation networks -
comparison of two new approaches, in: Proc. of the European Symposium
on Artificial Neural Networks (ESANN'97), Bruge, Belgium 16-18.4.1997,
pp. xx-xx
Wlodzislaw Duch, Department of Computer Methods, Nicholas Copernicus
University, 87-100 Torun, Grudziadzka 5, Poland
e-mail: duch@phys.uni.torun.pl
WWW http://www.phys.uni.torun.pl/kmk/
Date: Mon, 17 Feb 1997 13:47:40 +0100
From: Wlodzislaw Duch <duch@phys.uni.torun.pl>
Organization: Dept. of Computer Methods, UMK
I have attached a file containing logical rules for mushrooms.
It should be helpful for other people since only in the last year I
have seen about 10 papers analyzing this dataset and obtaining quite
complex rules. We will try to contribute other results later.
With best regards, Wlodek Duch
________________________________________________________________
Logical rules for the mushroom data sets.
Logical rules given below seem to be the simplest possible for the
mushroom dataset and therefore should be treated as benchmark results.
Disjunctive rules for poisonous mushrooms, from most general
to most specific:
P_1) odor=NOT(almond.OR.anise.OR.none)
120 poisonous cases missed, 98.52% accuracy
P_2) spore-print-color=green
48 cases missed, 99.41% accuracy
P_3) odor=none.AND.stalk-surface-below-ring=scaly.AND.
(stalk-color-above-ring=NOT.brown)
8 cases missed, 99.90% accuracy
P_4) habitat=leaves.AND.cap-color=white
100% accuracy
Rule P_4) may also be
P_4') population=clustered.AND.cap_color=white
These rule involve 6 attributes (out of 22). Rules for edible
mushrooms are obtained as negation of the rules given above, for
example the rule:
odor=(almond.OR.anise.OR.none).AND.spore-print-color=NOT.green
gives 48 errors, or 99.41% accuracy on the whole dataset.
Several slightly more complex variations on these rules exist,
involving other attributes, such as gill_size, gill_spacing,
stalk_surface_above_ring, but the rules given above are the simplest
we have found.
4. Relevant Information:
This data set includes descriptions of hypothetical samples
corresponding to 23 species of gilled mushrooms in the Agaricus and
Lepiota Family (pp. 500-525). Each species is identified as
definitely edible, definitely poisonous, or of unknown edibility and
not recommended. This latter class was combined with the poisonous
one. The Guide clearly states that there is no simple rule for
determining the edibility of a mushroom; no rule like ``leaflets
three, let it be'' for Poisonous Oak and Ivy.
5. Number of Instances: 8124
6. Number of Attributes: 22 (all nominally valued)
7. Attribute Information: (classes: edible=e, poisonous=p)
1. cap-shape: bell=b,conical=c,convex=x,flat=f,
knobbed=k,sunken=s
2. cap-surface: fibrous=f,grooves=g,scaly=y,smooth=s
3. cap-color: brown=n,buff=b,cinnamon=c,gray=g,green=r,
pink=p,purple=u,red=e,white=w,yellow=y
4. bruises?: bruises=t,no=f
5. odor: almond=a,anise=l,creosote=c,fishy=y,foul=f,
musty=m,none=n,pungent=p,spicy=s
6. gill-attachment: attached=a,descending=d,free=f,notched=n
7. gill-spacing: close=c,crowded=w,distant=d
8. gill-size: broad=b,narrow=n
9. gill-color: black=k,brown=n,buff=b,chocolate=h,gray=g,
green=r,orange=o,pink=p,purple=u,red=e,
white=w,yellow=y
10. stalk-shape: enlarging=e,tapering=t
11. stalk-root: bulbous=b,club=c,cup=u,equal=e,
rhizomorphs=z,rooted=r,missing=?
12. stalk-surface-above-ring: fibrous=f,scaly=y,silky=k,smooth=s
13. stalk-surface-below-ring: fibrous=f,scaly=y,silky=k,smooth=s
14. stalk-color-above-ring: brown=n,buff=b,cinnamon=c,gray=g,orange=o,
pink=p,red=e,white=w,yellow=y
15. stalk-color-below-ring: brown=n,buff=b,cinnamon=c,gray=g,orange=o,
pink=p,red=e,white=w,yellow=y
16. veil-type: partial=p,universal=u
17. veil-color: brown=n,orange=o,white=w,yellow=y
18. ring-number: none=n,one=o,two=t
19. ring-type: cobwebby=c,evanescent=e,flaring=f,large=l,
none=n,pendant=p,sheathing=s,zone=z
20. spore-print-color: black=k,brown=n,buff=b,chocolate=h,green=r,
orange=o,purple=u,white=w,yellow=y
21. population: abundant=a,clustered=c,numerous=n,
scattered=s,several=v,solitary=y
22. habitat: grasses=g,leaves=l,meadows=m,paths=p,
urban=u,waste=w,woods=d
8. Missing Attribute Values: 2480 of them (denoted by "?"), all for
attribute #11.
9. Class Distribution:
-- edible: 4208 (51.8%)
-- poisonous: 3916 (48.2%)
-- total: 8124 instances

View File

@@ -0,0 +1,50 @@
#!/usr/bin/python
import sys
def loadfmap( fname ):
fmap = {}
nmap = {}
for l in open( fname ):
arr = l.split()
if arr[0].find('.') != -1:
idx = int( arr[0].strip('.') )
assert idx not in fmap
fmap[ idx ] = {}
ftype = arr[1].strip(':')
content = arr[2]
else:
content = arr[0]
for it in content.split(','):
if it.strip() == '':
continue
k , v = it.split('=')
fmap[ idx ][ v ] = len(nmap)
nmap[ len(nmap) ] = ftype+'='+k
return fmap, nmap
def write_nmap( fo, nmap ):
for i in range( len(nmap) ):
fo.write('%d\t%s\ti\n' % (i, nmap[i]) )
# start here
fmap, nmap = loadfmap( 'agaricus-lepiota.fmap' )
fo = open( 'featmap.txt', 'w' )
write_nmap( fo, nmap )
fo.close()
fo = open( 'agaricus.txt', 'w' )
for l in open( 'agaricus-lepiota.data' ):
arr = l.split(',')
if arr[0] == 'p':
fo.write('1')
else:
assert arr[0] == 'e'
fo.write('0')
for i in range( 1,len(arr) ):
fo.write( ' %d:1' % fmap[i][arr[i].strip()] )
fo.write('\n')
fo.close()

View File

@@ -0,0 +1,29 @@
#!/usr/bin/python
import sys
import random
if len(sys.argv) < 2:
print ('Usage:<filename> <k> [nfold = 5]')
exit(0)
random.seed( 10 )
k = int( sys.argv[2] )
if len(sys.argv) > 3:
nfold = int( sys.argv[3] )
else:
nfold = 5
fi = open( sys.argv[1], 'r' )
ftr = open( sys.argv[1]+'.train', 'w' )
fte = open( sys.argv[1]+'.test', 'w' )
for l in fi:
if random.randint( 1 , nfold ) == k:
fte.write( l )
else:
ftr.write( l )
fi.close()
ftr.close()
fte.close()

View File

@@ -0,0 +1,29 @@
# General Parameters, see comment for each definition
# choose the booster, can be gbtree or gblinear
booster = gbtree
# choose logistic regression loss function for binary classification
objective = binary:logistic
# Tree Booster Parameters
# step size shrinkage
eta = 1.0
# minimum loss reduction required to make a further partition
gamma = 1.0
# minimum sum of instance weight(hessian) needed in a child
min_child_weight = 1
# maximum depth of a tree
max_depth = 3
# Task Parameters
# the number of round to do boosting
num_round = 2
# 0 means do not save any model except the final round model
save_period = 0
# The path of training data
data = "agaricus.txt.train"
# The path of validation data, used to monitor training process, here [test] sets name of the validation set
eval[test] = "agaricus.txt.test"
# evaluate on training data as well each round
eval_train = 1
# The path of test data
test:data = "agaricus.txt.test"

View File

@@ -0,0 +1,15 @@
#!/bin/bash
# map feature using indicator encoding, also produce featmap.txt
python mapfeat.py
# split train and test
python mknfold.py agaricus.txt 1
# training and output the models
../../xgboost mushroom.conf
# output prediction task=pred
../../xgboost mushroom.conf task=pred model_in=0002.model
# print the boosters of 00002.model in dump.raw.txt
../../xgboost mushroom.conf task=dump model_in=0002.model name_dump=dump.raw.txt
# use the feature map in printing for better visualization
../../xgboost mushroom.conf task=dump model_in=0002.model fmap=featmap.txt name_dump=dump.nice.txt
cat dump.nice.txt

2
demo/data/README.md Normal file
View File

@@ -0,0 +1,2 @@
This folder contains processed example dataset used by the demos.
Copyright of the dataset belongs to the original copyright holder

1611
demo/data/agaricus.txt.test Normal file

File diff suppressed because it is too large Load Diff

6513
demo/data/agaricus.txt.train Normal file

File diff suppressed because it is too large Load Diff

126
demo/data/featmap.txt Normal file
View File

@@ -0,0 +1,126 @@
0 cap-shape=bell i
1 cap-shape=conical i
2 cap-shape=convex i
3 cap-shape=flat i
4 cap-shape=knobbed i
5 cap-shape=sunken i
6 cap-surface=fibrous i
7 cap-surface=grooves i
8 cap-surface=scaly i
9 cap-surface=smooth i
10 cap-color=brown i
11 cap-color=buff i
12 cap-color=cinnamon i
13 cap-color=gray i
14 cap-color=green i
15 cap-color=pink i
16 cap-color=purple i
17 cap-color=red i
18 cap-color=white i
19 cap-color=yellow i
20 bruises?=bruises i
21 bruises?=no i
22 odor=almond i
23 odor=anise i
24 odor=creosote i
25 odor=fishy i
26 odor=foul i
27 odor=musty i
28 odor=none i
29 odor=pungent i
30 odor=spicy i
31 gill-attachment=attached i
32 gill-attachment=descending i
33 gill-attachment=free i
34 gill-attachment=notched i
35 gill-spacing=close i
36 gill-spacing=crowded i
37 gill-spacing=distant i
38 gill-size=broad i
39 gill-size=narrow i
40 gill-color=black i
41 gill-color=brown i
42 gill-color=buff i
43 gill-color=chocolate i
44 gill-color=gray i
45 gill-color=green i
46 gill-color=orange i
47 gill-color=pink i
48 gill-color=purple i
49 gill-color=red i
50 gill-color=white i
51 gill-color=yellow i
52 stalk-shape=enlarging i
53 stalk-shape=tapering i
54 stalk-root=bulbous i
55 stalk-root=club i
56 stalk-root=cup i
57 stalk-root=equal i
58 stalk-root=rhizomorphs i
59 stalk-root=rooted i
60 stalk-root=missing i
61 stalk-surface-above-ring=fibrous i
62 stalk-surface-above-ring=scaly i
63 stalk-surface-above-ring=silky i
64 stalk-surface-above-ring=smooth i
65 stalk-surface-below-ring=fibrous i
66 stalk-surface-below-ring=scaly i
67 stalk-surface-below-ring=silky i
68 stalk-surface-below-ring=smooth i
69 stalk-color-above-ring=brown i
70 stalk-color-above-ring=buff i
71 stalk-color-above-ring=cinnamon i
72 stalk-color-above-ring=gray i
73 stalk-color-above-ring=orange i
74 stalk-color-above-ring=pink i
75 stalk-color-above-ring=red i
76 stalk-color-above-ring=white i
77 stalk-color-above-ring=yellow i
78 stalk-color-below-ring=brown i
79 stalk-color-below-ring=buff i
80 stalk-color-below-ring=cinnamon i
81 stalk-color-below-ring=gray i
82 stalk-color-below-ring=orange i
83 stalk-color-below-ring=pink i
84 stalk-color-below-ring=red i
85 stalk-color-below-ring=white i
86 stalk-color-below-ring=yellow i
87 veil-type=partial i
88 veil-type=universal i
89 veil-color=brown i
90 veil-color=orange i
91 veil-color=white i
92 veil-color=yellow i
93 ring-number=none i
94 ring-number=one i
95 ring-number=two i
96 ring-type=cobwebby i
97 ring-type=evanescent i
98 ring-type=flaring i
99 ring-type=large i
100 ring-type=none i
101 ring-type=pendant i
102 ring-type=sheathing i
103 ring-type=zone i
104 spore-print-color=black i
105 spore-print-color=brown i
106 spore-print-color=buff i
107 spore-print-color=chocolate i
108 spore-print-color=green i
109 spore-print-color=orange i
110 spore-print-color=purple i
111 spore-print-color=white i
112 spore-print-color=yellow i
113 population=abundant i
114 population=clustered i
115 population=numerous i
116 population=scattered i
117 population=several i
118 population=solitary i
119 habitat=grasses i
120 habitat=leaves i
121 habitat=meadows i
122 habitat=paths i
123 habitat=urban i
124 habitat=waste i
125 habitat=woods i

View File

@@ -0,0 +1,8 @@
XGBoost Python Feature Walkthrough
====
* [Basic walkthrough of wrappers](basic_walkthrough.py)
* [Cutomize loss function, and evaluation metric](custom_objective.py)
* [Boosting from existing prediction](boost_from_prediction.py)
* [Predicting using first n trees](predict_first_ntree.py)
* [Generalized Linear Model](generalized_linear_model.py)
* [Cross validation](cross_validation.py)

View File

@@ -0,0 +1,76 @@
#!/usr/bin/python
import sys
import numpy as np
import scipy.sparse
# append the path to xgboost, you may need to change the following line
# alternatively, you can add the path to PYTHONPATH environment variable
sys.path.append('../../wrapper')
import xgboost as xgb
### simple example
# load file from text file, also binary buffer generated by xgboost
dtrain = xgb.DMatrix('../data/agaricus.txt.train')
dtest = xgb.DMatrix('../data/agaricus.txt.test')
# specify parameters via map, definition are same as c++ version
param = {'max_depth':2, 'eta':1, 'silent':1, 'objective':'binary:logistic' }
# specify validations set to watch performance
watchlist = [(dtest,'eval'), (dtrain,'train')]
num_round = 2
bst = xgb.train(param, dtrain, num_round, watchlist)
# this is prediction
preds = bst.predict(dtest)
labels = dtest.get_label()
print ('error=%f' % ( sum(1 for i in range(len(preds)) if int(preds[i]>0.5)!=labels[i]) /float(len(preds))))
bst.save_model('0001.model')
# dump model
bst.dump_model('dump.raw.txt')
# dump model with feature map
bst.dump_model('dump.nice.txt','../data/featmap.txt')
# save dmatrix into binary buffer
dtest.save_binary('dtest.buffer')
bst.save_model('xgb.model')
# load model and data in
bst2 = xgb.Booster(model_file='xgb.model')
dtest2 = xgb.DMatrix('dtest.buffer')
preds2 = bst2.predict(dtest2)
# assert they are the same
assert np.sum(np.abs(preds2-preds)) == 0
###
# build dmatrix from scipy.sparse
print ('start running example of build DMatrix from scipy.sparse CSR Matrix')
labels = []
row = []; col = []; dat = []
i = 0
for l in open('../data/agaricus.txt.train'):
arr = l.split()
labels.append( int(arr[0]))
for it in arr[1:]:
k,v = it.split(':')
row.append(i); col.append(int(k)); dat.append(float(v))
i += 1
csr = scipy.sparse.csr_matrix( (dat, (row,col)) )
dtrain = xgb.DMatrix( csr, label = labels )
watchlist = [(dtest,'eval'), (dtrain,'train')]
bst = xgb.train( param, dtrain, num_round, watchlist )
print ('start running example of build DMatrix from scipy.sparse CSC Matrix')
# we can also construct from csc matrix
csc = scipy.sparse.csc_matrix( (dat, (row,col)) )
dtrain = xgb.DMatrix(csc, label=labels)
watchlist = [(dtest,'eval'), (dtrain,'train')]
bst = xgb.train( param, dtrain, num_round, watchlist )
print ('start running example of build DMatrix from numpy array')
# NOTE: npymat is numpy array, we will convert it into scipy.sparse.csr_matrix in internal implementation
# then convert to DMatrix
npymat = csr.todense()
dtrain = xgb.DMatrix(npymat, label = labels)
watchlist = [(dtest,'eval'), (dtrain,'train')]
bst = xgb.train( param, dtrain, num_round, watchlist )

View File

@@ -0,0 +1,26 @@
#!/usr/bin/python
import sys
import numpy as np
sys.path.append('../../wrapper')
import xgboost as xgb
dtrain = xgb.DMatrix('../data/agaricus.txt.train')
dtest = xgb.DMatrix('../data/agaricus.txt.test')
watchlist = [(dtest,'eval'), (dtrain,'train')]
###
# advanced: start from a initial base prediction
#
print ('start running example to start from a initial prediction')
# specify parameters via map, definition are same as c++ version
param = {'max_depth':2, 'eta':1, 'silent':1, 'objective':'binary:logistic' }
# train xgboost for 1 round
bst = xgb.train( param, dtrain, 1, watchlist )
# Note: we need the margin value instead of transformed prediction in set_base_margin
# do predict with output_margin=True, will always give you margin values before logistic transformation
ptrain = bst.predict(dtrain, output_margin=True)
ptest = bst.predict(dtest, output_margin=True)
dtrain.set_base_margin(ptrain)
dtest.set_base_margin(ptest)
print ('this is result of running from initial prediction')
bst = xgb.train( param, dtrain, 1, watchlist )

View File

@@ -0,0 +1,63 @@
#!/usr/bin/python
import sys
import numpy as np
sys.path.append('../../wrapper')
import xgboost as xgb
### load data in do training
dtrain = xgb.DMatrix('../data/agaricus.txt.train')
param = {'max_depth':2, 'eta':1, 'silent':1, 'objective':'binary:logistic'}
num_round = 2
print ('running cross validation')
# do cross validation, this will print result out as
# [iteration] metric_name:mean_value+std_value
# std_value is standard deviation of the metric
xgb.cv(param, dtrain, num_round, nfold=5,
metrics={'error'}, seed = 0)
print ('running cross validation, disable standard deviation display')
# do cross validation, this will print result out as
# [iteration] metric_name:mean_value+std_value
# std_value is standard deviation of the metric
xgb.cv(param, dtrain, num_round, nfold=5,
metrics={'error'}, seed = 0, show_stdv = False)
print ('running cross validation, with preprocessing function')
# define the preprocessing function
# used to return the preprocessed training, test data, and parameter
# we can use this to do weight rescale, etc.
# as a example, we try to set scale_pos_weight
def fpreproc(dtrain, dtest, param):
label = dtrain.get_label()
ratio = float(np.sum(label == 0)) / np.sum(label==1)
param['scale_pos_weight'] = ratio
return (dtrain, dtest, param)
# do cross validation, for each fold
# the dtrain, dtest, param will be passed into fpreproc
# then the return value of fpreproc will be used to generate
# results of that fold
xgb.cv(param, dtrain, num_round, nfold=5,
metrics={'auc'}, seed = 0, fpreproc = fpreproc)
###
# you can also do cross validation with cutomized loss function
# See custom_objective.py
##
print ('running cross validation, with cutomsized loss function')
def logregobj(preds, dtrain):
labels = dtrain.get_label()
preds = 1.0 / (1.0 + np.exp(-preds))
grad = preds - labels
hess = preds * (1.0-preds)
return grad, hess
def evalerror(preds, dtrain):
labels = dtrain.get_label()
return 'error', float(sum(labels != (preds > 0.0))) / len(labels)
param = {'max_depth':2, 'eta':1, 'silent':1}
# train with customized objective
xgb.cv(param, dtrain, num_round, nfold = 5, seed = 0,
obj = logregobj, feval=evalerror)

View File

@@ -0,0 +1,44 @@
#!/usr/bin/python
import sys
import numpy as np
sys.path.append('../../wrapper')
import xgboost as xgb
###
# advanced: cutomsized loss function
#
print ('start running example to used cutomized objective function')
dtrain = xgb.DMatrix('../data/agaricus.txt.train')
dtest = xgb.DMatrix('../data/agaricus.txt.test')
# note: for customized objective function, we leave objective as default
# note: what we are getting is margin value in prediction
# you must know what you are doing
param = {'max_depth':2, 'eta':1, 'silent':1 }
watchlist = [(dtest,'eval'), (dtrain,'train')]
num_round = 2
# user define objective function, given prediction, return gradient and second order gradient
# this is loglikelihood loss
def logregobj(preds, dtrain):
labels = dtrain.get_label()
preds = 1.0 / (1.0 + np.exp(-preds))
grad = preds - labels
hess = preds * (1.0-preds)
return grad, hess
# user defined evaluation function, return a pair metric_name, result
# NOTE: when you do customized loss function, the default prediction value is margin
# this may make buildin evalution metric not function properly
# for example, we are doing logistic loss, the prediction is score before logistic transformation
# the buildin evaluation error assumes input is after logistic transformation
# Take this in mind when you use the customization, and maybe you need write customized evaluation function
def evalerror(preds, dtrain):
labels = dtrain.get_label()
# return a pair metric_name, result
# since preds are margin(before logistic transformation, cutoff at 0)
return 'error', float(sum(labels != (preds > 0.0))) / len(labels)
# training with customized objective, we can also do step by step training
# simply look at xgboost.py's implementation of train
bst = xgb.train(param, dtrain, num_round, watchlist, logregobj, evalerror)

View File

@@ -0,0 +1,32 @@
#!/usr/bin/python
import sys
sys.path.append('../../wrapper')
import xgboost as xgb
##
# this script demonstrate how to fit generalized linear model in xgboost
# basically, we are using linear model, instead of tree for our boosters
##
dtrain = xgb.DMatrix('../data/agaricus.txt.train')
dtest = xgb.DMatrix('../data/agaricus.txt.test')
# change booster to gblinear, so that we are fitting a linear model
# alpha is the L1 regularizer
# lambda is the L2 regularizer
# you can also set lambda_bias which is L2 regularizer on the bias term
param = {'silent':1, 'objective':'binary:logistic', 'booster':'gblinear',
'alpha': 0.0001, 'lambda': 1 }
# normally, you do not need to set eta (step_size)
# XGBoost uses a parallel coordinate descent algorithm (shotgun),
# there could be affection on convergence with parallelization on certain cases
# setting eta to be smaller value, e.g 0.5 can make the optimization more stable
# param['eta'] = 1
##
# the rest of settings are the same
##
watchlist = [(dtest,'eval'), (dtrain,'train')]
num_round = 4
bst = xgb.train(param, dtrain, num_round, watchlist)
preds = bst.predict(dtest)
labels = dtest.get_label()
print ('error=%f' % ( sum(1 for i in range(len(preds)) if int(preds[i]>0.5)!=labels[i]) /float(len(preds))))

View File

@@ -0,0 +1,22 @@
#!/usr/bin/python
import sys
import numpy as np
sys.path.append('../../wrapper')
import xgboost as xgb
### load data in do training
dtrain = xgb.DMatrix('../data/agaricus.txt.train')
dtest = xgb.DMatrix('../data/agaricus.txt.test')
param = {'max_depth':2, 'eta':1, 'silent':1, 'objective':'binary:logistic' }
watchlist = [(dtest,'eval'), (dtrain,'train')]
num_round = 3
bst = xgb.train(param, dtrain, num_round, watchlist)
print ('start testing prediction from first n trees')
### predict using first 1 tree
label = dtest.get_label()
ypred1 = bst.predict(dtest, ntree_limit=1)
# by default, we predict using all the trees
ypred2 = bst.predict(dtest)
print ('error of ypred1=%f' % (np.sum((ypred1>0.5)!=label) /float(len(label))))
print ('error of ypred2=%f' % (np.sum((ypred2>0.5)!=label) /float(len(label))))

7
demo/guide-python/runall.sh Executable file
View File

@@ -0,0 +1,7 @@
#!/bin/bash
python basic_walkthrough.py
python custom_objective.py
python boost_from_prediction.py
python generalized_linear_model.py
python cross_validation.py
rm -rf *~ *.model *.buffer

View File

@@ -0,0 +1,26 @@
Guide for Kaggle Higgs Challenge
=====
This is the folder giving example of how to use XGBoost Python Module to run Kaggle Higgs competition
This script will achieve about 3.600 AMS score in public leadboard. To get start, you need do following step:
1. Compile the XGBoost python lib
```bash
cd ../..
make
```
2. Put training.csv test.csv on folder './data' (you can create a symbolic link)
3. Run ./run.sh
Speed
=====
speedtest.py compares xgboost's speed on this dataset with sklearn.GBM
Using R module
=====
* Alternatively, you can run using R, higgs-train.R and higgs-pred.R.

39
demo/kaggle-higgs/higgs-cv.py Executable file
View File

@@ -0,0 +1,39 @@
#!/usr/bin/python
import sys
import numpy as np
sys.path.append('../../wrapper')
import xgboost as xgb
### load data in do training
train = np.loadtxt('./data/training.csv', delimiter=',', skiprows=1, converters={32: lambda x:int(x=='s'.encode('utf-8')) } )
label = train[:,32]
data = train[:,1:31]
weight = train[:,31]
dtrain = xgb.DMatrix( data, label=label, missing = -999.0, weight=weight )
param = {'max_depth':6, 'eta':0.1, 'silent':1, 'objective':'binary:logitraw', 'nthread':4}
num_round = 120
print ('running cross validation, with preprocessing function')
# define the preprocessing function
# used to return the preprocessed training, test data, and parameter
# we can use this to do weight rescale, etc.
# as a example, we try to set scale_pos_weight
def fpreproc(dtrain, dtest, param):
label = dtrain.get_label()
ratio = float(np.sum(label == 0)) / np.sum(label==1)
param['scale_pos_weight'] = ratio
wtrain = dtrain.get_weight()
wtest = dtest.get_weight()
sum_weight = sum(wtrain) + sum(wtest)
wtrain *= sum_weight / sum(wtrain)
wtest *= sum_weight / sum(wtest)
dtrain.set_weight(wtrain)
dtest.set_weight(wtest)
return (dtrain, dtest, param)
# do cross validation, for each fold
# the dtrain, dtest, param will be passed into fpreproc
# then the return value of fpreproc will be used to generate
# results of that fold
xgb.cv(param, dtrain, num_round, nfold=5,
metrics={'ams@0.15', 'auc'}, seed = 0, fpreproc = fpreproc)

View File

@@ -0,0 +1,62 @@
#!/usr/bin/python
# this is the example script to use xgboost to train
import inspect
import os
import sys
import numpy as np
# add path of xgboost python module
code_path = os.path.join(
os.path.split(inspect.getfile(inspect.currentframe()))[0], "../../wrapper")
sys.path.append(code_path)
import xgboost as xgb
test_size = 550000
# path to where the data lies
dpath = 'data'
# load in training data, directly use numpy
dtrain = np.loadtxt( dpath+'/training.csv', delimiter=',', skiprows=1, converters={32: lambda x:int(x=='s'.encode('utf-8')) } )
print ('finish loading from csv ')
label = dtrain[:,32]
data = dtrain[:,1:31]
# rescale weight to make it same as test set
weight = dtrain[:,31] * float(test_size) / len(label)
sum_wpos = sum( weight[i] for i in range(len(label)) if label[i] == 1.0 )
sum_wneg = sum( weight[i] for i in range(len(label)) if label[i] == 0.0 )
# print weight statistics
print ('weight statistics: wpos=%g, wneg=%g, ratio=%g' % ( sum_wpos, sum_wneg, sum_wneg/sum_wpos ))
# construct xgboost.DMatrix from numpy array, treat -999.0 as missing value
xgmat = xgb.DMatrix( data, label=label, missing = -999.0, weight=weight )
# setup parameters for xgboost
param = {}
# use logistic regression loss, use raw prediction before logistic transformation
# since we only need the rank
param['objective'] = 'binary:logitraw'
# scale weight of positive examples
param['scale_pos_weight'] = sum_wneg/sum_wpos
param['eta'] = 0.1
param['max_depth'] = 6
param['eval_metric'] = 'auc'
param['silent'] = 1
param['nthread'] = 16
# you can directly throw param in, though we want to watch multiple metrics here
plst = list(param.items())+[('eval_metric', 'ams@0.15')]
watchlist = [ (xgmat,'train') ]
# boost 120 tres
num_round = 120
print ('loading data end, start to boost trees')
bst = xgb.train( plst, xgmat, num_round, watchlist );
# save out model
bst.save_model('higgs.model')
print ('finish training')

View File

@@ -0,0 +1,24 @@
# install xgboost package, see R-package in root folder
require(xgboost)
require(methods)
modelfile <- "higgs.model"
outfile <- "higgs.pred.csv"
dtest <- read.csv("data/test.csv", header=TRUE)
data <- as.matrix(dtest[2:31])
idx <- dtest[[1]]
xgmat <- xgb.DMatrix(data, missing = -999.0)
bst <- xgb.load(modelfile=modelfile)
ypred <- predict(bst, xgmat)
rorder <- rank(ypred, ties.method="first")
threshold <- 0.15
# to be completed
ntop <- length(rorder) - as.integer(threshold*length(rorder))
plabel <- ifelse(rorder > ntop, "s", "b")
outdata <- list("EventId" = idx,
"RankOrder" = rorder,
"Class" = plabel)
write.csv(outdata, file = outfile, quote=FALSE, row.names=FALSE)

53
demo/kaggle-higgs/higgs-pred.py Executable file
View File

@@ -0,0 +1,53 @@
#!/usr/bin/python
# make prediction
import sys
import numpy as np
# add path of xgboost python module
sys.path.append('../../wrapper/')
import xgboost as xgb
# path to where the data lies
dpath = 'data'
modelfile = 'higgs.model'
outfile = 'higgs.pred.csv'
# make top 15% as positive
threshold_ratio = 0.15
# load in training data, directly use numpy
dtest = np.loadtxt( dpath+'/test.csv', delimiter=',', skiprows=1 )
data = dtest[:,1:31]
idx = dtest[:,0]
print ('finish loading from csv ')
xgmat = xgb.DMatrix( data, missing = -999.0 )
bst = xgb.Booster({'nthread':16}, model_file = modelfile)
ypred = bst.predict( xgmat )
res = [ ( int(idx[i]), ypred[i] ) for i in range(len(ypred)) ]
rorder = {}
for k, v in sorted( res, key = lambda x:-x[1] ):
rorder[ k ] = len(rorder) + 1
# write out predictions
ntop = int( threshold_ratio * len(rorder ) )
fo = open(outfile, 'w')
nhit = 0
ntot = 0
fo.write('EventId,RankOrder,Class\n')
for k, v in res:
if rorder[k] <= ntop:
lb = 's'
nhit += 1
else:
lb = 'b'
# change output rank order to follow Kaggle convention
fo.write('%s,%d,%s\n' % ( k, len(rorder)+1-rorder[k], lb ) )
ntot += 1
fo.close()
print ('finished writing into prediction file')

View File

@@ -0,0 +1,33 @@
# install xgboost package, see R-package in root folder
require(xgboost)
require(methods)
testsize <- 550000
dtrain <- read.csv("data/training.csv", header=TRUE)
dtrain[33] <- dtrain[33] == "s"
label <- as.numeric(dtrain[[33]])
data <- as.matrix(dtrain[2:31])
weight <- as.numeric(dtrain[[32]]) * testsize / length(label)
sumwpos <- sum(weight * (label==1.0))
sumwneg <- sum(weight * (label==0.0))
print(paste("weight statistics: wpos=", sumwpos, "wneg=", sumwneg, "ratio=", sumwneg / sumwpos))
xgmat <- xgb.DMatrix(data, label = label, weight = weight, missing = -999.0)
param <- list("objective" = "binary:logitraw",
"scale_pos_weight" = sumwneg / sumwpos,
"bst:eta" = 0.1,
"bst:max_depth" = 6,
"eval_metric" = "auc",
"eval_metric" = "ams@0.15",
"silent" = 1,
"nthread" = 16)
watchlist <- list("train" = xgmat)
nround = 120
print ("loading data end, start to boost trees")
bst = xgb.train(param, xgmat, nround, watchlist );
# save out model
xgb.save(bst, "higgs.model")
print ('finish training')

14
demo/kaggle-higgs/run.sh Executable file
View File

@@ -0,0 +1,14 @@
#!/bin/bash
python -u higgs-numpy.py
ret=$?
if [[ $ret != 0 ]]; then
echo "ERROR in higgs-numpy.py"
exit $ret
fi
python -u higgs-pred.py
ret=$?
if [[ $ret != 0 ]]; then
echo "ERROR in higgs-pred.py"
exit $ret
fi

View File

@@ -0,0 +1,71 @@
# install xgboost package, see R-package in root folder
require(xgboost)
require(gbm)
require(methods)
testsize <- 550000
dtrain <- read.csv("data/training.csv", header=TRUE, nrows=350001)
# gbm.time = system.time({
# gbm.model <- gbm(Label ~ ., data = dtrain[, -c(1,32)], n.trees = 120,
# interaction.depth = 6, shrinkage = 0.1, bag.fraction = 1,
# verbose = TRUE)
# })
# print(gbm.time)
# Test result: 761.48 secs
dtrain[33] <- dtrain[33] == "s"
label <- as.numeric(dtrain[[33]])
data <- as.matrix(dtrain[2:31])
weight <- as.numeric(dtrain[[32]]) * testsize / length(label)
sumwpos <- sum(weight * (label==1.0))
sumwneg <- sum(weight * (label==0.0))
print(paste("weight statistics: wpos=", sumwpos, "wneg=", sumwneg, "ratio=", sumwneg / sumwpos))
xgboost.time = list()
threads = c(1,2,4,8,16)
for (i in 1:length(threads)){
thread = threads[i]
xgboost.time[[i]] = system.time({
xgmat <- xgb.DMatrix(data, label = label, weight = weight, missing = -999.0)
param <- list("objective" = "binary:logitraw",
"scale_pos_weight" = sumwneg / sumwpos,
"bst:eta" = 0.1,
"bst:max_depth" = 6,
"eval_metric" = "auc",
"eval_metric" = "ams@0.15",
"silent" = 1,
"nthread" = thread)
watchlist <- list("train" = xgmat)
nround = 120
print ("loading data end, start to boost trees")
bst = xgb.train(param, xgmat, nround, watchlist );
# save out model
xgb.save(bst, "higgs.model")
print ('finish training')
})
}
xgboost.time
# [[1]]
# user system elapsed
# 444.98 1.96 450.22
#
# [[2]]
# user system elapsed
# 188.15 0.82 102.41
#
# [[3]]
# user system elapsed
# 143.29 0.79 44.18
#
# [[4]]
# user system elapsed
# 176.60 1.45 34.04
#
# [[5]]
# user system elapsed
# 180.15 2.85 35.26

66
demo/kaggle-higgs/speedtest.py Executable file
View File

@@ -0,0 +1,66 @@
#!/usr/bin/python
# this is the example script to use xgboost to train
import sys
import numpy as np
# add path of xgboost python module
sys.path.append('../../wrapper/')
import xgboost as xgb
from sklearn.ensemble import GradientBoostingClassifier
import time
test_size = 550000
# path to where the data lies
dpath = 'data'
# load in training data, directly use numpy
dtrain = np.loadtxt( dpath+'/training.csv', delimiter=',', skiprows=1, converters={32: lambda x:int(x=='s') } )
print ('finish loading from csv ')
label = dtrain[:,32]
data = dtrain[:,1:31]
# rescale weight to make it same as test set
weight = dtrain[:,31] * float(test_size) / len(label)
sum_wpos = sum( weight[i] for i in range(len(label)) if label[i] == 1.0 )
sum_wneg = sum( weight[i] for i in range(len(label)) if label[i] == 0.0 )
# print weight statistics
print ('weight statistics: wpos=%g, wneg=%g, ratio=%g' % ( sum_wpos, sum_wneg, sum_wneg/sum_wpos ))
# construct xgboost.DMatrix from numpy array, treat -999.0 as missing value
xgmat = xgb.DMatrix( data, label=label, missing = -999.0, weight=weight )
# setup parameters for xgboost
param = {}
# use logistic regression loss
param['objective'] = 'binary:logitraw'
# scale weight of positive examples
param['scale_pos_weight'] = sum_wneg/sum_wpos
param['bst:eta'] = 0.1
param['bst:max_depth'] = 6
param['eval_metric'] = 'auc'
param['silent'] = 1
param['nthread'] = 4
plst = param.items()+[('eval_metric', 'ams@0.15')]
watchlist = [ (xgmat,'train') ]
# boost 10 tres
num_round = 10
print ('loading data end, start to boost trees')
print ("training GBM from sklearn")
tmp = time.time()
gbm = GradientBoostingClassifier(n_estimators=num_round, max_depth=6, verbose=2)
gbm.fit(data, label)
print ("sklearn.GBM costs: %s seconds" % str(time.time() - tmp))
#raw_input()
print ("training xgboost")
threads = [1, 2, 4, 16]
for i in threads:
param['nthread'] = i
tmp = time.time()
plst = param.items()+[('eval_metric', 'ams@0.15')]
bst = xgb.train( plst, xgmat, num_round, watchlist );
print ("XGBoost with %d thread costs: %s seconds" % (i, str(time.time() - tmp)))
print ('finish training')

View File

@@ -0,0 +1,10 @@
Demonstrating how to use XGBoost accomplish Multi-Class classification task on [UCI Dermatology dataset](https://archive.ics.uci.edu/ml/datasets/Dermatology)
Make sure you make make xgboost python module in ../../python
1. Run runexp.sh
```bash
./runexp.sh
```
Explainations can be found in [wiki](https://github.com/tqchen/xgboost/wiki)

View File

@@ -0,0 +1,9 @@
#!/bin/bash
if [ -f dermatology.data ]
then
echo "use existing data to run multi class classification"
else
echo "getting data from uci, make sure you are connected to internet"
wget https://archive.ics.uci.edu/ml/machine-learning-databases/dermatology/dermatology.data
fi
python train.py

View File

@@ -0,0 +1,50 @@
#! /usr/bin/python
import sys
import numpy as np
sys.path.append('../../wrapper/')
import xgboost as xgb
# label need to be 0 to num_class -1
data = np.loadtxt('./dermatology.data', delimiter=',',converters={33: lambda x:int(x == '?'), 34: lambda x:int(x)-1 } )
sz = data.shape
train = data[:int(sz[0] * 0.7), :]
test = data[int(sz[0] * 0.7):, :]
train_X = train[:,0:33]
train_Y = train[:, 34]
test_X = test[:,0:33]
test_Y = test[:, 34]
xg_train = xgb.DMatrix( train_X, label=train_Y)
xg_test = xgb.DMatrix(test_X, label=test_Y)
# setup parameters for xgboost
param = {}
# use softmax multi-class classification
param['objective'] = 'multi:softmax'
# scale weight of positive examples
param['eta'] = 0.1
param['max_depth'] = 6
param['silent'] = 1
param['nthread'] = 4
param['num_class'] = 6
watchlist = [ (xg_train,'train'), (xg_test, 'test') ]
num_round = 5
bst = xgb.train(param, xg_train, num_round, watchlist );
# get prediction
pred = bst.predict( xg_test );
print ('predicting, classification error=%f' % (sum( int(pred[i]) != test_Y[i] for i in range(len(test_Y))) / float(len(test_Y)) ))
# do the same thing again, but output probabilities
param['objective'] = 'multi:softprob'
bst = xgb.train(param, xg_train, num_round, watchlist );
# Note: this convention has been changed since xgboost-unity
# get prediction, this is in 1D array, need reshape to (ndata, nclass)
yprob = bst.predict( xg_test ).reshape( test_Y.shape[0], 6 )
ylabel = np.argmax(yprob, axis=1)
print ('predicting, classification error=%f' % (sum( int(ylabel[i]) != test_Y[i] for i in range(len(test_Y))) / float(len(test_Y)) ))

13
demo/rank/README Normal file
View File

@@ -0,0 +1,13 @@
Instructions:
The dataset for ranking demo is from LETOR04 MQ2008 fold1,
You can use the following command to run the example
Get the data: ./wgetdata.sh
Run the example: ./runexp.sh

28
demo/rank/mq2008.conf Normal file
View File

@@ -0,0 +1,28 @@
# General Parameters, see comment for each definition
# specify objective
objective="rank:pairwise"
# Tree Booster Parameters
# step size shrinkage
eta = 0.1
# minimum loss reduction required to make a further partition
gamma = 1.0
# minimum sum of instance weight(hessian) needed in a child
min_child_weight = 0.1
# maximum depth of a tree
max_depth = 6
# Task parameters
# the number of round to do boosting
num_round = 4
# 0 means do not save any model except the final round model
save_period = 0
# The path of training data
data = "mq2008.train"
# The path of validation data, used to monitor training process, here [test] sets name of the validation set
eval[test] = "mq2008.vali"
# The path of test data
test:data = "mq2008.test"

11
demo/rank/runexp.sh Executable file
View File

@@ -0,0 +1,11 @@
python trans_data.py train.txt mq2008.train mq2008.train.group
python trans_data.py test.txt mq2008.test mq2008.test.group
python trans_data.py vali.txt mq2008.vali mq2008.vali.group
../../xgboost mq2008.conf
../../xgboost mq2008.conf task=pred model_in=0004.model

41
demo/rank/trans_data.py Normal file
View File

@@ -0,0 +1,41 @@
import sys
def save_data(group_data,output_feature,output_group):
if len(group_data) == 0:
return
output_group.write(str(len(group_data))+"\n")
for data in group_data:
# only include nonzero features
feats = [ p for p in data[2:] if float(p.split(':')[1]) != 0.0 ]
output_feature.write(data[0] + " " + " ".join(feats) + "\n")
if __name__ == "__main__":
if len(sys.argv) != 4:
print ("Usage: python trans_data.py [Ranksvm Format Input] [Output Feature File] [Output Group File]")
sys.exit(0)
fi = open(sys.argv[1])
output_feature = open(sys.argv[2],"w")
output_group = open(sys.argv[3],"w")
group_data = []
group = ""
for line in fi:
if not line:
break
if "#" in line:
line = line[:line.index("#")]
splits = line.strip().split(" ")
if splits[1] != group:
save_data(group_data,output_feature,output_group)
group_data = []
group = splits[1]
group_data.append(splits)
save_data(group_data,output_feature,output_group)
fi.close()
output_feature.close()
output_group.close()

4
demo/rank/wgetdata.sh Executable file
View File

@@ -0,0 +1,4 @@
#!/bin/bash
wget http://research.microsoft.com/en-us/um/beijing/projects/letor/LETOR4.0/Data/MQ2008.rar
unrar x MQ2008.rar
mv -f MQ2008/Fold1/*.txt .

13
demo/regression/README Normal file
View File

@@ -0,0 +1,13 @@
Demonstrating how to use XGBoost accomplish regression tasks on computer hardware dataset https://archive.ics.uci.edu/ml/datasets/Computer+Hardware
Run: ./runexp.sh
Format of input: LIBSVM format
Format of ```featmap.txt: <featureid> <featurename> <q or i or int>\n ```:
- Feature id must be from 0 to number of features, in sorted order.
- i means this feature is binary indicator feature
- q means this feature is a quantitative value, such as age, time, can be missing
- int means this feature is integer value (when int is hinted, the decision boundary will be integer)
Explainations: https://github.com/tqchen/xgboost/wiki/Regression

View File

@@ -0,0 +1,30 @@
# General Parameters, see comment for each definition
# choose the tree booster, can also change to gblinear
booster = gbtree
# this is the only difference with classification, use reg:linear to do linear classification
# when labels are in [0,1] we can also use reg:logistic
objective = reg:linear
# Tree Booster Parameters
# step size shrinkage
eta = 1.0
# minimum loss reduction required to make a further partition
gamma = 1.0
# minimum sum of instance weight(hessian) needed in a child
min_child_weight = 1
# maximum depth of a tree
max_depth = 3
# Task parameters
# the number of round to do boosting
num_round = 2
# 0 means do not save any model except the final round model
save_period = 0
# The path of training data
data = "machine.txt.train"
# The path of validation data, used to monitor training process, here [test] sets name of the validation set
eval[test] = "machine.txt.test"
# The path of test data
test:data = "machine.txt.test"

View File

@@ -0,0 +1,209 @@
adviser,32/60,125,256,6000,256,16,128,198,199
amdahl,470v/7,29,8000,32000,32,8,32,269,253
amdahl,470v/7a,29,8000,32000,32,8,32,220,253
amdahl,470v/7b,29,8000,32000,32,8,32,172,253
amdahl,470v/7c,29,8000,16000,32,8,16,132,132
amdahl,470v/b,26,8000,32000,64,8,32,318,290
amdahl,580-5840,23,16000,32000,64,16,32,367,381
amdahl,580-5850,23,16000,32000,64,16,32,489,381
amdahl,580-5860,23,16000,64000,64,16,32,636,749
amdahl,580-5880,23,32000,64000,128,32,64,1144,1238
apollo,dn320,400,1000,3000,0,1,2,38,23
apollo,dn420,400,512,3500,4,1,6,40,24
basf,7/65,60,2000,8000,65,1,8,92,70
basf,7/68,50,4000,16000,65,1,8,138,117
bti,5000,350,64,64,0,1,4,10,15
bti,8000,200,512,16000,0,4,32,35,64
burroughs,b1955,167,524,2000,8,4,15,19,23
burroughs,b2900,143,512,5000,0,7,32,28,29
burroughs,b2925,143,1000,2000,0,5,16,31,22
burroughs,b4955,110,5000,5000,142,8,64,120,124
burroughs,b5900,143,1500,6300,0,5,32,30,35
burroughs,b5920,143,3100,6200,0,5,20,33,39
burroughs,b6900,143,2300,6200,0,6,64,61,40
burroughs,b6925,110,3100,6200,0,6,64,76,45
c.r.d,68/10-80,320,128,6000,0,1,12,23,28
c.r.d,universe:2203t,320,512,2000,4,1,3,69,21
c.r.d,universe:68,320,256,6000,0,1,6,33,28
c.r.d,universe:68/05,320,256,3000,4,1,3,27,22
c.r.d,universe:68/137,320,512,5000,4,1,5,77,28
c.r.d,universe:68/37,320,256,5000,4,1,6,27,27
cdc,cyber:170/750,25,1310,2620,131,12,24,274,102
cdc,cyber:170/760,25,1310,2620,131,12,24,368,102
cdc,cyber:170/815,50,2620,10480,30,12,24,32,74
cdc,cyber:170/825,50,2620,10480,30,12,24,63,74
cdc,cyber:170/835,56,5240,20970,30,12,24,106,138
cdc,cyber:170/845,64,5240,20970,30,12,24,208,136
cdc,omega:480-i,50,500,2000,8,1,4,20,23
cdc,omega:480-ii,50,1000,4000,8,1,5,29,29
cdc,omega:480-iii,50,2000,8000,8,1,5,71,44
cambex,1636-1,50,1000,4000,8,3,5,26,30
cambex,1636-10,50,1000,8000,8,3,5,36,41
cambex,1641-1,50,2000,16000,8,3,5,40,74
cambex,1641-11,50,2000,16000,8,3,6,52,74
cambex,1651-1,50,2000,16000,8,3,6,60,74
dec,decsys:10:1091,133,1000,12000,9,3,12,72,54
dec,decsys:20:2060,133,1000,8000,9,3,12,72,41
dec,microvax-1,810,512,512,8,1,1,18,18
dec,vax:11/730,810,1000,5000,0,1,1,20,28
dec,vax:11/750,320,512,8000,4,1,5,40,36
dec,vax:11/780,200,512,8000,8,1,8,62,38
dg,eclipse:c/350,700,384,8000,0,1,1,24,34
dg,eclipse:m/600,700,256,2000,0,1,1,24,19
dg,eclipse:mv/10000,140,1000,16000,16,1,3,138,72
dg,eclipse:mv/4000,200,1000,8000,0,1,2,36,36
dg,eclipse:mv/6000,110,1000,4000,16,1,2,26,30
dg,eclipse:mv/8000,110,1000,12000,16,1,2,60,56
dg,eclipse:mv/8000-ii,220,1000,8000,16,1,2,71,42
formation,f4000/100,800,256,8000,0,1,4,12,34
formation,f4000/200,800,256,8000,0,1,4,14,34
formation,f4000/200ap,800,256,8000,0,1,4,20,34
formation,f4000/300,800,256,8000,0,1,4,16,34
formation,f4000/300ap,800,256,8000,0,1,4,22,34
four-phase,2000/260,125,512,1000,0,8,20,36,19
gould,concept:32/8705,75,2000,8000,64,1,38,144,75
gould,concept:32/8750,75,2000,16000,64,1,38,144,113
gould,concept:32/8780,75,2000,16000,128,1,38,259,157
hp,3000/30,90,256,1000,0,3,10,17,18
hp,3000/40,105,256,2000,0,3,10,26,20
hp,3000/44,105,1000,4000,0,3,24,32,28
hp,3000/48,105,2000,4000,8,3,19,32,33
hp,3000/64,75,2000,8000,8,3,24,62,47
hp,3000/88,75,3000,8000,8,3,48,64,54
hp,3000/iii,175,256,2000,0,3,24,22,20
harris,100,300,768,3000,0,6,24,36,23
harris,300,300,768,3000,6,6,24,44,25
harris,500,300,768,12000,6,6,24,50,52
harris,600,300,768,4500,0,1,24,45,27
harris,700,300,384,12000,6,1,24,53,50
harris,80,300,192,768,6,6,24,36,18
harris,800,180,768,12000,6,1,31,84,53
honeywell,dps:6/35,330,1000,3000,0,2,4,16,23
honeywell,dps:6/92,300,1000,4000,8,3,64,38,30
honeywell,dps:6/96,300,1000,16000,8,2,112,38,73
honeywell,dps:7/35,330,1000,2000,0,1,2,16,20
honeywell,dps:7/45,330,1000,4000,0,3,6,22,25
honeywell,dps:7/55,140,2000,4000,0,3,6,29,28
honeywell,dps:7/65,140,2000,4000,0,4,8,40,29
honeywell,dps:8/44,140,2000,4000,8,1,20,35,32
honeywell,dps:8/49,140,2000,32000,32,1,20,134,175
honeywell,dps:8/50,140,2000,8000,32,1,54,66,57
honeywell,dps:8/52,140,2000,32000,32,1,54,141,181
honeywell,dps:8/62,140,2000,32000,32,1,54,189,181
honeywell,dps:8/20,140,2000,4000,8,1,20,22,32
ibm,3033:s,57,4000,16000,1,6,12,132,82
ibm,3033:u,57,4000,24000,64,12,16,237,171
ibm,3081,26,16000,32000,64,16,24,465,361
ibm,3081:d,26,16000,32000,64,8,24,465,350
ibm,3083:b,26,8000,32000,0,8,24,277,220
ibm,3083:e,26,8000,16000,0,8,16,185,113
ibm,370/125-2,480,96,512,0,1,1,6,15
ibm,370/148,203,1000,2000,0,1,5,24,21
ibm,370/158-3,115,512,6000,16,1,6,45,35
ibm,38/3,1100,512,1500,0,1,1,7,18
ibm,38/4,1100,768,2000,0,1,1,13,20
ibm,38/5,600,768,2000,0,1,1,16,20
ibm,38/7,400,2000,4000,0,1,1,32,28
ibm,38/8,400,4000,8000,0,1,1,32,45
ibm,4321,900,1000,1000,0,1,2,11,18
ibm,4331-1,900,512,1000,0,1,2,11,17
ibm,4331-11,900,1000,4000,4,1,2,18,26
ibm,4331-2,900,1000,4000,8,1,2,22,28
ibm,4341,900,2000,4000,0,3,6,37,28
ibm,4341-1,225,2000,4000,8,3,6,40,31
ibm,4341-10,225,2000,4000,8,3,6,34,31
ibm,4341-11,180,2000,8000,8,1,6,50,42
ibm,4341-12,185,2000,16000,16,1,6,76,76
ibm,4341-2,180,2000,16000,16,1,6,66,76
ibm,4341-9,225,1000,4000,2,3,6,24,26
ibm,4361-4,25,2000,12000,8,1,4,49,59
ibm,4361-5,25,2000,12000,16,3,5,66,65
ibm,4381-1,17,4000,16000,8,6,12,100,101
ibm,4381-2,17,4000,16000,32,6,12,133,116
ibm,8130-a,1500,768,1000,0,0,0,12,18
ibm,8130-b,1500,768,2000,0,0,0,18,20
ibm,8140,800,768,2000,0,0,0,20,20
ipl,4436,50,2000,4000,0,3,6,27,30
ipl,4443,50,2000,8000,8,3,6,45,44
ipl,4445,50,2000,8000,8,1,6,56,44
ipl,4446,50,2000,16000,24,1,6,70,82
ipl,4460,50,2000,16000,24,1,6,80,82
ipl,4480,50,8000,16000,48,1,10,136,128
magnuson,m80/30,100,1000,8000,0,2,6,16,37
magnuson,m80/31,100,1000,8000,24,2,6,26,46
magnuson,m80/32,100,1000,8000,24,3,6,32,46
magnuson,m80/42,50,2000,16000,12,3,16,45,80
magnuson,m80/43,50,2000,16000,24,6,16,54,88
magnuson,m80/44,50,2000,16000,24,6,16,65,88
microdata,seq.ms/3200,150,512,4000,0,8,128,30,33
nas,as/3000,115,2000,8000,16,1,3,50,46
nas,as/3000-n,115,2000,4000,2,1,5,40,29
nas,as/5000,92,2000,8000,32,1,6,62,53
nas,as/5000-e,92,2000,8000,32,1,6,60,53
nas,as/5000-n,92,2000,8000,4,1,6,50,41
nas,as/6130,75,4000,16000,16,1,6,66,86
nas,as/6150,60,4000,16000,32,1,6,86,95
nas,as/6620,60,2000,16000,64,5,8,74,107
nas,as/6630,60,4000,16000,64,5,8,93,117
nas,as/6650,50,4000,16000,64,5,10,111,119
nas,as/7000,72,4000,16000,64,8,16,143,120
nas,as/7000-n,72,2000,8000,16,6,8,105,48
nas,as/8040,40,8000,16000,32,8,16,214,126
nas,as/8050,40,8000,32000,64,8,24,277,266
nas,as/8060,35,8000,32000,64,8,24,370,270
nas,as/9000-dpc,38,16000,32000,128,16,32,510,426
nas,as/9000-n,48,4000,24000,32,8,24,214,151
nas,as/9040,38,8000,32000,64,8,24,326,267
nas,as/9060,30,16000,32000,256,16,24,510,603
ncr,v8535:ii,112,1000,1000,0,1,4,8,19
ncr,v8545:ii,84,1000,2000,0,1,6,12,21
ncr,v8555:ii,56,1000,4000,0,1,6,17,26
ncr,v8565:ii,56,2000,6000,0,1,8,21,35
ncr,v8565:ii-e,56,2000,8000,0,1,8,24,41
ncr,v8575:ii,56,4000,8000,0,1,8,34,47
ncr,v8585:ii,56,4000,12000,0,1,8,42,62
ncr,v8595:ii,56,4000,16000,0,1,8,46,78
ncr,v8635,38,4000,8000,32,16,32,51,80
ncr,v8650,38,4000,8000,32,16,32,116,80
ncr,v8655,38,8000,16000,64,4,8,100,142
ncr,v8665,38,8000,24000,160,4,8,140,281
ncr,v8670,38,4000,16000,128,16,32,212,190
nixdorf,8890/30,200,1000,2000,0,1,2,25,21
nixdorf,8890/50,200,1000,4000,0,1,4,30,25
nixdorf,8890/70,200,2000,8000,64,1,5,41,67
perkin-elmer,3205,250,512,4000,0,1,7,25,24
perkin-elmer,3210,250,512,4000,0,4,7,50,24
perkin-elmer,3230,250,1000,16000,1,1,8,50,64
prime,50-2250,160,512,4000,2,1,5,30,25
prime,50-250-ii,160,512,2000,2,3,8,32,20
prime,50-550-ii,160,1000,4000,8,1,14,38,29
prime,50-750-ii,160,1000,8000,16,1,14,60,43
prime,50-850-ii,160,2000,8000,32,1,13,109,53
siemens,7.521,240,512,1000,8,1,3,6,19
siemens,7.531,240,512,2000,8,1,5,11,22
siemens,7.536,105,2000,4000,8,3,8,22,31
siemens,7.541,105,2000,6000,16,6,16,33,41
siemens,7.551,105,2000,8000,16,4,14,58,47
siemens,7.561,52,4000,16000,32,4,12,130,99
siemens,7.865-2,70,4000,12000,8,6,8,75,67
siemens,7.870-2,59,4000,12000,32,6,12,113,81
siemens,7.872-2,59,8000,16000,64,12,24,188,149
siemens,7.875-2,26,8000,24000,32,8,16,173,183
siemens,7.880-2,26,8000,32000,64,12,16,248,275
siemens,7.881-2,26,8000,32000,128,24,32,405,382
sperry,1100/61-h1,116,2000,8000,32,5,28,70,56
sperry,1100/81,50,2000,32000,24,6,26,114,182
sperry,1100/82,50,2000,32000,48,26,52,208,227
sperry,1100/83,50,2000,32000,112,52,104,307,341
sperry,1100/84,50,4000,32000,112,52,104,397,360
sperry,1100/93,30,8000,64000,96,12,176,915,919
sperry,1100/94,30,8000,64000,128,12,176,1150,978
sperry,80/3,180,262,4000,0,1,3,12,24
sperry,80/4,180,512,4000,0,1,3,14,24
sperry,80/5,180,262,4000,0,1,3,18,24
sperry,80/6,180,512,4000,0,1,3,21,24
sperry,80/8,124,1000,8000,0,1,8,42,37
sperry,90/80-model-3,98,1000,8000,32,2,8,46,50
sratus,32,125,2000,8000,0,2,14,52,41
wang,vs-100,480,512,8000,32,0,0,67,47
wang,vs-90,480,1000,4000,0,0,0,45,25

View File

@@ -0,0 +1,72 @@
1. Title: Relative CPU Performance Data
2. Source Information
-- Creators: Phillip Ein-Dor and Jacob Feldmesser
-- Ein-Dor: Faculty of Management; Tel Aviv University; Ramat-Aviv;
Tel Aviv, 69978; Israel
-- Donor: David W. Aha (aha@ics.uci.edu) (714) 856-8779
-- Date: October, 1987
3. Past Usage:
1. Ein-Dor and Feldmesser (CACM 4/87, pp 308-317)
-- Results:
-- linear regression prediction of relative cpu performance
-- Recorded 34% average deviation from actual values
2. Kibler,D. & Aha,D. (1988). Instance-Based Prediction of
Real-Valued Attributes. In Proceedings of the CSCSI (Canadian
AI) Conference.
-- Results:
-- instance-based prediction of relative cpu performance
-- similar results; no transformations required
- Predicted attribute: cpu relative performance (numeric)
4. Relevant Information:
-- The estimated relative performance values were estimated by the authors
using a linear regression method. See their article (pp 308-313) for
more details on how the relative performance values were set.
5. Number of Instances: 209
6. Number of Attributes: 10 (6 predictive attributes, 2 non-predictive,
1 goal field, and the linear regression's guess)
7. Attribute Information:
1. vendor name: 30
(adviser, amdahl,apollo, basf, bti, burroughs, c.r.d, cambex, cdc, dec,
dg, formation, four-phase, gould, honeywell, hp, ibm, ipl, magnuson,
microdata, nas, ncr, nixdorf, perkin-elmer, prime, siemens, sperry,
sratus, wang)
2. Model Name: many unique symbols
3. MYCT: machine cycle time in nanoseconds (integer)
4. MMIN: minimum main memory in kilobytes (integer)
5. MMAX: maximum main memory in kilobytes (integer)
6. CACH: cache memory in kilobytes (integer)
7. CHMIN: minimum channels in units (integer)
8. CHMAX: maximum channels in units (integer)
9. PRP: published relative performance (integer)
10. ERP: estimated relative performance from the original article (integer)
8. Missing Attribute Values: None
9. Class Distribution: the class value (PRP) is continuously valued.
PRP Value Range: Number of Instances in Range:
0-20 31
21-100 121
101-200 27
201-300 13
301-400 7
401-500 4
501-600 2
above 600 4
Summary Statistics:
Min Max Mean SD PRP Correlation
MCYT: 17 1500 203.8 260.3 -0.3071
MMIN: 64 32000 2868.0 3878.7 0.7949
MMAX: 64 64000 11796.1 11726.6 0.8630
CACH: 0 256 25.2 40.6 0.6626
CHMIN: 0 52 4.7 6.8 0.6089
CHMAX: 0 176 18.2 26.0 0.6052
PRP: 6 1150 105.6 160.8 1.0000
ERP: 15 1238 99.3 154.8 0.9665

32
demo/regression/mapfeat.py Executable file
View File

@@ -0,0 +1,32 @@
#!/usr/bin/python
import sys
fo = open( 'machine.txt', 'w' )
cnt = 6
fmap = {}
for l in open( 'machine.data' ):
arr = l.split(',')
fo.write(arr[8])
for i in range( 0,6 ):
fo.write( ' %d:%s' %(i,arr[i+2]) )
if arr[0] not in fmap:
fmap[arr[0]] = cnt
cnt += 1
fo.write( ' %d:1' % fmap[arr[0]] )
fo.write('\n')
fo.close()
# create feature map for machine data
fo = open('featmap.txt', 'w')
# list from machine.names
names = ['vendor','MYCT', 'MMIN', 'MMAX', 'CACH', 'CHMIN', 'CHMAX', 'PRP', 'ERP' ];
for i in range(0,6):
fo.write( '%d\t%s\tint\n' % (i, names[i+1]))
for v, k in sorted( fmap.items(), key = lambda x:x[1] ):
fo.write( '%d\tvendor=%s\ti\n' % (k, v))
fo.close()

29
demo/regression/mknfold.py Executable file
View File

@@ -0,0 +1,29 @@
#!/usr/bin/python
import sys
import random
if len(sys.argv) < 2:
print ('Usage:<filename> <k> [nfold = 5]')
exit(0)
random.seed( 10 )
k = int( sys.argv[2] )
if len(sys.argv) > 3:
nfold = int( sys.argv[3] )
else:
nfold = 5
fi = open( sys.argv[1], 'r' )
ftr = open( sys.argv[1]+'.train', 'w' )
fte = open( sys.argv[1]+'.test', 'w' )
for l in fi:
if random.randint( 1 , nfold ) == k:
fte.write( l )
else:
ftr.write( l )
fi.close()
ftr.close()
fte.close()

16
demo/regression/runexp.sh Executable file
View File

@@ -0,0 +1,16 @@
#!/bin/bash
# map the data to features. For convenience we only use 7 original attributes and encode them as features in a trivial way
python mapfeat.py
# split train and test
python mknfold.py machine.txt 1
# training and output the models
../../xgboost machine.conf
# output predictions of test data
../../xgboost machine.conf task=pred model_in=0002.model
# print the boosters of 0002.model in dump.raw.txt
../../xgboost machine.conf task=dump model_in=0002.model name_dump=dump.raw.txt
# print the boosters of 0002.model in dump.nice.txt with feature map
../../xgboost machine.conf task=dump model_in=0002.model fmap=featmap.txt name_dump=dump.nice.txt
# cat the result
cat dump.nice.txt

Some files were not shown because too many files have changed in this diff Show More