change the regression demo data set
This commit is contained in:
parent
57713be940
commit
feb914c35b
@ -1,11 +1,5 @@
|
||||
Demonstrating how to use XGBoost accomplish regression tasks on UCI mushroom dataset http://archive.ics.uci.edu/ml/datasets/Mushroom
|
||||
Demonstrating how to use XGBoost accomplish regression tasks on computer hardware dataset https://archive.ics.uci.edu/ml/datasets/Computer+Hardware
|
||||
|
||||
Run: ./runexp.sh
|
||||
|
||||
Format of input: LIBSVM format
|
||||
|
||||
Format of featmap.txt:
|
||||
<featureid> <featurename> <q or i>\n
|
||||
|
||||
q means continuous quantities, i means indicator features.
|
||||
Feature id must be from 0 to num_features, in sorted order.
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@ -1,32 +0,0 @@
|
||||
1. cap-shape: bell=b,conical=c,convex=x,flat=f,knobbed=k,sunken=s
|
||||
2. cap-surface: fibrous=f,grooves=g,scaly=y,smooth=s
|
||||
3. cap-color: brown=n,buff=b,cinnamon=c,gray=g,green=r,pink=p,purple=u,red=e,white=w,yellow=y
|
||||
4. bruises?: bruises=t,no=f
|
||||
5. odor: almond=a,anise=l,creosote=c,fishy=y,foul=f,
|
||||
musty=m,none=n,pungent=p,spicy=s
|
||||
6. gill-attachment: attached=a,descending=d,free=f,notched=n
|
||||
7. gill-spacing: close=c,crowded=w,distant=d
|
||||
8. gill-size: broad=b,narrow=n
|
||||
9. gill-color: black=k,brown=n,buff=b,chocolate=h,gray=g,
|
||||
green=r,orange=o,pink=p,purple=u,red=e,
|
||||
white=w,yellow=y
|
||||
10. stalk-shape: enlarging=e,tapering=t
|
||||
11. stalk-root: bulbous=b,club=c,cup=u,equal=e,
|
||||
rhizomorphs=z,rooted=r,missing=?
|
||||
12. stalk-surface-above-ring: fibrous=f,scaly=y,silky=k,smooth=s
|
||||
13. stalk-surface-below-ring: fibrous=f,scaly=y,silky=k,smooth=s
|
||||
14. stalk-color-above-ring: brown=n,buff=b,cinnamon=c,gray=g,orange=o,
|
||||
pink=p,red=e,white=w,yellow=y
|
||||
15. stalk-color-below-ring: brown=n,buff=b,cinnamon=c,gray=g,orange=o,
|
||||
pink=p,red=e,white=w,yellow=y
|
||||
16. veil-type: partial=p,universal=u
|
||||
17. veil-color: brown=n,orange=o,white=w,yellow=y
|
||||
18. ring-number: none=n,one=o,two=t
|
||||
19. ring-type: cobwebby=c,evanescent=e,flaring=f,large=l,
|
||||
none=n,pendant=p,sheathing=s,zone=z
|
||||
20. spore-print-color: black=k,brown=n,buff=b,chocolate=h,green=r,
|
||||
orange=o,purple=u,white=w,yellow=y
|
||||
21. population: abundant=a,clustered=c,numerous=n,
|
||||
scattered=s,several=v,solitary=y
|
||||
22. habitat: grasses=g,leaves=l,meadows=m,paths=p,
|
||||
urban=u,waste=w,woods=d
|
||||
@ -1,148 +0,0 @@
|
||||
1. Title: Mushroom Database
|
||||
|
||||
2. Sources:
|
||||
(a) Mushroom records drawn from The Audubon Society Field Guide to North
|
||||
American Mushrooms (1981). G. H. Lincoff (Pres.), New York: Alfred
|
||||
A. Knopf
|
||||
(b) Donor: Jeff Schlimmer (Jeffrey.Schlimmer@a.gp.cs.cmu.edu)
|
||||
(c) Date: 27 April 1987
|
||||
|
||||
3. Past Usage:
|
||||
1. Schlimmer,J.S. (1987). Concept Acquisition Through Representational
|
||||
Adjustment (Technical Report 87-19). Doctoral disseration, Department
|
||||
of Information and Computer Science, University of California, Irvine.
|
||||
--- STAGGER: asymptoted to 95% classification accuracy after reviewing
|
||||
1000 instances.
|
||||
2. Iba,W., Wogulis,J., & Langley,P. (1988). Trading off Simplicity
|
||||
and Coverage in Incremental Concept Learning. In Proceedings of
|
||||
the 5th International Conference on Machine Learning, 73-79.
|
||||
Ann Arbor, Michigan: Morgan Kaufmann.
|
||||
-- approximately the same results with their HILLARY algorithm
|
||||
3. In the following references a set of rules (given below) were
|
||||
learned for this data set which may serve as a point of
|
||||
comparison for other researchers.
|
||||
|
||||
Duch W, Adamczak R, Grabczewski K (1996) Extraction of logical rules
|
||||
from training data using backpropagation networks, in: Proc. of the
|
||||
The 1st Online Workshop on Soft Computing, 19-30.Aug.1996, pp. 25-30,
|
||||
available on-line at: http://www.bioele.nuee.nagoya-u.ac.jp/wsc1/
|
||||
|
||||
Duch W, Adamczak R, Grabczewski K, Ishikawa M, Ueda H, Extraction of
|
||||
crisp logical rules using constrained backpropagation networks -
|
||||
comparison of two new approaches, in: Proc. of the European Symposium
|
||||
on Artificial Neural Networks (ESANN'97), Bruge, Belgium 16-18.4.1997,
|
||||
pp. xx-xx
|
||||
|
||||
Wlodzislaw Duch, Department of Computer Methods, Nicholas Copernicus
|
||||
University, 87-100 Torun, Grudziadzka 5, Poland
|
||||
e-mail: duch@phys.uni.torun.pl
|
||||
WWW http://www.phys.uni.torun.pl/kmk/
|
||||
|
||||
Date: Mon, 17 Feb 1997 13:47:40 +0100
|
||||
From: Wlodzislaw Duch <duch@phys.uni.torun.pl>
|
||||
Organization: Dept. of Computer Methods, UMK
|
||||
|
||||
I have attached a file containing logical rules for mushrooms.
|
||||
It should be helpful for other people since only in the last year I
|
||||
have seen about 10 papers analyzing this dataset and obtaining quite
|
||||
complex rules. We will try to contribute other results later.
|
||||
|
||||
With best regards, Wlodek Duch
|
||||
________________________________________________________________
|
||||
|
||||
Logical rules for the mushroom data sets.
|
||||
|
||||
Logical rules given below seem to be the simplest possible for the
|
||||
mushroom dataset and therefore should be treated as benchmark results.
|
||||
|
||||
Disjunctive rules for poisonous mushrooms, from most general
|
||||
to most specific:
|
||||
|
||||
P_1) odor=NOT(almond.OR.anise.OR.none)
|
||||
120 poisonous cases missed, 98.52% accuracy
|
||||
|
||||
P_2) spore-print-color=green
|
||||
48 cases missed, 99.41% accuracy
|
||||
|
||||
P_3) odor=none.AND.stalk-surface-below-ring=scaly.AND.
|
||||
(stalk-color-above-ring=NOT.brown)
|
||||
8 cases missed, 99.90% accuracy
|
||||
|
||||
P_4) habitat=leaves.AND.cap-color=white
|
||||
100% accuracy
|
||||
|
||||
Rule P_4) may also be
|
||||
|
||||
P_4') population=clustered.AND.cap_color=white
|
||||
|
||||
These rule involve 6 attributes (out of 22). Rules for edible
|
||||
mushrooms are obtained as negation of the rules given above, for
|
||||
example the rule:
|
||||
|
||||
odor=(almond.OR.anise.OR.none).AND.spore-print-color=NOT.green
|
||||
|
||||
gives 48 errors, or 99.41% accuracy on the whole dataset.
|
||||
|
||||
Several slightly more complex variations on these rules exist,
|
||||
involving other attributes, such as gill_size, gill_spacing,
|
||||
stalk_surface_above_ring, but the rules given above are the simplest
|
||||
we have found.
|
||||
|
||||
|
||||
4. Relevant Information:
|
||||
This data set includes descriptions of hypothetical samples
|
||||
corresponding to 23 species of gilled mushrooms in the Agaricus and
|
||||
Lepiota Family (pp. 500-525). Each species is identified as
|
||||
definitely edible, definitely poisonous, or of unknown edibility and
|
||||
not recommended. This latter class was combined with the poisonous
|
||||
one. The Guide clearly states that there is no simple rule for
|
||||
determining the edibility of a mushroom; no rule like ``leaflets
|
||||
three, let it be'' for Poisonous Oak and Ivy.
|
||||
|
||||
5. Number of Instances: 8124
|
||||
|
||||
6. Number of Attributes: 22 (all nominally valued)
|
||||
|
||||
7. Attribute Information: (classes: edible=e, poisonous=p)
|
||||
1. cap-shape: bell=b,conical=c,convex=x,flat=f,
|
||||
knobbed=k,sunken=s
|
||||
2. cap-surface: fibrous=f,grooves=g,scaly=y,smooth=s
|
||||
3. cap-color: brown=n,buff=b,cinnamon=c,gray=g,green=r,
|
||||
pink=p,purple=u,red=e,white=w,yellow=y
|
||||
4. bruises?: bruises=t,no=f
|
||||
5. odor: almond=a,anise=l,creosote=c,fishy=y,foul=f,
|
||||
musty=m,none=n,pungent=p,spicy=s
|
||||
6. gill-attachment: attached=a,descending=d,free=f,notched=n
|
||||
7. gill-spacing: close=c,crowded=w,distant=d
|
||||
8. gill-size: broad=b,narrow=n
|
||||
9. gill-color: black=k,brown=n,buff=b,chocolate=h,gray=g,
|
||||
green=r,orange=o,pink=p,purple=u,red=e,
|
||||
white=w,yellow=y
|
||||
10. stalk-shape: enlarging=e,tapering=t
|
||||
11. stalk-root: bulbous=b,club=c,cup=u,equal=e,
|
||||
rhizomorphs=z,rooted=r,missing=?
|
||||
12. stalk-surface-above-ring: fibrous=f,scaly=y,silky=k,smooth=s
|
||||
13. stalk-surface-below-ring: fibrous=f,scaly=y,silky=k,smooth=s
|
||||
14. stalk-color-above-ring: brown=n,buff=b,cinnamon=c,gray=g,orange=o,
|
||||
pink=p,red=e,white=w,yellow=y
|
||||
15. stalk-color-below-ring: brown=n,buff=b,cinnamon=c,gray=g,orange=o,
|
||||
pink=p,red=e,white=w,yellow=y
|
||||
16. veil-type: partial=p,universal=u
|
||||
17. veil-color: brown=n,orange=o,white=w,yellow=y
|
||||
18. ring-number: none=n,one=o,two=t
|
||||
19. ring-type: cobwebby=c,evanescent=e,flaring=f,large=l,
|
||||
none=n,pendant=p,sheathing=s,zone=z
|
||||
20. spore-print-color: black=k,brown=n,buff=b,chocolate=h,green=r,
|
||||
orange=o,purple=u,white=w,yellow=y
|
||||
21. population: abundant=a,clustered=c,numerous=n,
|
||||
scattered=s,several=v,solitary=y
|
||||
22. habitat: grasses=g,leaves=l,meadows=m,paths=p,
|
||||
urban=u,waste=w,woods=d
|
||||
|
||||
8. Missing Attribute Values: 2480 of them (denoted by "?"), all for
|
||||
attribute #11.
|
||||
|
||||
9. Class Distribution:
|
||||
-- edible: 4208 (51.8%)
|
||||
-- poisonous: 3916 (48.2%)
|
||||
-- total: 8124 instances
|
||||
209
demo/regression/machine.data
Normal file
209
demo/regression/machine.data
Normal file
@ -0,0 +1,209 @@
|
||||
adviser,32/60,125,256,6000,256,16,128,198,199
|
||||
amdahl,470v/7,29,8000,32000,32,8,32,269,253
|
||||
amdahl,470v/7a,29,8000,32000,32,8,32,220,253
|
||||
amdahl,470v/7b,29,8000,32000,32,8,32,172,253
|
||||
amdahl,470v/7c,29,8000,16000,32,8,16,132,132
|
||||
amdahl,470v/b,26,8000,32000,64,8,32,318,290
|
||||
amdahl,580-5840,23,16000,32000,64,16,32,367,381
|
||||
amdahl,580-5850,23,16000,32000,64,16,32,489,381
|
||||
amdahl,580-5860,23,16000,64000,64,16,32,636,749
|
||||
amdahl,580-5880,23,32000,64000,128,32,64,1144,1238
|
||||
apollo,dn320,400,1000,3000,0,1,2,38,23
|
||||
apollo,dn420,400,512,3500,4,1,6,40,24
|
||||
basf,7/65,60,2000,8000,65,1,8,92,70
|
||||
basf,7/68,50,4000,16000,65,1,8,138,117
|
||||
bti,5000,350,64,64,0,1,4,10,15
|
||||
bti,8000,200,512,16000,0,4,32,35,64
|
||||
burroughs,b1955,167,524,2000,8,4,15,19,23
|
||||
burroughs,b2900,143,512,5000,0,7,32,28,29
|
||||
burroughs,b2925,143,1000,2000,0,5,16,31,22
|
||||
burroughs,b4955,110,5000,5000,142,8,64,120,124
|
||||
burroughs,b5900,143,1500,6300,0,5,32,30,35
|
||||
burroughs,b5920,143,3100,6200,0,5,20,33,39
|
||||
burroughs,b6900,143,2300,6200,0,6,64,61,40
|
||||
burroughs,b6925,110,3100,6200,0,6,64,76,45
|
||||
c.r.d,68/10-80,320,128,6000,0,1,12,23,28
|
||||
c.r.d,universe:2203t,320,512,2000,4,1,3,69,21
|
||||
c.r.d,universe:68,320,256,6000,0,1,6,33,28
|
||||
c.r.d,universe:68/05,320,256,3000,4,1,3,27,22
|
||||
c.r.d,universe:68/137,320,512,5000,4,1,5,77,28
|
||||
c.r.d,universe:68/37,320,256,5000,4,1,6,27,27
|
||||
cdc,cyber:170/750,25,1310,2620,131,12,24,274,102
|
||||
cdc,cyber:170/760,25,1310,2620,131,12,24,368,102
|
||||
cdc,cyber:170/815,50,2620,10480,30,12,24,32,74
|
||||
cdc,cyber:170/825,50,2620,10480,30,12,24,63,74
|
||||
cdc,cyber:170/835,56,5240,20970,30,12,24,106,138
|
||||
cdc,cyber:170/845,64,5240,20970,30,12,24,208,136
|
||||
cdc,omega:480-i,50,500,2000,8,1,4,20,23
|
||||
cdc,omega:480-ii,50,1000,4000,8,1,5,29,29
|
||||
cdc,omega:480-iii,50,2000,8000,8,1,5,71,44
|
||||
cambex,1636-1,50,1000,4000,8,3,5,26,30
|
||||
cambex,1636-10,50,1000,8000,8,3,5,36,41
|
||||
cambex,1641-1,50,2000,16000,8,3,5,40,74
|
||||
cambex,1641-11,50,2000,16000,8,3,6,52,74
|
||||
cambex,1651-1,50,2000,16000,8,3,6,60,74
|
||||
dec,decsys:10:1091,133,1000,12000,9,3,12,72,54
|
||||
dec,decsys:20:2060,133,1000,8000,9,3,12,72,41
|
||||
dec,microvax-1,810,512,512,8,1,1,18,18
|
||||
dec,vax:11/730,810,1000,5000,0,1,1,20,28
|
||||
dec,vax:11/750,320,512,8000,4,1,5,40,36
|
||||
dec,vax:11/780,200,512,8000,8,1,8,62,38
|
||||
dg,eclipse:c/350,700,384,8000,0,1,1,24,34
|
||||
dg,eclipse:m/600,700,256,2000,0,1,1,24,19
|
||||
dg,eclipse:mv/10000,140,1000,16000,16,1,3,138,72
|
||||
dg,eclipse:mv/4000,200,1000,8000,0,1,2,36,36
|
||||
dg,eclipse:mv/6000,110,1000,4000,16,1,2,26,30
|
||||
dg,eclipse:mv/8000,110,1000,12000,16,1,2,60,56
|
||||
dg,eclipse:mv/8000-ii,220,1000,8000,16,1,2,71,42
|
||||
formation,f4000/100,800,256,8000,0,1,4,12,34
|
||||
formation,f4000/200,800,256,8000,0,1,4,14,34
|
||||
formation,f4000/200ap,800,256,8000,0,1,4,20,34
|
||||
formation,f4000/300,800,256,8000,0,1,4,16,34
|
||||
formation,f4000/300ap,800,256,8000,0,1,4,22,34
|
||||
four-phase,2000/260,125,512,1000,0,8,20,36,19
|
||||
gould,concept:32/8705,75,2000,8000,64,1,38,144,75
|
||||
gould,concept:32/8750,75,2000,16000,64,1,38,144,113
|
||||
gould,concept:32/8780,75,2000,16000,128,1,38,259,157
|
||||
hp,3000/30,90,256,1000,0,3,10,17,18
|
||||
hp,3000/40,105,256,2000,0,3,10,26,20
|
||||
hp,3000/44,105,1000,4000,0,3,24,32,28
|
||||
hp,3000/48,105,2000,4000,8,3,19,32,33
|
||||
hp,3000/64,75,2000,8000,8,3,24,62,47
|
||||
hp,3000/88,75,3000,8000,8,3,48,64,54
|
||||
hp,3000/iii,175,256,2000,0,3,24,22,20
|
||||
harris,100,300,768,3000,0,6,24,36,23
|
||||
harris,300,300,768,3000,6,6,24,44,25
|
||||
harris,500,300,768,12000,6,6,24,50,52
|
||||
harris,600,300,768,4500,0,1,24,45,27
|
||||
harris,700,300,384,12000,6,1,24,53,50
|
||||
harris,80,300,192,768,6,6,24,36,18
|
||||
harris,800,180,768,12000,6,1,31,84,53
|
||||
honeywell,dps:6/35,330,1000,3000,0,2,4,16,23
|
||||
honeywell,dps:6/92,300,1000,4000,8,3,64,38,30
|
||||
honeywell,dps:6/96,300,1000,16000,8,2,112,38,73
|
||||
honeywell,dps:7/35,330,1000,2000,0,1,2,16,20
|
||||
honeywell,dps:7/45,330,1000,4000,0,3,6,22,25
|
||||
honeywell,dps:7/55,140,2000,4000,0,3,6,29,28
|
||||
honeywell,dps:7/65,140,2000,4000,0,4,8,40,29
|
||||
honeywell,dps:8/44,140,2000,4000,8,1,20,35,32
|
||||
honeywell,dps:8/49,140,2000,32000,32,1,20,134,175
|
||||
honeywell,dps:8/50,140,2000,8000,32,1,54,66,57
|
||||
honeywell,dps:8/52,140,2000,32000,32,1,54,141,181
|
||||
honeywell,dps:8/62,140,2000,32000,32,1,54,189,181
|
||||
honeywell,dps:8/20,140,2000,4000,8,1,20,22,32
|
||||
ibm,3033:s,57,4000,16000,1,6,12,132,82
|
||||
ibm,3033:u,57,4000,24000,64,12,16,237,171
|
||||
ibm,3081,26,16000,32000,64,16,24,465,361
|
||||
ibm,3081:d,26,16000,32000,64,8,24,465,350
|
||||
ibm,3083:b,26,8000,32000,0,8,24,277,220
|
||||
ibm,3083:e,26,8000,16000,0,8,16,185,113
|
||||
ibm,370/125-2,480,96,512,0,1,1,6,15
|
||||
ibm,370/148,203,1000,2000,0,1,5,24,21
|
||||
ibm,370/158-3,115,512,6000,16,1,6,45,35
|
||||
ibm,38/3,1100,512,1500,0,1,1,7,18
|
||||
ibm,38/4,1100,768,2000,0,1,1,13,20
|
||||
ibm,38/5,600,768,2000,0,1,1,16,20
|
||||
ibm,38/7,400,2000,4000,0,1,1,32,28
|
||||
ibm,38/8,400,4000,8000,0,1,1,32,45
|
||||
ibm,4321,900,1000,1000,0,1,2,11,18
|
||||
ibm,4331-1,900,512,1000,0,1,2,11,17
|
||||
ibm,4331-11,900,1000,4000,4,1,2,18,26
|
||||
ibm,4331-2,900,1000,4000,8,1,2,22,28
|
||||
ibm,4341,900,2000,4000,0,3,6,37,28
|
||||
ibm,4341-1,225,2000,4000,8,3,6,40,31
|
||||
ibm,4341-10,225,2000,4000,8,3,6,34,31
|
||||
ibm,4341-11,180,2000,8000,8,1,6,50,42
|
||||
ibm,4341-12,185,2000,16000,16,1,6,76,76
|
||||
ibm,4341-2,180,2000,16000,16,1,6,66,76
|
||||
ibm,4341-9,225,1000,4000,2,3,6,24,26
|
||||
ibm,4361-4,25,2000,12000,8,1,4,49,59
|
||||
ibm,4361-5,25,2000,12000,16,3,5,66,65
|
||||
ibm,4381-1,17,4000,16000,8,6,12,100,101
|
||||
ibm,4381-2,17,4000,16000,32,6,12,133,116
|
||||
ibm,8130-a,1500,768,1000,0,0,0,12,18
|
||||
ibm,8130-b,1500,768,2000,0,0,0,18,20
|
||||
ibm,8140,800,768,2000,0,0,0,20,20
|
||||
ipl,4436,50,2000,4000,0,3,6,27,30
|
||||
ipl,4443,50,2000,8000,8,3,6,45,44
|
||||
ipl,4445,50,2000,8000,8,1,6,56,44
|
||||
ipl,4446,50,2000,16000,24,1,6,70,82
|
||||
ipl,4460,50,2000,16000,24,1,6,80,82
|
||||
ipl,4480,50,8000,16000,48,1,10,136,128
|
||||
magnuson,m80/30,100,1000,8000,0,2,6,16,37
|
||||
magnuson,m80/31,100,1000,8000,24,2,6,26,46
|
||||
magnuson,m80/32,100,1000,8000,24,3,6,32,46
|
||||
magnuson,m80/42,50,2000,16000,12,3,16,45,80
|
||||
magnuson,m80/43,50,2000,16000,24,6,16,54,88
|
||||
magnuson,m80/44,50,2000,16000,24,6,16,65,88
|
||||
microdata,seq.ms/3200,150,512,4000,0,8,128,30,33
|
||||
nas,as/3000,115,2000,8000,16,1,3,50,46
|
||||
nas,as/3000-n,115,2000,4000,2,1,5,40,29
|
||||
nas,as/5000,92,2000,8000,32,1,6,62,53
|
||||
nas,as/5000-e,92,2000,8000,32,1,6,60,53
|
||||
nas,as/5000-n,92,2000,8000,4,1,6,50,41
|
||||
nas,as/6130,75,4000,16000,16,1,6,66,86
|
||||
nas,as/6150,60,4000,16000,32,1,6,86,95
|
||||
nas,as/6620,60,2000,16000,64,5,8,74,107
|
||||
nas,as/6630,60,4000,16000,64,5,8,93,117
|
||||
nas,as/6650,50,4000,16000,64,5,10,111,119
|
||||
nas,as/7000,72,4000,16000,64,8,16,143,120
|
||||
nas,as/7000-n,72,2000,8000,16,6,8,105,48
|
||||
nas,as/8040,40,8000,16000,32,8,16,214,126
|
||||
nas,as/8050,40,8000,32000,64,8,24,277,266
|
||||
nas,as/8060,35,8000,32000,64,8,24,370,270
|
||||
nas,as/9000-dpc,38,16000,32000,128,16,32,510,426
|
||||
nas,as/9000-n,48,4000,24000,32,8,24,214,151
|
||||
nas,as/9040,38,8000,32000,64,8,24,326,267
|
||||
nas,as/9060,30,16000,32000,256,16,24,510,603
|
||||
ncr,v8535:ii,112,1000,1000,0,1,4,8,19
|
||||
ncr,v8545:ii,84,1000,2000,0,1,6,12,21
|
||||
ncr,v8555:ii,56,1000,4000,0,1,6,17,26
|
||||
ncr,v8565:ii,56,2000,6000,0,1,8,21,35
|
||||
ncr,v8565:ii-e,56,2000,8000,0,1,8,24,41
|
||||
ncr,v8575:ii,56,4000,8000,0,1,8,34,47
|
||||
ncr,v8585:ii,56,4000,12000,0,1,8,42,62
|
||||
ncr,v8595:ii,56,4000,16000,0,1,8,46,78
|
||||
ncr,v8635,38,4000,8000,32,16,32,51,80
|
||||
ncr,v8650,38,4000,8000,32,16,32,116,80
|
||||
ncr,v8655,38,8000,16000,64,4,8,100,142
|
||||
ncr,v8665,38,8000,24000,160,4,8,140,281
|
||||
ncr,v8670,38,4000,16000,128,16,32,212,190
|
||||
nixdorf,8890/30,200,1000,2000,0,1,2,25,21
|
||||
nixdorf,8890/50,200,1000,4000,0,1,4,30,25
|
||||
nixdorf,8890/70,200,2000,8000,64,1,5,41,67
|
||||
perkin-elmer,3205,250,512,4000,0,1,7,25,24
|
||||
perkin-elmer,3210,250,512,4000,0,4,7,50,24
|
||||
perkin-elmer,3230,250,1000,16000,1,1,8,50,64
|
||||
prime,50-2250,160,512,4000,2,1,5,30,25
|
||||
prime,50-250-ii,160,512,2000,2,3,8,32,20
|
||||
prime,50-550-ii,160,1000,4000,8,1,14,38,29
|
||||
prime,50-750-ii,160,1000,8000,16,1,14,60,43
|
||||
prime,50-850-ii,160,2000,8000,32,1,13,109,53
|
||||
siemens,7.521,240,512,1000,8,1,3,6,19
|
||||
siemens,7.531,240,512,2000,8,1,5,11,22
|
||||
siemens,7.536,105,2000,4000,8,3,8,22,31
|
||||
siemens,7.541,105,2000,6000,16,6,16,33,41
|
||||
siemens,7.551,105,2000,8000,16,4,14,58,47
|
||||
siemens,7.561,52,4000,16000,32,4,12,130,99
|
||||
siemens,7.865-2,70,4000,12000,8,6,8,75,67
|
||||
siemens,7.870-2,59,4000,12000,32,6,12,113,81
|
||||
siemens,7.872-2,59,8000,16000,64,12,24,188,149
|
||||
siemens,7.875-2,26,8000,24000,32,8,16,173,183
|
||||
siemens,7.880-2,26,8000,32000,64,12,16,248,275
|
||||
siemens,7.881-2,26,8000,32000,128,24,32,405,382
|
||||
sperry,1100/61-h1,116,2000,8000,32,5,28,70,56
|
||||
sperry,1100/81,50,2000,32000,24,6,26,114,182
|
||||
sperry,1100/82,50,2000,32000,48,26,52,208,227
|
||||
sperry,1100/83,50,2000,32000,112,52,104,307,341
|
||||
sperry,1100/84,50,4000,32000,112,52,104,397,360
|
||||
sperry,1100/93,30,8000,64000,96,12,176,915,919
|
||||
sperry,1100/94,30,8000,64000,128,12,176,1150,978
|
||||
sperry,80/3,180,262,4000,0,1,3,12,24
|
||||
sperry,80/4,180,512,4000,0,1,3,14,24
|
||||
sperry,80/5,180,262,4000,0,1,3,18,24
|
||||
sperry,80/6,180,512,4000,0,1,3,21,24
|
||||
sperry,80/8,124,1000,8000,0,1,8,42,37
|
||||
sperry,90/80-model-3,98,1000,8000,32,2,8,46,50
|
||||
sratus,32,125,2000,8000,0,2,14,52,41
|
||||
wang,vs-100,480,512,8000,32,0,0,67,47
|
||||
wang,vs-90,480,1000,4000,0,0,0,45,25
|
||||
72
demo/regression/machine.names
Normal file
72
demo/regression/machine.names
Normal file
@ -0,0 +1,72 @@
|
||||
1. Title: Relative CPU Performance Data
|
||||
|
||||
2. Source Information
|
||||
-- Creators: Phillip Ein-Dor and Jacob Feldmesser
|
||||
-- Ein-Dor: Faculty of Management; Tel Aviv University; Ramat-Aviv;
|
||||
Tel Aviv, 69978; Israel
|
||||
-- Donor: David W. Aha (aha@ics.uci.edu) (714) 856-8779
|
||||
-- Date: October, 1987
|
||||
|
||||
3. Past Usage:
|
||||
1. Ein-Dor and Feldmesser (CACM 4/87, pp 308-317)
|
||||
-- Results:
|
||||
-- linear regression prediction of relative cpu performance
|
||||
-- Recorded 34% average deviation from actual values
|
||||
2. Kibler,D. & Aha,D. (1988). Instance-Based Prediction of
|
||||
Real-Valued Attributes. In Proceedings of the CSCSI (Canadian
|
||||
AI) Conference.
|
||||
-- Results:
|
||||
-- instance-based prediction of relative cpu performance
|
||||
-- similar results; no transformations required
|
||||
- Predicted attribute: cpu relative performance (numeric)
|
||||
|
||||
4. Relevant Information:
|
||||
-- The estimated relative performance values were estimated by the authors
|
||||
using a linear regression method. See their article (pp 308-313) for
|
||||
more details on how the relative performance values were set.
|
||||
|
||||
5. Number of Instances: 209
|
||||
|
||||
6. Number of Attributes: 10 (6 predictive attributes, 2 non-predictive,
|
||||
1 goal field, and the linear regression's guess)
|
||||
|
||||
7. Attribute Information:
|
||||
1. vendor name: 30
|
||||
(adviser, amdahl,apollo, basf, bti, burroughs, c.r.d, cambex, cdc, dec,
|
||||
dg, formation, four-phase, gould, honeywell, hp, ibm, ipl, magnuson,
|
||||
microdata, nas, ncr, nixdorf, perkin-elmer, prime, siemens, sperry,
|
||||
sratus, wang)
|
||||
2. Model Name: many unique symbols
|
||||
3. MYCT: machine cycle time in nanoseconds (integer)
|
||||
4. MMIN: minimum main memory in kilobytes (integer)
|
||||
5. MMAX: maximum main memory in kilobytes (integer)
|
||||
6. CACH: cache memory in kilobytes (integer)
|
||||
7. CHMIN: minimum channels in units (integer)
|
||||
8. CHMAX: maximum channels in units (integer)
|
||||
9. PRP: published relative performance (integer)
|
||||
10. ERP: estimated relative performance from the original article (integer)
|
||||
|
||||
8. Missing Attribute Values: None
|
||||
|
||||
9. Class Distribution: the class value (PRP) is continuously valued.
|
||||
PRP Value Range: Number of Instances in Range:
|
||||
0-20 31
|
||||
21-100 121
|
||||
101-200 27
|
||||
201-300 13
|
||||
301-400 7
|
||||
401-500 4
|
||||
501-600 2
|
||||
above 600 4
|
||||
|
||||
Summary Statistics:
|
||||
Min Max Mean SD PRP Correlation
|
||||
MCYT: 17 1500 203.8 260.3 -0.3071
|
||||
MMIN: 64 32000 2868.0 3878.7 0.7949
|
||||
MMAX: 64 64000 11796.1 11726.6 0.8630
|
||||
CACH: 0 256 25.2 40.6 0.6626
|
||||
CHMIN: 0 52 4.7 6.8 0.6089
|
||||
CHMAX: 0 176 18.2 26.0 0.6052
|
||||
PRP: 6 1150 105.6 160.8 1.0000
|
||||
ERP: 15 1238 99.3 154.8 0.9665
|
||||
|
||||
@ -1,48 +1,21 @@
|
||||
#!/usr/bin/python
|
||||
import sys
|
||||
|
||||
def loadfmap( fname ):
|
||||
fmap = {}
|
||||
nmap = {}
|
||||
|
||||
for l in open( fname ):
|
||||
arr = l.split()
|
||||
if arr[0].find('.') != -1:
|
||||
idx = int( arr[0].strip('.') )
|
||||
assert idx not in fmap
|
||||
fmap[ idx ] = {}
|
||||
ftype = arr[1].strip(':')
|
||||
content = arr[2]
|
||||
else:
|
||||
content = arr[0]
|
||||
for it in content.split(','):
|
||||
if it.strip() == '':
|
||||
continue
|
||||
k , v = it.split('=')
|
||||
fmap[ idx ][ v ] = len(nmap)
|
||||
nmap[ len(nmap) ] = ftype+'='+k
|
||||
return fmap, nmap
|
||||
|
||||
def write_nmap( fo, nmap ):
|
||||
for i in xrange( len(nmap) ):
|
||||
fo.write('%d\t%s\ti\n' % (i, nmap[i]) )
|
||||
|
||||
# start here
|
||||
fmap, nmap = loadfmap( 'agaricus-lepiota.fmap' )
|
||||
fo = open( 'featmap.txt', 'w' )
|
||||
write_nmap( fo, nmap )
|
||||
fo.close()
|
||||
|
||||
fo = open( 'agaricus.txt', 'w' )
|
||||
for l in open( 'agaricus-lepiota.data' ):
|
||||
fo = open( 'machine.txt', 'w' )
|
||||
cnt = 6
|
||||
fmap = {}
|
||||
for l in open( 'machine.data' ):
|
||||
arr = l.split(',')
|
||||
if arr[0] == 'p':
|
||||
fo.write('1')
|
||||
else:
|
||||
assert arr[0] == 'e'
|
||||
fo.write('0')
|
||||
for i in xrange( 1,len(arr) ):
|
||||
fo.write( ' %d:1' % fmap[i][arr[i].strip()] )
|
||||
fo.write(arr[8])
|
||||
for i in xrange( 0,6 ):
|
||||
fo.write( ' %d:%s' %(i,arr[i+2]) )
|
||||
|
||||
if arr[0] not in fmap.keys():
|
||||
fmap[arr[0]] = cnt
|
||||
cnt += 1
|
||||
|
||||
fo.write( ' %d:1' % fmap[arr[0]] )
|
||||
|
||||
fo.write('\n')
|
||||
|
||||
fo.close()
|
||||
|
||||
@ -2,13 +2,10 @@
|
||||
# map feature using indicator encoding, also produce featmap.txt
|
||||
python mapfeat.py
|
||||
# split train and test
|
||||
python mknfold.py agaricus.txt 1
|
||||
python mknfold.py machine.txt 1
|
||||
# training and output the models
|
||||
../../xgboost mushroom.conf
|
||||
../../xgboost machine.conf
|
||||
# output predictions of test data
|
||||
../../xgboost mushroom.conf task=pred model_in=0003.model
|
||||
../../xgboost machine.conf task=pred model_in=0003.model
|
||||
# print the boosters of 00003.model in dump.raw.txt
|
||||
../../xgboost mushroom.conf task=dump model_in=0003.model name_dump=dump.raw.txt
|
||||
# use the feature map in printing for better visualization
|
||||
../../xgboost mushroom.conf task=dump model_in=0003.model fmap=featmap.txt name_dump=dump.nice.txt
|
||||
cat dump.nice.txt
|
||||
../../xgboost machine.conf task=dump model_in=0003.model name_dump=dump.raw.txt
|
||||
Loading…
x
Reference in New Issue
Block a user