SPIE Discovery Challenge
Problem:
The SPIE Discovery Challenge was
posted on the web at: http://www.cs.uncc.edu/~zytkow/spie_challenge/.
A description of the data can be found at: http://www.cs.uncc.edu/~zytkow/spie_challenge/data_described-5.htm.
Method:
A software package named Cluzzifier was used to accomplish the following steps:
1) Data pre-processing and coding; 2) Three-layer (input, hidden and output)
feed-forward neural network training using back-propagation; 3) Weight
interpretation; 4) Node pruning and network restructuring; 5) Steps 2, 3 and 4
iteration; 6) Result assessment; and 7) Data post-processing and report
generation.
Results:
Out of the original 23 attributes,
10 are useless, if not harmful, for the correct classification of the type of
unbalance. The highest classification accuracy was achieved when the
following 13 inputs were used: 1) RtSp; 2) A04D1; 3) A04D2; 4)
A18D1; 5) A18D2;
6) R02D1; 7) R16D1; 8) R16D2; 9) R16TZ (obtained as
R02TZ - RdifTZ); 10) node1_#; 11) node2_#; 12) unbal1; and 13)
unbal2. The confusion matrix for four runs using only these 13
inputs are shown below.
Classified As
-------------------------
b s q m d Accu.(%)
--------------------------------------------
b 1080 0 0 0 0 100.00
s 0 968 4 0 0 99.59
q 0 72 900 0 0 92.59
m 0 0 0 972 0 100.00
d 0 0 4 0 1076 99.62
Overall accuracy: 98.42%
--------------------------------------------
b 1080 0 0 0 0 100.00
s 0 950 22 0 0 97.74
q 0 36 936 0 0 96.30
m 0 0 0 972 0 100.00
d 0 0 0 16 1064 98.52
Overall accuracy: 98.54%
--------------------------------------------
b 1080 0 0 0 0 100.00
s 0 958 14 0 0 98.56
q 0 76 896 0 0 92.18
m 0 0 0 972 0 100.00
d 0 0 0 14 1066 98.70
Overall accuracy: 97.95%
--------------------------------------------
b 1080 0 0 0 0 100.00
s 0 956 16 0 0 98.35
q 0 54 918 0 0 94.44
m 0 0 0 972 0 100.00
d 0 0 0 6 1074 99.44
Overall accuracy: 98.50%
--------------------------------------------
Classified As
-------------------------
b s q m d Accu.(%)
--------------------------------------------
b 1080 0 0 0 0 100.00
s 0 958 14 0 0 98.56
q 0 80 892 0 0 91.77
m 0 0 0 972 0 100.00
d 0 0 0 18 1062 98.33
Overall accuracy: 97.79%
--------------------------------------------
b 1080 0 0 0 0 100.00
s 0 950 22 0 0 97.74
q 0 50 922 0 0 94.86
m 0 0 0 972 0 100.00
d 2 0 0 46 1032 95.56
Overall accuracy: 97.64%
--------------------------------------------
b 1080 0 0 0 0 100.00
s 0 952 20 0 0 97.94
q 0 94 878 0 0 90.33
m 0 0 0 972 0 100.00
d 0 0 0 24 1056 97.78
Overall accuracy: 97.28%
--------------------------------------------
b 1080 0 0 0 0 100.00
s 0 914 58 0 0 94.03
q 0 72 900 0 0 92.59
m 0 0 0 972 0 100.00
d 0 0 0 62 1010 93.52
Overall accuracy: 96.06%
--------------------------------------------
In comparison, when all the 23 attributes were used, the confusion matrices are:
Classified As
-------------------------
b s q m d Accu.(%)
--------------------------------------------
b 1075 0 0 4 1 99.54
s 0 859 111 0 2 88.37
q 0 95 873 1 3 89.81
m 0 0 1 970 1 99.79
d 5 0 13 66 996 92.22
Overall accuracy: 94.03%
--------------------------------------------
b 1064 0 1 13 2 98.52
s 0 866 106 0 0 89.09
q 0 106 864 2 0 88.89
m 0 0 1 964 7 99.18
d 16 2 19 28 1015 93.98
Overall accuracy: 94.03%
--------------------------------------------
b 1073 0 1 0 6 99.35
s 6 844 122 0 0 86.83
q 0 108 841 22 1 86.52
m 1 0 10 960 1 98.77
d 7 2 42 45 984 91.11
Overall accuracy: 92.63%
--------------------------------------------
b 1050 0 3 27 0 97.22
s 0 870 102 0 0 89.51
q 0 130 839 3 0 86.32
m 7 0 3 959 3 98.66
d 26 4 65 41 944 87.41
Overall accuracy: 91.84%
--------------------------------------------
Knowledge discovered:
Neural Network Main Page
Character
Recognition || SPIE Challenge || Diabetes
Forecast || Gene Recognition