Diabetes Mellitus Forecast
Smith et al. [1] used an early neural network model to forecast the onset of diabetes mellitus. The data were downloaded from ftp://ftp.ncc.up.pt/pub/statlog/. From the 768 samples, an equal number of 170 samples were selected randomly to represent each of the two possible results of diabetes test: positive and negative. The remaining 428 were used as validating samples.
The data set consists of eight input variables:
Each feature selection process started with eight input nodes, each corresponding to one variable, and two output nodes, corresponding to the two possible test results. Each training session continued for 500 cycles. The process was run for a total of five times. The order in which the inputs were deleted, the average MCSR* and average CASR** at each iteration during the five processes are shown in the table below:
|
|
Process |
|||||
|
1 |
2 |
3 |
4 |
5 |
||
|
Inputs used |
1 |
abcdefgh |
abcdefgh |
abcdefgh |
abcdefgh |
abcdefgh |
|
2 |
abcdefg |
abcdefg |
abcdefg |
abcdefg |
abcdefg |
|
|
3 |
abcd fg |
abcd fg |
abcd fg |
abcd fg |
abcd fg |
|
|
4 |
ab d fg |
ab d fg |
ab d fg |
ab d fg |
ab d fg |
|
|
5 |
ab fg |
ab fg |
ab fg |
ab fg |
ab fg |
|
|
6 |
b fg |
ab f |
ab f |
ab f |
ab f |
|
|
7 |
b f |
b f |
b f |
b f |
b f |
|
|
8 |
b |
b |
b |
b |
b |
|
|
Average MCSR |
1 |
67.143 |
66.939 |
67.143 |
66.735 |
67.143 |
|
2 |
66.122 |
65.714 |
65.510 |
65.918 |
65.918 |
|
|
3 |
69.388 |
70.000 |
69.592 |
69.388 |
69.388 |
|
|
4 |
67.143 |
67.551 |
68.163 |
67.347 |
67.551 |
|
|
5 |
73.265 |
73.469 |
73.469 |
73.469 |
73.469 |
|
|
6 |
67.347 |
68.163 |
68.163 |
68.163 |
68.571 |
|
|
7 |
60.204 |
60.204 |
60.204 |
60.204 |
60.612 |
|
|
8 |
60.816 |
60.816 |
60.816 |
61.429 |
61.429 |
|
|
Average CASR |
1 |
75.047 |
75.234 |
74.907 |
74.860 |
75.561 |
|
2 |
74.159 |
74.252 |
74.720 |
74.206 |
74.299 |
|
|
3 |
74.486 |
74.579 |
74.486 |
74.486 |
74.579 |
|
|
4 |
74.019 |
74.159 |
74.439 |
74.206 |
74.346 |
|
|
5 |
73.738 |
73.645 |
73.692 |
73.598 |
73.598 |
|
|
6 |
75.280 |
74.065 |
73.972 |
74.159 |
73.879 |
|
|
7 |
74.299 |
74.299 |
74.299 |
74.299 |
74.252 |
|
|
8 |
72.570 |
72.570 |
72.570 |
72.477 |
72.477 |
|
As inputs were deleted one after another, the MCSR's and CASR's reversed the trend of decline when four and three inputs, respectively, were kept. In particular, the highest MCSR's were reached when four inputs were used. In other words, to achieve the highest success rate for both "positive" and "negative" cases, only four inputs: a, b, f and g, should be used as inputs. The success rate is above 50% even when only one input, b, is used.
[1] J. W. Smith, J. E. Everhart, W. C. Dickson, W. C. Knowler and R. S. Johannes, "Using the ADAP learning algorithm to forecast the onset of diabetes mellitus," Proceedings of 12th Symposium on Computer Applications in Medical Care (R. A. Greenes, Ed.), IEEE Computer Society Press, pp. 261-265, 1988
__________
*MCSR: The Minimum Class Success Rate was the lowest success rate among all the target classes. The average MCSR is the MCSR's averaged over the five training sessions within each process.
**CASR: The Class Average Success Rate is the success rate averaged over all the target classes. The average CASR is the CASR's averaged over the five training sessions within each process.
Neural Network Main Page
Character
Recognition || SPIE Challenge || Diabetes
Forecast || Gene Recognition