Title | Influence of missing values on artificial neural network performance |
Author(s) | Ennett CM, Frize M, Walker CR |
Source | Medinfo, Vol. 10, Pages 449-453 |
Publication Date | 2001 |
Abstract | The problem of databases containing missing values is a common one in the medical environment. Researchers must find a way to incorporate the incomplete data into the data set to use those cases in their experiments. Artificial neural networks (ANNs) cannot interpret missing values, and when a database is highly skewed, ANNs have difficulty identifying the factors leading to a rare outcome. This study investigates the impact on ANN performance when predicting neonatal mortality of increasing the number of cases with missing values in the data sets. Although previous work using the Canadian Neonatal Intensive Care Unit (NICU) Network s database showed that the ANN could not correctly classify any patients who died when the missing values were replaced with normal or mean values, this problem did not arise as expected in this study. Instead, the ANN consistently performed better than the constant predictor (which classifies all cases as belonging to the outcome with the highest training set a priori probability) with a 0.6-1.3% improvement over the constant predictor. The sensitivity of the models ranged from 14.5-20.3% and the specificity ranged from 99.2- 99.7%. These results indicate that nearly 1 in 5 babies who will eventually die are correctly classified by the ANN, and very few babies were incorrectly identified as patients who will die. These findings are important for patient care, counselling of parents and resource allocation. |