Wrappers Feature Selection in Alzheimer's Biomarkers Using kNN and SMOTE Oversampling
DOI:
https://doi.org/10.5540/tema.2017.018.01.0015Keywords:
k-vizinhos mais próximos, SMOTE, seleção de características, biomarcadores de Alzheimer, problema de classificaçãoAbstract
Biomarkers are a characteristic that is objectively measured and eval-
uated as an indicator of normal biological processes, pathogenic processes or phar-
macological responses to a therapeutic intervention. The combination of dierent
biomarker modalities often allows an accurate diagnosis classication. In Alzheimer's
disease (AD), biomarkers are indispensable to identify cognitively normal individ-
uals destined to develop dementia symptoms. However, using the combination of
canonical AD biomarkers, studies have repeatedly shown poor classication rates
to dierentiate between AD, mild cognitive impairment and control individuals.
Furthermore, the design of classiers to access multiple biomarker combinations
includes issues such as imbalance classes and missing data. Since the number
biomarker combinations is large then wrappers are used to avoid multiple com-
parisons. Here, we compare the ability of three wrappers feature selection methods
to obtain biomarker combinations which maximize classication rates. Also, as
criterion to the wrappers feature selection we use the k-nearest neighbor classi-
er with balance aids, random undersampling and SMOTE. Overall, our analyses
showed how biomarkers combinations aects the classier accuracy and how imbal-
ance strategy improve it. We show that non-dening and non-cognitive biomarkers
have less accuracy than cognitive measures when classifying AD. Our approach sur-
pass in average the support vector machine and the weighted k-nearest neighbors
classiers and reaches 94.34 ± 3.91% of accuracy reproducing class denitions.
References
Aggarwal, C. C., et al. On the surprising behavior of distance metrics in high dimensional sapce. Springer, 2001.
Bailey, T., and Jain, A. K. A Note on Distance-Weighted k-Nearest Neighbor Rules. IEEE Transactions on Systems, Man, and Cybernetics SMC-8, 4 (1978), 311–312.
Bhattacharya, G., et al. An affinity-based new local distance function and similarity measure for knn algorithm. Pattern Recognition Letters 33, 3 (2012), 356–363.
Bhattacharyya, A. On a measure of divergence between two multinomial populations. Sankhyā: the indian journal of statistics (1946), 401–406.
Bishop, C. M. Pattern recognition. Machine Learning 128 (2006).
Chawla, N. V., et al. Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research 16 (2002), 321–357.
Cover, T. M., and Hart, P. E. Nearest neighbor pattern classification. Information Theory, IEEE Transactions on 13, 1 (1967), 21–27.
Devroye, L., Györfi, L., and Lugosi, G. A probabilistic theory of pattern recognition, vol. 31. Springer Science & Business Media, 2013.
Dubey, H., and Pudi, V. Class based weighted k-nearest neighbor over imbalance dataset. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (2013), Springer, pp. 305–316.
Fawcett, T. An introduction to roc analysis. Pattern recognition letters 27, 8 (2006), 861–874.
Fiandaca, M. S., et al. The critical need for defining preclinical biomarkers in alzheimer’s disease. Alzheimer’s & Dementia 10, 3 (2014), S196–S212.
Guyon, I., and Elisseeff, A. An introduction to variable and feature selection. Journal of machine learning research 3, Mar (2003), 1157–1182.
He, H., and Garcia, E. A. Learning from imbalanced data. IEEE Transactions on knowledge and data engineering 21, 9 (2009), 1263–1284.
Humpel, C. Identifying and validating biomarkers for alzheimer’s disease. Trends in biotechnology 29, 1 (2011), 26–32.
Jack, C. R., et al. Hypothetical model of dynamic biomarkers of the alzheimer’s pathological cascade. The Lancet Neurology 9, 1 (2010), 119–128.
Khazaee, A., et al. Identifying patients with alzheimer’s disease using resting-state fmri and graph theory. Clinical Neurophysiology 126, 11 (2015), 2132–2141.
Khedher, L., et al. Early diagnosis of alzheimer’s disease based on partial least squares, principal component analysis and support vector machine using segmented mri images. Neurocomputing 151 (2015), 139–150.
Kohavi, R., and John, G. H. Wrappers for feature subset selection. Artificial intelligence 97, 1 (1997), 273–324.
Krawczyk, B. Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence (2016), 1–12.
Lopez-de Ipiña, K., et al. On automatic diagnosis of alzheimer’s disease based on spontaneous speech analysis and emotional temperature. Cognitive Computation 7, 1 (2015), 44–55.
Ma, C.-M., et al. How the parameters of k-nearest neighbor algorithm impact on the best classification accuracy: In case of parkinson dataset. Journal of Applied Sciences 14, 2 (2014), 171.
Marques, J. S. Reconhecimento de Padroes: metodos estatisticos e neuronais. IST press, 2005.
Motsinger-Reif, A. A., et al. Comparing metabolomic and pathologic biomarkers alone and in combination for discriminating alzheimer’s disease from normal cognitive aging. Acta neuropathologica communications 1, 1 (2013), 1.
Saeys, Y., et al. A review of feature selection techniques in bioinformatics. bioinformatics 23, 19 (2007), 2507–2517.
Sarica, A., et al. Advanced feature selection in multinominal dementia
classication from structural mri data. In Proc MICCAI Workshop Challenge
on Computer-Aided Diagnosis of Dementia Based on Structural MRI Data
(2014), pp. 82-91.
Scheubert, L., et al. Tissue-based alzheimer gene expression markers–comparison of multiple machine learning approaches and investigation of redundancy in small biomarker sets. BMC bioinformatics 13, 1 (2012), 1.
Sperling, R. A., et al. Toward defining the preclinical stages of alzheimer’s disease: Recommendations from the national institute on aging-alzheimer’s association workgroups on diagnostic guidelines for alzheimer’s disease. Alzheimer’s & dementia 7, 3 (2011), 280–292.
Tapiola, T., et al. Cerebrospinal fluid β-amyloid 42 and tau proteins as biomarkers of alzheimer-type pathologic changes in the brain. Archives of neurology 66, 3 (2009), 382–389.
Teipel, S. J., et al. Perspectives for multimodal neurochemical and imaging biomarkers in alzheimer’s disease. Journal of Alzheimer’s Disease 33, s1 (2013).
Toga, A. W., and Crawford, K. L. The alzheimer’s disease neuroimaging initiative informatics core: A decade in review. Alzheimer’s & Dementia 11, 7 (2015), 832–839.
Yang, Q., and Wu, X. 10 challenging problems in data mining research. International Journal of Information Technology & Decision Making 5, 04 (2006), 597–604.
Downloads
Additional Files
Published
How to Cite
Issue
Section
License
Copyright
Authors of articles published in the journal Trends in Computational and Applied Mathematics retain the copyright of their work. The journal uses Creative Commons Attribution (CC-BY) in published articles. The authors grant the TCAM journal the right to first publish the article.
Intellectual Property and Terms of Use
The content of the articles is the exclusive responsibility of the authors. The journal uses Creative Commons Attribution (CC-BY) in published articles. This license allows published articles to be reused without permission for any purpose as long as the original work is correctly cited.
The journal encourages Authors to self-archive their accepted manuscripts, publishing them on personal blogs, institutional repositories, and social media, as long as the full citation is included in the journal's website version.