Wrappers Feature Selection in Alzheimer's Biomarkers Using kNN and SMOTE Oversampling

Authors

  • Yuri Elias Rodrigues Federal University of Rio Grande do Sul
  • Evandro Manica Federal University of Rio Grande do Sul
  • Eduardo Rigon Zimmer Institue Inscer, Catholic University of Rio Grande do Sul; Brain Federal University of Rio Grande do Sul
  • Tharick Ali Pascoal Translational Neuroimaging Laboratory, McGill University, Montréal
  • Sulantha Sanjeewa Mathotaarachchi Translational Neuroimaging Laboratory, McGill University, Montréal
  • Pedro Rosa-Neto Translational Neuroimaging Laboratory, McGill University, Montréal

DOI:

https://doi.org/10.5540/tema.2017.018.01.0015

Keywords:

k-vizinhos mais próximos, SMOTE, seleção de características, biomarcadores de Alzheimer, problema de classificação

Abstract

Biomarkers are a characteristic that is objectively measured and eval-
uated as an indicator of normal biological processes, pathogenic processes or phar-
macological responses to a therapeutic intervention. The combination of dierent
biomarker modalities often allows an accurate diagnosis classication. In Alzheimer's
disease (AD), biomarkers are indispensable to identify cognitively normal individ-
uals destined to develop dementia symptoms. However, using the combination of
canonical AD biomarkers, studies have repeatedly shown poor classication rates
to dierentiate between AD, mild cognitive impairment and control individuals.
Furthermore, the design of classiers to access multiple biomarker combinations
includes issues such as imbalance classes and missing data. Since the number
biomarker combinations is large then wrappers are used to avoid multiple com-
parisons. Here, we compare the ability of three wrappers feature selection methods
to obtain biomarker combinations which maximize classication rates. Also, as
criterion to the wrappers feature selection we use the k-nearest neighbor classi-
er with balance aids, random undersampling and SMOTE. Overall, our analyses
showed how biomarkers combinations aects the classier accuracy and how imbal-
ance strategy improve it. We show that non-dening and non-cognitive biomarkers
have less accuracy than cognitive measures when classifying AD. Our approach sur-
pass in average the support vector machine and the weighted k-nearest neighbors
classiers and reaches 94.34 ± 3.91% of accuracy reproducing class denitions.

Author Biographies

Yuri Elias Rodrigues, Federal University of Rio Grande do Sul

Student at Post-Graduated Program of Applied Mathematics (PPGMAP) in Mathematics Institute at Federal University of Rio Grande do Sul (UFRGS)

Evandro Manica, Federal University of Rio Grande do Sul

Associated Professor at Mathematical Institute

Eduardo Rigon Zimmer, Institue Inscer, Catholic University of Rio Grande do Sul; Brain Federal University of Rio Grande do Sul

Postdoctoral Researcher in Biochemestry at Brain Institute (InsCer)

Tharick Ali Pascoal, Translational Neuroimaging Laboratory, McGill University, Montréal

M.D, Neurologist at Translational Neuroimaging Laboratory

Pedro Rosa-Neto, Translational Neuroimaging Laboratory, McGill University, Montréal

Director of the TNL, Assistant Professor at McGill University

References

Aggarwal, C. C., et al. On the surprising behavior of distance metrics in high dimensional sapce. Springer, 2001.

Bailey, T., and Jain, A. K. A Note on Distance-Weighted k-Nearest Neighbor Rules. IEEE Transactions on Systems, Man, and Cybernetics SMC-8, 4 (1978), 311–312.

Bhattacharya, G., et al. An affinity-based new local distance function and similarity measure for knn algorithm. Pattern Recognition Letters 33, 3 (2012), 356–363.

Bhattacharyya, A. On a measure of divergence between two multinomial populations. Sankhyā: the indian journal of statistics (1946), 401–406.

Bishop, C. M. Pattern recognition. Machine Learning 128 (2006).

Chawla, N. V., et al. Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research 16 (2002), 321–357.

Cover, T. M., and Hart, P. E. Nearest neighbor pattern classification. Information Theory, IEEE Transactions on 13, 1 (1967), 21–27.

Devroye, L., Györfi, L., and Lugosi, G. A probabilistic theory of pattern recognition, vol. 31. Springer Science & Business Media, 2013.

Dubey, H., and Pudi, V. Class based weighted k-nearest neighbor over imbalance dataset. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (2013), Springer, pp. 305–316.

Fawcett, T. An introduction to roc analysis. Pattern recognition letters 27, 8 (2006), 861–874.

Fiandaca, M. S., et al. The critical need for defining preclinical biomarkers in alzheimer’s disease. Alzheimer’s & Dementia 10, 3 (2014), S196–S212.

Guyon, I., and Elisseeff, A. An introduction to variable and feature selection. Journal of machine learning research 3, Mar (2003), 1157–1182.

He, H., and Garcia, E. A. Learning from imbalanced data. IEEE Transactions on knowledge and data engineering 21, 9 (2009), 1263–1284.

Humpel, C. Identifying and validating biomarkers for alzheimer’s disease. Trends in biotechnology 29, 1 (2011), 26–32.

Jack, C. R., et al. Hypothetical model of dynamic biomarkers of the alzheimer’s pathological cascade. The Lancet Neurology 9, 1 (2010), 119–128.

Khazaee, A., et al. Identifying patients with alzheimer’s disease using resting-state fmri and graph theory. Clinical Neurophysiology 126, 11 (2015), 2132–2141.

Khedher, L., et al. Early diagnosis of alzheimer’s disease based on partial least squares, principal component analysis and support vector machine using segmented mri images. Neurocomputing 151 (2015), 139–150.

Kohavi, R., and John, G. H. Wrappers for feature subset selection. Artificial intelligence 97, 1 (1997), 273–324.

Krawczyk, B. Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence (2016), 1–12.

Lopez-de Ipiña, K., et al. On automatic diagnosis of alzheimer’s disease based on spontaneous speech analysis and emotional temperature. Cognitive Computation 7, 1 (2015), 44–55.

Ma, C.-M., et al. How the parameters of k-nearest neighbor algorithm impact on the best classification accuracy: In case of parkinson dataset. Journal of Applied Sciences 14, 2 (2014), 171.

Marques, J. S. Reconhecimento de Padroes: metodos estatisticos e neuronais. IST press, 2005.

Motsinger-Reif, A. A., et al. Comparing metabolomic and pathologic biomarkers alone and in combination for discriminating alzheimer’s disease from normal cognitive aging. Acta neuropathologica communications 1, 1 (2013), 1.

Saeys, Y., et al. A review of feature selection techniques in bioinformatics. bioinformatics 23, 19 (2007), 2507–2517.

Sarica, A., et al. Advanced feature selection in multinominal dementia

classication from structural mri data. In Proc MICCAI Workshop Challenge

on Computer-Aided Diagnosis of Dementia Based on Structural MRI Data

(2014), pp. 82-91.

Scheubert, L., et al. Tissue-based alzheimer gene expression markers–comparison of multiple machine learning approaches and investigation of redundancy in small biomarker sets. BMC bioinformatics 13, 1 (2012), 1.

Sperling, R. A., et al. Toward defining the preclinical stages of alzheimer’s disease: Recommendations from the national institute on aging-alzheimer’s association workgroups on diagnostic guidelines for alzheimer’s disease. Alzheimer’s & dementia 7, 3 (2011), 280–292.

Tapiola, T., et al. Cerebrospinal fluid β-amyloid 42 and tau proteins as biomarkers of alzheimer-type pathologic changes in the brain. Archives of neurology 66, 3 (2009), 382–389.

Teipel, S. J., et al. Perspectives for multimodal neurochemical and imaging biomarkers in alzheimer’s disease. Journal of Alzheimer’s Disease 33, s1 (2013).

Toga, A. W., and Crawford, K. L. The alzheimer’s disease neuroimaging initiative informatics core: A decade in review. Alzheimer’s & Dementia 11, 7 (2015), 832–839.

Yang, Q., and Wu, X. 10 challenging problems in data mining research. International Journal of Information Technology & Decision Making 5, 04 (2006), 597–604.

Downloads

Additional Files

Published

2017-05-22

How to Cite

Rodrigues, Y. E., Manica, E., Zimmer, E. R., Pascoal, T. A., Mathotaarachchi, S. S., & Rosa-Neto, P. (2017). Wrappers Feature Selection in Alzheimer’s Biomarkers Using kNN and SMOTE Oversampling. Trends in Computational and Applied Mathematics, 18(1), 15. https://doi.org/10.5540/tema.2017.018.01.0015

Issue

Section

Original Article