Alzheimer's biomarkers

Wrappers Feature Selection in Alzheimer's Biomarkers Using kNN and SMOTE Oversampling

Yuri Elias Rodrigues, Evandro Manica, Eduardo Rigon Zimmer, Tharick Ali Pascoal, Sulantha Sanjeewa Mathotaarachchi, Pedro Rosa-Neto


Biomarkers are a characteristic that is objectively measured and eval-
uated as an indicator of normal biological processes, pathogenic processes or phar-
macological responses to a therapeutic intervention. The combination of dierent
biomarker modalities often allows an accurate diagnosis classication. In Alzheimer's
disease (AD), biomarkers are indispensable to identify cognitively normal individ-
uals destined to develop dementia symptoms. However, using the combination of
canonical AD biomarkers, studies have repeatedly shown poor classication rates
to dierentiate between AD, mild cognitive impairment and control individuals.
Furthermore, the design of classiers to access multiple biomarker combinations
includes issues such as imbalance classes and missing data. Since the number
biomarker combinations is large then wrappers are used to avoid multiple com-
parisons. Here, we compare the ability of three wrappers feature selection methods
to obtain biomarker combinations which maximize classication rates. Also, as
criterion to the wrappers feature selection we use the k-nearest neighbor classi-
er with balance aids, random undersampling and SMOTE. Overall, our analyses
showed how biomarkers combinations aects the classier accuracy and how imbal-
ance strategy improve it. We show that non-dening and non-cognitive biomarkers
have less accuracy than cognitive measures when classifying AD. Our approach sur-
pass in average the support vector machine and the weighted k-nearest neighbors
classiers and reaches 94.34 ± 3.91% of accuracy reproducing class denitions.


k-vizinhos mais próximos, SMOTE, seleção de características, biomarcadores de Alzheimer, problema de classificação

Full Text:



Aggarwal, C. C., et al. On the surprising behavior of distance metrics in high dimensional sapce. Springer, 2001.

Bailey, T., and Jain, A. K. A Note on Distance-Weighted k-Nearest Neighbor Rules. IEEE Transactions on Systems, Man, and Cybernetics SMC-8, 4 (1978), 311–312.

Bhattacharya, G., et al. An affinity-based new local distance function and similarity measure for knn algorithm. Pattern Recognition Letters 33, 3 (2012), 356–363.

Bhattacharyya, A. On a measure of divergence between two multinomial populations. Sankhyā: the indian journal of statistics (1946), 401–406.

Bishop, C. M. Pattern recognition. Machine Learning 128 (2006).

Chawla, N. V., et al. Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research 16 (2002), 321–357.

Cover, T. M., and Hart, P. E. Nearest neighbor pattern classification. Information Theory, IEEE Transactions on 13, 1 (1967), 21–27.

Devroye, L., Györfi, L., and Lugosi, G. A probabilistic theory of pattern recognition, vol. 31. Springer Science & Business Media, 2013.

Dubey, H., and Pudi, V. Class based weighted k-nearest neighbor over imbalance dataset. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (2013), Springer, pp. 305–316.

Fawcett, T. An introduction to roc analysis. Pattern recognition letters 27, 8 (2006), 861–874.

Fiandaca, M. S., et al. The critical need for defining preclinical biomarkers in alzheimer’s disease. Alzheimer’s & Dementia 10, 3 (2014), S196–S212.

Guyon, I., and Elisseeff, A. An introduction to variable and feature selection. Journal of machine learning research 3, Mar (2003), 1157–1182.

He, H., and Garcia, E. A. Learning from imbalanced data. IEEE Transactions on knowledge and data engineering 21, 9 (2009), 1263–1284.

Humpel, C. Identifying and validating biomarkers for alzheimer’s disease. Trends in biotechnology 29, 1 (2011), 26–32.

Jack, C. R., et al. Hypothetical model of dynamic biomarkers of the alzheimer’s pathological cascade. The Lancet Neurology 9, 1 (2010), 119–128.

Khazaee, A., et al. Identifying patients with alzheimer’s disease using resting-state fmri and graph theory. Clinical Neurophysiology 126, 11 (2015), 2132–2141.

Khedher, L., et al. Early diagnosis of alzheimer’s disease based on partial least squares, principal component analysis and support vector machine using segmented mri images. Neurocomputing 151 (2015), 139–150.

Kohavi, R., and John, G. H. Wrappers for feature subset selection. Artificial intelligence 97, 1 (1997), 273–324.

Krawczyk, B. Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence (2016), 1–12.

Lopez-de Ipiña, K., et al. On automatic diagnosis of alzheimer’s disease based on spontaneous speech analysis and emotional temperature. Cognitive Computation 7, 1 (2015), 44–55.

Ma, C.-M., et al. How the parameters of k-nearest neighbor algorithm impact on the best classification accuracy: In case of parkinson dataset. Journal of Applied Sciences 14, 2 (2014), 171.

Marques, J. S. Reconhecimento de Padroes: metodos estatisticos e neuronais. IST press, 2005.

Motsinger-Reif, A. A., et al. Comparing metabolomic and pathologic biomarkers alone and in combination for discriminating alzheimer’s disease from normal cognitive aging. Acta neuropathologica communications 1, 1 (2013), 1.

Saeys, Y., et al. A review of feature selection techniques in bioinformatics. bioinformatics 23, 19 (2007), 2507–2517.

Sarica, A., et al. Advanced feature selection in multinominal dementia

classication from structural mri data. In Proc MICCAI Workshop Challenge

on Computer-Aided Diagnosis of Dementia Based on Structural MRI Data

(2014), pp. 82-91.

Scheubert, L., et al. Tissue-based alzheimer gene expression markers–comparison of multiple machine learning approaches and investigation of redundancy in small biomarker sets. BMC bioinformatics 13, 1 (2012), 1.

Sperling, R. A., et al. Toward defining the preclinical stages of alzheimer’s disease: Recommendations from the national institute on aging-alzheimer’s association workgroups on diagnostic guidelines for alzheimer’s disease. Alzheimer’s & dementia 7, 3 (2011), 280–292.

Tapiola, T., et al. Cerebrospinal fluid β-amyloid 42 and tau proteins as biomarkers of alzheimer-type pathologic changes in the brain. Archives of neurology 66, 3 (2009), 382–389.

Teipel, S. J., et al. Perspectives for multimodal neurochemical and imaging biomarkers in alzheimer’s disease. Journal of Alzheimer’s Disease 33, s1 (2013).

Toga, A. W., and Crawford, K. L. The alzheimer’s disease neuroimaging initiative informatics core: A decade in review. Alzheimer’s & Dementia 11, 7 (2015), 832–839.

Yang, Q., and Wu, X. 10 challenging problems in data mining research. International Journal of Information Technology & Decision Making 5, 04 (2006), 597–604.


Article Metrics

Metrics Loading ...

Metrics powered by PLOS ALM


  • There are currently no refbacks.

Trends in Computational and Applied Mathematics

A publication of the Brazilian Society of Applied and Computational Mathematics (SBMAC)


Indexed in:




Desenvolvido por:

Logomarca da Lepidus Tecnologia