Please use this identifier to cite or link to this item: https://hdl.handle.net/1959.11/29687
Title: Robust Feature Engineering for Parkinson Disease Diagnosis: New Machine Learning Techniques
Contributor(s): Wang, Max (author); Ge, Wenbo (author); Apthorp, Deborah  (author)orcid ; Suominen, Hanna (author)
Publication Date: 2020-07-27
Open Access: Yes
DOI: 10.2196/13611
Handle Link: https://hdl.handle.net/1959.11/29687
Abstract: Background
Parkinson disease (PD) is a common neurodegenerative disorder that affects between 7 and 10 million people worldwide. No objective test for PD currently exists, and studies suggest misdiagnosis rates of up to 34%. Machine learning (ML) presents an opportunity to improve diagnosis; however, the size and nature of data sets make it difficult to generalize the performance of ML models to real-world applications.
Objective
This study aims to consolidate prior work and introduce new techniques in feature engineering and ML for diagnosis based on vowel phonation. Additional features and ML techniques were introduced, showing major performance improvements on the large mPower vocal phonation data set.
Methods
We used 1600 randomly selected /aa/ phonation samples from the entire data set to derive rules for filtering out faulty samples from the data set. The application of these rules, along with a joint age-gender balancing filter, results in a data set of 511 PD patients and 511 controls. We calculated features on a 1.5-second window of audio, beginning at the 1-second mark, for a support vector machine. This was evaluated with 10-fold cross-validation (CV), with stratification for balancing the number of patients and controls for each CV fold.
Results
We showed that the features used in prior literature do not perform well when extrapolated to the much larger mPower data set. Owing to the natural variation in speech, the separation of patients and controls is not as simple as previously believed. We presented significant performance improvements using additional novel features (with 88.6% certainty, derived from a Bayesian correlated t test) in separating patients and controls, with accuracy exceeding 58%.
Conclusions
The results are promising, showing the potential for ML in detecting symptoms imperceptible to a neurologist.
Publication Type: Journal Article
Source of Publication: JMIR Biomedical Engineering, 5(1), p. 1-13
Publisher: JMIR Publications, Inc
Place of Publication: Canada
ISSN: 2561-3278
Fields of Research (FoR) 2008: 110904 Neurology and Neuromuscular Diseases
170203 Knowledge Representation and Machine Learning
Fields of Research (FoR) 2020: 320905 Neurology and neuromuscular diseases
461105 Reinforcement learning
461106 Semi- and unsupervised learning
Socio-Economic Objective (SEO) 2008: 920112 Neurodegenerative Disorders Related to Ageing
Socio-Economic Objective (SEO) 2020: 200101 Diagnosis of human diseases and conditions
Peer Reviewed: Yes
HERDC Category Description: C1 Refereed Article in a Scholarly Journal
Appears in Collections:Journal Article
School of Psychology

Files in This Item:
3 files
File Description SizeFormat 
openpublished/RobustApthorp2020JournalArticle.pdfPublished version577.24 kBAdobe PDF
Download Adobe
View/Open
Show full item record

Page view(s)

1,940
checked on Oct 22, 2023

Download(s)

88
checked on Oct 22, 2023
Google Media

Google ScholarTM

Check

Altmetric


This item is licensed under a Creative Commons License Creative Commons