|Year : 2020 | Volume
| Issue : 1 | Page : 60-66
A hybrid method for the diagnosis and classifying parkinson's patients based on time–frequency domain properties and K-nearest neighbor
Zayrit Soumaya1, Belhoussine Drissi Taoufiq1, Nsiri Benayad2, Benba Achraf3, Abdelkrim Ammoumou1
1 Laboratory Industrial Engineering, Information Processing and Logistics (GITIL), Faculty of Science Ain Chok. University, Hassan II - Casablanca, Morocco
2 Laboratory Research Center STIS, M2CS, Higher School of Technical Education of Rabat (ENSET), Mohammed V University in Rabat, Morocco
3 Electronic Systems Sensors and Nanobiotechnologies (E2SN), ENSET, Mohammed V University in Rabat, Morocco
|Date of Submission||06-Dec-2018|
|Date of Decision||13-Jul-2019|
|Date of Acceptance||07-Sep-2019|
|Date of Web Publication||06-Feb-2020|
Dr. Zayrit Soumaya
Faculty of Science, Ain Chok University, Hassan II – Casablanca
Source of Support: None, Conflict of Interest: None
The vibrations of hands and arms are the main symptoms of Parkinson's ailment. Nevertheless, the affection of the vocal cords leads to troubles and defects in the speech, which is another accurate symptom of the disease. This article presents a diagnostic model of Parkinson's disease (PD) and proposes the time–frequency transform (wavelet WT) and Mel-frequency cepstral coefficients (MFCC) treatment for this disease. The proposed treatment is centered on the vocal signal transformation by a method based on the WT and to extract the coefficients of the MFCC and eventually the categorization of the sick and healthy patients by the use of the classifier K-nearest neighbor (KNN). The analysis used in this article uses a database that contains 18 healthy patients and twenty patients. The Daubechies mother WT is used in treatments to compress the vocal signal and extract the MFCC cepstral coefficients. As far as, the diagnosis of Parkinson's ailment is concerned the KNN classifying performance gives 89% accuracy when applied to 52% of the database as training data, whereas when we increase this percentage from 52% to 73%, we reach 98.68% accuracy which is higher than using the support-vector machine classifier. The KNN is conclusive in the determination of the PD. Moreover, the higher the training data is, the more precise the results are.
Keywords: K-nearest neighbor, Mel-frequency cepstral coefficient, Parkinson's disease, wavelet
|How to cite this article:|
Soumaya Z, Taoufiq BD, Benayad N, Achraf B, Ammoumou A. A hybrid method for the diagnosis and classifying parkinson's patients based on time–frequency domain properties and K-nearest neighbor. J Med Signals Sens 2020;10:60-6
|How to cite this URL:|
Soumaya Z, Taoufiq BD, Benayad N, Achraf B, Ammoumou A. A hybrid method for the diagnosis and classifying parkinson's patients based on time–frequency domain properties and K-nearest neighbor. J Med Signals Sens [serial online] 2020 [cited 2020 Aug 8];10:60-6. Available from: http://www.jmssjournal.net/text.asp?2020/10/1/60/277823
| introduction|| |
In 1817, James Parkinson described Parkinson's ailment, which is a neurodegenerative disease of unknown cause, characterized by the progressive destruction of a specific population of neurons.
The loss of dopamine in the midbrain induces the slowness of movement, difficulty with walking, communication trembling, and rigidity which are the most obvious motor symptoms.
The performing neuron analyses and magnetic resonance imaging examination of the brain are employed in the Parkinson's disease (PD) detection. The phonation and articulation means of speech extraction and analysis can give the needed guidance in the spotting of PD.
The PD has a lot of indicators; among them, the vocal impairment which is one of the earliest. Exactly, the phonation is the main part of speech production affected.
Several methods are used for the diagnosis of PD (prediction cepstral coefficient, perceptual linear predictive [PLP], and Mel-frequency cepstral coefficient [MFCC],). Focusing on the vocal signal, we are interested in the most used in recognition systems which are the MFCC method. To exploit the human auditory system characteristics through the change of frequencies linear scale into Mel scale that allows to make cepstral analysis by passage in the log-spectral domain, the cepstral analysis had been used by Shourie of the electroencephalogram signals in the process of perception observance, and mental imagery proves the impact of artistic expertise and also in the appraising of hypernasality for children affected by cleft palate centered on cepstrum analysis by Akafi et al.
The diagnostic of patients affected by PD who undergo a categorization process for appraising and home monitoring of tremor in those patients elaborated by Bazgir et al. are also some other ways to reach the diagnosis of the disease, for example, handwriting.
Filter banks that are devised accordingly to the perceptual criteria of the human ear will be required for the computation of MFCC and PLP features. It is important that this spectrum obtained by computing the discrete Fourier transform (DFT) of the windowed speech frames should be estimated accurately. The procedure involves the estimation of a short-time spectrum.
Whereas, our interest is focused on the diagnosis of PD from vocal disorders detection. The spotting of PD centered on the extraction of the cepstral coefficients of the MFCC from the speech was first proposed by Frail et al., Shahbakhi et al. diagnosed PD by the measures of fundamental frequency disturbance (Jitter), amplitude disturbance (Shimmer), and fundamental frequency F0. Recent researches using the cepstral coefficients of the MFCC, and PLP performed by Banba et al. and Upadhya et al. also conducted a study on the spotting of PD by extracting MFCC and PLP by the use of the Thomson multitaper window technique. Recent studies are based on the works performed by Taoufiq BELHOUSSINE DRISSI et al. that deal with wavelet transform and MFCC and the support-vector machine (SVM) classifier. In this work, they transform the speech signals through the sorts of DWT which were tested; then, they extracted MFCC coefficients from the signals and applying the SVM as a classifier.
Among the simplest machine learning algorithms, we find the K-nearest neighbor (KNN) algorithm, which is a robust classification method. This method is widely applied in real-time applications. The SVM principal based on the use of hyper planes to separate the classes. The shape of the decision will change if a different kernel provides, so choosing the kernel is necessary. The choice of good kernel needs to have some knowledge about the data that is not always available. Besides that, more the size of the dataset used for training is big, more the computational time for training grows nonlinearly with it. Whereas, the KNN being based on vector distance concept, so errors are bound to be less. Hence, in this article, we will choose the KNN as classifier in the aim of having higher accuracy.
In this work, we come up with a diagnosis model of the PD based on a time–frequency treatments of speech signals of a database that consists of 18 sound patients and twenty affected by “PD,” then extracting the cepstral coefficient of the MFCC, and in the end, a classification will be performed by the KNN classifier. We will create two training bases when the first accounts for 52% of the database and the other 73% and apply the suggested treatment (wavelet, MFCC, and KNN) on the totality of the database.
Continuous time–frequency transform
The continuous time–frequency transform (CWT) was devised by the French geophysicist Morlet in 1980 to study earth tremor signals. Then, Grossmann, Meyer, Mallat, and Daubechies laid their mathematical basis for wavelets. Since that time, WT is more and more used in signal processing.
A wavelet uses two coefficients: a coefficient of scale “a” which permitted to obtain various versions, which were compressed or dilated of windows stemming from the same mother wavelet, this coefficient represents the inverse of the frequency and a coefficient of translation “b” that characterizes the displacement of the window along the axis of time.
The CWT of signal s (t) is defined by:
here ψ(t) is the mother wavelet, and ψ*(t) is the conjugate complex ψ(t).
It should be noted that the wavelet transform gives adequate temporal resolution at high frequencies and adequate frequency resolution at low ones.
Discrete wavelet transform
The discrete wavelet transform (DWT) is the discrete version of the continuous time–frequency transform (CWT). It is achieved by the use of the Mallat algorithm that is regarded as a multiresolution analysis. This algorithm is based on the definition of a pair of filters H (low-pass filter) and G (high-pass filter) and whose impulse responses h and g. Several sorts of wavelets are used in literature: Haar, Beylkin, Coiflet, Daubechies, Symmlet, Vaidyanathan, Battle,….
In this work, we will only use the wavelets of Daubechies.
Mel-frequency Cepstral Coefficient
MFCCs refer to the parameters that used the most in speech recognition systems. MFCC analysis consists of the adaption of the linear scale of frequencies into the Mel scale to exploit the properties of the human auditory system that give the most effective illustration of the speech signal. The process of extracting the coefficients is shown in [Figure 1].,
|Figure 1: Extraction process of cepstral coefficients of the Mel-frequency cepstral coefficient|
Click here to view
This is a voice signal filtering process (sn, n = 1,…, N) with a first-order finite impulse response numeric filter sn given as follows:,
Where, k is the coefficient of the preemphasis that must be comprised between 0.9≤ k ≤1. In this study, we fixed the parameter k at 0.97., In this way, the pre-emphasized signal is related to the signal by the formula below:
The vocal signal is nonstationary, but the signal processing ways are stationary signals. To solve this problem, we will proceed to the segmentation of the signal into N speech samples of frames in the lapse of 10–30 ms where the voice signal is regarded as stable. To dodge unexpected transitions from frame to frame, the two adjacent frames are overlapped.
As a result of the segmentation, some discontinuities are shown at the borders of the frames; in the aim of reducing the revealed discontinuities, we multiply the samples (s'n, n = 1,…, N) of the frame by a Hamming window.,
Where, N is the number of samples in the frame.
The fast Fourier transform
The fast Fourier transform (FFT) application consists of converting every single frame of N samples to the frequency domain instead of the time domain. The FFT is a fast algorithm for implementing DFT.
The definition of the DFT is as follows:,
Mel filtering with a filter bank
The human ear follows a nonlinear scale through an audible spectrum. Consequently, we will use the transformation of the linear scale of frequencies to the Mel scale. The latter is in a linear space under 1000 Hz (low frequencies) and logarithmic above 1000 Hz (high frequencies).
The conversion from the linear scale to the Mel scale,, is given as follows:
Logarithm/discrete cosine transform
The MFCC coefficients may be worked out first hand by employing the discrete cosine transform (DCT) on the logarithm of energies coming out of a bank of M triangular filters, apart from according to the Mel scale by the following equation.
Here, mj is the logarithm of the energy obtained with the triangular filter j, M is the number of filters bank, in our article M was set to 20, and i is the number of coefficients to be extracted.
As the higher order of the MFCC coefficients is so small, we have to apply the lifter to lift the cepstrum. Consequently, it is important to increase these amplitudes so that they become quite similar., To achieve that, we liftered the cepstral coefficient so that the following equation can be applied:
Here, L is the parameter of the lifter. In this article, we set L = 22.
KNN classifier is of a simple principle based on the theory of statistical training. First, we give a database that contains the two classes with a label vector is the training phase where the feature space is reached so that the database become separable. At the test phase, the database classified seeks the nearest neighbor given by training database, and according to this, it is classified either in class 1 or class 2. The Euclidean distance was applied to spot the nearest neighbor in the KNN algorithm.
Between the two points x and y, we calculate the Euclidean distance d (x, y) using Eq. 9. Here, N is the number of characteristics such that x = (x1, x2, x3...xN) and y = (y1, y2, y3… yN).
| Results|| |
The goal of this study is to determine the KNN performance. Before the bloc of extraction of the MFCC coefficients, DWT block will be injected to achieve a correct diagnosis of PD as shown in [Figure 2].
We apply the database that consists of 18 sound ones and twenty recordings of patients suffering from PD. They all utter the vowel “a.”
The algorithm of DWT is centered on the definition of a pair of filters H (low-pass filter) and G (high-pass filter). The filter outputs are subsampled by a factor of 2. The high-pass filter provides DWT coefficients or signal details at a given scale. The low-pass filter gives the coefficients of the approximation of the signal at the same scale. The same operation is again applied to the approximation, thus generating another detail and a new approximation.
A process of PD diagnosis which is similar to our process is applied in the article, the difference between them is the classifier. The DWT gives the higher accuracy at level 2 and the 3rd scale. Hence, in our study, we will work with the Daubechies db2 wavelet at scale 3, and we are interested only in the approximation a3 [Figure 3].
In the first phase, we transform the vocal recordings by the use of Daubechies wavelet. The vocal signal of PD patient before and after using the Daubechies wavelet is shown in [Figure 4]. [Figure 5] shows a zoom at the two representations of the signal.
|Figure 4: (a) Speech before the transformation. (b) Speech after being transformed through the use of wavelet|
Click here to view
|Figure 5: (a) A zoom of speech before the transformation. (b) A zoom of speech after being transformed through the use of wavelet|
Click here to view
In the second phase, we will execute an input of a3 approximation to the MFCC block to obtain from every single patient, the first 12 MFCC coefficients employing the program “Htk mfcc matlab.” These coefficients will be the characteristics that be relied on to get a classification to reach an exact diagnosis. The MFCC is composed of numerous frames that need significant processing time to classify. However, such operation hinders a precise result. To cope with this problem, we had recourse to the calculation of the average value of these images to obtain the voiceprint. The 12 MFCC and voiceprint for a sound patient are featured in [Figure 6], as for [Figure 7], it illustrates the MFCC and voiceprint for a patient suffered from PD.
|Figure 6: (a) Mel-frequency cepstral coefficient value of a healthy patient. (b) Voiceprint value of a healthy patient|
Click here to view
|Figure 7: (a) Mel-frequency cepstral coefficient value of a sick patient. (b) Voiceprint value of a sick patient|
Click here to view
In the third phase, in which we take a decision based on the categorization of the patients. In this aims, we create two training bases one of 52% and the other 73% of the database. At the first step, we will carry out a test on our database by using the first training base (of 52%) and another test during the use of the second training base (of 73%) classifier.
In a categorization problem, the labels are part of the following possible identities:
Moreover, the task consists of assigning a test example to one of the C classes. KNN classifier is the most used procedure. Moreover, the widely used method is setting K = 1 yields the nearest neighbor classification rule.
In spite of it is simplicity, KNN so often gives a good performance mainly for large data sets.
We calculated measures such as accuracy, sensitivity, and specificity by applying the following formula to determine the performance of the classifier:,,
- TP stands for true positive (correctly classified healthy patients)
- TN stands for true negative (correctly classified patients)
- FP stands for false positive (incorrectly classified patients)
- FN stands for false negative (incorrectly classified sound patients).
The percentage of the test is accuracy, sensitivity as well as specificity of all the recordings by the use of the 52% training is shown in [Table 1], and then their percentage by using the training base of 73% in the test of all the recordings (including 6 sick and 4 healthy patients) is shown in [Table 2].
| Conclusion|| |
We have presented in this article, a sample of diagnosis based on PD that is centered on the signal treatment, in which we will employ the wavelet transform and the MFCC using a database of recordings of sick patients and healthy ones while they pronounce the vowel “a.” The change of speech signals is treated by Daubechies wavelet by the third-scale approximation and then, we will recover the 12 cepstral coefficients after injecting the approximation into the MFCC bloc. To make a decision on which one is sick or healthy, we work with the KNN classifier by using two-learning bases, one is 52% and the other 73% of the database. When you work with the database of 52%, one obtains an accuracy of 89% which is higher than the accuracy obtained by using the classifier SVM with the database of 73%, and when we increase the percentage of the database to 73%, we get an accuracy of 98.68% and from that one can conclude that the increase of the base of data gives us better results by increasing the accuracy of the classifier and that the KNN is more accurate than the SVM classifier.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
Zayrit Soumaya was born in ZAOUIAT CHEIKH BENI MELLAL, Morocco on July 18th, 1994. Received the Master degree in Electronics, Electrotechnics, Automatic, and Industrial Computing from Faculty of Science Ain Chok. University Hassan II - Casablanca, Morocco, in 2017 she is a research student in Research Laboratory in Industrial Engineering, Information Processing and Logistics (GITIL). Faculty of Science Ain Chok, University Hassan II - Casablanca, Morocco. Her interests are in speech processing for detecting people with neurological disorders.
Belhoussine Drissi Taoufiq was born in Oujda, Morocco in 1978 received the Ph.D. degree in acoustics in 2009 at the university of le Havne (France) Since 2011 he has been an assistant professor at the sciences faculty of Ain chock university Hassan II, Casablanca. His scientific interest lies in the research of nondestructive testing and the signal treatment.
Nsiri Benayad held MBI degree in computer sciences from Telecom Bretagne, in 2005, and Ph.D. degree in signalprocessing from Telecom Bretagne, in 2004. He received D.E.A (French equivalent of M.Sc. degree) in electronics from the Occidental Bretagne University, in 2000. Currently, he is a Full Professor in Higher School of Technical Education of Rabat (ENSET), Mohammed V University; a member in Research Center STIS, M2CS, Mohammed V University; and a member associate in Researcher, Industrial Engineering, data processing and logistic Laboratory, Hassan II University. He was a Professor in the Faculty of Sciences Ain Chock, Hassan II University. Benayad NSIRI has advised and co-advised more than 12 Ph.D. theses, contributed to more than 80 articles in regional and international conferences and journals. His research interests include but not restricted to computer science, telecommunication, signal and image processing, adaptive techniques, blind deconvolution, MCMC methods, seismic data and higher order statistics
Benba Achraf received his PhD in Electrical Engineering from ENSIAS, Rabat Mohammed V University, Morocco, in 2017 he is actually a professor in electrical engineering department at ENSET. He is a member of Electronic systems sensors and nanobiotechnologies at ENSET, Mohammad V University in Rabat. His interests are in Signal processing and biomedical engineering.
Abdelkrim Ammoumou received his Ph.D degree in Control and Signal Processing from Mohamed V University, Rabat, Morroco in 2002. Earlier, he received the DES degree in Control and Signal Processing from Mohamed V University, Rabat, Morroco in 1991. He is a professor of Process Control and computer science at EST, Hassan II University, Casablanca since 1991. he is also a responsible for the research team " MODELING, SIMULATION AND CONTROL OF PRODUCTION SYSTEM"
| References|| |
Parkinson J. An Essay on the Shaking Palsy. London: Whittingham and Rowland for Sherwood, Neely, and Jones; 1817.
Orozco-Arroyave JR, Arias-Londoño JD, Vargas-Bonilla JF, Nöth E. Perceptual analysis of speech signals from people with Parkinson's disease. In: Ferrández Vicente JM, Álvarez Sánchez JR, de la Paz López F, Toledo Moreo FJ. Editors. Natural and Artificial Models in Computation and Biology. IWINAC 2013. Lecture Notes in Computer Science. Vol. 7930, Berlin, Heidelberg; Springer; 2013b.
Benba A, Jilbab A, Hammouch A, Sandabad S. Voiceprints analysis using MFCC and SVM for detecting patients with Parkinson's disease. IEEE 1st
International Conference on Electrical and Information Technologies ICEIT'2015. 2015. p. 300-4.
Rabiner LR, Schafer RW. Introduction to digital speech processing. Foundat Trends Signal Process 2007;1:1-194.
Benba A, Jilbab A, Hammouch A. Discriminating between patients with Parkinson's and neurological diseases using cepstral analysis. IEEE Trans Neural Syst Rehabil Eng 2016;24:1100-08.
Shourie N. Cepstral analysis of EEG during visual perception and mental imagery reveals the influence of artistic expertise. J Med Signals Sens 2016;6:203-17.
] [Full text]
Akafi E, Vali M, Moradi N, Baghban K. Assessment of hypernasality for children with cleft palate based on cepstrum analysis. J Med Signals Sens 2013;3:209-15.
] [Full text]
Bazgir O, Habibi SAH, Palma L, Pierleoni P, Nafees S. A classification system for assessment and home monitoring of tremor in patients with Parkinson's disease. J Med Signals Sens 2018;8:65-72. [Full text]
Frail R, Godino-Llorente JI, Saenz-Lechon N, Osma-Ruiz V, Fredouille C. MFCC-based Remote Pathology Detection on Speech Transmitted Through The Telephone Channel. Proceedings Biosignals, Porto; 2009.
Jafari A. Classification of Parkinson's disease patients using nonlinear phonetic features and Mel-frequency cepstral analysis. Biomed Eng Appl Basis Communication 2013;25:1350001.
Shahbakhi M, Far DT, Tahami E. Speech analysis for diagnosis of Parkinson's disease using genetic algorithm and support vector machine. J Biomed Sci Eng 2014;7:147-56.
Upadhya SS, Cheeranb AN. Nirmalc JH. Thomson Multitaper MFCC and PLP voice features for early detection of Parkinson disease Biomed Signal Process Control 2018;46:293-301.
Belhoussine T, Zayrit S, Nsiri B, Ammoummou A. Diagnosis of Parkinson's disease based on wavelet transform and Mel Frequency Cepstral Coefficients. Int J Adv Comput Sci Appl 2019;10:125-32.
Sakar BE, Isenkul ME, Sakar CO, Sertbas A, Gurgen F, Delil S, et al.
Collection and analysis of a Parkinson speech dataset with multiple types of sound recordings. IEEE J Biomed Health Inform 2013;17:828-34.
Mallat S. A Wavelet Tour of Signal Processing. The Sparse Way. 3rd
ed. Academic Press, Inc. Orlando, FL: USA; 2009.
Daubechies I. Ten Lectures on Wavelets, of CBMS-NSF Regional Conference Series in Applied Mathematics. Vol. 61. SIAM, Philadelphia, PA; 1992.
Chui CK. An Introduction to Wavelets. Boston: Academic Press; 1992.
Mallat S. Multiresolution Signal Decomposition: The Wavelet Representation. IEEE Trans Paerntt Anal Mack Intell 1989;11:674-93.
Hacine-Gharbi A. Selection of Relevant Acoustic Parameters for Speech Recognition. University Orléans; 2012.
Young S, Evermann G, Gales M, Hain T, Kershaw D, Liu X. The HTK Book.Trumpington St, Cambridge CB2 1PZ: Cambridge University Engineering Department; 2006.
Rabiner LR. Juan BH. Hidden Markov Models for Speech Recognition. Englewood Cliffs NJ, editors. Fundamentals of Speech Recognition. USA: Prentice Hall; 1993.
Benba A, Jilbab A, HammouchA. Voice Analysis for Detecting Persons with Parkinson's Disease Using MFCC and VQ. The 2014 International Conference on Circuits, Systems and Signal Processing. Saint Petersburg, Russia: Saint Petersburg State Polytechnic University; September 23-25, 2014.
Benba A, Jilbab A, Hammouch A. Voice Analysis for Detecting Persons with Parkinson's Disease Using MFCC and VQ. The 2014 International Conference on Circuits, Systems and Signal Processing. Saint Petersburg, Russia: Saint Petersburg State Polytechnic University; September 23-25, 2014.
Bahoura M. Analysis of Respiratory Acoustic Signals: Contribution to the Automatic Detection of Sibilants by Wavelet Packages. PhD Thesis, University de Rouen. Defended on; 1999.
Benba A, Jilbab A, Hammouch A. Hybridization of best Acoustic Cues for Detecting Persons With Parkinson's Disease. 2nd
World Conference on Complex System (WCCS'14). IEEE, Agadir; November 10-12, 2014.
Benba A, Jilbab A, Hammouch A. Voiceprint Analysis Using Perceptual Linear Prediction and Support Vector Machines. For Detecting Persons With Parkinson's Disease. The 3rd
International Conference on Health Science and Biomedical Systems (HSBS '14) Florence. Italy; November 22-24, 2014.
[Figure 1], [Figure 2], [Figure 3], [Figure 4], [Figure 5], [Figure 6], [Figure 7]
[Table 1], [Table 2]