The adoption of speech processing systems is increasing and as such, more specific use cases are becoming apparent, one being the use of whispered speech. Since there is still a lack of applied whispered speech recognition research, the current project has focused on distinguishing whispered from normal speech such that this classification can be used in everyday situations. Existing studies mainly focus on specific features related to energy, time, and frequency. To easily classify whispered and normal speech based on these findings, this project used spectrograms in which most of these features are embedded. For the classification, SVM, kNN and Decision Tree classifiers were trained and tested. The data for the classification contained samples whose amplitude was normalised and samples where this wasn’t the case. It was found that the SVM classifier outperformed the kNN and the Decision Tree classifier. Since speed is a relevant aspect in real-life applications, future work could look into other features or classifiers to be used for classification, while keeping in mind the data storage that is required. After that, studies should also be conducted in real-life situations to establish ecological validity of the classification of whispered versus normal speech.
This project was done in collaboration with Dominique Geissler and Sara Polak for the course “Speech Processing” at university of Twente.