

Method for asynchronous analysis of a glottal source based on a two-level autoregressive model of the speech signal
https://doi.org/10.32446/0368-1025it.2024-2-55-62
Abstract
The task of analyzing a glottal source over a short observation interval is considered. The acute problem of insufficient performance of known methods for analyzing a glottal source is pointed out, regardless of the mode of data preparation: synchronous with the main tone of speech sounds or asynchronous. A method for analyzing the glottal source based on a two-level autoregressive model of the speech signal is proposed. Its software implementation based on the high-speed Burg-Levinson computational procedure is described. It does not require synchronization of the sequence of observations used with the main tone of the speech signal and is characterized by a relatively small amount of computational costs. Using the described software implementation, a full-scale experiment was set up and conducted, where the vowel sounds of the control speaker’s speech were used as the object of study. Based on the results of the experiment, the increased performance of the proposed method was confirmed and its requirements for the duration of the speech signal during voice analysis in real time were formulated. It is shown that the optimal duration is in the range from 32 to 128 ms. The results obtained can be used in the development and research of digital speech communication systems, voice control, biometrics, biomedicine and other speech systems where the voice characteristics of the speaker’s speech are of paramount importance.
About the Authors
V. V. SavchenkoRussian Federation
Vladimir V. Savchenko
Nizhny Novgorod
L. V. Savchenko
Russian Federation
Lyudmila V. Savchenko
Nizhny Novgorod
References
1. Li Y., Tao J., Erickson D., Liu B. and Akagi M. F0-Noise-robust glottal source and vocal tract analysis based on ARX-LF model, In IEEE/ACM Transactions on Audio, Speech, Language Processing, 29, 3375–3383 (2021). https://doi.org/10.1109/TASLP.2021.3120585
2. Narendra N. P., Airaksinen M., Story B., Alku P. Estimation of the glottal source from coded telephone speech using deep neural networks. Speech Communication, 106, 95–104 (2019). https://doi.org/10.1016/j.specom.2018.12.002
3. Drugman T., Alku P., Alwan A., Yegnanarayana B. Glottal source processing: From analysis to applications. Computer Speech & Language, 28(5), 1117–1138 (2014). https://doi.org/10.1016/j.csl.2014.03.003
4. Sadok S., Leglaive S., Girin L., Alameda-Pineda X., Séguier R. Learning and controlling the source-fi lter representation of speech with a variational autoencoder. Speech Communication, 148, 53–65 (2023). https://doi.org/10.1016/j.specom.2023.02.005
5. Mittapalle K. R., Pohjalainen H., Helkkula P. et al. Glottal fl ow characteristics in vowels produced by speakers with heart failure. Speech Communication, 137, 35–43 (2022). https://doi.org/10.1016/j.specom.2021.12.001
6. Rudzicz F. Clear Speech: Technologies that Enable the Expression and Reception of Language. Springer Cham (2022). https://doi.org/10.1007/978-3-031-01599-1
7. Ternström S. Special issue on current trends and future directions in voice acoustics measurement. Applied Sciences, 13(6), 3514 (2023). https://doi.org/10.3390/app13063514
8. Savchenko V. V. Acoustic variability of voice signal as factor of information security for automatic speech recognition systems with tuning to user voice. Radioelectronics and Communications Systems, 63(10), 532–542 (2020). https://doi.org/10.3103/S0735272720100039
9. Serry M. A., Alzamendi G. A., Zañartu M., Peterson S. D. An Euler-Bernoulli-type beam model of the vocal folds for describing curved and incomplete glottal closure patterns. Journal of the Mechanical Behavior of Biomedical Materials, 147, 106130 (2023). https://doi.org/10.1016/j.jmbbm.2023.106130
10. Sundberg J. Objective characterization of phonation type using amplitude of fl ow glottogram pulse and of voice source fundamental. Journal of Voice, 36(1), 4–14 (2022). https://doi.org/10.1016/j.jvoice.2020.03.018
11. Yao X., Bai W., Ren Y.n, Liu X., Hui Zh. Exploration of glottal characteristics and the vocal folds behavior for the speech under emotion. Neurocomputing, 410, 328–341 (2020). https://doi.org/10.1016/j.neucom.2020.06.010
12. Rabiner L. R., Shafer R. W. Theory and Applications of Digital Speech Processing, Pearson, Boston (2011).
13. Gibson J. Mutual Information, the Linear Prediction Model, and CELP Voice Codecs. Information, 10(5), 179–189 (2019). https://doi.org/10.3390/info10050179
14. Südholt D., Cámara M., Xu Zh., Reiss J. D. Vocal Tract Area Estimation by Gradient Descent. Proceedings of the 26th International Conference on Digital Audio Effects (DAFx23), Copenhagen, Denmark, 2023. https://doi.org/10.48550/arXiv.2307.04702
15. Li Y., Sakakibara K. I. & Akagi M. Simultaneous estimation of glottal source waveforms and vocal tract shapes from speech signals based on ARX-LF Model. Jornal Signal Processing Systems, 92, 831–838 (2020). https://doi.org/10.1007/s11265-019-01510-4
16. Drugman T., Bozkurt B. and Dutoit Th. A comparative study of glottal source estimation techniques. Computer Speech and Language, 26, 20–34 (2019).
17. Freixes M., Luis J. O., Socoró J. C. and Francesc A. P. Evaluation of glottal inverse fi ltering techniques on OPENGLOT synthetic male and female vowels. Applied Sciences, 13(15), 8775 (2023). https://doi.org/10.3390/app13158775
18. Zhang Zh., Lin J. Evaluation of glottal inverse fi ltering in the presence of source-fi lter interaction. The Journal of the Acoustical Society of America, 152(4), A284–A284 (2022). https://doi.org/10.1121/10.0016281
19. Perrotin O. and McLoughlin I. A spectral glottal fl ow model for source-fi lter separation of speech, 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2019), Brighton, UK, pp. 7160–7164 (2019). https://doi.org/10.1109/ICASSP.2019.8682625
20. Savchenko V. V. Method for reduction of speech signal autoregression model for speech transmission systems on lowspeed communication channels. Radioelectronics and Communications Systems, 64(11), pp. 592–603 (2021). https://doi.org/10.3103/S0735272721110030
21. Walker J., Murphy P. A., Review of Glottal Waveform Analysis, In: Progress in Nonlinear Speech Processing. Lecture Notes in Computer Science, 4391 (2007). Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71505-4_1
22. Palaparthi A., Titze I. R. Analysis of glottal inverse fi ltering in the presence of source-fi lter interaction. Speech Communication, 123, 98–108 (2020). https://doi.org/10.1016/j.specom.2020.07.003
23. Gupta S., Fahad M. S., Deepak A. Pitch-synchronous single frequency fi ltering spectrogram for speech emotion recognition. Multimed Tools Applications, 79, 23347–23365 (2020). https://doi.org/10.1007/s11042-020-09068-1
24. Савченко В. В. Мера различий речевых сигналов по тембру голоса. Измерительная техника, (10), 63–69 (2023). [Savchenko V. V. Mera razlichij rechevyh signalov po tembru golosa. Izmeritel’naya tekhnika, (10), 63–69 (2023). (In Russ.)] https://doi.org/10.32446/0368-1025it.2023-10-63-69
25. Nossier S. A., Wall J., Moniri M., Glackin C. and Cannings N. A comparative study of time and frequency domain approaches to deep learning based speech enhancement. In: 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, pp. 1–8 (2020). https://doi.org/10.1109/IJCNN48605.2020.9206928
26. Freixes, M., Arnela, M., Socoró, J. C., Alías F., Guasch O. Glottal source contribution to higher order modes in the fi nite element synthesis of vowels. Applied Sciences, 9(21), 4535 (2019). https://doi.org/10.3390/app9214535
27. Candan Ç. Making linear prediction perform like maximum likelihood in Gaussian autoregressive model parameter estimation. Signal Processing, 166, 107256 (2020). https://doi.org/10.1016/j.sigpro.2019.107256
28. Cui S., Li E. and Kang X. Autoregressive model based smoothing forensics of very short speech clips. In: IEEE International Conference on Multimedia and Expo (ICME), London, UK, pp. 1–6 (2020). https://doi.org/10.1109/ICME46284.2020.9102765
29. Savchenko A. V., Savchenko V. V. Adaptive method for measuring a fundamental tone frequency using a two-level autoregressive model of speech signals. Measurement Techniques, 65(6), 453–460 (2022). https://doi.org/10.1007/s11018-022-02104-6
30. Marple S. L. Digital Spectral Analysis with Appli cations, 2nd ed. Dover Publications, Mineola, New York (2019).
31. Savchenko V. V., Savchenko A. V. Method for measuring distortions in speech signals during transmission over a communication channel to a biometric identifi cation system. Measurement Techniques, 63(11), 917–925 (2021). https://doi.org/10.1007/s11018-021-01864-x
32. Kathiresan Th., Maurer D., Suter H., Dellwo V. Formant pattern and spectral shape ambiguity in vowel synthesis: The role of fundamental frequency and formant amplitude. The Journal of Acoustical Society of America, 143(3), 1919–1920 (2018). https://doi.org/10.1121/1.5036258
33. Corey R. M., Kozat S. S., Singer A. C. Parametric estimation. In: Paulo S. R. Diniz (Eds.). Signal Processing and Machine Learning Theory, Academic Press, pp. 689–716 (2024). https://doi.org/10.1016/B978-0-32-391772-8.00017-X
34. Savchenko V. V. Method for comparison testing of parametric power spectrum estimates: spectral analysis via time series synthesis. Measurement Techniques, 66(6), 430–438 (2023). https://doi.org/10.1007/s11018-023-02244-3
35. Savchenko A. V., Savchenko V. V. Sc ale-invariant modification of COSH distance for measuring speech signal distortions in real-time mode. Radioelectronics and Communications Systems, 64(6), 300–306 (2021). https://doi.org/10.3103/S0735272721060030
36. Savchenko V. V. Improving the method for measuring the accuracy indicator of a speech signal autoregression model. Measurement Techniques, 65(10), 769–775 (2023). https://doi.org/10.1007/s11018-023-02150-8
37. Kumar S., Singh S. K., Bhattacharya S. Performance evaluation of a ACF-AMDF based pitch detection scheme in realtime. International Journal of Speech Technology, 18, 521–527 (2015). https://doi.org/10.1007/s10772-015-9296-2
Supplementary files
Review
For citations:
Savchenko V.V., Savchenko L.V. Method for asynchronous analysis of a glottal source based on a two-level autoregressive model of the speech signal. Izmeritel`naya Tekhnika. 2024;(2):55-62. (In Russ.) https://doi.org/10.32446/0368-1025it.2024-2-55-62