Two-stage algorithm of spectral analysis for automatic speech recognition systems

V. V. Savchenko; L. V. Savchenko

doi:10.32446/0368-1025it.2024-7-60-69

Two-stage algorithm of spectral analysis for automatic speech recognition systems

V. V. Savchenko, L. V. Savchenko

https://doi.org/10.32446/0368-1025it.2024-7-60-69

Full Text:

PDF (Rus)

Generate QR code

Abstract

Within the framework of a dynamically developing direction of research in the field of acoustic measurements, the task of spectral analysis of speech signals in automatic speech recognition systems is considered. The low efficiency of the systems in unfavorable speech production conditions (noise, insufficient intelligibility of speech sounds) compared to human perception of oral speech is noted. To improve the efficiency of automatic speech recognition systems, a two-stage algorithm for spectral analysis of speech signals is proposed. The first stage of speech signal processing consists of its parametric spectral analysis using an autoregressive model of the vocal tract of a conditional speaker. The second stage of processing is the transformation (modification) of the obtained spectral estimate according to the principle of frequency-selective amplification of the amplitude of the main formants of the intra-periodic power spectrum. The software implementation of the proposed algorithm based on the high-speed computational procedure of the fast Fourier transform is described. Using the author’s software, a full-scale experiment was carried out: an additive mixture of vowel sounds of the control speaker’s speech with white Gaussian noise was studied. Based on the results of the experiment, it was concluded that the amplitude of the main speech signal formants were amplified by 10–20 dB and, accordingly, a significant improvement in the speech sounds intelligibility. The scope of possible application of the developed algorithm covers automatic speech recognition systems based on speech signal processing in the frequency domain, including the use of artificial neural networks.

Keywords

speech signal, spectral analysis, vocal tract, autoregressive model, all-pole model, artificial neural network, data augmentation

About the Authors

V. V. Savchenko

National Research University Higher School of Economics
Russian Federation

Vladimir V. Savchenko

Nizhny Novgorod

L. V. Savchenko

National Research University Higher School of Economics
Russian Federation

Lyudmila V. Savchenko

Nizhny Novgorod

References

1. Ternström S. Special issue on current trends and future directions in voice acoustics measurement. Applied Sciences, 13(6), 3514 (2023). https://doi.org/10.3390/app13063514

2. Mishra J., Sharma R. Vocal tract acoustic measurements for detection of pathological voice disorders. Journal of Circuits, Systems and Computers, 2450173 (2024). https://doi.org/10.1142/S0218126624501731

3. Li S. A., Liu Y. Y., Chen Y. C. et al. Voice interaction recognition design in real-life scenario mobile robot applications. Applied Sciences, 13(5), 3359 (2023). https://doi.org/10.3390/app13053359

4. Savchenko A. V., Savchenko V. V. Method for measurement the intensity of speech vowel sounds fow for audiovisual dialogue information systems. Measurement Techniques, 65(3), 219–226 (2022). https://doi.org/10.1007/s11018-022-02072-x

5. O’Shaughnessy D. Trends and developments in automatic speech recognition research. Computer Speech and Language, 83(12) (2024). https://doi.org/10.1016/j.csl.2023.101538

6. Yu D., Deng L. Automatic speech recognition. A Deep Learning Approach. Vol. 1. Springer, London (2016). https://doi.org/10.1007/978-1-4471-5779-3

7. Savchenko V. V. Itakura–Saito Divergence as an element of the information theory of speech perception. Journal of Communications Technology and Electronics, 64(6), 590–596 (2019). https://doi.org/10.1134/S1064226919060093

8. Kathiresan Th., Maurer D., Suter H., Dellwo V. Formant pattern and spectral shape ambiguity in vowel synthesis: The role of fundamental frequency and formant amplitude. The Journal of Acoustical Society of America, 143(3), 1919–1920 (2018). https://doi.org/10.1121/1.5036258

9. Fu M., Wang X., Wang J. Polynomial-decomposition-based LPC for formant estimation. IEEE Signal Processing Letters, 29, 1392–1396 (2022). https://doi.org/10.1109/LSP.2022.3181523

10. Savchenko V. V. A measure of differences in speech signals by the voice timbre. Measurement Techniques, 66(10) 803–812 (2024). https://doi.org/10.1007/s11018-024-02294-1

11. Tokuda I. The source–flter theory of speech. Oxford Research Encyclopedia of Linguistics (2021). https://doi.org/10.1093/acrefore/9780199384655.013.894

12. Kim H. S. Linear predictive coding is all-pole resonance modeling. Center for Computer Research in Music and Acoustics, Stanford University (2023). https://ccrma.stanford.edu/~hskim08/lpc/lpc.pdf

13. Butenko I., Slavnov N., Stroganov Yu., Kvasnikov A. Phonetic-acoustic database of trigrams for Russian dialects speech recognition. AIP Conference Proceedings, 2833(1) (2023). https://doi.org/10.1063/5.0151706

14. Shumway R. H., Stoffer D. S. Spectral analysis and fltering. In: Time series analysis and its applications. Springer Texts in Statistics. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-52452-8_4

15. Marple S. L. Digital Spectral Analysis with Applications. 2 ed. Dover Publications, Mineola, New York (2019).

16. Savchenko V. V., Savchenko L. V. Method for asynchronous analysis of a glottal source based on a two-level autoregressive model of the speech signal. Izmeritel’naya Tekhnika, 73(2), 55–62 (2024). (In Russ.) https://doi.org/10.32446/0368-1025it.2024-2-55-62

17. Savchenko V. V., Savchenko L. V. Method for testing stability and adjusting parameters of an autoregressive model of the vocal tract. Izmeritel’naya Tekhnika, 73(5), 54–63 (2024). (In Russ.) https://doi.org/10.32446/0368-1025it.2024-5-54-63

18. Savchenko V. V. A method for autoregression modeling of a speech signal using the envelope of the schuster periodogram as a reference spectral sample. Journal of Communications Technology and Electronics, 68(2), 121–127 (2023). https://doi.org/10.1134/S1064226923020122

19. Savchenko V. V. Method for reduction of speech signal autoregression model for speech transmission systems on lowspeed communication channels. Radioelectronics and Communications Systems, 64(11), 592–603 (2021). https://doi.org/10.3103/S0735272721110030

20. Savchenko V. V. Hybrid method of speech signals spectral analysis based on the autoregressive model and Schuster periodogram. Measurement Techniques, 66(3), 203–210 (2023). https://doi.org/10.1007/s11018-023-02211-y

21. Savchenko V. V. Improving the method for measuring the accuracy indicator of a speech signal autoregression model. Measurement Techniques, 65(10), 769–775 (2023). https://doi.org/10.1007/s11018-023-02150-8

22. Rabiner L. R., Schafer R. W. Theory and Applications of Digital Speech Processing. Prentice Hall (2010).

23. Alku P., Kadiri S. R., Gowda D. Refning a deep learning-based formant tracker using linear prediction methods. Computer Speech & Language, 81, 101515 (2023). https://doi.org/10.1016/j.csl.2023.101515

24. Kuhn K., Kersken V., Reuter B., Egger N., Zimmermann G. Measuring the accuracy of automatic speech recognition solutions. ACM Transactions on Accessible Computing, 16(4), 1–23 (2024). https://doi.org/10.1145/3636513

25. Candan С. Making linear prediction perform like maximum likelihood in Gaussian autoregressive model parameter estimation. Signal Processing, 166, 107256 (2020). https://doi.org/10.1016/j.sigpro.2019.107256

26. Borovkov A. A. Matematicheskaya statistika. Dopolnitel’nye glavy. Nauka. Fizmatlit, Moscow (1984). (In Russ.)

27. Jolad B., Khanai R. An approach for speech enhancement with dysarthric speech recognition using optimization based machine learning frameworks. International Journal of Speech Technology, 26, 287–305 (2023). https://doi.org/10.1007/s10772-023-10019-y

28. Kolbæk M., Tan Z.-H., Jensen S. H., Jensen J. On Loss Functions for Supervised Monaural Time-Domain Speech Enhancement. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28, 825–838 (2020). https://doi.org/10.1109/TASLP.2020.2968738

29. Savchenko V. V., Savchenko L. V. Method for measuring the intelligibility of speech signals in the Kullback-Leibler information metric. Measurement Techniques, 62(9), 832–839 (2019). https://doi.org/10.1007/s11018-019-01702-1

30. Feng S., Halpern B. M., Kudina O., Scharenborg O. Towards inclusive automatic speech recognition. Computer Speech & Language, 84, 101567 (2024). https://doi.org/10.1016/j.csl.2023.101567

31. Esfandiari M., Vorobyov S. A., Karimi M. New estimation methods for autoregressive process in the presence of white observation noise. Signal Processing, 171, 107480 (2020). https://doi.org/10.1016/j.sigpro.2020.107480

32. Ngo Th., Kubo R., Akagi M. Increasing speech intelligibility and naturalness in noise based on concepts of modulation spectrum and modulation transfer function. Speech Communication, 135, 11–24 (2021). https://doi.org/10.1016/j.specom.2021.09.004

33. O’Shaughnessy D. Speech enhancement – a review of modern methods. IEEE Transactions on Human-Machine Systems, 54(1), 110–120 (2024). https://doi.org/10.1109/THMS.2023.3339663

34. Gustafsson Ph. U., Laukka P., Lindholm T. Vocal characteristics of accuracy in eyewitness testimony. Speech Communication, 146, 82–92 (2023). https://doi.org/10.1016/j.specom.2022.12.001

35. Alex A., Wang L., Gastaldo P., Cavallaro A. Data augmentation for speech separation. Speech Communication, 152, 102949 (2023). https://doi.org/10.1016/j.specom.2023.05.009

36. Aldarmaki H., Ullah A., Ram S., Zaki N. Unsupervised automatic speech recognition: A review. Speech Communication, 139, 76–91 (2022). https://doi.org/10.1016/j.specom.2022.02.005

37. Shahnawazuddin S. Developing children’s ASR system under low-resource conditions using end-to-end architecture. Digital Signal Processing, 146, 104385 (2024). https://doi.org/10.1016/j.dsp.2024.104385

38. Wei S., Zou S., Liao F. A comparison on data augmentation methods based on deep learning for audio classifcation. Journal of Physics: Conference Series, 1453(1), 012085 (2020). https://doi.org/10.1088/1742-6596/1453/1/012085

Supplementary files

Review

For citations:

Savchenko V.V., Savchenko L.V. Two-stage algorithm of spectral analysis for automatic speech recognition systems. Izmeritel`naya Tekhnika. 2024;(7):60-69. (In Russ.) https://doi.org/10.32446/0368-1025it.2024-7-60-69

ISSN 0368-1025 (Print)
ISSN 2949-5237 (Online)

Username
Password
	Remember me
Not a user? Register with this site Forgot your password?

User

Izmeritel`naya Tekhnika

Two-stage algorithm of spectral analysis for automatic speech recognition systems

Full Text:

Abstract

Keywords

About the Authors

References

Supplementary files

Review

For citations:

Cookies policy