

Method for measuring the intensity of speech vowel sounds f ow for audiovisual dialogue information systems
https://doi.org/10.32446/0368-1025it.2022-3-65-72
Abstract
In this paper, we consider the audiovisual data processing and interaction of two modalities for user’s emotional state prediction in dialogue information systems. The audio modality is used for real-time detection of emotional speech segments. As an indicator of the level of speech emotionality, it is proposed to estimate the intensity of the fl ow of vowels in the user's speech signal at the input of the information system. A novel method for measuring this indicator is proposed by using the empirical probability of the appearance of vowels in the speech signal. An example of its practical implementation in soft real time is provided. Using the author's software, a full-scale experiment was set up and carried out. The advantages of our method in terms of its high speed and sensitivity to the level of users' speech emotionality are shown. The obtained results are intended for developers of modern information systems with an audiovisual user interface.
Keywords
About the Authors
A. V. SavchenkoRussian Federation
Andrey V. Savchenko
Nizhniy Novgorod
V. V. Savchenko
Russian Federation
Vladimir V. Savchenko
Nizhniy Novgorod
References
1. Davis S. K., Morningstar M., Dirks M. A., Qualter P., Personality and Individual Diff erences, 2020, vol. 160, 109938. https://doi.org/10.1016/j.paid.2020.109938
2. Arana J. M., Gordillo F., Darias J., Mestas L., Computers in Human Behavior, 2020, vol. 104, 106156. https://doi.org/10.1016/j.chb.2019.106156
3. Savchenko L. V., Savchenko A. V., Measurement Techniques, 2021, vol. 64, no. 4. https://doi.org/10.1007/s11018-021-01935-z
4. Shaqra F. A., Duwairi R., Al-Ayyoub M., Procedia Computer Science, 2019, vol. 151, pp. 37–44. https://doi.org/10.1016/j.procs.2019.04.009
5. Savchenko A. V., Savchenko V. V., Izmeritel’naya tekhnika, 2021, no. 11, pp. 60–66. (In Russ.) https://doi.org/10.32446/0368-1025it.2021-11-60-66
6. Srinivas N., Pradhan G., Kumar P. K., Integration, 2018, vol. 63, pp. 185–195. https://doi.org/10.1016/j.vlsi.2018.07.005
7. Rammohan R., Dhanabalsamy N., Dimov V., Eidelman F. J., Journal of Allergy and Clinical Immunology, 2017, vol. 139, no. 2, ab250. https://doi.org/10.1016/j.jaci.2016.12.804
8. Akçay M. B., Oğuz K., Speech Communication, 2020, vol. 116, pp. 56–76. https://doi.org/10.1016/j.specom.2019.12.001
9. Bourguignon M., Molinaro N., Lizarazu M. et al., NeuroImage, 2020, vol. 216, 116788. https://doi.org/10.1016/j.neuroimage.2020.
10. Cardona D. B., Nedjah N., Mourelle L. M., Neurocomputing, 2017, vol. 265, pp. 78–90. https://doi.org/10.1016/j.neucom.2016.09.140 11. Cui S., Li E., Kang X., IEEE International Conference on Multimedia and Expo (ICME), 6–10 July 2020, London, UK, IEEE, 2020, pp. 1–6. https://doi.org/10.1109/ICME46284.2020.9102765
11. Kashani H. B., Sayadiyan A., Sheikhzadeh H., Speech Communication, 2017, vol. 91, pp. 28–48.
12. Yongda D., Fang L., Huang X., Computers & Electrical Engineering, 2018, vol. 72, pp. 443–454. https://doi.org/10.1016/j.compeleceng.2018.09.014
13. Akbulut F. P., Perros Harry G., Shahzad M., Computer Methods and Programs in Biomedicine, 2020, vol. 195, 105571. https://doi.org/10.1016/j.cmpb.2020.105571
14. Stasak B., Epps J., Goecke R., Computer Speech & Language, 2019, vol. 53, pp. 140–155. https://doi.org/10.1016/j.csl.2018.08.001
15. Asada T., Adachi R., Takada S. et al., Proceedings of International Conference on Artifi cial Life and Robotics, 13–16 January 2020, Beppu, Oita, Japan, 2020, ALife Robotics Corporation Ltd., 2020, vol. 2, pp. 398–402. https://doi.org/10.5954/ICAROB.2020.OS16-3
16. Juan D. S., Senoussaoui M., Granger E. et al., Multimodal Fusion with Deep Neural Networks for Audio-Video Emotion Recognition, 2019. https://arxiv.org/abs/1907.03196v1 [cs.CV].
17. Borovkov A. A. Matematicheskaya statistika, St. Petersburg, Lan’ Publ., 2010, 704 p. (In Russ.)
18. Kumar A., Shahnawazuddin S., Pradhan G., Circuits Systems, Signal Process, 2017, vol. 36, pp. 2315–2340. https://doi.org/10.1007/s00034-016-0409-1
19. Savchenko V. V., Radioelectronics and Communications Systems, 2020, vol. 63, pp. 532–542. https://doi.org/10.3103/S0735272720100039
20. Savchenko A. V., Savchenko V. V., Savchenko L. V., Optimization Letters, 2021, no. 7. https://doi.org/10.1007/s11590-021-01790-5
21. Candan Ç., Signal Processing, 2020, vol. 166, 107256. https://doi.org/10.1016/j.sigpro.2019.107256
22. Savchenko V. V., Reshenie problemy mnozhestvennyh sravnenij v zadachah avtomaticheskogo raspoznavaniya signalov na vyhode trakta rechevoj svyazi, Elektrosvyaz’, 2017, no. 12, pp. 22–27. (In Russ.)
23. Savchenko V. V., Savchenko A. V., Journal of Communications Technology and Electronics, 2020, vol. 65, no. 11, pp. 1311– 1317. https://doi.org/10.1134/S1064226920110157
24. Kullback S., Information Theory and Statistics, N.Y., Dover Publications, 1997, 432 p.
25. Savchenko V. V., Journal of Communications Technology and Electronics, 2019, vol. 64, no. 6, pp. 590–596. https://doi.org/10.1134/S1064226919060093
26. Gray R. M., Buzo A., Gray A. H., Matsuyama Y., IEEE Transactions on Signal Processing, 1980, vol. 28, no. 4, pp. 367–377. https://doi.org/10.1109/TASSP.1980.1163421
27. Savchenko V. V., Savchenko А. V., Radioelectronics and Communications Systems, 2019, vol. 62, pp. 276–286. https://doi.org/10.3103/S0735272719050042
28. Marple S. L., Digital Spectral Analysis with Applications, 2nd ed. Mineola, NY, Dover Publications, 2019, 432 p.
29. Perepelkina O., Kazimirova E., Konstantinova M., Proceedings of International Conference on Speech and Computer (SPECOM 2018), 18–22 September 2018, Leipzig, Germany, Springer, Cham, 2018, pp. 501–510. https://doi.org/10.1007/978-3-319-99579-3_52
Review
For citations:
Savchenko A.V., Savchenko V.V. Method for measuring the intensity of speech vowel sounds f ow for audiovisual dialogue information systems. Izmeritel`naya Tekhnika. 2022;(3):65-72. (In Russ.) https://doi.org/10.32446/0368-1025it.2022-3-65-72