Method of voice source coding with data compression based on the linear prediction model

V. V. Savchenko; L. V. Savchenko

doi:10.32446/10.32446/0368-1025it.2025-3-67-78

Method of voice source coding with data compression based on the linear prediction model

V. V. Savchenko, L. V. Savchenko

https://doi.org/10.32446/10.32446/0368-1025it.2025-3-67-78

Full Text:

PDF (Rus)

Generate QR code

Abstract

Within the framework of a dynamically developing direction of research in the field of acoustic measurements – analysis and evaluation of parameters of the excitation signal of acoustic oscillations in the vocal tract of a speaker – the problem of coding a voice source of speech with data compression based on a linear prediction model is considered. Using the criterion of minimum average the voice source power in the speech production process, the problem is reduced to real-time coding of the linear prediction error signal. A new method of voice coding has been developed: with clipping of the linear prediction error, which is not associated with computationally expensive procedures for measuring the initial phase and frequency of the fundamental tone of the speech signal. An example of its technical implementation in soft real-time mode is considered. A full-scale experiment was set up and carried out, during which a comparative analysis of the effectiveness of the proposed method and the widely used discrete cosine transform method was performed. It is shown that due to the weakening of data compression artifacts in the reconstructed speech signal, the accuracy of coding the voice source using the developed method is one and a half to two times higher, and there is no need to detect vowel sounds of speech and pauses in the speech signal. The obtained results will be useful in the development of new and modernization of existing systems and algorithms in the fields of automatic speech processing and synthesis, mobile speech communication, artificial intelligence and other applications of speech technologies with data compression based on the linear prediction model.

Keywords

acoustic speech analysis, speech signal, vocal tract, linear prediction model, voice excitation

About the Authors

V. V. Savchenko

National Research University Higher School of Economics
Russian Federation

Vladimir V. Savchenko

Nizhny Novgorod

L. V. Savchenko

National Research University Higher School of Economics
Russian Federation

Lyudmila V. Savchenko

Nizhny Novgorod

References

1. Rabiner L. R., Shafer R. W. Theory and Applications of Digital Speech Processing. Pearson, Boston (2010).

2. Li Y., Tao J., Erickson D., Liu B. and Akagi M. F0-Noise-Robust Glottal Source and Vocal Tract Analysis Based on ARX-LF Model. In: IEEE ACM Transactions on Audio, Speech, and Language Processing, 29, 3375-3383 (2021). https://doi.org/10.1109/TASLP.2021.3120585

3. Tokuda I. The source–filter theory of speech. Oxford Research Encyclopedia of Linguistics (2021). https://doi.org/10.1093/acrefore/9780199384655.013.894

4. Palaparthi A., Titze I. R. Analysis of glottal inverse filtering in the presence of source-filter interaction. Speech Communication, 123, 98-108 (2020). https://doi.org/10.1016/j.specom.2020.07.003

5. Gibson J. Mutual Information, the Linear Prediction Model and CELP Voice Codecs. Information, 10(5), 179 (2019). https://doi.org/10.3390/info10050179

6. Kim H. S. Linear predictive coding is all-pole resonance modeling, Center for Computer Research in Music and Acoustics, Stanford University (2023). https://ccrma.stanford.edu/~hskim08/lpc/

7. Ternström S. Special Issue on Current Trends and Future Directions in Voice Acoustics Measurement. Applied Sciences, 13(6), 3514, (2023). https://doi.org/10.3390/app13063514

8. Mishra J. & Sharma R. K. Vocal Tract Acoustic Measurements for Detection of Pathological Voice Disorders. Journal of Circuits, Systems and Computers, 33(10), 2450173 (2024). https://doi.org/10.1142/S0218126624501731

9. Savchenko V.V., Savchenko L. V. A method for the asynchronous analysis of a voice source based on a two-level autoregressive model of speech signal. Measurement Techniques, 67, 151–161 (2024). https://doi.org/10.1007/s11018-024-02330-0

10. Kadiri S. R., Alku P. and Yegnanarayana B. Extraction and Utilization of Excitation Information of Speech: A Review. In: Proceedings of the IEEE, 109(12), 1920-1941 (2021). https://doi.org/10.1109/JPROC.2021.3126493

11. Arun M. S. & Sathidevi P. S. A Wideband Scalable Bit Rate Mixed Excitation Linear Prediction-Enhanced Speech Coder by Preserving Speaker-Specific Features. Circuits, Systems, and Signal Processing, 42, 1-27 (2023). https://doi.org/10.1007/s00034-022-02277-z

12. Al-Radhi M. S., Abdo O., Csapó T. G., Abdou Sh., Németh G., Fashal M. A continuous vocoder for statistical parametric speech synthesis and its evaluation using an audio-visual phonetically annotated Arabic corpus. Computer Speech & Language, 60, 101025 (2020). https://doi.org/10.1016/j.csl.2019.101025

13. Perrotin O., McLoughlin I. A Spectral Glottal Flow Model for Source-filter Separation of Speech. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 7160-7164 (2019). https://doi.org/10.1109/ICASSP.2019.8682625

14. Hansen J. H. L., Stauffer A., Xia W. Nonlinear waveform distortion: Assessment and detection of clipping on speech data and systems. Speech Communication, 134, 20-31 (2021). https://doi.org/10.1016/j.specom.2021.07.007

15. Schnell M., Ravelli E., Buthe J., Schlegel M., Tomasek A., Tschekalinskij A., Svedberg J. and Sehlstedt M. Lc3 and lc3plus: The new audio transmission standards for wireless communication. Audio Engineering Society Convention, 150, 104911 (2021). https://aes2.org/publications/elibrary-page/?id=21084

16. Ochoa-Dominguez H., Rao K. R. Discrete cosine transform. Boca Raton: CRC Press. (2019). https://doi.org/10.1201/9780203729854

17. Korse S., Pia N., Gupta K. and Fuchs G. PostGAN: A GAN-Based Post-Processor to Enhance the Quality of Coded Speech. In: ICASSP 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 831-835 (2022). https://doi.org/10.1109/ICASSP43922.2022.9747733

18. Liu C. M., Hsu H. W. and Lee W. C. Compression Artifacts in Perceptual Audio Coding. In: IEEE Transactions on Audio, Speech, and Language Processing, 16(4), 681-695 (2008). https://doi.org/10.1109/TASL.2008.918979

19. Thiem N., Orescanin M. and Michael J. B. Reducing Artifacts in GAN Audio Synthesis. In: 19th IEEE International Conference on Machine Learning and Applications (ICMLA), Miami, FL, USA, 1268-1275 (2020). https://doi.org/10.1109/ICMLA51294.2020.00199

20. Savchenko V. V. Method for Comparison Testing of Parametric Power Spectrum Estimates: Spectral Analysis Via Time Series Synthesis. Measurement Techniques, 66 (6), 430-438 (2023). https://doi.org/10.1007/s11018-023-02244-3

21. Savchenko A. V., Savchenko V. V. Scale-Invariant Modification of COSH Distance for Measuring Speech Signal Distortions in Real-Time Mode. Radioelectronics and Communications Systems, 64(6), 300–309 (2021). https://doi.org/10.3103/S0735272721060030

22. Marple S. L. Digital Spectral Analysis with Applications. 2-nd ed. Mineola, New York, Dover Publications (2019).

23. Savchenko V. V. A measure of differences in speech signals by the voice timbre. Measurement Techniques, 67(6), 430-438 (2024). https://doi.org/10.1007/s11018-024-02294-1

24. Rabiner L., Gould B. Teoriya i primenenie cifrovoj obrabotki signalov. Perevod s angl. A.L. Zajceva, E.G. Nazarenko, N.N. Tetekina. M.: Mir (1978).

25. Tan L., Jiang J. Waveform Quantization and Compression. In: Digital Signal Processing (Third Edition), Academic Press, 475-527 (2019). https://doi.org/10.1016/B978-0-12-815071-9.00010-5

26. Savchenko V. V., Savchenko L. V. Metod korrektirovki koefficientov linejnogo predskazaniya dlya sistem cifrovoj obrabotki rechi so szhatiem dannyh na osnove avtoregressionnoj modeli golosovogo signala. Radiotekhnika i elektronika, 69(4), 339-347 (2024). https://doi.org/10.31857/S0033849424040056

27. Chen J. H., Thyssen J. Analysis-by-Synthesis Speech Coding. In: Springer Handbook of Speech Processing, Springer, Berlin, Heidelberg (2008). https://doi.org/10.1007/978-3-540-49127-9_17

28. Kolbæk M., Tan Z. H., Jensen S. H., Jensen J. On Loss Functions for Supervised Monaural Time-Domain Speech Enhancement. ACM Transactions on Audio, Speech, and Language Processing, 28, 8966946, 825-838 (2020). https://doi.org/10.1109/TASLP.2020.2968738

29. Zalazar I. A., Alzamendi G. A., Schlotthauer G. Symmetric and asymmetric Gaussian weighted linear prediction for voice inverse filtering. Speech Communication, 159, 103057 (2024) https://doi.org/10.1016/j.specom.2024.103057

30. Yi H. & Philipos L. Evaluation of Objective Quality Measures for Speech Enhancement. Audio, Speech, and Language Processing, IEEE Transactions, 16, 229 – 238 (2008). https://doi.org/10.1109/TASL.2007.911054

31. Benesty J., Chen J., Huang Y. Linear Prediction. In: Springer Handbook of Speech Processing. Springer, Berlin, Heidelberg (2008). https://doi.org/10.1007/978-3-540-49127-9_7

32. Savchenko V. V., Savchenko A. V. Method for Measuring Distortions in Speech Signals during Transmission over a Communication Channel to a Biometric Identification. Measurement Techniques, 63, 917–925 (2021). https://doi.org/10.1007/s11018-021-01864-x

33. Savchenko V. V. Acoustic Variability of Voice Signal as Factor of Information Security for Automatic Speech Recognition Systems with Tuning to User Voice. Radioelectronics and Communications Systems, 63, 532–542 (2020). https://doi.org/10.3103/S0735272720100039

34. Molla M. K. I., Hirose K. & Hasan M. K. Voiced/non-voiced speech classification using adaptive thresholding with bivariate EMD. Pattern Analysis and Applications, 19, 139–144 (2016). https://doi.org/10.1007/s10044-015-0449-3

35. Rajmic P., Bertin N., Emiya V., Holighaus N. and Ozerov A. Editorial: Reconstruction of Audio From Incomplete or Highly Degraded Observations. In: IEEE Journal of Selected Topics in Signal Processing, 15(1), 2–4 (2021). https://doi.org/10.1109/JSTSP.2021.3052087

Review

For citations:

Savchenko V.V., Savchenko L.V. Method of voice source coding with data compression based on the linear prediction model. Izmeritel`naya Tekhnika. 2025;74(3):67-78. (In Russ.) https://doi.org/10.32446/10.32446/0368-1025it.2025-3-67-78

JATS XML

ISSN 0368-1025 (Print)
ISSN 2949-5237 (Online)

Username
Password
	Remember me
Not a user? Register with this site Forgot your password?

User

Izmeritel`naya Tekhnika

Method of voice source coding with data compression based on the linear prediction model

Full Text:

Abstract

Keywords

About the Authors

References

Review

For citations:

Cookies policy