

Method of voice source coding with data compression based on the linear prediction model
https://doi.org/10.32446/10.32446/0368-1025it.2025-3-67-78
Abstract
Within the framework of a dynamically developing direction of research in the field of acoustic measurements – analysis and evaluation of parameters of the excitation signal of acoustic oscillations in the vocal tract of a speaker – the problem of coding a voice source of speech with data compression based on a linear prediction model is considered. Using the criterion of minimum average the voice source power in the speech production process, the problem is reduced to real-time coding of the linear prediction error signal. A new method of voice coding has been developed: with clipping of the linear prediction error, which is not associated with computationally expensive procedures for measuring the initial phase and frequency of the fundamental tone of the speech signal. An example of its technical implementation in soft real-time mode is considered. A full-scale experiment was set up and carried out, during which a comparative analysis of the effectiveness of the proposed method and the widely used discrete cosine transform method was performed. It is shown that due to the weakening of data compression artifacts in the reconstructed speech signal, the accuracy of coding the voice source using the developed method is one and a half to two times higher, and there is no need to detect vowel sounds of speech and pauses in the speech signal. The obtained results will be useful in the development of new and modernization of existing systems and algorithms in the fields of automatic speech processing and synthesis, mobile speech communication, artificial intelligence and other applications of speech technologies with data compression based on the linear prediction model.
About the Authors
V. V. SavchenkoRussian Federation
Vladimir V. Savchenko
Nizhny Novgorod
L. V. Savchenko
Russian Federation
Lyudmila V. Savchenko
Nizhny Novgorod
References
1. Rabiner L. R., Shafer R. W. Theory and Applications of Digital Speech Processing. Pearson, Boston (2010).
2. Li Y., Tao J., Erickson D., Liu B. and Akagi M. F0-Noise-Robust Glottal Source and Vocal Tract Analysis Based on ARX-LF Model. In: IEEE ACM Transactions on Audio, Speech, and Language Processing, 29, 3375-3383 (2021). https://doi.org/10.1109/TASLP.2021.3120585
3. Tokuda I. The source–filter theory of speech. Oxford Research Encyclopedia of Linguistics (2021). https://doi.org/10.1093/acrefore/9780199384655.013.894
4. Palaparthi A., Titze I. R. Analysis of glottal inverse filtering in the presence of source-filter interaction. Speech Communication, 123, 98-108 (2020). https://doi.org/10.1016/j.specom.2020.07.003
5. Gibson J. Mutual Information, the Linear Prediction Model and CELP Voice Codecs. Information, 10(5), 179 (2019). https://doi.org/10.3390/info10050179
6. Kim H. S. Linear predictive coding is all-pole resonance modeling, Center for Computer Research in Music and Acoustics, Stanford University (2023). https://ccrma.stanford.edu/~hskim08/lpc/
7. Ternström S. Special Issue on Current Trends and Future Directions in Voice Acoustics Measurement. Applied Sciences, 13(6), 3514, (2023). https://doi.org/10.3390/app13063514
8. Mishra J. & Sharma R. K. Vocal Tract Acoustic Measurements for Detection of Pathological Voice Disorders. Journal of Circuits, Systems and Computers, 33(10), 2450173 (2024). https://doi.org/10.1142/S0218126624501731
9. Savchenko V.V., Savchenko L. V. A method for the asynchronous analysis of a voice source based on a two-level autoregressive model of speech signal. Measurement Techniques, 67, 151–161 (2024). https://doi.org/10.1007/s11018-024-02330-0
10. Kadiri S. R., Alku P. and Yegnanarayana B. Extraction and Utilization of Excitation Information of Speech: A Review. In: Proceedings of the IEEE, 109(12), 1920-1941 (2021). https://doi.org/10.1109/JPROC.2021.3126493
11. Arun M. S. & Sathidevi P. S. A Wideband Scalable Bit Rate Mixed Excitation Linear Prediction-Enhanced Speech Coder by Preserving Speaker-Specific Features. Circuits, Systems, and Signal Processing, 42, 1-27 (2023). https://doi.org/10.1007/s00034-022-02277-z
12. Al-Radhi M. S., Abdo O., Csapó T. G., Abdou Sh., Németh G., Fashal M. A continuous vocoder for statistical parametric speech synthesis and its evaluation using an audio-visual phonetically annotated Arabic corpus. Computer Speech & Language, 60, 101025 (2020). https://doi.org/10.1016/j.csl.2019.101025
13. Perrotin O., McLoughlin I. A Spectral Glottal Flow Model for Source-filter Separation of Speech. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 7160-7164 (2019). https://doi.org/10.1109/ICASSP.2019.8682625
14. Hansen J. H. L., Stauffer A., Xia W. Nonlinear waveform distortion: Assessment and detection of clipping on speech data and systems. Speech Communication, 134, 20-31 (2021). https://doi.org/10.1016/j.specom.2021.07.007
15. Schnell M., Ravelli E., Buthe J., Schlegel M., Tomasek A., Tschekalinskij A., Svedberg J. and Sehlstedt M. Lc3 and lc3plus: The new audio transmission standards for wireless communication. Audio Engineering Society Convention, 150, 104911 (2021). https://aes2.org/publications/elibrary-page/?id=21084
16. Ochoa-Dominguez H., Rao K. R. Discrete cosine transform. Boca Raton: CRC Press. (2019). https://doi.org/10.1201/9780203729854
17. Korse S., Pia N., Gupta K. and Fuchs G. PostGAN: A GAN-Based Post-Processor to Enhance the Quality of Coded Speech. In: ICASSP 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 831-835 (2022). https://doi.org/10.1109/ICASSP43922.2022.9747733
18. Liu C. M., Hsu H. W. and Lee W. C. Compression Artifacts in Perceptual Audio Coding. In: IEEE Transactions on Audio, Speech, and Language Processing, 16(4), 681-695 (2008). https://doi.org/10.1109/TASL.2008.918979
19. Thiem N., Orescanin M. and Michael J. B. Reducing Artifacts in GAN Audio Synthesis. In: 19th IEEE International Conference on Machine Learning and Applications (ICMLA), Miami, FL, USA, 1268-1275 (2020). https://doi.org/10.1109/ICMLA51294.2020.00199
20. Savchenko V. V. Method for Comparison Testing of Parametric Power Spectrum Estimates: Spectral Analysis Via Time Series Synthesis. Measurement Techniques, 66 (6), 430-438 (2023). https://doi.org/10.1007/s11018-023-02244-3
21. Savchenko A. V., Savchenko V. V. Scale-Invariant Modification of COSH Distance for Measuring Speech Signal Distortions in Real-Time Mode. Radioelectronics and Communications Systems, 64(6), 300–309 (2021). https://doi.org/10.3103/S0735272721060030
22. Marple S. L. Digital Spectral Analysis with Applications. 2-nd ed. Mineola, New York, Dover Publications (2019).
23. Savchenko V. V. A measure of differences in speech signals by the voice timbre. Measurement Techniques, 67(6), 430-438 (2024). https://doi.org/10.1007/s11018-024-02294-1
24. Rabiner L., Gould B. Teoriya i primenenie cifrovoj obrabotki signalov. Perevod s angl. A.L. Zajceva, E.G. Nazarenko, N.N. Tetekina. M.: Mir (1978).
25. Tan L., Jiang J. Waveform Quantization and Compression. In: Digital Signal Processing (Third Edition), Academic Press, 475-527 (2019). https://doi.org/10.1016/B978-0-12-815071-9.00010-5
26. Savchenko V. V., Savchenko L. V. Metod korrektirovki koefficientov linejnogo predskazaniya dlya sistem cifrovoj obrabotki rechi so szhatiem dannyh na osnove avtoregressionnoj modeli golosovogo signala. Radiotekhnika i elektronika, 69(4), 339-347 (2024). https://doi.org/10.31857/S0033849424040056
27. Chen J. H., Thyssen J. Analysis-by-Synthesis Speech Coding. In: Springer Handbook of Speech Processing, Springer, Berlin, Heidelberg (2008). https://doi.org/10.1007/978-3-540-49127-9_17
28. Kolbæk M., Tan Z. H., Jensen S. H., Jensen J. On Loss Functions for Supervised Monaural Time-Domain Speech Enhancement. ACM Transactions on Audio, Speech, and Language Processing, 28, 8966946, 825-838 (2020). https://doi.org/10.1109/TASLP.2020.2968738
29. Zalazar I. A., Alzamendi G. A., Schlotthauer G. Symmetric and asymmetric Gaussian weighted linear prediction for voice inverse filtering. Speech Communication, 159, 103057 (2024) https://doi.org/10.1016/j.specom.2024.103057
30. Yi H. & Philipos L. Evaluation of Objective Quality Measures for Speech Enhancement. Audio, Speech, and Language Processing, IEEE Transactions, 16, 229 – 238 (2008). https://doi.org/10.1109/TASL.2007.911054
31. Benesty J., Chen J., Huang Y. Linear Prediction. In: Springer Handbook of Speech Processing. Springer, Berlin, Heidelberg (2008). https://doi.org/10.1007/978-3-540-49127-9_7
32. Savchenko V. V., Savchenko A. V. Method for Measuring Distortions in Speech Signals during Transmission over a Communication Channel to a Biometric Identification. Measurement Techniques, 63, 917–925 (2021). https://doi.org/10.1007/s11018-021-01864-x
33. Savchenko V. V. Acoustic Variability of Voice Signal as Factor of Information Security for Automatic Speech Recognition Systems with Tuning to User Voice. Radioelectronics and Communications Systems, 63, 532–542 (2020). https://doi.org/10.3103/S0735272720100039
34. Molla M. K. I., Hirose K. & Hasan M. K. Voiced/non-voiced speech classification using adaptive thresholding with bivariate EMD. Pattern Analysis and Applications, 19, 139–144 (2016). https://doi.org/10.1007/s10044-015-0449-3
35. Rajmic P., Bertin N., Emiya V., Holighaus N. and Ozerov A. Editorial: Reconstruction of Audio From Incomplete or Highly Degraded Observations. In: IEEE Journal of Selected Topics in Signal Processing, 15(1), 2–4 (2021). https://doi.org/10.1109/JSTSP.2021.3052087
Supplementary files
Review
For citations:
Savchenko V.V., Savchenko L.V. Method of voice source coding with data compression based on the linear prediction model. Izmeritel`naya Tekhnika. 2025;74(3):67-78. (In Russ.) https://doi.org/10.32446/10.32446/0368-1025it.2025-3-67-78