Written by Jens Berger | December 16, 2025

Quality Evaluation of Speech Coding Technologies
A comprehensive quality test was conducted to evaluate the perceived quality of various speech coding technologies under realistic conditions. The study compared current mobile network codecs with traditional low-bitrate codecs, and emerging AI-based ultra-low bitrate speech coding solutions.
In the test, a set of German speech samples spoken by various speakers was processed through each codec type. A controlled listening experiment was applied to assess overall speech quality with respect to naturalness of reproduced speech, combined with typical transmission impairments such as packet loss and bandwidth constraints. The evaluation aimed to reflect real-world usage scenarios, including mobile calls, popular IP-based voice services and speech transmission over satellite links.
To achieve statistically meaningful results, a formal listening test was conducted in a standardized acoustic environment following the ITU-T P.800 methodology using the Absolute Category Rating (ACR) approach. A total of 32 participants - men and women from various age groups - were invited to rate the speech samples. The test ensured balanced demographic representation and controlled conditions to obtain reliable subjective quality scores. Participants evaluated multiple samples per codec type, and the results were statistically analyzed to identify significant differences in perceived quality.
Key categories included:
- Modern Mobile Codecs: Including EVS and AMR-WB, which are widely deployed in LTE and 5G networks. Additionally, OPUS (used in WhatsApp) and Satin (used in MS Teams) were considered under real transmission conditions. These codecs offer high fidelity and robustness, especially under variable network conditions.
- Legacy Low-Bitrate Codecs: Such as MELP and LPC-10, and the amateur radio codec Codec2, representing earlier generations of strong speech compression. These codecs were originally designed for extremely bandwidth-constrained environments and are still used in specialized applications.
- Ultra-Low Bitrate AI-Based Codecs: Leveraging deep learning models for end-to-end speech representation and reconstruction. The tested codecs operate in the bitrate range of approximately 600 bit/s to 3 kbit/s. For comparison, 600 bit/s is only one hundredth of the well-known ISDN transmission rate (64 kbit/s) and just one fortieth of the bitrate typically used in VoLTE (24 kbit/s).
Ultra-low bitrate codecs are of particular interest for use in satellite-based communication systems (e.g., Non-Terrestrial Networks, NTN) in Direct-to Cell or Direct-to-Device mode (smartphones receive signals directly from satellites), where bandwidth is highly constrained and latency is critical. They are also relevant in military and tactical communication scenarios, where efficient spectrum usage and resilience to transmission errors are essential.

Performance of AI-Based Codecs
The new AI-based codecs support 8 kHz wideband and 12 kHz super-wideband audio and demonstrate a significant leap in perceived speech quality and naturalness compared to classical low-bitrate codecs. Some AI-based solutions approached the performance level of high-quality codecs such as AMR-WB and EVS, making them promising candidates for future communication systems under strong bitrate constraints or high network load situations. The computational complexity of these codecs was not investigated in this study; however, some implementations introduce only a short delay that is acceptable for use in real-time communication.
These codecs deliver speech that sounds natural and pleasant to the listener without question. However, they do not always reproduce all speaker-specific characteristics with full accuracy. For example, pitch and intonation may be slightly altered, and in some cases, initial phonemes or consonants may be replaced or smoothed. While this may be acceptable for everyday conversation, it can limit their applicability in scenarios requiring speaker identification, authentication, or mission-critical communication.
The following table shows some representative results of the listening experiment; the Mean Opinion Score (MOS) rates the subjectively perceived quality on a scale from 1 (bad) to 5 (excellent):

The detailed results of this evaluation, including statistical analysis, codec performance rankings, and listener feedback, are presented at the ITU-T SG12 meeting in September 2025. These insights are expected to contribute to ongoing discussions around codec standardization, the definition of “quality,” and its automated prediction, particularly in the context of future mobile and satellite communication systems.