AudioTools In Focus: "What Did He Say?" How AudioTools Server Fixes Mumbled Dialog

By Graham Tudball on Dec 15, 2025 7:00:18 PM

The problem of dialog intelligibility is nothing new and something that we’ve likely all come across on more than one occasion. Whether it is quiet dialog that forces you to continually ride the volume control on the TV remote, or actors mumbling their lines, intelligibility issues can take many forms. But how can we detect that a piece of content has issues that need addressing and, more importantly, how can we improve sections where intelligibility is a concern?

What did he say?

Another viewer vexed by the mumbles. AudioTools Server is ready with the fix.

Why Is Intelligibility A Problem?

Dialog intelligibility issues can take many forms. One major contributor can be traced back to how content, particularly movies, is mixed. Cinematic content is typically mixed with a wide dynamic range, as this lends itself to the most exciting and engaging theatrical experience. While explosions so thunderous they make your seat shake and whispers so intimate that you feel you're drawn right into the moment work beautifully in a controlled cinema environment, they translate poorly to the average living room, where the wide dynamic range can make dialog almost inaudible during quiet scenes and the explosions so loud they wake the entire street - even after loudness normalization has been applied.

The design of modern flat-screen televisions can also play a part. As TVs have become progressively thinner, the physical space available for speakers has drastically diminished, leading to the use of small, downward-firing, or rear-firing speakers that struggle to produce a full and clear sound. These compact speakers tend to lack the resonance and frequency response needed to accurately reproduce the human voice, especially in the lower-mid range where much of dialog resides. The result is often a thin, muddy, or tinny sound that further exacerbates the problem of distinguishing speech from background noise and music, regardless of how well the audio was mixed at the production stage.

The way dialog is delivered by actors also plays a significant role in how we understand the speech. The modern trend for actors to strive for a more natural-sounding performance, with increased use of regional dialects, can sometimes result in lines being delivered with less projection or clarity than perhaps was the case in the past. This "natural" delivery, while artistically desirable, can make it harder for viewers to discern what's being said, especially when combined with background music or sound effects - something that is particularly pronounced for elderly viewers or those with hearing impairment.

Quantifying Intelligibility

solve-the-problem

We know the problem. Now how do we solve it?

Beyond just being annoying, intelligibility issues can also have serious monetary implications. Disgruntled viewers are more likely to switch over to another program (or even turn the TV off), resulting in a loss in advertising revenue. They may also take to social media to air their frustrations, which is likely, in turn, to result in other potential viewers choosing not to watch. As such, identifying and addressing intelligibility issues at the earliest stage is of critical importance.

The ability to measure the loudness of the dialog component of a mix is a technology that has been around for many years, thanks to Dolby’s Dialog Intelligence algorithm. Comparing the measured dialog level with that of the overall loudness can then provide a useful indicator of mix-related intelligibility problems - particularly when combined with other loudness measurement types, such as Loudness Range (LRA) and Short Term Loudness. The importance of the relative difference between the dialog and overall loudness as an indicator of intelligibility problems was formally recognised by the EBU in the 2023 supplement to their longstanding R128 loudness specification (R128 s4), where it was given the name “loudness-to-dialog ratio.”

But what about detecting intelligibility problems caused by poor vocal delivery, or where the background audio masks the dialog component in the mix? These types of issues cannot be easily detected by traditional measurement options and require a more sophisticated approach. This is where AudioTools Server’s Dialog Intelligibility Analysis module comes in. The Dialog Intelligibility Analysis module takes a modern, machine-learning driven approach to measuring how intelligible a piece of speech is by assigning a score based on how easy it is to understand. In addition to providing overall intelligibility scores, the module can also be used to highlight individual sections where listeners may struggle to make out what is being said.

I Can Hear Clearly Now!

murky-clear

From murky as hell to clear as a bell.

Of course, identifying regions where intelligibility may be an issue is one thing, but wouldn’t it be good if you can also improve the audio quality. For intelligibility issues caused by a wide dynamic range, the perfect tool exists in AudioTools Server’s Advanced Loudness Adaptation. Advanced Loudness Adaptation analyses the audio for a wide range of loudness properties (including its “loudness-to-dialog ratio”) and then adapts it (where required) to improve any level imbalances, resulting in a more pleasing viewing experience. You can read more about how Advanced Loudness Adaptation can improve dialog-intelligibility issues here.

But what about audio where the dialog and overall loudness levels are broadly the same, or where the background audio is masking a section of dialog? This requires a different approach, one that employs another machine-learning driven technology, Dialog Extraction. 

Dialog Extraction allows us to split the previously mixed content up into separate “Dialog” and “Non-Dialog” components. Once separated, the two components can then be separately treated before being mixed back together again. This could be as simple as just raising the level of the entire dialog component, or something more sophisticated, where only sections of dialog audio that fall below a certain threshold are raised. Within the AudioTools Server toolset, both the Fraunhofer Dialog+ module and our own Linear Acoustic APTO™ technology (which is part of the Advanced Loudness module) can be employed for this purpose.

With AudioTools Server, both the identification of sections where intelligibility could be of concern and then their improvement can be handled within a single automated workflow. This makes the application an invaluable tool for anybody producing and/or delivering content, particularly when combined with its Advanced Loudness Adaptation dynamic control, resulting in fewer intelligibility issues and an audience less likely to switch off or watch something else.

To learn more about how AudioTools Server can help make intelligibility issues a thing of the past, please contact your AudioTools Server representative. We’d love to hear from you!


 

More Topics: Automation, audio processing software, AudioTools Server, AudioTools In Focus, 2025, Dialog Intelligibility

Recent Posts

Subscribe

If you love broadcast audio, you'll love Telos Alliance's newsletter. Get it delivered to your inbox by subscribing below!