Harnessing the Full Potential of Audio over IP

By Martin Dyster on Feb 8, 2017 11:00:00 AM

Martin DysterThere is a misconception in the broadcast space that media outlets must wait for video over IP to mature before fully embracing an audio over IP (AoIP) infrastructure, but this is not the case. AoIP technologies have been stable for some time and today can provide the flexibility, reliability, and connectivity needed to deploy audio separately from video. Before the introduction of SDI video with embedded audio, video and audio had been handled separately for years. Although embedded audio seemed like a step forward, it did not reduce lip sync issues, and associated metadata is still easily separated from the audio. Since metadata is an increasingly essential part of new audio services, this becomes a major problem.

Traditional, channel-based audio looks set to gradually be replaced by object-based audio, the carriage of the individual sound elements, bundled with metadata, that make up the channels. Multiple languages, emergency audio and services for the visually impaired are all competing for auxiliary space in broadcast delivery. While these expanded audio services can provide flexibility and enhanced consumer experiences for broadcast and Internet-based OTT services, and even for handheld devices, they need to be supported by the right platform to deliver as promised. AES and SMPTE have been working together on ways to enable the sub-sample accurate linking of Audio over IP with video while keeping the two streams separate until final delivery. The result of their efforts on the audio-side is the new AES67 standard currently being adopted by many manufacturers.

Past & Present – Breaking Down the Boxes

To best understand where AoIP technology can take the industry, one needs to reflect on how we have been working until this point. The broadcast chain as we knew it consisted of a string of devices that were each assigned to complete a specialized task. Closed captioning, stills, graphics, squeeze, crawls, bugs, audio processing and video encoding are each done with specific hardware that is dedicated to a certain function. Add to that countless utility products, such as frame syncs, distribution amplifiers, audio de-embedders and re-embedders, and audio and video synchronization, and the system becomes installation intensive. Troubleshooting and maintenance is made far more complex and difficult, given the multiple points of failure.

Reducing the number and types of hardware devices used in the broadcast chain limits failure and helps streamline the process, and the broadcast industry has been working to consolidate functionality in everything from video switchers, to video effects, and audio processors. There are two major advances driving this consolidation: the widespread use of Ethernet for file distribution, device control, and real-time video and audio delivery, and the increasing power and storage of open, reliable IT platforms. Major increases in computing power have also allowed video and audio processing to evolve from one specialized box per function to multiple functions in a single box. These advances mean broadcasters can now use fewer devices in the air chain, resulting in higher density, reduced space requirements, less AC power, reduced cooling requirements, less wiring, better system management and faster design and installation.

AES67 takes all this condensing a step further, however, bringing AoIP devices under the umbrella of interoperability. Prior to the standard, several manufacturers had their own dedicated AoIP protocols. AES67 removes those barriers completely, allowing the designer to take diverse products from different companies and create a wider ecosystem built around standards-based AoIP.

Future – Thinking “Outside the Box”

Even with these new technologies well within our grasp, many broadcasters are still not using AoIP to its fullest potential. Right now, AoIP tends to be used as a point-to-point replacement for MADI and not as the efficient and flexible distributed networked architecture that multicast routing enables. With increased channel density and interoperability comes the possibility of supporting emerging formats and rethinking how we look at audio altogether. Around the world, 5.1 channel audio is common and the broadcast of 7.1 channels and more is in the process of being standardized. Delivering immersive audio experiences to the home and providing audio objects and metadata that can place these objects in anything from stereo to 11 (or more) channels of playback is also already a well-developed idea.

There are many other areas beyond increased realism or an immersive audio experience that can benefit from increasing the number of audio channels. One is live sports, where increased channels could better serve fans by simultaneously delivering home and away commentary with venue atmosphere. Another is descriptive audio for the sight impaired. Handling multiple languages and multiple levels of emergency information are additional uses. The idea is to deliver a more personalized audio experience to every consumer, regardless of how they are listening — on headphones, handheld devices or laptops, or from televisions and home theater systems with 11 or more channels.

In addition to increased audio services, separating video and AoIP streams simplifies delivery.  All you need for audio distribution are network cables, since the video does not need to run through a facility to reach loudness management and audio processing devices. Another benefit of AoIP is universal contribution and access. This means that any AoIP-enabled device or computer with an AoIP driver can put audio on, and receive audio from, the network. The end result brings audio outputs to control rooms, edit bays and studios that can be used anywhere and heard anywhere. AoIP interfaces that provide GPIO, AES digital audio, analog audio and sync connections can be placed anywhere within the reach of the network, eliminating separate runs of sync, RS-422, time code, and GPI and audio cabling. On top of this, national emergency announcements, local emergency audio, and local audio cut-ins can be added to the AoIP network using interfaces at the edges of the AoIP network.

In addition to what is heard over the airwaves, AoIP makes it possible to converge subsystems that would otherwise have remained separate. The emergence of agnostic IP stream control and routing has made the management of IP audio simple and coherent. Intercom feeds are no longer tied up to their own matrix, nor is program audio routed independently for on-air and in-studio feeds. Access becomes open and devolved, flattening role-specific signal paths in the process. Everything is just “audio” — low latency, program-quality, 24-bit multi-channel sound.

The emergence of the AES67 standard for AoIP adds many possibilities for broadcast outlets beyond the distribution of audio. OTT may change how we’re viewing content, but AoIP aims to change how we hear it. With proper system configuration, it is now possible to distribute more channels than ever before. In addition, many facilities are using AoIP to become more flexible while reducing overall costs. It will be critical in the coming years to embrace a scalable technology like AoIP if we want to seamlessly accommodate emerging audio requirements and video formats. Video over IP may still be a few years away from full maturity, but we can start to pave the way for this workflow by bringing audio into the IP realm now.

Topics: Audio over IP, AoIP for Television

Recent Posts


If you love broadcast audio, you'll love Telos Alliance's newsletter. Get it delivered to your inbox by subscribing below!