Radio Thrilled the Video Star (Pt. 1/3)
By Greg Shay on Jul 9, 2020 11:13:40 AM
The Whole Origin Story of Audio Over IP - Greg Shay, CTO Telos Alliance
It is not that often that your inner monologue, history as you remember it, is of interest to others. On this occasion of being awarded a technical Emmy by the television industry for the Development of Synchronized Multi-channel Uncompressed Audio Transport Over IP Networks, I wanted to share the story, the whole story from my point of view, of how this came to be at Telos Alliance.
Any story worth reading is worth telling well, so please forgive my somewhat loose writing style and enjoy the ride, as I have over these 24 years.
So, kiddies, set your PTP way-back machine to 1996. (Oh wait..., that was before PTP was invented....!)
It was in the first conversation on the very first day I first came down to Telos to meet with Frank Foti and Steve Church, and late in the day (any conversation with Steve was prone to go late), we started talking about the potential of audio networks.
"Wouldn't it be great to use that network jack that is already there on the computer," were Steve's very words. I had been doing some research work on audio networks since 1993 when I was with Spectral Synthesis, and here was the chance to put this into practice and start to move on it.
“Out of the office blue sky session to bust out some future ideas..”
(July 1997, at Punderson State Park, Newbury Ohio. From left to right, Steve Church, Greg Shay, John Casey, Frank Foti, Kevin Nosé)
Soon after I hired on, Steve organized an out of the office brainstorming meeting. My first project, Telos’ next generation talkshow systems, and Telos’ multi-studio talk show system was conceived at that brainstorming session. (Named the 2101, which was just our office street address if anyone hadn't ever realized that).
Alongside the phone system project, I studied Cobranet, how it worked, it's patents and its limitations. If we could have used Cobranet for our studio applications, we would have. But we thought the latency was too high, and it could not share traffic with non-audio non-Cobranet network traffic. I attended some IEEE conferences and got a head full of AVB. If we could have bought AVB switches, and if AVB NIC’s had been in standard PC motherboards, we would have used AVB. But alas, the hardware for AVB just wasn’t in the market. So I started work in earnest on a new kind of audio network.
Sawing Through the Eagles
The first challenge was to get around the Cobranet patents, which were aimed at avoiding collisions, making audio over ethernet work on 10baseT hubs. Steve said we should just count on 100BaseT and skip 10BaseT, which seemed bold in 1997, but not a risk in hindsight. This avoided needing the Cobranet collision avoidance.
We settled on using standard, by-the-book RTP for audio packets. Telos had made its mark in interfacing telephony to the studio, and the big news with the telcos was the shift from POTS and ISDN to Voice over IP (VoIP). It seemed drop-dead obvious to us to take all the structure of VoIP RTP packets, crank up the sample rate and bit depth to pro audio levels, and run with it. No need to invent a new audio packet format.
Telos had a new division in those years, AudioActive, targeted at the brand new world of web audio streaming. We were bringing our experience and know-how of the Fraunhofer MP3 codec to web streaming solutions. (Telos’ Zephyr had been the first commercial product to use MP3. And Steve was the guy who introduced a little company called Microsoft, to FHG’s MP3). So we had one foot in VoIP and one foot in streaming. The kitchen was ready to cook up AoIP for inside the studio.
"I still remember when I woke up with the solution to the synchronization design in my mind. Yes, literally, I dreamed it up."
For our early AoIP testing, Maciej Szlapka found that the standard Quicktime player knew about 16 bit, 48Khz audio RTP streams, and could be coerced to play 24bit, 48Khz audio streams. This was key, having an existing reference player to test our early work. But the latency was huge, as buffering for streaming was designed for playing streams over the internet. What was lacking was tighter coordination of the data flow, to keep the packet size small, the buffers small, and latency down.
I figured out the formula for the minimum buffering possible based on how network switches pass packets through at hardware speed. This is what gave our AoIP design its lowest latency, start at the bottom, understanding the theoretical minimums, and work upward.
A last missing piece was tight synchronization. To use small buffers, you had to have small errors in your sync time recovery. The error in time synchronization adds to network jitter, and increases the required buffer size, which increases latency. We were shooting for 250us (12 sample packets at 48Khz), two packet buffers, so we wanted around 5-microsecond sync accuracy. Not so easy on a busy network.
I still remember when I woke up with the solution to the synchronization design in my mind. Yes, literally, I dreamed it up. In my previous job, I had designed software-controlled phase lock loops, and used statistical data analysis on the fly, to remove the ‘time error’ which is ‘time noise’, for video telecom equipment. When I realized how fundamental network behavior affects the sync packets, I knew what to do. A network adds a variable amount of delay. A network can never make a packet early. The errors are biased, and thus can be filtered out.
Lots of FPGA work later, (first implementations were in FPGA), audio packets were starting to fly. In our R&D lab, we had CD players on infinite loop, generating source audio for testing. For whatever reason, the song during this debugging was the Eagles “Take it Easy”. Those who were in that room probably remember the 'buzz-saw' sound cutting through the song, as moving around and syncing up the audio packets was getting less and less distorted..
“Well, I’m runnin' down the load tryin’ to loosen my ZZZZZRRRRAAAATTT….!”
“Seven women on my mind, four that wanna BBBZZZZZZSSSSSTT..!”
You get the idea... I can’t hear that song on the radio, more than 20 years later, without still hearing that saw!
"Hey, they say a clean desk is the sign of a sick mind. I’ve never, ever, had that problem!”
A new technology does not automatically take over. People have to get behind it.
I was ready to use our new audio network technology in our new 2101 multi-studio talkshow system in development, to connect the multiple studios to the Hub.
We gathered the software engineering team into a conference room, and started to discuss what was needed to be done.
“But you don’t have a control protocol…” the other team members said to me.
“That’s why we’re talking; you can create them..” I replied. Blank stares. Lesson learned. Innovation may start in a mind, but change happens with people and teamwork.
So Steve said to me, no, not yet, let the 2101 be more traditionally designed. Steve then expertly fostered and incubated this new technology by organizing and getting group momentum going in the right direction...
If you love broadcast audio, you'll love Telos Alliance's newsletter. Get it delivered to your inbox by subscribing below!