Radio Thrilled the Video Star (Pt. 2/3)
By Greg Shay on Aug 20, 2020 1:06:54 PM
The Whole Origin Story of Audio Over IP - Greg Shay, CTO The Telos Alliance
Continuing the story of the origin of Audio Over IP, we move onto Part 2. The idea to share this story stemmed from the occasion of being awarded a technical Emmy by the television industry for the Development of Synchronized Multi-channel Uncompressed Audio Transport Over IP Networks.
Out of the Buzzsaw, Into the Sawmill
Steve wrote up a whole treatise on this new audio network technology, and how it would revolutionize professional audio. He gave it the working name ‘SmartLink’, and organized an offsite meeting at Sawmill Creek Resort in Sandusky Ohio, (far away from the offices in downtown Cleveland). A wider circle of about eight engineers were invited, and momentum was built. My memories of that pow-wow were of Rob Dye and myself debating the fine points of RTP, NTP wall clocks, time stamps, and sync. Another memory I’m sure many present remember, is in our intent concentration we worked late into the evening, only to decide to break for dinner at about 10 pm. Except this was not a hip metropolitan area of Ohio, and nothing was open, no restaurants, no take out... No food! I actually don’t remember what we ate that night.. (Scott Stiefel remembers we finally convinced a pizza place to deliver to our hungry crew).
So now, this new AoIP cat was being shot out of the bag with some momentum. The team was leaning into the work to be done, not away from it.
On a trip to Europe, Steve came into contact with the research institute LUMII in Riga, Latvia. He hired Gints Linis to organize and carry through a ton of the work needing done to flesh out the rest of the ecosystem for the new AoIP system.
Very thorough system design documents were created and implemented, which became, and remain, the infrastructure of the what was to be ultimately named the ‘Livewire system,’ alongside the fundamental RTP audio streams. Control, advertising and discovery, plans for organizing the different channels of audio, automatic mix-minus backfeeds, even surround (ahead of its time for radio). A protocol for ‘Reliable UDP’ and the CMSG binary packed message protocol written by Ioan Rus as part of the 2101 project, was incorporated into the inner workings of Axia Livewire protocols.
Gints and the LUMII team created the world’s first pure AoIP low latency mixing engine on a completely generic Intel PC platform. No DSP chips. No FPGA’s. No custom hardware. This truly delivered on the promise of Steve and my first conversation in ’96 of ‘using that network jack that was already there on the PC’ for professional audio.
The Kitchen Needs a New Sync
Do you remember back in the late 90s, at the beginning of the dot.com boom? Intel ran TV ads showing people in silver cleanroom suits dancing around to the tune of “Play That Funky Music” by Wild Cherry (a song released from a Cleveland, Ohio record company, small world..!). What Intel was trying to say, and in my opinion not very clearly, was that Intel added fast math processing into their CPUs so you didn't need DSP chips any more to do audio and video signal processing. Intel could do the jams! Get Down!
"I often have said that if we could have bought AVB switches and if AVB hardware was in standard PC platforms back in those days...AoIP may have never been invented."
Steve took the leap and directed the software guys to test this for ourselves. At that time we had products with up to 13 DSP chips, but the new result was that a single advanced Pentium could indeed do the whole job, and more. With a real-time Linux operating system, we were able to process packets with low latency to achieve the low latency audio targets. However, we hit a snag. The mix engine needed sync.
Remember this was before PTP, so there was no such thing as PTP aware networking hardware. And critically, we wanted to use standard PC platforms. No special hardware. No special NIC cards.
This had been the nemesis of AVB. I often have said that if we could have bought AVB switches and if AVB hardware was in standard PC platforms back in those days, we (and the broadcast world of today) maybe would just have used AVB, and AoIP may have never been invented. But AVB hardware just wasn't available, and we had to figure out how to make this sync work with standard PC hardware.
So the sync work came back to me in our Cleveland office, and I worked out a method using only the PC motherboard timer chip to form a phase lock loop that could lock to the sync packets. Every Windows capable PC to this day, buried inside its chipset, has a fossilized version of the old Intel 8253 timer chip from the original IBM PC of circa 1981. A big difficulty was that timer chip had really crude resolution, so we had to dither it, but finally, we got lock! Maciej and I were excited the first time we saw it come into lock on the scope!
An aside, but a great interview question for an embedded software position is to ask, “How might you debug software with an oscilloscope?” and see what answer you get or if they just look at you funny. If they actually understand what you are talking about, hire them!
The Windows That Couldn't
We were succeeding with dedicated hardware, FPGA, and real-time operating systems. But an important goal for connecting audio networking to general applications was to make it work on a standard Windows operating system PC. Internet web streaming to the PC was all the rage, but could we make this low latency linear audio work in Windows, too?
"This allowance of different stream types, intercompatible in well-defined ways, foreshadowed the way the future AES67 standard would be written."
Well, I already gave away the end of this story, with the paragraph title… No, Windows is not a real-time OS. Worse, any one misbehaving application, or driver, or piece of hardware in the system, can inject a delay, and poke holes in our low latency audio.
So what happened when the unstoppable new audio network force met the immovable Windows object? We invented a new higher latency subcategory of larger packets, lower packet rates, and larger buffers. We realized not all workflows of audio must be lowest latency. Many of the tasks could operate perfectly well with higher latency that Windows could support. While live microphone to headphone latency was kept to less than three milliseconds, playing back songs and ad inserts could easily deal with tens or even 100 milliseconds of latency and be perfectly usable.
This allowance of different stream types, intercompatible in well-defined ways, foreshadowed the way the future AES67 standard would be written: a range of different choices to fit different operating conditions, all the while enabling new possibilities without getting stuck behind the impossible barriers where they come up. I express this engineering maxim as “you either find how to do it, or else you prove it can’t be done and then find a way around."
The Little Coldfire That Could
Inside our newly created Axia audio nodes, the audio packet heavy lifting was being done by an FPGA. But this new need to embrace multiple audio packet types, and the larger buffering for the Windows type streams (which we called “Standard streams” vs. “Live Streams”), blew the lid off the FPGA resources. Or any inexpensive FPGA of that time. What to do?
Maciej Szlapka came to the rescue by writing carefully optimized code for the Coldfire CPU we had in there. One by one, the seemingly impossible barriers to reaching the low latency performance came down. These were heady days of R&D, so much potential, so much at stake. I have fond memories of Maciej and I working so closely together that I at one time was holding the mouse, and he would be doing the typing, we were thinking so much in unison. One of the keys was optimizing the assembly code inner loop of audio data transfers in blocks, which we called ‘shovels’.. a shovel full of audio. We need a bigger shovel..! Kudos to that little CPU that could, we made it work.
Blowing Up Switches
We had created a monster. Eight streams at 4000 packets per second is 32000 packets per second, per device. Put a dozen of these on a network, and you had close to half a million packets per second, continuous, non-stop. Another name for an AoIP device is a ‘Denial-of-Service attack in a box.’ (Warning! Don’t plug one into your business network without multicast routing enabled!)
We managed to blow up just about every switch we tested in those early years around 2002. Some just crashed, some lost their minds and started flooding all packets everywhere (a crater inducing result for each device to suddenly receive 500,000 packets per second, if it fit down the wire...) What was required was a well-behaved switch that always strictly used hardware packet switching. Some of the strict definition between switching and routing was getting intermingled in these network switches, but any reliance on software packet processing was instant death for the switch.
"It was very important that a standard IT switch vendor could be used to build the new audio over IP future."
Some would hang on, but if you power-cycled the switch, during its boot-up phase it would just momentarily go through a software packet-sniffing stage before turning on the hardware packet switching, and whammo! that poor unsuspecting CPU was hit by the audio packet freight train...
We started to worry. Worried quite a bit, actually, that our cool new technology was just ahead of its time. Since our goal was to use standard off the shelf network IT switches, what if the switches were not up to the challenge yet?
Saved By Cisco
In one of those precarious moments where all you can do is hold your breath, as we kept testing and breaking switches, we did find at least two vendors of switches that we could not break. Cisco and Allied Telesis. (We heard that A-T had been strong in video conferencing, which made sense, as this was a similar traffic and stream-heavy use case.)
It was very important that a standard IT switch vendor could be used to build the new audio over IP future. This was the cornerstone of our economic argument. When we first showed our new tech at NAB, we made sure these standard switch vendor types were visible and prominent. We knew if we had had to create our own switches, or done any smoke and mirrors workarounds, this would put the new audio over IP into the same difficulties as many of the other competing audio network technologies of that time, needing special hardware. The standard IP switch was the key.
"The 2101 era." - Greg Shay and Ioan Rus
IP Shows Its Mettle
During switch-testing is actually when IP Layer 3 began to show it’s full worth. We had based our Livewire system around multicast addressing. The in-studio and inside-the-facility workflow we were aiming for was all about having all audio channels available everywhere all the time, without having to centrally manage or coordinate change to routing controls in the middle. Multicasting solved this elegantly.
But multicasting meant we had to rely on the switches to do proper multicast routing. You could not afford to flood all multicast traffic everywhere; there was much too much traffic. In fact, you had to ensure that the switches would never flood, even for a moment, or audio would be disrupted everywhere. This put a high sensitivity on the maturity of the implementation of multicast control in the switches.
We tried out the different multicast control protocols. GMRP was the Layer 2 (ethernet) multicast protocol, and IGMP was at Layer 3 (IP). Remember, I just said we had blown up just about every switch we had tested, we only had a few left we could use.
The end of the story is that IGMP layer three, was better implemented and better behaved than the layer two protocol. We had come to the end of the long pier, were standing in front of the deep water, and IP layer three was the bridge that the IT industry had been investing in to be the way forward.
The future of Audio over IP was cast using the concrete of IT R&D...
If you love broadcast audio, you'll love Telos Alliance's newsletter. Get it delivered to your inbox by subscribing below!