In our quest for better Internet audio streaming we’re looking closer at Apple’s HTTP Live Streaming - or Apple HLS. Greg Ogonowski joins me, Kirk Harnack, to take a deep dive into how Apple HLS adaptive streaming works at the encoder end, server, and for the end user.
Watch the Video!
Read the Transcript!
Kirk: This Week in Radio Tech Episode 281 is brought to you by the Axia Fusion AoIP mixing console. Fusion - where design and technology become one. By the Telos Hx6 talk show system, perfect for request line callers and serious newsmakers alike -- 6 lines and 2 digitally-clear Telos hybrids in one rack unit, the Telos Hx6. And by Lawo and the crystalCLEAR virtual radio console. CrystalClear is the console with a multi-touch touchscreen interface.
In our quest for better Internet audio streaming, we're taking a closer look at Apple's HLS. Greg Ogonowski joins me, Kirk Harnack, to take a deep dive into how Apple HLS adaptive streaming works at the encoder and server and for the end user.
Hey, welcome in. It's This Week in Radio Tech. I'm Kirk Harnack. Glad you can join us, whether you're joining us live on the GFQ Network or watching it later on from GFQ or YouTube or on my website, This Week in Radio Tech. We're glad you're here.
This is the show where we talk about everything, I hope, from the microphone to the light bulb at the top of the tower and all the stuff in between, which now includes . . . there are a whole lot of branches on that tree now and the biggest one probably is streaming. So, we're going to get to that here in just a minute.
I'm coming to you live from Nashville, Tennessee, from my office here. As a disclaimer, I work for the folks at the Telos Alliance. As you may discover, I actually try to make it not too obvious, the Telos Alliance makes a few products that compete with our guest's products. So, this show is about our guest and what he has to say about streaming and his products that help do that in a really effective way. So, I'm delighted to have Greg Ogonowski on the show.
Let's go ahead and bring him in. Hi, Greg. How are you doing? Glad you're here from California. Good to see you.
Greg: Hey, Kirk. Thanks for having us. We're coming to you from lovely suburban LA. Trees and streams go together.
Kirk: They do.
Greg: They do.
Kirk: Somebody needs to come across with a product called Willow because that's the kind of tree that grows next to a stream.
Greg: Yeah. I think there was a product called Willow, come to think of it. I don't remember what it was, though.
Kirk: You know, the first thing . . .
Greg: Believe me, any word you could possibly think of, there's a software package named after it.
Kirk: Yeah. It's hard to come up with something new and innovative, hence all the baby names of products on the Internet or the unpronounceable ones. So, our show is brought to you by a few sponsors and the first one that I want to tell you about, we're just going to see an ad.
I've got a friend, a business partner, actually, Larry Fuss. Larry is the driving force behind our little radio stations. We have six stations in Mississippi and two in American Samoa. Suncast is going to roll this and Larry is going to tell you about why he likes audio over IP with Axia.
Larry: We had three radio stations in one building on the other side of town and then we ended up with another radio station here and decided to consolidate everything into one facility. So, what you see here or what you will be seeing here are four radio stations combined into one facility and in the process of moving, we would decide that we would switch over to the Axia platform. Part of the reason is because we were changing automation as well to Rivendell and Rivendell works very well with the Axia system.
When we decided to consolidate everything, we used to have separate control rooms for each radio station, but several of them had no live, local programming. It's all either syndicated stuff or voice tracked. So, we decided we don't need a separate control room for those stations anymore. So, we decided to consolidate.
Now we have one big, massive control room here. That's pretty much where we do everything for four radio stations. A morning show on one of them is live. We do a live talk show on the AM at midday, at one time we did. Then we do a night show on one of the other stations, all live from this room and all via the Axia.
In the old facility we had literally miles and miles of analog audio wiring, studio to studio, studio to rack, rack to the other studio. It was just lots of wiring. One whole wall of punch blocks, every bit of that is gone now because all we have between this room and the rack room is a couple of runs of cat 5 cable. There's a whole lot less wiring involved this way, which of course means a lot less time involved in setting up an Axia system than in wiring up a plant with analog consoles in every room.
had already installed one smaller Axia console at another radio station that we own about 35 miles from here. It has worked so well, that when the decision was made to move this station or these stations and consolidate them, we looked at all that wiring, we looked at all those punch blocks and we said, "We ain't doing that. We're going to go with Axia and eliminate all that." First of all, it made the move a whole lot simpler.
Tommy: Since we put the Axia system in, everything has run a lot smoother, which is the main thing I'm concerned about. I don't have the engineer coming in every day telling me we need to buy this part, something has gone down. With the one studio, everything is located in one room. We can go in there, all the production for the commercials is done there. The board is easy to use. Everything just works very smoothly.
One other thing that I really like about the Axia system is the flexibility. If I decide to do a Southern Miss football game, I can either automate it or I can have somebody come in and do it with pushing the buttons and getting it on the air for us.
Larry: I wouldn't do it any other way. I've got two more radio stations in another market thousands of miles from here, literally, even in a different hemisphere, as a matter of fact. We put Axia in there too after our first experience with Axia. I would never build another analog plant again.
Kirk: Thanks a lot to Axia and the folks at the Telos Alliance for sponsoring This Week in Radio Tech and thanks to Larry Fuss for that heartfelt testimonial about our Axia experience in Greenville, Mississippi.
All right. This Week in Radio Tech, it's Episode 281. Greg Ogonowski is our guest. Greg has got a company that he is promoting streams hi-fi encoders. Greg, it's great to have you back. You've been on the show a couple of times before. I'm glad you're here now. But you've got something to talk about. We've talked about streaming in the past, but there's a new kind of streaming that you're going to be telling us all about. Where should we start in this subject of adaptive streaming, especially Apple HLS? What do you think?
Greg: Well, once again, thanks an awful lot for the opportunity to appear here. It's a wonderful opportunity. Before we go any further, I just wanted to point out that speaking of the Axia audio over IP protocols, all of our encoders are fully Axia compatible. So, you install the driver and away you go with streaming. Just like the other fellow said about never doing another analog or even a digital installation again, I can't agree more. It's just the way to go. You get one crimp tool and a bunch of cat 5 and you're off.
Kirk: Well, thanks for pointing out that the Axia drivers work with your product. That's great that there's no hitch in the giddy up there. It works great.
Greg: That's correct. The only thing I haven't tried is more than one protocol at once. But I would imagine that may not go so well. That's another story. I don't think anybody needs to go there. Trust me, somebody in Europe is going to complain about that.
Kirk: That's right. We want Ravenna, AES67 and Livewire.
Greg: No, no, no. Ravenna and Livewire, that will be a piece of cake.
Greg: You guys are actually reading from the same hymnal there. No, what somebody's going to want to do one day, we're going to wake up and somebody's going to want to run Dante, they're going to want to run Big W, they're going to want to run Axia and Ravenna. Like I say, those two, I'll bet you any money those run together.
Kirk: Yeah. The networking is the same for both. But the AES67 and AVB and others, a lot of those take a different kind of clocking. They take the IEEE 1588 clocking standard.
Greg: Correct. Axia is moving in that direction anyway, so it's just a matter of changing the clocking because the RTP between Axia and Ravenna is identical.
Greg: But anyway, getting back to HLS, HLS is a very, very interesting way to stream now. It pretty much addresses every complaint that anyone ever had about streaming except the whole idea of the fundamental flaw of streaming. I know I brought this up before. Streaming is fundamentally flawed in that it doesn't have a clock mechanism.
Kirk: Oh yeah, the clock thing.
Greg: That's pretty much what separates the streaming protocols from the audio over IP protocols, such as Axia, for example. There's a clock that's sent with Axia. With streaming, there isn't. The only way it stays up is you build yourself a buffer in the player and you have to fill the buffer. Sooner or later, you've got to deal with the unfortunate mess of either a buffer underrun or a buffer overrun. There are some creative ways to deal with this with sample rate conversion, but then that has speed issues, which can or cannot be audible.
I might point out if you have an SRC in the way that the Nielsen PPM will not propagate through it. So, it destroys that. But be that as it may, streaming is what it is. It's an inexpensive way to deliver. For the most part, with crystal-controlled clocks, a thesaurus clock and the destination clock, streams will hold up for several hours.
Kirk: Let's go back to something that you said at first and that is this HLS streaming, which you're going to explain pretty well to us. It solves several complaints. Let's enumerate what a couple or three of those complaints about regular streaming have been.
Greg: Well, probably the first one would be buffers. You connect to a stream and you have to keep that thing up. It's got contiguous data. The buffer will be very player dependent depending on how the player has been written. How the buffer management is dealt with will determine the robustness of the player client.
Kirk: So, I've noticed that . . . I'm sorry to jump in. I've just got so many ideas.
Greg: There's a ton of detail here.
Kirk: One of the first players I ever used was Winamp and when used with a SHOUTcast server, it did something fairly smart. It would load up its buffer as fast as possible because the server had a bunch of data stored in it. It had 20 seconds or so worth of audio streaming stored up in it. So, I was already delayed 20 seconds and then Winamp would fill up quick and start playing me right away and then once its buffer got full, it would then be streaming at the normal pace with a full buffer, most of the time a full buffer.
Kirk: But when I use . . . I have a TuneIn app on my Android phone. When I punch up one of my radio stations using just a standard SHOUTcast protocol, the TuneIn app doesn't preload the buffer, so I sit there and wait while it says 10%, 20%, 30%. So, I wait until the buffer gets full.
Greg: TuneIn hasn't gotten the idea about burst on connect. That's technically that.
Kirk: Burst on connect, there you go. I knew there was a word for that.
Greg: Yeah, and their player doesn't support it. So, you're right. You sit there and wait. Then the other popular protocol that is used is Adobe's RTMP. Now, that thing is really horrible because there's no buffer on the server at all and they offload the buffer problem exclusively to the player. So, the problem there is while you're loading the buffer in the player, you can't play because you don't get the data yet.
So, you've got to preload. So, if you want to setup a 10-second or a 20-second buffer, when you go to click to play the stream, you've got to wait for the buffer to fill and it will be that 10 or 20 seconds before it can start to deliver audio. Otherwise, the buffer mechanism won't work.
Kirk: Just as a consumer, this is a bit of a pain. You go and you open an app or you click on a TuneIn station that you like or whatever your method is. Then you wait. It just seems like that period of ambiguity has got to cause at least a little slight bit of anxiety in anybody who's wanting to do that.
I've got, I believe it's a Grace brand commercial radio. It's on the back side of my rack right here. It's intended for use in barbershops and stores and whatever. It's kind of commercial looking. It hooks into the PA system of your store. It does sit there and just works and works and works and I like it. But my goodness, it takes forever to buffer up. I don't know what its deal is. It's got iHeartRadio in there and it's got several presets all setup. When I tune in a station, it feels like half a minute before I start getting audio.
Greg: I've spoken to the people over at Grace about that device. It's got a real underpowered processor in it, for one thing. They obviously don't support any burst on connect. It's got everything going against it.
Kirk: Okay. So, buffering and slow to start is one problem.
Greg: Buffering is like the number one problem. The next thing that's really important is, especially with the terrestrial radio stations, everybody wants to do ad insertion now.
Kirk: Oh, yeah.
Greg: So, now you need something that has accurate queueing. SHOUTcast and Icecast can't deliver that because, contrary to popular belief, the metadata is not frame accurate in SHOUTcast and Icecast. It has to succumb to what they call a metadata interval, which is a parameter that gets set on the server itself. It's expressed in bytes.
So, even though you may send the metadata right when you think you've sent it, it's sent out of band. It is not congruent to the audio bit stream. It goes around. So, it may not even go the same route. We all know what's going to happen there. There are latencies that will build up there. It goes to the server. It gets cached in the server and then it squirts out of the server in line with the audio at this metadata interval.
For example, for Icecast, the default metadata interval is 16,000 bytes. That can be a long time, especially if you're relying on this for commercial insertion and you're running a tight format and you expect these commercials to go in and out on your terms instead of its terms. It can't be achieved. I've had this conversation with a number of the commercial insertion services. That conversation has been pretty much like pushing string uphill, which is scary in itself.
Kirk: Now, when you're speaking about this kind of commercial insertion, we're talking about stream splicing, right, where the commercial is getting spliced in somewhere downstream?
Greg: Yeah. There are essentially two ways to do it. One, you can do it on the encoder side, which is a more elegant way to do it. The problem there is you can only splice one stream in it. So, if you want to do anything targeted, it's got to go on the network side.
If it goes on the network side, now you've got another big house of cards. For one thing, it can't be processed through the same audio processor as everything else. So trying to get the audio texture lined up is a big problem. Trying to explain this to the Internet crowd about audio texture and the creative that's behind all of that is yet another problem.
Kirk: That's your uphill string right there too.
Greg: The first thing they tell you, "What do you mean? I can hear it." Okay. But that's pretty much the attitude you get with most everybody that's servicing the Internet. So, all that being said, that's what happens. So, you've got to stream, splice. You've got to worry about getting it spliced correctly. It's got to be done at the audio frames, a lot of detail there, a lot of detail.
Kirk: All right. I think maybe a third point of pain with traditional streaming is bitrate versus who are your listeners and where are they at and what connectivity do they have.
Greg: Correct. You're sitting here with a network like we have here and, hey, you've got 12 megabytes to deal with entertainment, if you want. But on this guy, that's another story. Oh yeah, then there's this guy. This guy is a real important thing.
Greg: That's where the rubber meets the road.
Kirk: That's a factor and I hadn't thought about that, but that's a good point.
Greg: Yeah. This is the age old argument about MP3. MP3, to get any kind of decent fidelity, you've got to be up at 128 KBs per second. That's expensive. It's expensive for mediocre quality. You get to AAC and 32 KBs isn't great, but at least you have 15 KHz response with it, but you can get to 64 and get really good quality with AAC and that's half as much money. It's half as much money and it's twice the bandwidth.
The other thing that you get with all of that is you get twice the reliability or four times the reliability if you're down at 32. You see, all these guys on these networks, they tell you that they'll give you . . . I can go outside here, right outside the door and I can show you a speed test. I can probably do it in here. It's going to be about 30 or 40 MBs per second. It's going to look really good. We're up on a hill and we're actually above the beam on the closest cell site, but if I go down the hill, I'll get 80 MBs on that phone. It's pretty incredible.
Greg: But sustained is a different story. That's where HLS comes in. HLS is a protocol that looks to a device like a webpage that you are looking at forever. It's kind of like you're looking at the webpage and clicking on something about every 10 or 20 seconds.
Kirk: Greg, you've just explained something in a really good way. This is a good elevator speech for either you or for me trying to explain this. Earlier you compared SHOUTcast or Icecast or Adobe streaming, you used the word "contiguous". We don't use that word often. We use it to describe the 48 states, right? But it's contiguous. The packets are one after another after another.
There's some time between them. But it's a constant me-you server-client connection, always there, whereas with adaptive rate streaming -- and we're going to be talking about Apple HLS -- you just said it's like a webpage. It acts like a webpage. It acts like file, little file, little file, little file.
Kirk: We'll get to this later, but there's some intelligence within the player to know which version of the website to get - the full rich version with lots of stuff or the little skinny version that still has the audio at a decent audio quality.
Greg: That is correct. Anybody who's familiar with our StreamS HiFi Radio app for iPhone, there's an interesting tool that you can get as an in-app purchase. There's a bit meter. I don't know how well this is going to work here. Let's just see if I can make this . . . There we go. You can see I've got an HLS stream. There's such a delay in the video here. It's hard for me to get this thing positioned. It's glaring.
Kirk: I know. It's tough to show stuff like that.
Greg: In any regard, the bit meter . . . okay, I guess it would help if the screen save didn't kick in. Okay. There we go. You can see the spikes there. Those are the HLS segments that come in. With a normal stream, that will just be a flat line straight across.
Kirk: Okay. That's an option that I need to purchase. I don't have that on mine. I've got your app on my big old fashioned iPad here.
Greg: Oh yeah. I'd send you a promo code if I could, but they don't give us a mechanism to send promo codes for the in-app portion of all of this. You'll have to settle for lunch the next time we see one another.
Kirk: That's fine. I don't mind putting a few dollars on that. That's pretty cool. I have recommended this. We gave away a few of these apps a couple of years ago when we first had you on, but I love this app. Do you vet? I should tell people this. This is an app where you pick radio stations or other web broadcasters and listen to them. Every one of them sounds good. How do you vet that?
Greg: We do vet them. For one thing, we don't list any MP3. It's all AAC exclusive. It's the only app that I know of that is about quality rather than quantity. So, we've got like 5,000 or 6,000 streams in there at this point and it's harder than hell to keep it all maintained because the amount of changes in stream URLs every day, it's mind-numbing, but we do our best and I urge anybody that should get the app, if you run across URLs that don't work, please, by all means email us.
We love to hear from our users. We just don't get enough feedback from dead URLs. We've got some spiders that try to deal with it, but the other problem is everybody in the world is streaming with a slightly different spin on things. So, the spiders don't always work on everybody and on and on and on.
But back to HLS, this business of being able to make these streams look like webpages, the one huge advantage from a CDN point of view, that's the content distribution network, is you don't need a streaming server to stream live anymore. We've got a webpage setup. It's at www.etherstream.net slash a couple of places, either /8 for Flash players or /HTML5 for native HTML5 support, which is very abbreviated at the moment.
But there are two browsers that have native HLS support at this point, Microsoft Windows Edge browser for Windows 10 -- you can punch in that URL, the direct M3U8, it plays right in the browser, and of course Mac, iOS and OSX. The Apple folks have done the best job of implementing everything there is to know about HLS, no surprise there. They invented it. But they support everything that you could possibly think of. Of course, iTunes also plays HLS.
Kirk: Yeah. We're going to take a quick break here for a sponsor, Greg. So, in the first half of the show, we've been talking about streaming and the issues that streaming has had over the years. You know what, Greg? Let's take another two minutes before we break and talk about the thing that I brought up about different bitrates, that if a radio station wants to stream, they kind of have to decide, okay, do we stream at a high bitrate, at a low bitrate, at a medium bitrate?
Do we stream at several bitrates and make the customer or make the listener choose which one do we stream in MP3 or AAC and make the customer choose which app is going to be associated with which stream? There's so much we either end up compromising in what we do or we put it on the listener to make a decision, neither of which is a great solution. Hasn't that been one of the problems with streaming?
Greg: That's been another problem. You are absolutely correct. HLS, as well as MPEG-DASH, by the way, HLS and MPEG-DASH are very, very closely related. They're delivered pretty much the same way, the fundamental difference there being the file formats. But be that as it may, the way this works, very quickly, is the player client will make a request to the web server, which is now the streaming server, in essence.
It makes a request to the web server, for a file in the case of HLS, it's an M3U8 file, which is what's referred to as a playlist or a manifest file. In that playlist file, it lists typically three or four streams of various bitrates. Should your player client support that selection, it can make a determination as to which stream to use on the basis of your location or your network speed.
So, one URL to describe up to four streams usually, Apple specifies three, in our encoders you can specify four and they're all synchronous. If you're at home on a good Wi-Fi network, for example, you'll pull in 256k and if you're out and about, you do 32k, for example.
Kirk: We're going to explore that and how the players choose which one. You said that the streams, again, they're not really streams, they're files. But they're synchronous. So, if your player switches from one to another because the available bandwidth changed, you as the listener will probably not notice the difference. You probably won't even notice the difference in quality, although you might. We'll relate that to the way something like Netflix works in just a minute.
Hey, our show, this is This Week in Radio Tech Episode number 281. Our guest is Greg Ogonowski. We're talking specifically about implementing Apple HLS streaming. So, Greg and I have talked about the difficulties with streaming to this point for audio broadcasters and we're going to get into some of the benefits in how you get into Apple HLS.
Our show is brought to you in part by the folks at Telos and the Telos Hx6 phone system. Hey, if you are putting together any kind of content where you want talk to listeners, listener callers or you want to do interviews with guests, the Telos Hx6 is amazing. It is low-cost compared to the way you had to do this in the past. This one-rack unit box gives you inside two telephone hybrids. These are Telos' best 5th generation telephone hybrid technology.
This box takes POTS lines coming in. There is an ISDN version available too, although ISDN for the most part is going away. You can generate your POTS lines locally from SIP if you want to or if you get POTS lines from your telephone company provider, you can do that too.
I even know of people who are using things like several Vonage boxes to provide their POTS lines. There are all kinds of different ways to get good old-fashioned POTS to come into this box. At my radio stations, we're actually using a SIP to POTS converter to bring POTS lines into our Hx6 for our different radio stations.
But because you've got two telephone hybrids in there, that means you can -- the right way -- you can put two callers on the air at the same time. Maybe you've got an expert guest on line six and that's your hot line, right? Then you want to bring in listener callers and they can conference with each other. Well, the Hx6 does conferencing all by itself.
Let's say you've got an older audio console. You don't have an Axia console or you don't have a console that does automatic mix-minus. No problem. The Hx6 can take a single mix-minus from your audio console and properly cross feed the two callers, the expert guest, for example, and the listener caller on the other line. And mix in, of course, your mix-minus audio -- your announcer, your local people, your local host, their voice -- and any music played on the console along with that. So, everybody hears everybody exactly right.
Something else that's cool is that each of the two hybrids has its very own Omnia audio processing on the call. So, the callers' levels are consistent from call to call. There's even automatic equalization on each call. So, as much as possible, even the tonal timbre, the quality of each caller is relatively the same. Somebody calls in from a smartphone, somebody calls in from an old home telephone that's got the carbon granules packed together, whatever it may be. You've got consistent call to call quality. So, that's a really, really good benefit there.
Then Hx6 incorporates every single trick that Telos has learned over 30+ years as to how to keep feedback from happening. So, you can even use the Hx6 in an environment where you've got open mic and open speakers. Think the Phil Donahue talk show. Okay, some of you aren't old enough to remember the Phil Donahue talk show.
You've got a television studio. You've got a studio audience and you've got callers calling in being put on the overhead speakers and you've got a host walking around the audience with a microphone. That's open mic and open speakers. You can still carry on a conversation with a caller that way and not have feedback because of the tricks that Telos builds into the Hx6.
The Hx6 also uses the now really famous and easy to use Telos VSet 6, which looks like a telephone. It's a controller for the Hx6. Those connect over your Ethernet network. So, you can stick the Hx6 back in a closet somewhere, in your rack room. It doesn't have to be in the studio and then you can just, through your Ethernet network, your standard business network that doesn't have to be a Livewire network, you can plug in these telephones and control the system from that. It's amazingly easy to install and use. I use one of them myself at our stations.
It also, by the way, does Livewire. So, the Hx6 has your standard XLR connectors for audio in, audio out, the send to the caller, the caller's voice coming out, music on hold or as we say in our biz, program on hold, those inputs are all there. But if you want to do all that with Livewire, you can do that too.
If you're already a Livewire studio, then you have another choice. You can buy the Hx6 or you can save a few dollars and buy the iQ6. It's functionally identical except it doesn't have the XLR connectors. It only has a Livewire jack on the back of it, well, plus the phone line jacks.
Check it out if you would on the web. I love this talk show system. It works really well and the callers sound great. By the way, it does come with call screening software. It comes right with it from the folks at Broadcast Bionics. So, go to TelosAlliance.com, look at Telos and look for on air phone systems. You'll find the Telos Hx6 right there. It's sort of a cousin to the iQ6. It's just missing its XLR connectors. You don't have to pay for them. Thanks to Telos for sponsoring This Week in Radio Tech.
All right. It's Kirk Harnack along with Greg Ogonowski. Let's see Chris Tobin still hasn't been able to make it to a Skype machine yet. So, we're waiting on Chris to see if he'll be able to join us for this show.
So, Greg, we've talked about issues with streaming. Let's jump into now talking about Apple HLS. You said a moment ago and you provide a link that people can listen to some of these streams themselves and they sound like streams. You mentioned this manifest file. So, when you give a link out to your stream, you're really giving a link to this file sitting on a file server somewhere. Tell me about that file again.
Greg: Yeah. Before we get into that file, I just meant to mention, you forgot to give the 800-number. The 800-number is supposed to be in there and you're supposed to, "Call direct or collect except in Nebraska, don't call in."
Kirk: Yeah. Exactly. Call by midnight tonight, no cops.
Greg: There you go. Operators standing by.
Kirk: That's right.
Greg: You have this playlist file, the manifest file and it has a list of streams in it. The player client makes a determination as to what it's going to do next. So, then it goes and it determines, let's say for example that it's going to deliver you 256 KBs. So, then it gets another file, which is a variant, what they call a variant file. Inside that playlist, this is where the heavy lifting all happens, a list of these chunk files, which in the case of Apple HLS are ADTS AAC files. It's like a machine that goes like this, "
"That's how it works.
Kirk: That's good. That sounds like the elevator speech.
Greg: That's tech speak for how it works.
Kirk: Actually, I was trying to figure out how to explain this.
Greg: Or like this . . .
Kirk: Yeah? Okay. Have you seen that commercial where the guy is making sausage in the grocery store and he's like putting live baby pigs in the box and cranking the crank and out comes links of sausage and people are horrified?
Greg: That's how this works, actually.
Kirk: Yeah. If you didn't make the links, you'd have this continuous tube of sausage. What you've got, you spin the casing and they make a sausage link. Then the sausage link is like a file. So, it contains a certain amount of data that is a certain number of seconds or milliseconds of the audio stream.
Greg: It's usually 10, 20, 30 seconds depending on what you decide to do. We'll get to that in a moment.
Kirk: Okay. So, this variant file contains a list of the files in their numerical sequence or whatever it is.
Greg: Right. With a few other pieces of important information for the player client, the player client is going to need to know when to go get another manifest file because after it plays what's there, it's going to run out. It needs to know, "Got to get the manifest updated so we can get the next set of segments." So, it continually updates the manifest file as all of these segments or sausage links get updated on the web server.
Greg: Now, in the case of our encoder, our encoders do an awful lot of management. Believe it or not, we've come full circle and we call our encoders internally Project 959. For any of the geeks out there if they put two and two together, 959, what could that possibly be? Well, how about RFC 959? It's one of the Internet's oldest protocols.
Greg: We are now streaming using FTP.
Greg: Basically, the whole HLS thing is, it's complicated in so many ways, but in so many ways it's dumb as dirt. All it is FTPing files and you've got something to tell the player clients when to go get them and play them. But the beautiful thing about this is all of this can go over anybody's web HTTP infrastructure, their existing web server caches, all their existing investment, all of it can be used for this without having to change a thing.
Here's where it even gets better. I told you we'd get back to the segment length. If you use 10 seconds for segment lengths, the files come in in these 10-second chunks. But you'll usually have like anywhere from 30 to a minute or possibly 2 minutes. We had the advantage of being able to work with a very huge client with all of this.
We even experimented with some 4-minute stuff, which literally means you can go away from the network for 4 minutes and your stream never stops. That can't happen with SHOUTcast or Icecast unless you built in a huge, huge buffer into the player and I don't know of anybody who's ever gone that far.
So, as far as robustness and reliability and low cost, this protocol has basically delivered on anything anybody could possibly want. You're streaming over FTP. We also offer FTPS. You can use dumb storage, like the Amazon Cloud or Rackspace or Azure.
Kirk: So, I can use an Amazon web server or Azure or any place I can stick a file, that can be my web stream server now?
Kirk: Our streams are now file-based rather than contiguous stream-based.
Greg: That is correct. Over on those sites that I gave you are some interesting little surprises. The bottom two streams, number seven is down at the moment. I'm working on that. As a matter of fact, I'm working on that today. It will be up later. But all the other ones are served from one machine. There are eight playout systems running on one computer. There are eight Orban Optimod PC natives running on that computer. There's SQL server running on it and there are 64 streaming encoders running on that one box. We call that box jocks in the box.
We've got another one in the lab that's running 16. That's another interesting beast. I've probably got a 32 in the wind here before long. But be that as it may, the last two streams were actually streaming in surround. So, we'll be putting up a configuration page as to how to configure your sound to be able to hear all six channels.
We've got one stream that is upmixed from stereo sources and the one, number seven, which is the one I'm working on today, we've got a surround playout system that's going to dish out genuine six-channel program material. So, that will end up being a live stream with six channel sources.
If you look at those surround streams, they're all pointing to RAX, RAX.Indexcom.com. If you do a reverse lookup on that from your location, you'll see that points back to Akamai. If you do the pings on those, you're probably in the department of like 5 milliseconds no matter where you are in the world.
The advantage that this all has once you get on the Akamai network, for example, is Akamai has over 170,000 Edge servers all over the world. So, you ingest on a CDN such as Amazon AWS, glue it to Akamai and you are literally serving the world with millisecond ping times.
Kirk: Wow. This is almost more than I can take in.
Greg: I'd love to give that a test up at the space station. I'd love to know what their ping times would be.
Kirk: Do you think one of their 170,000 is on the space station?
Greg: It probably is.
Kirk: With that many Edge servers, there must be one down the street from me.
Greg: I'm sure there is. There's one down the street from me. I can tell you that any of my buddies in LA connect to all different Edge servers. I don't connect to the same one they do. It's all different.
Kirk: All right. So, thinking about either your software, your encoders or speaking generally, what are some of the things that engineers who are setting this up need to have in mind? What modeling in their head do they need to understand about this and then how do they get their feet wet in streaming in Apple HLS?
Greg: The first order of business for a terrestrial radio station where there's obviously a studio involved, if they're not so lucky to have audio over IP such as Axia or Ravenna, they've got to get the audio into the box. We solve that problem with these guys here. This is what you call a StreamS IOdigi. It's just a real simple thing. It's bus powered. No drivers need to go in.
This works on Mac, PC, even iOS. You can plug it into your iPhone if you want digital IO. But it's a great little box. It's AES. So, it's good to go for the pro community. They're inexpensive and the new ad, as it says, is, "Need a quickie? Use a StreamS IOdigi."
Kirk: There you go.
Greg: So, you get your audio into the encoder and point this encoder to your ingest server and go on and listen. It's just that simple.
Kirk: Okay. Now, when you're setting up an encoder, and maybe there's a thing in there called a segmenter. I heard you use that term when we were talking earlier today, a segmenter. What are some of the parameters that you feel would be standard that you want to set? What are some of the bitrates or the encoding algorithms or the segment lengths that you would want to setup?
Greg: All of the old HLS mechanisms that have been brought to the market so far have involved the use of another streaming server. They would ingest either an ICY, which is a SHOUTcast or Icecast stream or an RTMP stream. Now, right from the get-go, that starts with all of the baggage and all of the nasty things that all of those protocols have and then it would get translated to HLS segments.
Our encoders use HLS direct. All the heavy lifting is done in the encoder itself. So, all of the segmentation and file management is all done in the encoder. So, that means after the segments have expired in the manifest file, then the encoder goes back and cleans them out so that you don't end up with an endless bit bucket of trash. It keeps everything nice and clean. All you've got in there is just about a minute of audio or whatever you've configured your encoder to do.
Kirk: If I could see if I could get something understood, it sounds like there are a couple of ways to get this done. Earlier today, I was part of a webinar with the folks at Wowza. Their business model and their service is that you send them a good old fashioned RTMP stream and they offer to do the heavy lifting, as you put it, of making segmented streams at different bitrates.
As engineering purists, you and I think, okay, well, that's one way to do it. But maybe we're better off doing the, as you call it, HLS direct, having our own encoder make these files for the different bitrates and then either hosting them ourselves or shoving them off to the server in the cloud to be hosted and eliminate any transcoding.
Greg: Correct. The other advantage to being able to do all of the heavy lifting in the encoder is you're able to do frame accurate metadata, which was the other thing. So, you're like plus or minus 20 milliseconds, 40 milliseconds. It depends on whether you're AAC or HE-AAC as to that. But that's really, really tight.
Greg: So, you're going to be able to use that metadata for queueing. It makes the ad insertion or any kind of other content insertion very accurate and you're able to leverage the target spots that everybody wants to do right now with this technology. So, that is a great thing.
You don't have to worry about server latencies, server caches and the metadata is all in line with the audio. It travels right with it. It's the same socket connection, so it can never walk. The only place it could walk is if the player clients don't deal with it correctly. But right through the encoder and right through the web server, it can never ever walk.
Kirk: You've gotten off in the weeds here a few times with really technical stuff and I appreciate that very much. Can you, in the few minutes we have left, and by the way, what we're going to do is we're going to talk for a few minutes, we've got one more commercial to do and then if you can come back with a tip or a website that you've found particularly helpful, of course the one you gave out earlier is pretty cool. If folks want to go to that, we can mention that again. But I'm going to ask you for another tip after our last commercial.
You mentioned earlier in the show, we've been talking about Apple HLS. That does seem to be becoming more well-accepted. But there's also Microsoft Smooth Streaming and there's MPEG-DASH that you mentioned as well. Do you want to talk about where's this train headed and who's going to be on it?
Greg: Well, obviously, Apple went first with all of this. I'm just looking for something here. Let me just look for something really quick. I want to pull it up on my screen so I can refer to it in just a bit.
Kirk: All right.
Greg: As you say, Microsoft has their Smooth Stream. Adobe has another one called HDS.
Kirk: Of course.
Greg: Both of these have very little traction by comparison to HLS and MPEG-DASH. HLS was first to the party. Obviously, the Apple folks, being the media company that they are, they've done a beautiful job of implementing it and promoting it and making sure that it actually all works. That's one thing about the Internet that I've noticed is that there's an awful lot of show and not necessarily much go. But that is not the Apple way of doing business. They try their hardest. They try their best at actually trying to make things work. What a concept.
For those of us in professional broadcast, that really is the way to go. I know there's an awful lot of Android use out there, but if you're in the media business, get a media device. It's pretty much guaranteed to work. I just put something up a little bit earlier, for example. That site that I mentioned, the EtherStream, if you just bring that up in Safari, for example, all of the players will show up in there.
Kirk: And they'll play.
Greg: The video delay is really hard to show this. Anyway, they all play in there. If you try to do that in any of the desktop browsers, that's going to be a problem, although it will work in Safari for Mac and it will work, as I pointed out, in the Microsoft Edge browser.
Kirk: You did say that. Believe it or not, on my Android phone, well, the first time I tried to play an HLS stream in the Chrome browser on the Android phone, it said, "What app do you want me to use to play this?" I chose the . . .
Greg: If you point it to Chrome, you've got a 95% chance that it's going to do it. Chrome is coming along. Chrome for Android is a little bit different build than it is for the desktops. It appears that Chrome for Android has some HLS natively built in it because some of those streams, the reason I say some, I can't explain it, it's buggy. For whatever reason, on the device that we tried earlier this morning, streams one and two play, five and six play, but three and four don't play for some unexplained reason. I don't get that. Maybe it doesn't like numbers three or four.
Kirk: Well, there's no doubt that if you want to make this work, use an Apple device. I'm just pointing out that I've actually had some good look demonstrating the HLS streams that I made on an Android device. Now, I also demoed them on my iPad as well just to show that it works on both.
Kirk: That's the .ts extension, right?
Greg: That's the .ts extension, correct. This was the page that I wanted to refer to that is another place that you might want to look. Over on our website, on Indexcom.com, if you go into tech, there's a thing there called HLS audio ESTS. You guys might want to look over there for some more information or if you just want to cut to the chase, Indexcom.com/whitepaper/ESTS. It goes into great detail as to what the advantages are there.
It is an Apple recommendation that you use ES for audio only. It is much lighter on the network because there is no padding. TS has gobs and gobs of padding. So, you're sending lots of zeroes and your audience is paying for all those zeroes in more ways than just with the spender. It's not just the spender.
It's also a reliability issue because now you've got to account for all those zeroes, which are part of the equation. If you don't get to zeroes, even though they don't account for anything, they account for the data. If they get displaced, then you have reliability issues. So, our encoders are all ES-based for that reason.
Kirk: Gotcha. Okay. Do you have a choice of that file extension, .es or .ts or anything else?
Greg: The file extension for transport streams is certainly .ts. For .es, which stands for elementary stream, the file extension there in the case of AAC will be .aac.
Kirk: Okay. All right. I'm comparing what you're telling me with my own experience with the company that I work for and what we're making. So, I'm just trying to find I understand this from our point of view, how does this work on your end and are we really talking about the same thing or not?
Greg: Yeah. Anybody who's doing HLS, and there are 50,000 different ways to do HLS.
Kirk: Of course. Why wouldn't there be?
Greg: There is more than one to do it right, but for the most part, there's only one way to do it right. We had the advantage of being able to learn from the masters. I'll just leave it at that. We got the job done. It is fully 100% Apple compliant, but anybody that is streaming with Apple compliance, their elementary streams for AAC will be .aac segments.
Kirk: Gotcha. Understood. Hey, we're talking to Greg Ogonowski. He agreed to join us for our show today and talk to us about HLS streaming. It may take a while to get your head wrapped around this. I've got to suggest that if you're interested in this kind of technology or you're a broadcast engineer or a streaming engineer, you're going to need to understand this.
Right now there are a couple of big companies, Greg's company as well as the company that I work for make software and hardware products that do this. Get one of them or get a demo or something so you can begin to play and understand. The way I learn is playing with stuff and understanding what works, what doesn't, "Oh yeah, there are five ways to do this that it doesn't work, but there are a couple ways to do it where it does work."
So, that's who we're talking to, Greg Ogonowski. We're going to finish up with a tip and a final word in just a minute.
Our show is brought to you in part by the folks at Lawo, Lawo.com. They make audio consoles. They make an audio console just for radio broadcasters or others who are smaller outfits producing audio content. It's the crystalCLEAR console. This is an innovative, amazing little console. It uses the crystal console's one-rack unit mix engine. So, this has the audio IO on it. It's got some mic inputs, line-level inputs, AES digital inputs and some outputs too, both analog and AES digital outputs. So, those are built into the back of this thing.
Then it also has a network connection. Now, of course you can browse into it that way. In fact, that's how the clear part, that is the touchscreen surface, connects to the crystal part of it, the mix engine. But also, that Ethernet connection speaks Ravenna and it's also compatible with AES67. So, you can have your audio over IP with the crystalCLEAR console from Lawo.
Now, it also has available dual redundant power supplies there. So, if your power on one circuit fails or one of the power supplies fails, you've got the other one to keep things powered up as well.
Now, what about the clear part? That's the part where you touch. Well, if you go to the website and have a look, you can see a video there where Mike Dosch, who's the director of virtual radio projects there, he gives you a demonstration of how the crystalCLEAR radio mixing console works.
As far as the user, you see this beautiful, big, multi-touch touchscreen. The console is in software, even visually. There are no hardware faders that move up and down. You do it on a touchscreen. They're designed, of course, with big, easy to touch faders that are easy to move up and down. Because it's multi-touch, it's very natural. If you're used to using an iPad or other multi-touch touchscreen interfaces, you already have a little bit of an idea of how this works. It's really quite natural. The buttons that you select there are big.
What's even cooler is because everything is done in software, everything is contextual. So, you won't have to wade through menus of stuff that don't apply to you because it knows, "Hey, if I want to touch the options button on a source, on a fader that is a microphone, I'm just going to give you the microphone-related options. I'm not going to make you wade through a bunch of other stuff." That's how they've designed the whole console to work.
Of course, it's got a big honking clock on it. You can keep that accurate with NTP protocol time. You can do automatic mix-minuses or automatic back feeds to any of the sources. So, if you've got people in another room on microphones, you can properly send them the proper back feed to their headphones, which means you can interrupt it, you can talk to them. The same thing for codecs for remote reporters or remote programming coming in, hybrids as well, automatic mix-minus to those things as well.
Check it out if you would on the web. Go to Lawo.com and look for radio products and look for the crystalCLEAR virtual radio mixing console. It is cool.
All right. Greg Ogonowski and Kirk Harnack are here with you on This Week in Radio Tech. Greg, can I put you on the spot and ask you to pass along some kind of a tip where our listeners and viewers may learn something after the show?
Greg: Well, you see this here. This is a clock. This is official StreamS swag. We have a joke here in the laboratory. The one with two clocks doesn't know what time it is.
Kirk: You mentioned this earlier on the show.
Greg: If you have two of these, then you definitely don't know what time it is. But one of the cool things about the StreamS encoders is they also have an ability to not only lock to NTP but PTP if you're so lucky. It is an attempt at keeping everybody on time. What's really cool is if somebody wanted to build a player client that was locked to GPS, then it could probably be about as good as those Axia or Ravenna protocols.
Kirk: Interesting. So, you mentioned earlier in the show, when you stream for a long period of time, it is absolutely going to happen that there will be some kind of -- I don't know if you call it sample slipping or packet slipping -- but one of the clocks, either the encoder or the decoder is going to be fashioned.
Greg: Somebody's going to give. The chances of the two clocks being exactly the same frequency are slim to none, as you very well know. Crystals are very close, but anybody that's looked at the clock on their PC can vouch for the fact that you know how far off they can be. The time base for an audio capture, like if you have an audio device that is capturing the audio, here's the time base. It's actually not the computer crystal, but it's the crystal on here.
Greg: In the case of something like Axia or Ravenna, then the time base will be PTP-based as we move into AES67. AES67 is all PTP-based, which is IEEE 1588. So, that's the time base there. In the case of our jocks in the box, that's an interesting animal because there is no soundcard. That's assuming that the playout system is on the same machine that the encoders live, so there you're at the mercy of PTP again.
But provided you can get all that locked in, then you can get pretty damn close to staying on time. But if you're at the mercy of a soundcard, some sound cards, especially if they're analog and they're consumer-driven, there's no mechanism to lock that. You could use word clock, maybe. But then you've got to worry about keeping the rest of the encoder on time. There are really two time mechanisms that you've got to keep under control. The StreamS encoders deal with both of those.
Kirk: Now, the good news about it, even though we have different clocks and it bothers engineers like you and others who really have to worry about these things, the end result is an occasional glitch in decoded audio when you're streaming. That occasional glitch might be very occasional with a long time between them. It might be every few minutes or so but it might be every few hours or even days.
Greg: That is correct. It all depends on how close your crystals are. You just never know. It's not going to be the same on two different devices. It is what it is.
Kirk: Every time you talk about this, I keep thinking of, "Like sands through the hourglass, such are the days of our lives."
Greg: There it is.
Kirk: He stole it from the set, ladies and gentlemen in Hollywood. All right. We've got to go. Our show has been brought to you by the folks at Axia and also the folks at Telos and the folks at Lawo. I appreciate very much them sponsoring This Week in Radio Tech.
You've been enjoying the last hour with me, Kirk Harnack. I guess Chris Tobin must have got stuck in the subway or something. He wasn't able to make it here. I'm sad about that, but I'm sure he'll be back next year. He's still not online. I'm hoping he had a good time wherever he was and got out of whatever he was doing.
Greg Ogonowski, thank you for being with us. I hope you'll come back and join us again some other time. At some point, we could figure out how to share your screen and have you talk over it and maybe show us some setup so engineers, once they start to get their head wrapped around this, they can really start to see an implementation of HLS streaming and experience it to experience how good it can be.
We'll put the link to your webpage with those eight different streams, we'll put that in the show notes so folks can just click on that as long as they're using a decent browser can hear it.
Greg: That would be great.
Greg: Thanks once again for having us here. We'd love to do it again. As you say, we could do some screen shares and show it all in action.
Kirk: All right. Good deal. All right, folks. Thanks a lot to Suncast for producing the show. He's always on top of the video switching and the lower thirds and keeping us on time, whispering in my ear if I've got to have it, and also Andrew Zarian, the founder of the GFQ Network for making this possible as well. For Chris Tobin, I'm Kirk Harnack. We'll see you next week on This Week in Radio Tech. Bye, bye, everybody.