We no longer need to be together to have a shared experience, with people now consuming live content in a variety of ways. But this loses a large part of what makes live events so compelling: connection to the action. We explore how immersive audio invites us back in
Not that long ago, when something was live on TV you had to make time to see it. You had to show up and be physically present in front of the box. What if it was never shown again?
For many people, squeezing in front of the TV is still a worthwhile social activity, but widespread connectivity means it doesn’t have to be. Today, we have unfiltered access to whatever we want, whenever we want it, wherever we are. We don’t have to rush home from work to watch the World Cup final live on TV if we can watch it on the bus on a device we keep in our pockets.
How cool is that?
The irony is, in this increasingly connected world, more and more of us are consuming media content in a totally disconnected way.
But live events are as much about feeling like we’re part of something bigger as they are about the actual content, and as we access more of our media on devices, we are increasingly disengaged both from the action and each other.
In these environments, it is not what we see but what we hear that provides the connection. It’s no surprise that the desire for immersive or spatial audio is on the increase.
Feeling the connection
According to Qualcomm’s latest State of Sound Report, spatial audio is the next ‘must-have’ feature. More than half of those surveyed claimed that “spatial audio will have an influence on their decision to buy their next pair of true wireless earbuds, and 41% said they would be willing to spend more for the feature.”
When we think of immersive audio, we think of 3D sound – and it’s everywhere, from gaming to streaming music to live broadcasting, adding value to each presentation. According to audio and recording manager Gerardo Marrone, who has spent this summer producing a series of live immersive events at London’s Kings Place: “There has always been a sensory conflict between what you see and what you hear, but the main focus of the performance no longer has to be on the stage, and can provide an intimate connection between the art and the people. The audience becomes part of the performance, not just a witness to it.”
Whatever the medium, spatial audio adds value and turns a presentation into an experience. But while more consumers are switching on to it, it’s nothing we haven’t heard before.
Binaural audio
Binaural audio is a two-channel format which dates back to 1881 and provides an immersive environment by mimicking how our ears work. When we hear sound, we organically filter it depending on location. Binaural recordings take positional information from a pair of microphones and apply a filter, similar to how we do.
We do this naturally using three localisation cues. The inter aural level difference is how loud the sound is, so if it is louder in one ear we know the source is on the side nearest that ear, while the inter aural timing difference is how long it takes to get from one ear to the other.
The third is head-related transfer function (HRTF), which provides information based on average ear and head shapes. It provides information for the height plane as well as position and distance. Binaural recordings are often created using dummy heads, complete with mics implanted in dummy ears. For programming, these effects can also be created artificially using delay and panning on an audio workstation or mixing console.
The reason we’re banging on about it is because binaural mixes are a big reason why immersive audio is so popular. Binaural is a standard two-channel delivery, so consumers only need a pair of headphones to experience it. And as Qualcomm knows, this is the preferred delivery method to millions of phone users across the planet.
The big names
Developed in 2012, Dolby Atmos has done more to popularise immersive audio than anything else and is the most recognisable encoded immersive format around. It consists of essentially a 5.1 or 7.1 surround mix, with additional speakers in the height channel. Apple Music is another early adopter that has worked hard to bring spatial audio to market, with millions of people experiencing it through Apple Spatial Audio, which decodes Atmos content to a binaural format across thousands of songs on the streaming service.
These implementations have helped consumers appreciate the benefits, and the content production industry is responding positively, with stereo albums not only being converted to spatial formats but mixed as immersive-first productions.
Robert Edwards is a double Bafta award-winning sound director and fellow of the Institute of Professional Sound. He has spent nearly 50 years mixing audio across live news, sports and entertainment and has been working with multichannel audio since the introduction of 5.1 surround sound in the mid-2000s.
“Apple Music is a great example of immersive integration,” he says. “For a lot of music the default delivery is an Apple lossless mix in Atmos because it does sound incredible. You can have a seriously good listening experience and the music world is embracing that.
“Classical music really does benefit from the space that is created in the music, but a great stereo mix is paramount as it still has to sound good on the radio. Equally, people are enjoying a value-added experience.”
Fireworks
Edwards understands the value of spatial audio better than most. In 2022 and 2021, he presented millions of viewers with full live, immersive coverage of both the opening and closing ceremonies of the Beijing Winter and Tokyo Summer Olympic Games. In fact, as a testament to how popular the format is, those events provided immersive coverage for more than 100 different events, with audio in the height channels as well as 5.1 surround.
“You’ve got to have a plan for immersive,” says Edwards. “You can’t just roll up with 12 microphones and expect to get something out of it. It is more subtle than that and needs more care.
“My mixing philosophy was to divide the stadium into levels. The field of play was a 5.1 surround mix at ground level, but we had another plane at the same height as the PA speakers, and because big events like these often have firework displays, there was a third level to provide extra oomph for people listening in an immersive world.”
What if we’re not there?
This highlights one of the challenges of where we are on the road to full spatial audio, and one of the challenges for sound engineers is how to cater for all listeners. Despite the popularity of 3D soundbars, which bounce height channels off the living room ceiling, and spatial renderers, which create binaural mixes for earbuds, we are all at different points on that journey – and we are not all there yet.
The vast majority of people are still living in two-channel stereo world, and not everything is attached to a streaming service which will automatically create a spatial mix for you.
“How your fantastic multichannel mix ends up when it is downmixed on TV speakers for the majority of people when they are sat on their sofa is an age-old problem, and a lot of effort goes into ensuring the downmix still sounds good for the majority of viewers,” says Edwards.
“In an Atmos presentation, you are often adding information into the height channels, which doesn’t necessarily contribute in a positive way to what you’re doing at the base level, so there’s a balance to be found. If the mix is a totally dedicated Atmos mix and you know all your immersive channels are staying as discrete channels, then that issue is lessened. But you must be aware how much colouration these extra layers add to any downmix most viewers will be experiencing.
“What you have collected in an immersive environment should also have a positive impact to the downmixes. With something like the fireworks, it was vital to make sure that this top layer was not just a Dolby Atmos effect. While promoting the fireworks for the Atmos world in the top layer above your head, I had to be mindful of how to integrate it so it could be heard in the main mix for those listening in stereo.”
Channels vs objects
The way we have traditionally experienced audio in live broadcast is channel-based, where each audio channel is mixed to a specific loudspeaker; two channels are fed into two speakers for stereo, six channels for 5.1 surround, 12 channels for 7.1.4 immersive and so on.
Spatial audio is different in that it treats some components as independent objects, and an object can be anything from a single commentator mic to a direct stadium PA feed, or fireworks. Each object has associated metadata to describe what it is, such as its level and where it is positioned in space, and immersive scenes are created using multiple objects which each have a place in that scene.
Receivers on consumer devices use the information in the metadata to reproduce the soundscape as the mixer intended, and create the downmix accordingly to meet the listening requirements of the consumer equipment.
In this way, formats like Dolby Atmos can work for 3D soundbars as well as a stereo or binaural format; it is the metadata that contains all the information to produce every mix.
Getting personal
Objects unlock opportunities for broadcasters to create more immersion as the embedded metadata also provides the ability for end users to personalise their listening experience.
It enables viewers to change the contribution of enabled objects such as crowd noise or commentary, and while we are not there quite yet, broadcasters are close. AI is being trialled to create multiple audio mixes in real time, and the BBC is trialling personalised mixes from events like Eurovision, which Edwards was also involved in.
“At Eurovision, metadata enabled the BBC to generate an immersive mix from the feeds I was sending them, but also offer a choice of commentary. It was an opportunity to look at real-time metadata manipulation where consumers can choose to listen to the show. In the future, this will extend to other languages.
“Viewers will have access to select not only which audio they want to listen to but also where that commentary might fit in the sound field. You could choose to have the commentator as if they were sat alongside you, in the front and centre as normal, or even behind you. You could even turn it off. It goes beyond just immersive; it’s taking what’s being generated within the immersive environment and using it in a different way.”
Going over the top
Ultimately, as much as consumers are waking up to the benefits, we’re still waiting for the tipping point. Consumer buy-in is there, delivery methods like the Audio Definition Model (ADM) are proven, and consumer equipment that enables spatial listening is everywhere.
Content providers are also looking at ways to increase the value proposition for their customers, with Netflix already offering its stereo customers programming that creates spatialised experiences. Since June 2022, the streaming platform has been using Sennheiser’s Ambeo 2-Channel Spatial Audio renderer on more than 700 presentations to create what Sennheiser calls an ‘enhanced two-channel mix’ from an immersive signal.
According to Edwards, it’s likely that it is in streaming where immersive audio will find a home.
“While Sky’s standard production format for delivery is 5.1 and Dolby Atmos for premier sports, ITV has no requirement for any programming to be anything more than stereo, and other terrestrial broadcasters who have experimented with multiple-channel formats have pulled away because of delivery issues,” he suggests.
“However, the Eurovision 5.1 multichannel mix is available on the BBC’s streaming platforms and YouTube in that same format. The market for this technology isn’t generally terrestrial; it is being led by the streaming services.
“It’s going to come from the commercialisation of the product. Rather like Apple has done with its music service, somebody will take ownership and say we’re going to be a channel that’s going to do UHD and immersive delivery, rather like Sky have done for the Premier League, because those two things will go side by side.
“Broadcasters have got to have a business plan for it. In the meantime, we’re all happy to experiment, to dabble creatively around and learn about it, because at some stage somebody in a suit will make it a selling point. And then all of a sudden they’ll ask where the content is.
“And we can all go, yeah, we can do that – and it becomes the next big thing.”