Generative AI: Creating the impossible

Live team
Apr 30
8 min read

Generative AI has gotten so good that it’s now being deployed in real time. Used to digitally de-age Eminem, Frank Sinatra, Tom Hanks and Robin Wright, we explore its limitless applications and impact on live entertainment

Words Katie Kasperson

Naysayers aside, generative artificial intelligence holds incredible creative power. Until recently, GenAI was typically producing images and videos that looked unnatural and sometimes even downright strange. The technology has generally been easy to spot and hard to overlook.

Then deepfakes entered the discussion with visuals that were eerily realistic – so much so that they convinced viewers of their authenticity. This kicked up ethical debates and legal disputes, with celebrities like Taylor Swift and Scarlett Johansson speaking out against the non-consensual use of their likenesses (a protest not at all new to the production industry; Crispin Glover sued Back to the Future Part II over the same issue, except AI was not involved in his case).

But what about when GenAI is used ethically, and tastefully too? In the past few years, entertainment companies have been harnessing the technology for creative means after securing actors’ and artists’ consent. Robert Zemeckis’ latest film Here stars Tom Hanks and Robin Wright, yet despite a three-decade difference, they appear as if fresh out of Forrest Gump. At the 2024 MTV Video Music Awards, Eminem performed Houdini alongside his younger, ‘Slimmer’ self, bleach-blonde buzz cut and all. Snoop Dogg, Dr Dre, Frank Sinatra and Sammy Davis Jr all appear in the same commercial for Still GIN. It’s not sorcery and it’s not CGI; it’s generative AI – and it’s working in real time.

Big ideas

With offices in Los Angeles, London and elsewhere around the globe, Metaphysic is pushing the lever on what’s possible with GenAI. A few years ago, Here was just an idea – and a relatively far-fetched one at that. Then, it would’ve taken years to age and de-age its stars using existing conventional methods. But now, it takes ‘nanoseconds’, according to Ed Ulbrich, chief content officer and president of production at Metaphysic.

“During the pandemic, a friend of mine named Kevin Baillie, who is Robert Zemeckis’ VFX supervisor, turned me on to the deepfake Tom Cruise TikToks,” Ulbrich recalls. “I’d seen deepfakes before, but when a bunch of us VFX guys saw that, we were all looking at it and going ‘holy shit’. Long story short, that was the genesis of Metaphysic.”

The company first debuted its photoreal GenAI technology back in 2022, when it was used to revive Elvis Presley for a performance on America’s Got Talent. They placed the King’s face on top of tribute singer Emilio Santoro’s during the live broadcast and audiences could hardly believe their eyes.

To train its models, Metaphysic scours the Earth for source data – in an entirely ethical manner, of course. For deceased actors and artists, this means getting the green light from their estates, as well as film studios’ permission to use existing footage. “It is unforgiving,” states Ulbrich. “There’s no margin for error. It’s a great responsibility to bring someone like that back, and we take it very seriously. It’s with the blessings of estates every step of the way.” For vocal performances, Metaphysic also uses GenAI to ‘recreate their actual voices, as opposed to using impersonators’.

In an instant

Here has been Metaphysic’s big project from the start. “A bunch of old guys – and I’ll include myself in that category – are harnessing the most advanced, cutting-edge technology in the world to make a traditional live-action movie. It’s just actors talking, it’s a drama. It’s not superheroes throwing buildings at each other,” Ulbrich says half-jokingly.

“Zemeckis had the idea of using this technology. He thought this could get the movie made because doing it with CGI instead would cost tens of millions of dollars. It’s throughout the entire movie, and would have been like composing music by one note a day, multiplied by four characters. It would have been a very long, expensive, tedious process just for people talking on camera,” he explains, having been an early proponent of CGI in the nineties. The team behind Here initially contacted both VFX studios and AI companies, testing the available technology for whether it could support Zemeckis’ vision.

“At the end of the day, only one test – Metaphysic’s – was successful,” states Ulbrich, who was working elsewhere at the time. “I see the test; it’s Tom Hanks delivering a line from the movie. Then they show the same footage again, and he’s 20 years old, delivering the same line. Then they tell me they’re doing it live, in real time. I’m like ‘wait, what?’”

This instantaneous element has serious implications, creating avenues for live performances and productions on top of traditional, pre-recorded ones. Of course, the Elvis AGT moment had already happened, proving that live GenAI was possible in entertainment, but Here was the first time Ulbrich had witnessed it with his own eyes. “It blew my mind,” he admits. “I could take the best CG team and the best company on Earth – hundreds of people with limitless sums of money – and we couldn’t deliver the same thing. It’s a completely unique technology for production.”

A model with a good memory

Besides saving time, live GenAI also boasts another benefit: “It’s always photoreal,” claims Ulbrich. Trained on ‘trillions of pixels’, the AI model ‘starts on the other side of the uncanny valley’ as opposed to CGI. “Things that are hard to do in CGI, like eyes, mouths, lip sync and emotionality, are easy for us. It’s all built on photography.

“We, as humans, have seen other humans since birth, so we’re all experts in the human face. When it’s weird and wrong,” Ulbrich claims, “we know it.”

When working with late talent, data procurement options are limited by what’s physically possible – that is, Frank Sinatra and Sammy Davis Jr can’t show up for a photoshoot. When training the model for Here, however, Hanks, Wright, Paul Bettany and Kelly Reilly (who play Hanks’ parents) were able to visit the studio, where Metaphysic could record them for ‘30 or 40 minutes’ while they moved around the room and spoke. “We are training large neural models, which are able to memorise lots of things about images at scale and then recreate them,” shares Ulbrich.

But these models have to be pretty well fed to believably recreate a person’s likeness – and that’s where additional footage comes in. With Hanks, it was relatively easy. “Tom Hanks is probably one of the most documented humans in the 20th century,” argues Ulbrich, “just by the sheer number of films he’s starred in and other appearances. He’s a very well-documented person – there is an incredible portfolio of him. That’s the source data.

“We need high-quality data, which means high-resolution footage, and we need a variety of lighting conditions,” Ulbrich continues. “We want to see that subject in as many different scenarios as possible, shot many different ways, and with different lenses. We’re not just training on their face, we’re training on light, how it behaves on their face and on how their face articulates.

“AI doesn’t see shots; it sees pixels,” Ulbrich clarifies. “We get it to analyse and memorise the trillions of pixels that make up Tom Hanks’ face in the case of Here. We can then point that AI at an input, and in this case our input is a live performance from actual actors.”

In filmmaking, digital de-aging doesn’t need to be instantaneous (it hasn’t been, up until this point), but it definitely doesn’t hurt. “During shooting, we can look up and see Tom and Robin in their 50s and 60s, while seeing them at 20 years old on the playback monitor. For me, that was jaw-dropping,” Ulbrich remembers. “I can’t forget this now that I’ve seen it for myself.”

Guess who’s back?

We have covered the nuances of the uncanny valley, but what’s most uncanny is how well the tech actually seems to work, especially in truly live scenarios. Seeing a de-aged actor in a film is one thing – it’s something that could’ve been done in post – but seeing an early-2000s Eminem (real name Marshall Mathers III) or resuscitated Elvis on live TV is another.

“We’re seeing a lot of other companies out there developing their tech,” says Ulbrich, “but only testing it in a controlled environment. It may look cool as a demo, but hasn’t yet been battle-tested in real production scenarios.”

Because Metaphysic works on both pre-recorded and live productions, the company can use one to battle-test the other – which is exactly what they did for Eminem. The rapper’s Houdini music video sees him confront his younger self (aka his alter ego Slim Shady) to the lyrics, “I sometimes wonder what the old me’d say/If he could see the way shit is today.” For this, Metaphysic created the Shady face and placed it on a body double during filming.

Then, at the 2024 MTV Video Music Awards, they took it one step further, bringing Shady back once more for the live concert. While in-person attendees saw a body double on stage, they were able to watch Slim Shady on the big screens, as did audiences at home.

“Once we create a likeness – whether it’s Tom Hanks, Marshall Mathers or whoever – once that exists you can use it for anything. You can use it for music videos, TV commercials, during your tour; even in a major motion picture,” states Ulbrich, noting that applications of this technology go beyond what he himself could imagine. Now Eminem actually owns his likeness, which he is free to use again on future projects – including live entertainment like tour shows.

Into the unknown

Historically, VFX has been wildly expensive and downright unfeasible for low-budget productions. GenAI is changing all that, offering ‘nimble and accessible’ tools to all sorts of projects, big or small. “Projects with more humble budgets can easily afford this, whereas before they couldn’t. It unlocks lots of creative ideas that were previously far too expensive to realise,” argues Ulbrich.

There’s a general fear that AI will replace human labour and cost creatives their jobs. For instance, Zemeckis could have cast several actors to play the main characters of Here through the years, but this could have made the film feel slightly disjointed. Alternatively, he could have swung in the opposite direction, taking a Richard Linklater approach and making Here a decades-long project.

By using GenAI instead, Zemeckis saves time and money while maintaining creative continuity. “We can make more of these projects now because the costs are coming down,” Ulbrich adds. “A lot of the people who are doing our AI work come from traditional digital effects or other areas because they pick up this stuff so quickly,” shares Ulbrich. “And in many ways, it’s more intuitive than the old tools. It’s easier to understand this once you see it, but it just makes sense.

“I always think that fear isn’t an option. I imagine that the people who invented lanterns would have been very angry with Thomas Edison. The horse-and-buggy people probably didn’t care for Henry Ford either. Here we are,” Ulbrich says matter-of-factly. “We see enormous opportunity for people and their careers – and there are people harnessing this technology in a way that’s inspired, to benefit their creative vision.”

In addition to GenAI, artifical intelligence is making its presence felt in a myriad of ways across pro AV too.

Find out more about Metaphysic at metaphysic.ai