What this futuristic Olympics video says about the state of generative AI

The Paris Olympic and Paralympic Games are over, so the 2028 Summer Olympics in Los Angeles seem far away. Despite this, the prospect of watching the games in his hometown made Josh Kahn, a filmmaker in the world of sports entertainment who has worked on creating content for LeBron James and the Chicago Bulls, think even more about the future: what the Olympics would be like of Los Angeles in the year 3028?

It’s the perfect kind of creative exercise for AI video generation, which became popular with the debut of OpenAI’s Sora earlier this year. By typing prompts into generators like Runway or Synthesia, users can generate high-definition videos within minutes. It is fast and cheap, and presents few technical difficulties compared to traditional techniques such as CGI or animation. Even though every frame isn’t perfect – distortions like six-fingered hands or disappearing objects are common – there are, at least in theory, several commercial applications. Advertising agencies, businesses and content creators can use technology to create videos quickly and cheaply.

Kahn, who has been experimenting with AI video tools for some time, used the latest version of Runway to imagine what the Olympics of the future would look like, inserting a new prompt into the model for each shot. The video is just over a minute long and features stunning aerial views of a futuristic version of Los Angeles, where sea levels have risen dramatically, leaving the city squeezed to the coast. A football stadium is located atop a skyscraper, while a dome in the middle of the harbor houses beach volleyball courts.

The video, shared it’s less of a roadmap for the city and more of a demonstration of what’s now possible with Artificial Intelligence.

“We were watching the Olympics and saw the care embedded in the cultural narrative of the host city,” says Kahn. “There is a culture of imagination and storytelling in Los Angeles that has, in some ways, set the tone for the rest of the world. Wouldn’t it be amazing if we could show what the Olympics would be like if they came back to LA in a thousand years?”

More than anything, the video shows the potential that generative technology can have for creators, but it also reveals what’s holding it back. While Kahn declined to share his prompts for the scenes or specify how many were needed to get each shot right, he cautioned that anyone wanting to create good content with AI should be comfortable with trial and error. A particular challenge in their futuristic project was getting the AI ​​model to think outside the box in terms of architecture. A stadium floating on water, for example, is not something most generative models have frequently encountered in their training data.

With each shot requiring a new set of prompts, it’s also tricky to ensure a sense of continuity throughout a video. For a generational model like this, it is difficult to remain aware of the color, the angle of the sun and the shapes of the buildings. The video also doesn’t contain any close-ups of people, which Kahn says AI models still tend to struggle to create.

“Currently, these technologies work better at large scales than in more subtle human interactions,” he says. For this reason, Kahn imagines that the first cinematic applications of AI-generated video could be for wide shots of landscapes or crowds.

AI video expert Alex Mashrabov, who last year left his role as director of generative AI at Snap to found a new technology-generated video company called Higgsfield AI, agrees with video’s current flaws. of AI. He also points out that creating good content with a lot of dialogue is difficult with Artificial Intelligence, as it depends on subtle facial expressions and body language.

Some content creators may be reluctant to adopt this type of video simply because of the amount of time it takes to repeat the prompts until they get the desired result.

“Typically the success rate is one in 20,” says Mashrabov, but it’s not uncommon to need 50 or 100 attempts.

However, for many purposes, this is enough. Mashrabov says he has seen an increase in AI-generated video ads from large vendors like Temu. In goods-producing countries like China, video generators are in high demand to quickly make eye-catching video ads for specific products. And even though an AI model may require many prompts to produce a usable ad, filming it with real people, cameras and equipment can be a hundred times more expensive. Applications like this could be the first large-scale use of AI-generated video as the technology gradually improves, he says.

“Although I think this is a very long road, I am very confident that there are more achievable fruits,” says Mashrabov. “Today, we are discovering the genres that generative AI is already good at.”

( fonte: MIT Technology Review )