GAME OVER! OpenAI’s SORA just won the text-to-video race

The inevitability of script-to-screen technology is closer than ever.

OpenAI released test footage and announced, “Introducing Sora, our text-to-video model. All the clips in this video were generated directly by Sora without modification. Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions.”

See openai.com/sora for more.

“Sora is able to generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background. The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world.”

“The model has a deep understanding of language, enabling it to accurately interpret prompts and generate compelling characters that express vibrant emotions. Sora can also create multiple shots within a single generated video that accurately persist characters and visual style.”

See the Technical Research.

Beyond text-to-video, “Sora can also be prompted with other inputs, such as pre-existing images or video. This capability enables Sora to perform a wide range of image and video editing tasks — creating perfectly looping video, animating static images, extending videos forwards or backwards in time, etc.”

Sora can even replace the whole background in a video: “Diffusion models have enabled a plethora of methods for editing images and videos from text prompts…. One of these methods, SDEdit,32… enables Sora to transform the styles and environments of input videos zero-shot.”

My take: this is powerful stuff! Workers in media industries might want to start thinking about diversifying their skills….