Sora by OpenAI: principle of operation, video examples and comparison with Runway

23:47
13.12.2024
NeyroEntuziast
3570

After 10 months of rumors, leaks and speculative tweets from various AI enthusiasts, OpenAI has finally unveiled its long-awaited AI model for video generation - Sora.

A few days ago, the AI community was already anticipating the release of Sora, and I was one of those who constantly refreshed the Sora.com website, like a desperate sneakerhead trying to purchase limited edition models.

If you cannot access the site, check the list of supported countries here. You may need to use a VPN if your country is not on the list.

Honestly, I expected very little from Sora. It has been almost 10 months since we heard any serious news about a video model from OpenAI, and with new names like Kling AI, Runway Gen-3, and Hailou AI gaining traction, I thought Sora would be late to the party.

But I must say, after seeing some of the results shared by people, I am impressed. It might indeed give these competitors a run for their money.

However, just because a few early samples impressed me doesn't mean I'm ready to buy. There are many things that matter when it comes to AI video creation tools:

Supported input media (text, image, and video)
Generation speed
Output quality (resolution, consistency, and length)
Editing controls (expansion, trimming, merging, etc.)
Pricing

These are the points I focused on. I want to understand if Sora has enough capabilities to justify its use and ultimately if it's worth paying for.

What is Sora?

If you're hearing about it for the first time, Sora is an AI video generation tool from OpenAI that can create short clips from text, images, and even other videos.

At the beginning of this year, OpenAI introduced a preview version of Sora, showcasing its achievements in "world modeling" - essentially training the model to understand and represent aspects of the physical world.

Now Sora Turbo, a faster and more advanced version, is being released as a standalone product available to ChatGPT Plus and Pro users on Sora.com.

How Sora Works

Using the Sora Video Editor, you can create videos up to 20 seconds long while maintaining good visual quality and following your prompts.

Video Examples

Now let's look at some videos generated by Sora and my thoughts on each of them.

Prompt: A serene and otherworldly scene unfolds with high-resolution digital clarity, capturing a minimalist beach at twilight where fashionably dressed figures recline on floating chairs woven from delicate reeds, hovering just above the gentle waves; the ocean emits ethereal smoke particles that rise and form abstract patterns in the air; bioluminescent plankton in the water glow softly, illuminating the scene with a magical light as the chairs drift effortlessly along the shoreline.

The smoke and water splashes look incredibly detailed. I also like how the objects behave as they float on the water. It seems that Sora knows how objects should move and interact in a physical environment.

Prompt: Whales Soaring Skyward. (Subtle Remix): make the video more vibrant and the butterflies more colorful.

This is a challenging task because it requires not just making the video look realistic, but also making it beautiful and vibrant. Sora managed to handle this task.

The butterflies look more colorful, and the whole scene seems more magical, like something out of a fairy tale. It doesn't have to obey the physics of the real world, but it still looks professional and impressive.

Prompt: Japanese Winter Market (Storyboard)

Generating people in AI videos is always difficult. Even here, I notice some strange details - the nails look odd, and the limbs may be misaligned. Most existing models struggle with human anatomy, and Sora is no exception. It's not perfect, but I see some improvements compared to what I've seen before.

Key Features of Sora

One thing that catches my attention is the interface that OpenAI has developed for Sora. They didn't just give us a prompt input field and stop there. They introduced a variety of features: storyboard tool, Remix, Recut, Loop, Blend, and style presets.

Remix. Changing the scene by switching backgrounds, replacing objects, or adding and removing elements.
Recut. The Recut feature allows you to shorten and lengthen video frames.
Storyboard. Frame-by-frame video layout for better control over complex sequences.
Loop and Blend. Loop creates continuous animation, while Blend smoothly merges two clips into one.
Style Presets. Instantly apply a predefined visual style, such as papercraft style, without endlessly editing prompts.

How Sora Compares to Competitors

Right now, I'm stuck waiting for stable access. Due to the hype, the servers are overloaded, so new users can't create videos at all. If you see this error message, you're not alone.

Sora by OpenAI: how it works, video examples, and comparison with Runway

Once I get access, I plan to compare Sora with Runway Gen-3 and Kling using the same prompts and see which results are more consistent and realistic.
In the meantime, let's try comparing a pre-made Sora video with Runway Gen-3. Here's an example of a video created with Sora:

Prompt: a family of grizzly bears sit at a table, dining on salmon sashimi with chopsticks.

Video generated by Sora

This 5-second video clip shown below is generated in Runway.

Video generated by Runway Gen-3 Alpha

From what I see, Sora's result seems more lively. The bears are not just sitting still. The camera moved, and there was a sense of activity in the scene. The Runway Gen-3 version, on the other hand, looks more static, almost like a slightly animated image.

I also need to find out if the claimed "speed" is true. The early research model was slow and expensive to use. Sora Turbo should be significantly faster, but "fast" in AI terms can still mean a few minutes of waiting for a 20-second clip.

Will this speed improve over time? Possibly.

Prices for Sora

If you want to use Sora, it is currently tied to specific ChatGPT subscription plans:

ChatGPT Plus ($20/month)

You will be able to generate up to 50 priority videos (1,000 credits) per month, create videos with a resolution of up to 720p and a duration of up to 5 seconds.

ChatGPT Pro ($200/month)

You will be able to generate 500 priority videos (10,000 credits) per month, an unlimited number of videos with a resolution of up to 1080p and a duration of up to 20 seconds. You can also run up to 5 generations simultaneously, and the downloadable videos will be watermark-free.

Many people, including myself, are currently waiting for stable access to Sora. Server capacity issues are clearly frustrating, and social media is full of users who cannot register or generate videos at all.

Nevertheless, the initial results of Sora are impressive. The realism and consistency are excellent, and it is interesting to imagine what this could mean for creative people, filmmakers, and anyone who wants to bring their ideas to life through video.

But these same capabilities also raise serious concerns. With such realistic videos, we must acknowledge the risk of misuse: fake news, fraud, and deepfakes.

It is curious that tech blogger Marques Brownlee noticed: when he asked Sora to generate a video with a tech reviewer, there was a suspiciously green plant on the table, very similar to the one that can often be seen in his own videos.

Sora from OpenAI: how it works, video examples and comparison with Runway

Was it a strange coincidence, or did the OpenAI training data include his work and repeat these details? It's hard to say for sure, but it highlights the uncertainty associated with how these models are trained and what they might accidentally discover.

In any case, if they manage to establish access, maintain quality, and create a community that will responsibly use the tool, Sora could replace Kling or Runway as the main tool for creating AI-generated videos among users. For now, I refrain from drawing any final conclusions until the hype dies down, the servers work properly, and I get a chance to really test it.