Neural networks for video generation: a brief overview of Veo 3

12:43
08.09.2025
cognitronn
148

Early morning, a quiet street, and walking towards you is a grandmother in a headscarf, holding... a rhinoceros on a leash.

Yes, Google this year decided to finally erase the line between "shooting a movie" and "writing a prompt." The new version of their generator, Veo 3, is no longer a joke or an experiment, but a very serious statement. This is not about a funny ten-second clip, but a full-fledged movie: with lighting, sound, camera movement, and even elements of acting.

Today, we will explore what Veo 3 can do, see what it's truly capable of, and most importantly—try to "shoot" our own video with voiceover and atmosphere.

Stay tuned, it will be interesting!

Where to use Veo 3?

Let’s start with a bit of bad news. Veo 3 has officially launched only in the USA and about seventy other countries. You won’t find Russia, Belarus, the UK, or most of Europe on the list. To legally become one of the lucky users, you need an American IP, a local phone number, and an account with an active subscription.

But it's not all bad. In addition to subscribing, there are several working ways around that allow you to test the neural network.

One of these solutions is BotHub! It's a domestic platform that combines everything: from text and image generation to working with videos, documents, and code. There's no need to set up a VPN, look for workarounds, or register in ten different places. This is where we will test our model.

When registering via the Bothub link, you get 100,000 caps— grab your bonus and start creating!

What’s this beast—Veo 3?

Google unveiled Veo 3 at the I/O 2025. If earlier video generators looked like a school project made of cardboard and PVA glue, now we have a beast that behaves as if it graduated from VGIK. The picture quality has grown to 4K—you can now see how the character’s eyelashes flutter from the wind. And the best part is that the model has finally learned to listen. You write "grandmother with a rhinoceros" and you get exactly a grandmother with a rhinoceros, not a grandfather with a hippopotamus and a random cat on the side.

Earlier, the company introduced models like Imagen Video, Phenaki, VideoPoet, Lumiere, and also VeoandVeo 2.

But the main focus begins when Veo 3 opens its mouth. Now it can generate dialogues and voiceovers. And that’s not all: music and effects appear on their own — eerie violins, barking dogs, or rustling grass. You don’t edit the sound, it just exists. For me personally, this is the most magical part: just yesterday, it seemed that neural networks would barely manage with the image, but the voice — that was the territory of actors. And now, please: in one prompt, you have the actor, the director, and the composer. DeepMind has clearly had a hand in this, as they promised last year to teach neural networks to turn video into sound. Well, it seems they’ve kept their word.

How to create videos?

Text to Video. Creation using a text prompt
Frames to Video. Upload or generate images to use as starting and ending frames
Ingredients to Video. Upload or create images to use as object references

Let’s evaluate a few results!

A quite realistic video that would work well for a Coca-Cola advertisement.

And here we have a full action video with a squirrel.

Write the prompt correctly!

1. Who is in the frame?

Describe the character(s):

Age, gender
Appearance (hair color, clothing, details)
Emotions or mood

Example: A young woman with short red hair, wearing a grey hoodie, looking anxious.

2. What is happening?

Formulate the main action — short and clear.
Movements
Gestures
Interaction with objects or people

Example: She walks slowly across a rainy street, then picks up a ringing phone.

3. Where is it happening?

Specify the environment to set the atmosphere:

Location
Time of day or weather
Details of interior/exterior

Example: At a crowded metro station, in a foggy forest at dawn, inside a dimly lit diner.

4. Light, sound, and emotions

Set the mood through the atmosphere:

Lighting
Background music or noise
Dialogue (if needed)
Emotional tone

Example: Soft piano music in the background, the sound of rain and distant traffic, she says: “I’m not coming back.”

5. Camera angle and movement

Veo 3 understands cinematic techniques well:

Shot type (close-up, wide shot, drone shot)
Movement (slow zoom in, shaky handheld, panning shot)

Example: The camera slowly zooms in, shaky handheld close-up.

6. Visual style

Add genre or aesthetic.

Realism or cartoonishness
Specific genre or era
Comparison with movies/animation

Example: In the style of a 90s action movie, anime aesthetic, film noir lighting.

Final principle

Build the prompt like a short script, including: characters → action → location → atmosphere → angle → style.

This will give the neural network a clear structure, ensuring a more "cinematic" result!

Let's make our own video!

Let's bring the grandmother from our cover to life. We start with the prompt, and we will only use it!

Grandmother-action

Scene description: An elderly woman in a floral headscarf and oversized sunglasses, confidently walking through a typical Soviet courtyard with cracked asphalt and rusty playground swings. She is holding two leashes: on the left a giant brown bear, on the right a heavy gray rhinoceros. Both animals wear hanging wooden signs — the rhino’s says “Review”, the bear’s says “of Veo 3”. The mood is epic and surreal.

Visual details: cinematic wide shot, morning light, dramatic atmosphere, in the style of a 90s action movie trailer.

Camera: slow dolly zoom forward.

Sound: tense orchestral music with heavy drums, animals breathing, distant echo of city sounds.

Let's check the result:

It turned out quite well! There is a question about the strap in the right hand. At first, it blends a bit with the bear's fur.

Summarizing

+	—
4K graphics with good object physics, accurate matching with prompts	Cannot control sound or subtitle creation
Lip sync with words and great sound accompaniment	High likelihood of artifacts that are not easy to remove
Good variability for creating material	Expensive

In conclusion, Google with Veo 3 has really made a step that is hard to call just an update. It is no longer a toy for enthusiasts, but a tool that can comfortably compete with professional studios. Given the automatic voiceover, built-in effects, and nearly cinematic rendering — it’s no wonder that many are already calling Veo 3 the new standard in the world of video generation.

Of course, the service is still region-locked and requires workarounds to access, but the fact remains — we are witnessing how cinema and artificial intelligence have finally shaken hands. And it seems that Google is serious about keeping this standard for a long time!

Thank you for reading! Share your experience of creating videos with neural networks in the comments. Perhaps you have a favorite service. We’d love to hear about your work!

Neural networks for video generation: a brief overview of Veo 3

Early morning, a quiet street, and walking towards you is a grandmother in a headscarf, holding... a rhinoceros on a leash.

Where to use Veo 3?

What’s this beast—Veo 3?

How to create videos?

Let’s evaluate a few results!

Write the prompt correctly!

1. Who is in the frame?

Describe the character(s):

2. What is happening?

3. Where is it happening?

Specify the environment to set the atmosphere:

4. Light, sound, and emotions

Set the mood through the atmosphere:

5. Camera angle and movement

Veo 3 understands cinematic techniques well:

6. Visual style

Add genre or aesthetic.

Final principle

Let's make our own video!

Summarizing

Write comment

Relevant news on the topic "AI"

AI website generator on ChatGPT and Next.js 15: Creating SEO‑optimized pages from scratch

Parsing Telegram channels, groups, and chats with LLM processing

Pentest with AI agents, experiment with CAI

Implementation of structured data for AI assistants: FAQPage, HowTo, comparison tables

I was a designer for 6 years, creating images for news, and then the neural network came

Also read

Review of the Acemagic Tank 03 Mini-PC firsthand. A glowing cube with powerful hardware

Concepts of Information Security

Parsing Telegram channels, groups, and chats with LLM processing

Pentest with AI agents, experiment with CAI