- AI
- A
Neural networks for video generation: a brief overview of Veo 3
Early morning, a quiet street, and walking towards you is a grandmother in a headscarf, holding... a rhinoceros on a leash.
Yes, Google this year decided to finally erase the line between "shooting a movie" and "writing a prompt." The new version of their generator, Veo 3, is no longer a joke or an experiment, but a very serious statement. This is not about a funny ten-second clip, but a full-fledged movie: with lighting, sound, camera movement, and even elements of acting.
Today, we will explore what Veo 3 can do, see what it's truly capable of, and most importantly—try to "shoot" our own video with voiceover and atmosphere.
Stay tuned, it will be interesting!
Where to use Veo 3?
Let’s start with a bit of bad news. Veo 3 has officially launched only in the USA and about seventy other countries. You won’t find Russia, Belarus, the UK, or most of Europe on the list. To legally become one of the lucky users, you need an American IP, a local phone number, and an account with an active subscription.
But it's not all bad. In addition to subscribing, there are several working ways around that allow you to test the neural network.
One of these solutions is BotHub! It's a domestic platform that combines everything: from text and image generation to working with videos, documents, and code. There's no need to set up a VPN, look for workarounds, or register in ten different places. This is where we will test our model.
When registering via the Bothub link, you get 100,000 caps— grab your bonus and start creating!
What’s this beast—Veo 3?
Google unveiled Veo 3 at the I/O 2025. If earlier video generators looked like a school project made of cardboard and PVA glue, now we have a beast that behaves as if it graduated from VGIK. The picture quality has grown to 4K—you can now see how the character’s eyelashes flutter from the wind. And the best part is that the model has finally learned to listen. You write "grandmother with a rhinoceros" and you get exactly a grandmother with a rhinoceros, not a grandfather with a hippopotamus and a random cat on the side.
Earlier, the company introduced models like Imagen Video, Phenaki, VideoPoet, Lumiere, and also VeoandVeo 2.
But the main focus begins when Veo 3 opens its mouth. Now it can generate dialogues and voiceovers. And that’s not all: music and effects appear on their own — eerie violins, barking dogs, or rustling grass. You don’t edit the sound, it just exists. For me personally, this is the most magical part: just yesterday, it seemed that neural networks would barely manage with the image, but the voice — that was the territory of actors. And now, please: in one prompt, you have the actor, the director, and the composer. DeepMind has clearly had a hand in this, as they promised last year to teach neural networks to turn video into sound. Well, it seems they’ve kept their word.
How to create videos?
Text to Video. Creation using a text prompt
Frames to Video. Upload or generate images to use as starting and ending frames
Ingredients to Video. Upload or create images to use as object references
Let’s evaluate a few results!
A quite realistic video that would work well for a Coca-Cola advertisement.
And here we have a full action video with a squirrel.
Finally, let’s watch a video comparison with some competitors.
Interesting fact: all videos from Veo 3 have a watermark SynthID, but it’s invisible to the human eye. It’s used to recognize AI-generated content. This is the company’s policy against misinformation.
Let’s consider a real-world example of the model used in full advertising practice.
The company Jellyfish, a leader in digital marketing and part of The Brandtech Group, has begun actively using Veo in its own AI platform Pencil. The technology is already being applied in advertising campaigns, and recently it received a more unusual application: together with Japan Airlines, the team launched a project to create in-flight entertainment generated by artificial intelligence.
As part of the partnership, the first videos entirely created on Veo 3 were presented. This is a demonstration that generative video is no longer just a tool for user tests but is gradually turning into a full-fledged tool for advertising and media. Let’s evaluate!
It looks pretty impressive, right?
Write the prompt correctly!
1. Who is in the frame?
Describe the character(s):
Age, gender
Appearance (hair color, clothing, details)
Emotions or mood
Example: A young woman with short red hair, wearing a grey hoodie, looking anxious.
2. What is happening?
Formulate the main action — short and clear.
Movements
Gestures
Interaction with objects or people
Example: She walks slowly across a rainy street, then picks up a ringing phone.
3. Where is it happening?
Specify the environment to set the atmosphere:
Location
Time of day or weather
Details of interior/exterior
Example: At a crowded metro station, in a foggy forest at dawn, inside a dimly lit diner.
4. Light, sound, and emotions
Set the mood through the atmosphere:
Lighting
Background music or noise
Dialogue (if needed)
Emotional tone
Example: Soft piano music in the background, the sound of rain and distant traffic, she says: “I’m not coming back.”
5. Camera angle and movement
Veo 3 understands cinematic techniques well:
Shot type (close-up, wide shot, drone shot)
Movement (slow zoom in, shaky handheld, panning shot)
Example: The camera slowly zooms in, shaky handheld close-up.
6. Visual style
Add genre or aesthetic.
Realism or cartoonishness
Specific genre or era
Comparison with movies/animation
Example: In the style of a 90s action movie, anime aesthetic, film noir lighting.
Final principle
Build the prompt like a short script, including: characters → action → location → atmosphere → angle → style.
This will give the neural network a clear structure, ensuring a more "cinematic" result!
Let's make our own video!
Let's bring the grandmother from our cover to life. We start with the prompt, and we will only use it!
Grandmother-action
Scene description: An elderly woman in a floral headscarf and oversized sunglasses, confidently walking through a typical Soviet courtyard with cracked asphalt and rusty playground swings. She is holding two leashes: on the left a giant brown bear, on the right a heavy gray rhinoceros. Both animals wear hanging wooden signs — the rhino’s says “Review”, the bear’s says “of Veo 3”. The mood is epic and surreal.
Visual details: cinematic wide shot, morning light, dramatic atmosphere, in the style of a 90s action movie trailer.
Camera: slow dolly zoom forward.
Sound: tense orchestral music with heavy drums, animals breathing, distant echo of city sounds.
Let's check the result:
It turned out quite well! There is a question about the strap in the right hand. At first, it blends a bit with the bear's fur.
Summarizing
+ |
— |
4K graphics with good object physics, accurate matching with prompts |
Cannot control sound or subtitle creation |
Lip sync with words and great sound accompaniment |
High likelihood of artifacts that are not easy to remove |
Good variability for creating material |
Expensive |
In conclusion, Google with Veo 3 has really made a step that is hard to call just an update. It is no longer a toy for enthusiasts, but a tool that can comfortably compete with professional studios. Given the automatic voiceover, built-in effects, and nearly cinematic rendering — it’s no wonder that many are already calling Veo 3 the new standard in the world of video generation.
Of course, the service is still region-locked and requires workarounds to access, but the fact remains — we are witnessing how cinema and artificial intelligence have finally shaken hands. And it seems that Google is serious about keeping this standard for a long time!
Thank you for reading! Share your experience of creating videos with neural networks in the comments. Perhaps you have a favorite service. We’d love to hear about your work!
Write comment