Fab Tool, or the experience of creating complex generative video

I am a big fan of the French electronic musician Franck Hueso, better known under the pseudonym Carpenter Brut, and as a low-budget video author, I have long dreamed of creating a remake of the cult video Fab Tool. This clip, quite modest in production, always seemed incredibly expressive to me precisely because of the special atmosphere that could be recreated or even enhanced in the remake, and in our time of generative everything, literally anything is remade, so sooner or later the plan would be realized, however, the path to the desired result turned out to be much more thorny than initially imagined.

This story began a year and a half ago, in the summer of 2023, when the fifth version of MidJourney was released, Gen-2 from Runway became available, and generative bots Pika and Suno appeared on Discord. Thus, a full set of tools for creating AI videos was formed, in which both the picture, animation, and music would be created by neural networks.

As a person, in addition to being passionate about electronic music, who has been engaged in various low-budget video projects for many years, I was initially very interested in the fact that new generation neural networks would allow the release of the same book trailers, music videos, and other small creative projects more often, cheaper, and faster.

However, my rosy expectations quickly shattered against technical limitations. The tools were raw, and the first results were far from ideal. Many ideas simply did not succeed without excessive creative compromises. For example, a long-conceived book trailer project for Efremov's "Hour of the Bull" has so far been limited to only a few early versions of illustrations:

Returning to Fab Tool was out of the question for now. Therefore, I decided to focus on creating static images first, waiting for further development of video generators. In particular, I actively studied the capabilities of MidJourney, as the capabilities of their competitors were still too limited for my purposes. The problem of creating images has always been more acute for me than animation, which could always be implemented using editing software. That is, there were pictures, thanks to Alex Andreev and Maria O'Toole, but I needed many more pictures, and as a non-commercial author, my budgets were always minimal. Here is an example of my previous works that I relied on:

Thus began my two visual neurogenerative projects: the web comic "Teetotaler Detective" about the world's only teetotaler detective Jonathan T. Maddox and the illustrated novel "Gat".

The comic allowed me to learn to work in a single graphic style (importantly, not photorealistic) and, if possible, to maintain the unity of images and atmosphere. And "Gat" provided a variety of images and techniques suitable for future animation, which remains a complex task for all existing tools.

The comic ended up with the publication of the first volume in three chapters. It took six months to create. The final volume included 60 pages, assembled in Photoshop from more than 200 image generations. Captions, bubbles, frames, and dialogues I designed manually, as MidJourney still struggles with text, and I did not consciously use other tools to avoid unnecessarily complicating production. Here is an example of the resulting spreads:

The experience was quite challenging, although very exciting, for this I even created a telegram channel of the same name, where I published intermediate results, demos, consulted with subscribers and generally received feedback useful in any creative process.

In parallel, "Gat" was growing with generated images. Each of the 40 parts of the novel was accompanied by "official" illustrations and an additional gallery of "unpublished", which I also published on my telegram and on the AT platform in the additional materials section.

As a result, about a hundred images ready for animation were obtained. But even with them, Pika and Gen-2 could only offer modest effects like fades and panoramas, a significant amount of creative tasks still remained inaccessible for execution.

However, soon there were two revolutions in the direction of image-to-video. Although the loudly announced Sora model from OpenAI did not become available to the public for a year and ultimately produced rather mediocre results, the Gen-3 models from Runway and Luma Dream Machine 1.5 in the summer of 2024 made a real breakthrough. These tools used 3D modeling to animate images and showed impressive results since the start of the public beta, in which it was possible to model complex scenes with advanced camera work, object and character plasticity, so I immediately started creating videos from my images. The first was the book trailer for "Gat":

In addition to advanced animation, I used neurogenerative music for the first time, created by the Udio model, which by that time had already outpaced its competitors in many ways. The result exceeded my expectations: 100+ fragments were created in just a couple of weekends, for which it was necessary to register only a few free accounts. Thanks to the democratic pricing policy of the guys from Luma.

The time has come to take on something more serious. The book trailers for "The Non-Drinker" or "The Hour of the Bull" that suggested themselves as the next projects no longer seemed so groundbreaking. On the contrary, everything indicated that they might turn out to be too simple, given the new capabilities of generative tools. Therefore, I decided to challenge myself and took an old idea off the shelf. In October of last year, I started working on Fab Tool.

Visual references in the form of screenshots from the original video eventually transformed into two hundred generated original images, and then into a comparable number of animated clips, as the success rate of Luma's takes had become quite high by that time. These fragments had to be accompanied by a new sound remix that, if possible, would not be inferior to the original in power and sound quality. After that, the painstaking process of post-production and editing in Adobe Premiere began.

And now, a year and a half after I started researching this area, my video-generative magnum opus was completed:

As you can see, the complex transitions of the original, its hopeless atmosphere, and pathos are preserved here, I also tried to maintain the opposition of three color atmospheres, while bringing the picture as close as possible to photorealistic standards, which are obviously not inherent in the original, thus getting a chance to surpass it in some way.

Did I manage to achieve at least some of my creative and technical goals in the end? Undoubtedly. The process of creating the video became dramatically faster, and you can evaluate the atmosphere and picture quality yourself. Personally, I was extremely satisfied with the result. However, do we still have to make allowances for the rawness of the technology and the minimal budget when watching? To some extent, yes, but not entirely. The result even exceeded the initially set expectations, both artistic and instrumental.

Of course, a full-fledged "neural network movie" will not be made this year or next, all such projects are still experimental, regardless of the budget. However, the progress I have observed over the past year and a half, and the incredible pace at which generative neural networks are developing, will undoubtedly change the video and film production market in the near future in the most radical way — and are already changing it before our eyes. As they say, to new premieres and new breakthroughs!

Comments