Google Lumiere – Text-to-video AI with various features

Not only was OpenAI recently the video AI “Sora” was introduced, Google also released a video AI: Lumiere. With Google Lumiere, text prompts can be converted into videos, but numerous other functions can also be used. For example, images can be converted into the desired animation using text commands. Individual areas of the image can also be marked and animated, such as the smoke from a locomotive. Stylized video and animation creation can take the graphic style of an image and create prompt-based moving images in that style. Finally, content in existing videos can be changed – secured, clothing, surfaces and structures.

Google Lumiere – AI model with “Space-Time U-Net” architecture

I won't even begin to pretend to understand how such complex artificial intelligences work. Both on Google's presentation page (on GitHub) as well as in the associated research paper (at, however, there is talk of a “Space-Time U-Net” architecture, or STUNet for short. If you want to know more, you can visit the sources mentioned.

Lumiere is based on a diffusion model, which is used for spatial and temporal down- and up-sampling, which ultimately creates low-resolution videos including all individual images. This is intended to differentiate Google AI from models that create two keyframes that are apart and then try to fill the gap between them - and possibly fail to output a realistic-looking video in this way.

Create new videos from text commands

Google Lumiere can perform various tasks. The most impressive is probably the creation of videos from simple text commands, so-called prompts. This can be created with just short descriptions of the required scene. However, the results can vary greatly depending on the text command.

Change the style and structures of a video

Existing videos, such as those you have recorded yourself, can also be heavily modified. The image content (people, animals, objects, etc.) can be constructed from wooden blocks or Lego bricks, folded from paper using origami or put together from flowers. The initial movements are largely adopted.

Create videos from images

If you give the Lumiere AI a single image and describe the desired scene with a short prompt, it can create a video from it. Whether it's a car driving along a beach, a giraffe eating grass, or a sailboat sailing on a lake, there are a lot of examples of how AI works. The results are not really perfect and (still) recognizable as an AI product.

Animate individual image sections

If you want the fire to flicker in a photo of a campfire, Google Lumiere can make that happen too. In addition, the movements of a butterfly can be simulated - just using a photo of the animal. As already mentioned at the beginning, this also works with the smoke of a locomotive. And the water of a lake is also shown in the Lumiere performance; after AI processing, it creates waves.

Expand video or replace missing areas

If a disturbing object is in the foreground when recording a video or the image section has been selected incorrectly, then this should no longer be a problem with Google Lumiere. Thanks to the analysis of the existing video material, missing image content can be calculated and supplemented to suit the existing video - so-called inpainting.

Video editing with inserting new objects or structures

It also shows how existing video files can be edited with Lumiere. For example, a woman's dress was marked and then promptly redefined. A green and white dress with sleeves was sometimes transformed into a gold dress, sometimes a black dress, sometimes a white and red striped dress - including the removal of the sleeves. In other examples, birds were outfitted with crowns, sunglasses, scarves, bathrobes, and the like.

Stylized creation of image and video content

As already mentioned, images can be used to specify a certain style for the images or videos to be created. A lot is possible, from monochrome pixel graphics to colorful stickers to shiny golden 3D models. So with Google Lumiere you could basically adopt different art styles, film or video game designs and more for your own ideas.

Creative opportunities and deepfake risks of generative AI

Like any generative AI, whether text, image, audio or video creation, Google Lumiere offers not only creative opportunities but also social, political and economic risks. The risk that the individual tools will be misused to spread misinformation and/or for criminal purposes is not just theoretical. It has been evident in various deepfake examples for years.

Finally, the Lumiere presentation linked above also says: “… we believe that it is crucial to develop and use tools to detect bias and malicious uses in order to ensure safe and fair use.” But the view alone will be of no use. It remains to be seen whether Google Lumiere, OpenAI Sora etc. will be safe tools.

Did you like the article and did the instructions on the blog help you? Then I would be happy if you the blog via a Steady Membership would support.

Post a comment

Your e-mail address will not be published. Required fields are marked with * marked

In the Sir Apfelot Blog you will find advice, instructions and reviews on Apple products such as the iPhone, iPad, Apple Watch, AirPods, iMac, Mac Pro, Mac Mini and Mac Studio.