Video generation 2023-12-22
VideoPoet by Google icon

VideoPoet by Google

Transforming language models into video generators.
Generated by ChatGPT

VideoPoet, by Google Research, represents a significant evolution in video generation, particularly in producing large, interesting, and high-fidelity motions.

This tool is used to convert autoregressive language models into a high-quality video generator. It includes components such as MAGVIT V2 video tokenizer and SoundStream audio tokenizer that transform images, video, and audio clips with variable lengths into a sequence of discrete codes in a unified vocabulary.

These codes are allied with text-based language models, allowing integration with other modalities such as text. An autoregressive language model, contends within this tool, learns across video, image, audio, and text modalities to autoregressively predict the next video or audio token in the sequence.

It further combines multimodal generative learning objectives into the training framework, such as text-to-video, text-to-image, image-to-video, video frame continuation, video inpainting and outpainting, video stylization, and video-to-audio.

VideoPoet can generate videos in square orientation or portrait to cater for short-form content. It also supports generating audio from a video input.

With capability of multitasking on a variety of video-centric inputs and outputs, VideoPoet illustrates how language models can synthesize and edit videos with desirable temporal consistency.


Would you recommend VideoPoet by Google?

Help other people by letting them know if this AI was useful.


Feature requests

Are you looking for a specific feature that's not present in VideoPoet by Google?
VideoPoet by Google was manually vetted by our editorial team and was first featured on December 23rd 2023.
Featured banner
Promote this AI Claim this AI

137 alternatives to VideoPoet by Google for Video generation

Pros and Cons


High-fidelity motions
MAGVIT V2 video tokenizer
SoundStream audio tokenizer
Transforms variable length clips
Sequence of discrete codes
Integration with text modalities
Predicts next video/audio token
Combines multimodal generative learning
Generates square and portrait videos
Supports audio generation
Desirable temporal consistency
Text-to-Video capability
Image-to-Video capability
Video Inpainting
Video Outpainting
Video Stylization
Video-to-Audio capability
High-quality video generator
Multitasking on video-centric inputs/outputs
Maintains object identity preservation
Long video generation capabilities
Interactive video editing capabilities
Controllable camera motions
Zero-shot video generation
Controllable video motions
Audio matching for input video
Zero-shot controllable camera motions
Allows for stylization
Applies visual styles and effects
Capable of text-to-audio


Limited orientation
Unpredictable output
No real-time editing
Complex setup
Dependent on Google resources
Limited to Google's vocab
Requires large data
No user guides
Limited generations
No multilingual support


What is VideoPoet?
How does VideoPoet generate videos using language models?
What is the role of MAGVIT V2 video tokenizer in VideoPoet?
How does SoundStream audio tokenizer contribute to VideoPoet functionality?
Can VideoPoet generate both video and audio?
What formats or orientations are supported by VideoPoet?
Can you edit videos with VideoPoet?
How does VideoPoet ensure temporal consistency in videos?
What are some examples of multimodal learning objectives in VideoPoet's training framework?
How is the autoregressive functionality used in VideoPoet?
What is the process of converting text-to-video in VideoPoet?
Can you use VideoPoet to convert image-to-video?
Can you change the style of a video using VideoPoet?
How does text-to-audio feature work in VideoPoet?
What is the aspect ratio of videos generated through VideoPoet?
What is the video generation process in VideoPoet?
What is the operational sequence of VideoPoet?
How does VideoPoet handle multiple video-centric inputs and outputs?
What kind of languages does VideoPoet support?
What is the output format of VideoPoet generated videos and audios?


+ D bookmark this site for future reference
+ ↑/↓ go to top/bottom
+ ←/→ sort chronologically/alphabetically
↑↓←→ navigation
Enter open selected entry in new tab
⇧ + Enter open selected entry in new tab
⇧ + ↑/↓ expand/collapse list
/ focus search
Esc remove focus from search
A-Z go to letter (when A-Z sorting is enabled)
+ submit an entry
? toggle help menu
0 AIs selected
Clear selection