ByteDance joins the AI video craze with the announcement of Boximator.
AI video has been the hottest trend in the AI world recently. OpenAI’s Sora has been the talk of the town, but did you know that ByteDance also announced an AI tool that can turn an image into a video? It’s called Boximator.
The term “boximator” is a combination of the words “box” and "animator.” It is a new method to enhance video diffusion models with fine-grained motion control in a flexible and friendly way.
It utilizes two types of constraints: hard boxes for precise object delineation and soft boxes for more flexible motion paths. This technology enables users to select objects in a frame and define their motion in subsequent frames, offering unprecedented control over video content creation.
It employs a multi-stage training procedure, ensuring that the model aligns with Boximator constraints while preserving the base model’s knowledge. This method results in high-quality video outputs that maintain or improve upon the original video quality of base models, verified through extensive experimentation and user studies.
The illustration above is an overview of the control module: adding a new self attention layer to every spatial attention block, between the spatial self-attention and the spatial cross-attention.
During training, all the original model parameters are frozen.
The creators showed a couple of cherry-picked examples when they compared the results of Boximator against the competitors Runway Gen2 and Pika 1.0.
Prompt: “The adorable pika turns to the camera.”
Prompt: “The wind blows a woman’s umbrella away, rainy day.”
Prompt: “Very close view of a beautiful girl closing up her eyes.”
The demo website of Boximator is currently under development and should be available in two to three months. For now, you can explore the set of examples showcased in the demo section of the announcement page.
If you really want to try Boximator, you can email the creators at wangjiawei.424@bytedance.com, send them the input image and the text prompt, and then they will reply with the generated video.
As video synthesis continues to be a hot topic in the AI world, technologies like Boximator mark a pivotal step forward. With ByteDance’s backing, it’s not impossible that Boximator could soon be integrated into platforms like TikTok, bringing more sophisticated video editing capabilities to the fingertips of millions.
However, it is important to be aware of the risks associated with video synthesis technology. Deepfakes, which are videos that have been manipulated to make it appear as if someone is saying or doing something they never did, can be used to spread misinformation and propaganda.
It is important to consume online media responsibly and to be critical of the information you see.
‍
Software engineer, writer, solopreneur