TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

Vidi2

New Video Gen 4
Released: November 24, 2025

Overview

Vidi2 is ByteDance’s second generation large multimodal video model for understanding and creation, adding fine grained spatio temporal grounding, long video retrieval, and video question answering so it can find both the right time ranges and object boxes from natural language queries.

Description

Vidi2 extends the original Vidi models into a full video reasoning system that handles temporal retrieval, spatio temporal grounding, and video QA in one framework. Given a text query, it can locate the relevant time segments and output bounding boxes for the described objects, which is useful for plot or character tracking, automatic multi view switching, and composition aware reframing and cropping. The team also introduces the VUE STG and VUE TR V2 benchmarks with long videos, noun phrase style queries, carefully annotated timestamps and boxes, and refined vIoU or tIoU metrics, and reports state of the art performance on these tasks compared with both proprietary and open models of similar scale.

About ByteDance

ByteDance is a multinational technology company known for its content platforms, including TikTok and Douyin.

Industry: Internet
Company Size: 10001+
Location: Beijing, CN
View Company Profile

Related Models

Last updated: December 2, 2025