video SALMONN 2
Overview
video-SALMONN 2 is an audio-visual large language model from Tsinghua and ByteDance that uses video frames and sound to generate rich captions and answers, reaching state-of-the-art on many audio-visual QA and video understanding benchmarks.
About ByteDance
ByteDance is a multinational technology company known for its content platforms, including TikTok and Douyin.
View Company ProfileTools using video SALMONN 2
No tools found for this model yet.
