TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

Marlin 2B

Marlin-2B is a 2B-parameter video VLM from NemoStation, released under Apache 2.0. It is designed to answer two practical video-analysis questions: what is happening in a video, and when it happens. The model produces structured scene and event captions with second-level timestamps, and it can resolve natural-language event queries into start-end time spans. It is fine-tuned from Qwen3.5-2B, supports video-text-to-text use, and exposes developer-friendly caption and find modes for captioning and event localization.
New Multimodal Gen 3
Released: May 14, 2026

Overview

Marlin-2B is NemoStation’s open-source 2B video-language model for dense video captioning and natural-language temporal grounding.

About NemoStation

NemoStation is a video AI research and product lab focused on building small, grounded video understanding models that convert video data into structured, machine-readable information. Flagship product: Marlin-2B, a 2B-parameter video VLM fine-tuned on Qwen3.5-2B that produces dense scene/event captions with timestamps and resolves natural-language temporal queries. State-of-the-art in its weight class on CaReBench, DREAM-1K, and TimeLens-Bench. Also produces CaReBench, a video captioning benchm

Industry: Artificial Intelligence
View Company Profile

Tools using Marlin 2B

No tools found for this model yet.

Last updated: May 20, 2026
0 AIs selected
Clear selection
#
Name
Task