Video-Text-to-Text
Transformers
Safetensors
English
qwen3
text-generation
video-understanding
long-video-understanding
agentic-llm
video-question-answering
vision-language-model
grpo
reinforcement-learning
icml-2026
text-generation-inference
Instructions to use CewEhao/VideoSEAL_8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use CewEhao/VideoSEAL_8B with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("CewEhao/VideoSEAL_8B") model = AutoModelForCausalLM.from_pretrained("CewEhao/VideoSEAL_8B") - Notebooks
- Google Colab
- Kaggle
metadata
license: apache-2.0
library_name: transformers
pipeline_tag: video-text-to-text
base_model: Qwen/Qwen3-8B
language:
- en
tags:
- video-understanding
- long-video-understanding
- agentic-llm
- video-question-answering
- vision-language-model
- grpo
- reinforcement-learning
- icml-2026
π¬ VideoSEAL: Mitigating Evidence Misalignment in Agentic Long Video Understanding by Decoupling Answer Authority
π€ HuggingFace model: CewEhao/VideoSEAL_8B Β· π» Code: Echochef/VideoSEAL
π Introduction
This is the official model card for VideoSEAL: Mitigating Evidence Misalignment in Agentic Long Video Understanding by Decoupling Answer Authority (ICML 2026).
VideoSEAL provides offline build utilities for long video indexing:
- OCR subtitles (SRT) β OCR captions + (optional) embeddings
- Clip captions (VLM) β clip captions + (optional) embeddings
- Merge into a unified semantic index under
indexes/semantic/<video_id>/ - (Optional) generate a global
full_story.txtsummary
π¦ Layout
- π§° Shell entrypoints:
scripts/ - π Python package:
videoseal/ - β
Tests:
test/ - π§© OCR toolchain (vendored):
third_party/video-subtitle-extractor/
βοΈ Configuration
- Defaults live in the scripts under
scripts/. - Put real API keys/endpoints in your shell environment / job launcher.
ποΈ Run offline build
cd /path/to/VideoSEAL
export MLLM_API_KEY="sk_your_api_key"
export EMBEDDING_API_KEY="sk_your_api_key"
export AGENT_LLM_API_KEY="sk_your_api_key"
export VISUAL_INSPECT_API_KEY="sk_your_api_key"
VIDEO=/path/to/video.mp4 BENCHMARK=LVBench ./scripts/run_offline_build.sh
β Run tests
/root/miniconda3/envs/rllm/bin/python -m unittest discover -s test -v
ποΈ GRPO training (video tool workflow)
This repo vendors a minimal copy of the rllm/ + verl/ Python packages (under the repo root)
to make the video tool-agent GRPO workflow runnable without an extra repo checkout.
π§ͺ Training environment (conda)
conda create -n videoseal python=3.12 -y
conda activate videoseal
pip install vllm==0.11.0
cd rllm
pip install -e .
cd ../verl
pip install -e .
π Launcher
scripts/train/run_video_workflow_grpo.sh
π§© Example
cd /path/to/VideoSEAL
# Export real API keys/endpoints in your environment before launching.
TRAIN_PARQUET='["/path/to/train.parquet"]' \
VAL_PARQUET='/path/to/val.parquet' \
MODEL_PATH='Qwen/Qwen3-8B' \
./scripts/train/run_video_workflow_grpo.sh train
π Quick checks
./scripts/train/run_video_workflow_grpo.sh test-reward
pytest -q tests/rewards/test_video_reward_tool_env_integration.py