Qwen3-VL-8B GRPO RLVR checkpoints from a token-dropout exploration study. OMR ppexplore=winner (0.714); video ~0.485 dead-heat.
Nguyen Quang Trung
ngqtrung
AI & ML interests
None yet
Recent Activity
updated a collection 1 day ago
Qwen3-VL-8B RLVR — Models (v1) updated a collection 1 day ago
Qwen3-VL-8B RLVR — Models (v1) updated a collection 1 day ago
Qwen3-VL-8B RLVR — Models (v1)Organizations
VMAR — Raw & Source
Raw multi-style distilled traces, the pre-distillation template seed, and the never-trained real-audio eval.
videorl
Qwen3-VL-8B RLVR — Models (v1)
Qwen3-VL-8B GRPO RLVR checkpoints from a token-dropout exploration study. OMR ppexplore=winner (0.714); video ~0.485 dead-heat.
Qwen3-VL-8B RLVR — Datasets (v1)
Curated SFT + GRPO RL datasets (video MC-QA, OMR math-image, OpenMMReasoner-RL, Vero) for Qwen3-VL-8B post-training.
VMAR — Raw & Source
Raw multi-style distilled traces, the pre-distillation template seed, and the never-trained real-audio eval.
VMAR — Train-Ready (SFT + RL + Eval)
Train-ready VMAR datasets: teacher-distilled SFT corpus, RL prompt set, and the curated in-loop eval benchmark.
videorl