Qwen3-VL-8B RLVR — Models (v1) Collection Qwen3-VL-8B GRPO RLVR checkpoints from a token-dropout exploration study. OMR ppexplore=winner (0.714); video ~0.485 dead-heat. • 6 items • Updated 3 days ago
Qwen3-VL-8B RLVR — Datasets (v1) Collection Curated SFT + GRPO RL datasets (video MC-QA, OMR math-image, OpenMMReasoner-RL, Vero) for Qwen3-VL-8B post-training. • 5 items • Updated about 2 hours ago