Spec-o3-7B
Model Summary
Spec-o3-7B is a tool-augmented vision-language agent for astronomer-aligned spectral inspection and candidate vetting. It follows an interactive review workflow: start from a global spectrum view, iteratively request localized wavelength-window re-visualizations via a spectral visualization tool, and update hypotheses through interleaved multimodal reasoning before producing a final judgment.
- Backbone: Qwen2.5-VL-7B-Instruct
- Inputs: Text + spectrum visualizations (global view plus tool-rendered zoomed views)
- Core capability: Multi-turn inspection with tool calls for evidence localization
Intended Use
- Primary: Human-in-the-loop assistance for spectral inspection and rare-object candidate vetting, producing (1) a final decision and (2) an auditable inspection trace grounded in localized spectral evidence.
- Suitable settings: Research, offline analysis, and expert triage pipelines.
- Not intended for: Fully automated catalog publication without expert verification, safety-critical decision-making, or uses outside spectroscopic analysis.
How It Works (Tool-Augmented Inspection)
Spec-o3 alternates between:
- Reasoning about what spectral evidence is needed, and
- Tool calls that request re-rendered views for specific wavelength intervals.
A typical tool call uses JSON arguments like:
{"label": "Zoom on Hβ region", "wavelength_range": [4800, 4920]}
Recommended Prompt / Output Format
To reproduce the intended behavior, use a structured format:
- Provide a clear inspection/vetting instruction and the initial spectrum visualization.
- Allow tool calls during inference (one per turn).
- Expect a final decision accompanied by a concise evidence summary.
If you prefer not to expose verbose reasoning traces, you can post-process outputs to retain only the final answer and brief justification.
Training Overview
Spec-o3-7B uses a two-stage post-training recipe:
- Cold-start SFT: supervised fine-tuning on ~1k expert-approved spectral inspection trajectories with tool usage.
- Outcome-based RL (GRPO): reinforcement learning on label-only inspection tasks to improve decision quality, stabilize tool usage, and strengthen evidence localization.
High-level notes:
- Tool-rendered outputs are loss-masked to discourage memorization of images.
- RL uses group-wise rollouts (e.g., 8 rollouts) and an outcome reward emphasizing correctness and format compliance.
Evaluation (Reported)
- SpecVI-Bench (macro-average F1 across inspection tasks): 76.5
- Cross-Survey (transfer to SDSS/DESI matched spectra, average F1): SDSS 81.1, DESI 77.4 (reference on LAMOST subset: 81.5)
- Cross-Task (transfer to unseen inspection categories on LAMOST, average F1): 76.4
Limitations
- The released checkpoints are evaluated on a limited set of inspection tasks and do not cover all astrophysical classes or all observational conditions encountered in production pipelines.
- Real-world vetting often requires external cross-matching and additional modalities (photometry, imaging, time-domain evidence) beyond spectrum-only inspection.
- Extending to new surveys or new target categories may still require expert demonstration data for cold start and careful validation.
- The model does not yet provide production-grade uncertainty handling (e.g., abstention, calibration, or risk-aware triage) out of the box.
Usage (Conceptual)
A typical integration loop:
- Render an initial full-range spectrum image.
- Run the model with the task prompt + image.
- If a tool call is emitted, render the requested wavelength window and feed the new image back.
- Repeat until a final decision is produced or a tool-call budget is reached.
Citation
@misc{Jia2026SpecO3,
author = {Minghui Jia and Qichao Zhang and Ali Luo and Linjing Li and Shuo Ye and Hailing Lu and Wen Hou and Dongbin Zhao},
title = {Spec-o3: A Tool-Augmented Vision-Language Agent for Rare Celestial Object Candidate Vetting via Automated Spectral Inspection},
eprint = {2601.06498},
archivePrefix= {arXiv},
primaryClass = {cs.CL},
year = {2026},
url = {https://arxiv.org/abs/2601.06498},
doi = {10.48550/arXiv.2601.06498}
}
- Downloads last month
- 17