Spec-o3-7B

Model Summary

Spec-o3-7B is a tool-augmented vision-language agent for astronomer-aligned spectral inspection and candidate vetting. It follows an interactive review workflow: start from a global spectrum view, iteratively request localized wavelength-window re-visualizations via a spectral visualization tool, and update hypotheses through interleaved multimodal reasoning before producing a final judgment.

Backbone: Qwen2.5-VL-7B-Instruct
Inputs: Text + spectrum visualizations (global view plus tool-rendered zoomed views)
Core capability: Multi-turn inspection with tool calls for evidence localization

Intended Use

Primary: Human-in-the-loop assistance for spectral inspection and rare-object candidate vetting, producing (1) a final decision and (2) an auditable inspection trace grounded in localized spectral evidence.
Suitable settings: Research, offline analysis, and expert triage pipelines.
Not intended for: Fully automated catalog publication without expert verification, safety-critical decision-making, or uses outside spectroscopic analysis.

How It Works (Tool-Augmented Inspection)

Spec-o3 alternates between:

Reasoning about what spectral evidence is needed, and
Tool calls that request re-rendered views for specific wavelength intervals.

A typical tool call uses JSON arguments like:

{"label": "Zoom on Hβ region", "wavelength_range": [4800, 4920]}

Recommended Prompt / Output Format

To reproduce the intended behavior, use a structured format:

Provide a clear inspection/vetting instruction and the initial spectrum visualization.
Allow tool calls during inference (one per turn).
Expect a final decision accompanied by a concise evidence summary.

If you prefer not to expose verbose reasoning traces, you can post-process outputs to retain only the final answer and brief justification.

Training Overview

Spec-o3-7B uses a two-stage post-training recipe:

Cold-start SFT: supervised fine-tuning on ~1k expert-approved spectral inspection trajectories with tool usage.
Outcome-based RL (GRPO): reinforcement learning on label-only inspection tasks to improve decision quality, stabilize tool usage, and strengthen evidence localization.

High-level notes:

Tool-rendered outputs are loss-masked to discourage memorization of images.
RL uses group-wise rollouts (e.g., 8 rollouts) and an outcome reward emphasizing correctness and format compliance.

Evaluation (Reported)

SpecVI-Bench (macro-average F1 across inspection tasks): 76.5
Cross-Survey (transfer to SDSS/DESI matched spectra, average F1): SDSS 81.1, DESI 77.4 (reference on LAMOST subset: 81.5)
Cross-Task (transfer to unseen inspection categories on LAMOST, average F1): 76.4

Limitations

The released checkpoints are evaluated on a limited set of inspection tasks and do not cover all astrophysical classes or all observational conditions encountered in production pipelines.
Real-world vetting often requires external cross-matching and additional modalities (photometry, imaging, time-domain evidence) beyond spectrum-only inspection.
Extending to new surveys or new target categories may still require expert demonstration data for cold start and careful validation.
The model does not yet provide production-grade uncertainty handling (e.g., abstention, calibration, or risk-aware triage) out of the box.

Usage (Conceptual)

A typical integration loop:

Render an initial full-range spectrum image.
Run the model with the task prompt + image.
If a tool call is emitted, render the requested wavelength window and feed the new image back.
Repeat until a final decision is produced or a tool-call budget is reached.

Citation

@misc{Jia2026SpecO3,
  author       = {Minghui Jia and Qichao Zhang and Ali Luo and Linjing Li and Shuo Ye and Hailing Lu and Wen Hou and Dongbin Zhao},
  title        = {Spec-o3: A Tool-Augmented Vision-Language Agent for Rare Celestial Object Candidate Vetting via Automated Spectral Inspection},
  eprint       = {2601.06498},
  archivePrefix= {arXiv},
  primaryClass = {cs.CL},
  year         = {2026},
  url          = {https://arxiv.org/abs/2601.06498},
  doi          = {10.48550/arXiv.2601.06498}
}