Spec-o3-3B

Model Summary

Spec-o3-3B is a tool-augmented vision-language agent for astronomer-aligned spectral inspection and candidate vetting. It is designed to follow an interactive workflow that resembles expert review: inspect an initial full-spectrum view, iteratively request localized wavelength-window re-visualizations via a lightweight spectral visualization tool, and update hypotheses through interleaved multimodal reasoning before producing a final decision.

Backbone: Qwen2.5-VL-3B-Instruct
Inputs: Text + spectrum visualizations (global view plus tool-rendered zoomed views)
Core capability: Multi-turn inspection with tool calls for evidence localization

Intended Use

Primary: Human-in-the-loop assistance for spectral inspection and rare-object candidate vetting, providing (1) a final judgment and (2) an auditable inspection trace grounded in localized spectral evidence.
Suitable settings: Research, offline analysis, and expert triage pipelines.
Not intended for: Fully automated catalog publication without expert verification, safety-critical decision-making, or uses outside spectroscopic analysis.

How It Works (Tool-Augmented Inspection)

Spec-o3 alternates between:

Reasoning about what spectral evidence is needed, and
Tool calls that request re-rendered views for specific wavelength intervals.

A typical tool call uses JSON arguments like:

{"label": "Zoom on Hα region", "wavelength_range": [6500, 6600]}

Recommended Prompt / Output Format

To reproduce the intended behavior, use a structured format:

A clear task instruction (e.g., “inspect the spectrum and decide whether the candidate should be accepted”).
Allow tool calls during inference (one per turn).
Expect a final decision with a short justification.

If you prefer not to expose verbose reasoning traces in production, you can post-process outputs to retain only the final answer and brief evidence summary.

Training Overview

Spec-o3-3B uses a two-stage post-training recipe:

Cold-start SFT: supervised fine-tuning on ~1k expert-approved spectral inspection trajectories with tool usage.
Outcome-based RL (GRPO): reinforcement learning on label-only inspection tasks to improve decision quality, stabilize tool usage, and strengthen evidence localization.

High-level notes:

Tool-rendered outputs are loss-masked to discourage memorization of images.
RL uses group-wise rollouts (e.g., 8 rollouts) and an outcome reward emphasizing correctness and format compliance.

Evaluation (Reported)

SpecVI-Bench (macro-average F1 across inspection tasks): 73.3
Cross-Survey (transfer to SDSS/DESI matched spectra, average F1): SDSS 77.3, DESI 73.6 (reference on LAMOST subset: 79.8)
Cross-Task (transfer to unseen inspection categories on LAMOST, average F1): 74.4

Limitations

The released checkpoints are evaluated on a limited set of inspection tasks and do not cover all astrophysical classes or all observational conditions encountered in production pipelines.
Real-world vetting often requires external cross-matching and additional modalities (photometry, imaging, time-domain evidence) beyond spectrum-only inspection.
Extending to new surveys or new target categories may still require expert demonstration data for cold start and careful validation.
The model does not yet provide production-grade uncertainty handling (e.g., abstention, calibration, or risk-aware triage) out of the box.

Usage (Conceptual)

A typical integration loop:

Render an initial full-range spectrum image.
Run the model with the task prompt + image.
If a tool call is emitted, render the requested wavelength window and feed the new image back.
Repeat until a final decision is produced or a tool-call budget is reached.

Citation

@misc{Jia2026SpecO3,
  author       = {Minghui Jia and Qichao Zhang and Ali Luo and Linjing Li and Shuo Ye and Hailing Lu and Wen Hou and Dongbin Zhao},
  title        = {Spec-o3: A Tool-Augmented Vision-Language Agent for Rare Celestial Object Candidate Vetting via Automated Spectral Inspection},
  eprint       = {2601.06498},
  archivePrefix= {arXiv},
  primaryClass = {cs.CL},
  year         = {2026},
  url          = {https://arxiv.org/abs/2601.06498},
  doi          = {10.48550/arXiv.2601.06498}
}