Spec-o3-3B

Model Summary

Spec-o3-3B is a tool-augmented vision-language agent for astronomer-aligned spectral inspection and candidate vetting. It is designed to follow an interactive workflow that resembles expert review: inspect an initial full-spectrum view, iteratively request localized wavelength-window re-visualizations via a lightweight spectral visualization tool, and update hypotheses through interleaved multimodal reasoning before producing a final decision.

  • Backbone: Qwen2.5-VL-3B-Instruct
  • Inputs: Text + spectrum visualizations (global view plus tool-rendered zoomed views)
  • Core capability: Multi-turn inspection with tool calls for evidence localization

Intended Use

  • Primary: Human-in-the-loop assistance for spectral inspection and rare-object candidate vetting, providing (1) a final judgment and (2) an auditable inspection trace grounded in localized spectral evidence.
  • Suitable settings: Research, offline analysis, and expert triage pipelines.
  • Not intended for: Fully automated catalog publication without expert verification, safety-critical decision-making, or uses outside spectroscopic analysis.

How It Works (Tool-Augmented Inspection)

Spec-o3 alternates between:

  1. Reasoning about what spectral evidence is needed, and
  2. Tool calls that request re-rendered views for specific wavelength intervals.

A typical tool call uses JSON arguments like:

{"label": "Zoom on Hα region", "wavelength_range": [6500, 6600]}

Recommended Prompt / Output Format

To reproduce the intended behavior, use a structured format:

  • A clear task instruction (e.g., “inspect the spectrum and decide whether the candidate should be accepted”).
  • Allow tool calls during inference (one per turn).
  • Expect a final decision with a short justification.

If you prefer not to expose verbose reasoning traces in production, you can post-process outputs to retain only the final answer and brief evidence summary.

Training Overview

Spec-o3-3B uses a two-stage post-training recipe:

  1. Cold-start SFT: supervised fine-tuning on ~1k expert-approved spectral inspection trajectories with tool usage.
  2. Outcome-based RL (GRPO): reinforcement learning on label-only inspection tasks to improve decision quality, stabilize tool usage, and strengthen evidence localization.

High-level notes:

  • Tool-rendered outputs are loss-masked to discourage memorization of images.
  • RL uses group-wise rollouts (e.g., 8 rollouts) and an outcome reward emphasizing correctness and format compliance.

Evaluation (Reported)

  • SpecVI-Bench (macro-average F1 across inspection tasks): 73.3
  • Cross-Survey (transfer to SDSS/DESI matched spectra, average F1): SDSS 77.3, DESI 73.6 (reference on LAMOST subset: 79.8)
  • Cross-Task (transfer to unseen inspection categories on LAMOST, average F1): 74.4

Limitations

  • The released checkpoints are evaluated on a limited set of inspection tasks and do not cover all astrophysical classes or all observational conditions encountered in production pipelines.
  • Real-world vetting often requires external cross-matching and additional modalities (photometry, imaging, time-domain evidence) beyond spectrum-only inspection.
  • Extending to new surveys or new target categories may still require expert demonstration data for cold start and careful validation.
  • The model does not yet provide production-grade uncertainty handling (e.g., abstention, calibration, or risk-aware triage) out of the box.

Usage (Conceptual)

A typical integration loop:

  1. Render an initial full-range spectrum image.
  2. Run the model with the task prompt + image.
  3. If a tool call is emitted, render the requested wavelength window and feed the new image back.
  4. Repeat until a final decision is produced or a tool-call budget is reached.

Citation

@misc{Jia2026SpecO3,
  author       = {Minghui Jia and Qichao Zhang and Ali Luo and Linjing Li and Shuo Ye and Hailing Lu and Wen Hou and Dongbin Zhao},
  title        = {Spec-o3: A Tool-Augmented Vision-Language Agent for Rare Celestial Object Candidate Vetting via Automated Spectral Inspection},
  eprint       = {2601.06498},
  archivePrefix= {arXiv},
  primaryClass = {cs.CL},
  year         = {2026},
  url          = {https://arxiv.org/abs/2601.06498},
  doi          = {10.48550/arXiv.2601.06498}
}
Downloads last month
15
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Maxwell-Jia/spec-o3-3b

Finetuned
(627)
this model
Quantizations
1 model

Collection including Maxwell-Jia/spec-o3-3b

Paper for Maxwell-Jia/spec-o3-3b