ORCA — Llama-3.2-3B-Instruct (Multinomial, seed 99)

ORCA (Open-ended Response Correctness Assessment) scores the correctness of open-ended audio QA responses. Given a question, reference answer, candidate answer, and an LLM-generated rationale, it outputs a correctness score in [0, 1] and an uncertainty estimate.

Paper: ORCA: Open-ended Response Correctness Assessment for Audio Question Answering — accepted to TACL 2026
Code & usage: github.com/BUTSpeechFIT/ORCA
Training data: BUT-FIT/orca-audio-qa-annotations

Model details

Property	Value
Base model	`meta-llama/Llama-3.2-3B-Instruct`
LoRA rank / alpha	128 / 128
Loss function	Multinomial log-likelihood (5-class Likert)
Training seed	99
Training curriculum	Stage 1 (synthetic) → Stage 2 (LLM-judge) → Stage 3 (human)
Precision	bfloat16

Quick start

pip install git+https://github.com/BUTSpeechFIT/ORCA.git
hf download BUT-FIT/orca-llama-3.2-3b-it-multinomial --local-dir orca-llama-3b
orca-infer --model_path orca-llama-3b/model --data_jsonl your_data.jsonl --output_dir results/

See the repository for full usage, evaluation scripts, and the download_and_infer.py convenience script.

Citation

@article{sedlacek-etal-2026-orca,
  title={ORCA: Open-ended Response Correctness Assessment for Audio Question Answering},
  author={Sedl\'{a}\v{c}ek, \v{S}imon and Barahona, Sara and Bola\~{n}os, Cecilia and
          Herrera-Alarc\'{o}n, Laura and Udupa, Sathvik and L\'{o}pez, Fernando and
          Ferner, Allison and Lozano-Diez, Alicia and Yusuf, Bolaji and Kesiraju, Santosh and
          Duraiswami, Ramani and \v{C}ernock\'{y}, Jan},
  howpublished={Accepted to Transactions of the Association for Computational Linguistics},
  year={2026},
  url={https://arxiv.org/abs/2512.09066}
}

License

MIT License. See the repository LICENSE for details.

Downloads last month: -

Model tree for BUT-FIT/orca-llama-3.2-3b-it-multinomial

Base model

meta-llama/Llama-3.2-3B-Instruct

Finetuned

(1674)

this model

Collection including BUT-FIT/orca-llama-3.2-3b-it-multinomial

ORCA

Collection

Resources for Open-ended Response Correctness Assessment for Audio Question Answering • 5 items • Updated 2 days ago

Paper for BUT-FIT/orca-llama-3.2-3b-it-multinomial

ORCA: Open-ended Response Correctness Assessment for Audio Question Answering

Paper • 2512.09066 • Published 4 days ago