Papers
arxiv:2512.07608

Metric-Fair Prompting: Treating Similar Samples Similarly

Published on Dec 8
Authors:
,
,
,
,

Abstract

Metric-Fair Prompting improves the fairness and accuracy of large language models in medical question answering by enforcing similarity constraints on similar questions.

AI-generated summary

We introduce Metric-Fair Prompting, a fairness-aware prompting framework that guides large language models (LLMs) to make decisions under metric-fairness constraints. In the application of multiple-choice medical question answering, each {(question, option)} pair is treated as a binary instance with label +1 (correct) or -1 (incorrect). To promote {individual fairness}~--~treating similar instances similarly~--~we compute question similarity using NLP embeddings and solve items in joint pairs of similar questions rather than in isolation. The prompt enforces a global decision protocol: extract decisive clinical features, map each \((question, option)\) to a score f(x) that acts as confidence, and impose a Lipschitz-style constraint so that similar inputs receive similar scores and, hence, consistent outputs. Evaluated on the {MedQA (US)} benchmark, Metric-Fair Prompting is shown to improve performance over standard single-item prompting, demonstrating that fairness-guided, confidence-oriented reasoning can enhance LLM accuracy on high-stakes clinical multiple-choice questions.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2512.07608 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2512.07608 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2512.07608 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.