arxiv:2509.16462

Intrinsic Meets Extrinsic Fairness: Assessing the Downstream Impact of Bias Mitigation in Large Language Models

Published on Sep 19, 2025

Authors:

Abstract

Intrinsic bias mitigation through concept unlearning in large language models reduces gender bias by up to 94.9% while improving downstream task fairness metrics by up to 82% without sacrificing accuracy.

AI-generated summary

Large Language Models (LLMs) exhibit socio-economic biases that can propagate into downstream tasks. While prior studies have questioned whether intrinsic bias in LLMs affects fairness at the downstream task level, this work empirically investigates the connection. We present a unified evaluation framework to compare intrinsic bias mitigation via concept unlearning with extrinsic bias mitigation via counterfactual data augmentation (CDA). We examine this relationship through real-world financial classification tasks, including salary prediction, employment status, and creditworthiness assessment. Using three open-source LLMs, we evaluate models both as frozen embedding extractors and as fine-tuned classifiers. Our results show that intrinsic bias mitigation through unlearning reduces intrinsic gender bias by up to 94.9%, while also improving downstream task fairness metrics, such as demographic parity by up to 82%, without compromising accuracy. Our framework offers practical guidance on where mitigation efforts can be most effective and highlights the importance of applying early-stage mitigation before downstream deployment.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2509.16462

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2509.16462 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2509.16462 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2509.16462 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.