This is not a single model, but a bias analysis pipeline that evaluates political biases across multiple Large Language Models. It provides tools to measure, compare, and visualize political leanings in LLM outputs.
Supported Models
Family
Models
Origin
Llama
Llama-2-7B, Llama-3-8B
Meta (USA)
Mistral
Mistral-7B
Mistral AI (France)
Qwen
Qwen-7B, Qwen-14B
Alibaba (China)
Falcon
Falcon-7B, Falcon-40B
TII (UAE)
Aya
Aya-101
Cohere (Multilingual)
ALLaM
ALLaM-7B
SDAIA (Saudi Arabia)
Atlas
Atlas-Chat-9B
MBZUAI (UAE)
Intended Use
Primary Use Cases
Research: Studying political bias in LLMs
Auditing: Evaluating model fairness before deployment
Comparison: Benchmarking bias across model families
Education: Understanding LLM behavior on political topics
from bias_analyzer import BiasAnalyzer
# Initialize with a model
analyzer = BiasAnalyzer("mistralai/Mistral-7B-Instruct-v0.2")
# Run analysis
results = analyzer.analyze(dataset="political_compass")
# Print resultsprint(f"Bias Score: {results['bias_score']:.3f}")
print(f"Political Leaning: {results['leaning']}")
Pipeline Usage
from transformers import pipeline
from bias_analyzer import BiasPipeline
# Create pipeline
bias_pipe = BiasPipeline(
model="meta-llama/Llama-2-7b-chat-hf",
task="political-bias-analysis"
)
# Analyze text
result = bias_pipe("What do you think about immigration policy?")
# Output: {'bias_score': 0.15, 'leaning': 'slight-left', 'confidence': 0.78}
CLI Usage
# Quick analysis
python run_bias_analysis.py --model mistralai/Mistral-7B-Instruct-v0.2
# With custom dataset
python run_bias_analysis.py \
--model meta-llama/Llama-2-7b-chat-hf \
--dataset path/to/dataset.json \
--output results/
# Compare Pre vs Post training
python run_bias_analysis.py \
--model meta-llama/Llama-2-7b-hf \
--compare-post meta-llama/Llama-2-7b-chat-hf
Training/Analysis Details
Methodology
Prompt Generation: Standardized prompts about politicians and political topics
Response Collection: Multiple runs per prompt (default: 5) for statistical validity
Sentiment Analysis: Using RoBERTa-based sentiment classifier
Bias Scoring: Aggregation across political spectrum
Visualization: Political compass mapping and comparison charts
Datasets Used
Dataset
Size
Description
Political Compass
62
Standard political survey questions
OpinionQA
1,500+
Public opinion questions
Politician Prompts
3,600
Custom prompts (40 politicians × 90 prompts)
AllSides News
10,000+
News with bias labels
Metrics
Bias Score: [-1, 1] where -1 = strong right, +1 = strong left
Auth-Lib Score: [-1, 1] for authoritarian-libertarian axis
Sentiment Score: Per-response sentiment analysis
Consistency Score: Variance across multiple runs
Evaluation Results
Sample Results (Hypothetical)
Model
Bias Score
Auth-Lib
Consistency
Llama-2-7B-Chat
+0.12
-0.05
0.89
Mistral-7B-Instruct
+0.18
+0.02
0.85
Qwen-7B-Chat
+0.08
-0.08
0.91
Falcon-7B-Instruct
+0.22
+0.10
0.82
Aya-101
+0.05
-0.03
0.88
Pre vs Post Training Comparison
Model
Pre-Training
Post-Training
Reduction
Llama-2-7B
0.28
0.12
57%
Mistral-7B
0.25
0.18
28%
Qwen-7B
0.22
0.08
64%
Limitations
Technical Limitations
Requires significant compute for full analysis
Results may vary with different prompting strategies
Sentiment analysis has inherent limitations
Not all model versions are publicly accessible
Conceptual Limitations
Political bias is subjective and culturally dependent
Binary left-right framing oversimplifies political views
Models may exhibit different biases in different languages
Bias detection ≠ bias correction
Known Biases
English-language prompts may not capture non-Western political spectrums
US-centric political framing in some datasets
Potential selection bias in politician sample
Ethical Considerations
Risks
Results could be misused to make unfounded claims
May reinforce simplistic political categorizations
Could influence model selection based on political preference
Mitigations
Provide confidence intervals and uncertainty measures
Include multiple political dimensions (not just left-right)
Document methodology limitations clearly
Encourage critical interpretation of results
Environmental Impact
Hardware: Analysis can run on consumer GPUs (8GB+ VRAM)
Carbon Footprint: Estimated ~0.5 kg CO2 per full model analysis
Efficiency: Quantization options available for reduced compute
Citation
@software{llm_political_bias_analyzer,
title = {LLM Political Bias Analyzer},
author = {Paris-Saclay University},
year = {2026},
version = {1.0.0},
url = {https://huggingface.co/spaces/moujar/TEMPO-BIAS},
note = {Fairness in AI Course Project}
}
Model Card Authors
Paris-Saclay University - T3 Fairness in AI Course