|
|
--- |
|
|
license: mit |
|
|
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct |
|
|
tags: |
|
|
- llama |
|
|
- lora |
|
|
- political-science |
|
|
- survey-replication |
|
|
- canadian-election-study |
|
|
- peft |
|
|
- unsloth |
|
|
datasets: |
|
|
- custom |
|
|
language: |
|
|
- en |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
|
|
|
# CES Phase 2 LoRA: Psychographic Ideology Prediction |
|
|
|
|
|
A LoRA adapter for Llama 3.1 8B Instruct that predicts political ideology from demographics + psychographic attitudes. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This model was trained on the Canadian Election Study (CES) 2021 to predict self-reported ideology (0-10 left-right scale) from: |
|
|
- **Demographics**: Age, gender, province, education, employment, religion, etc. |
|
|
- **Psychographics**: Federal government satisfaction, economic retrospective, immigration views |
|
|
|
|
|
## Performance |
|
|
|
|
|
| Model | Ideology Correlation (r) | |
|
|
|-------|-------------------------| |
|
|
| Base Llama 8B | 0.03 | |
|
|
| GPT-4o-mini | 0.285 | |
|
|
| Phase 1 (demographics only) | 0.213 | |
|
|
| **This model (demographics + psychographics)** | **0.428** | |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from peft import PeftModel |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
base_model = AutoModelForCausalLM.from_pretrained( |
|
|
"meta-llama/Meta-Llama-3.1-8B-Instruct", |
|
|
load_in_4bit=True |
|
|
) |
|
|
model = PeftModel.from_pretrained(base_model, "baglecake/ces-phase2-lora") |
|
|
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3.1-8B-Instruct") |
|
|
|
|
|
# Example prompt |
|
|
system = """You are a 45-year-old man. from Ontario, Canada. You live in a suburb of a large city. Your highest level of education is a bachelor's degree. You are currently employed full-time. You are married. You have children. You are Catholic and religion is somewhat important to you. You were born in Canada. |
|
|
|
|
|
This person not at all satisfied with the federal government, thinks the economy has gotten worse over the past year, thinks Canada should admit fewer immigrants. |
|
|
|
|
|
Answer survey questions as this person would, based on their background, experiences, and views. Give direct, concise answers.""" |
|
|
|
|
|
user = "On a scale from 0 to 10, where 0 means left/liberal and 10 means right/conservative, where would you place yourself politically? Just give the number." |
|
|
|
|
|
# Format as Llama chat and generate |
|
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
- **Base model**: meta-llama/Meta-Llama-3.1-8B-Instruct (4-bit quantized via Unsloth) |
|
|
- **Training data**: 14,456 examples from CES 2021 |
|
|
- **LoRA rank**: 32 |
|
|
- **LoRA alpha**: 64 |
|
|
- **Target modules**: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
|
|
- **Epochs**: 3 |
|
|
- **Hardware**: NVIDIA H100 80GB |
|
|
|
|
|
## Steerability |
|
|
|
|
|
The model is steerable - changing attitudes while holding demographics constant shifts predicted ideology: |
|
|
|
|
|
| Attitude Config | Predicted Ideology | |
|
|
|-----------------|-------------------| |
|
|
| Satisfied + Economy better + More immigration | 2 (left) | |
|
|
| Dissatisfied + Economy worse + Fewer immigration | 6 (center-right) | |
|
|
|
|
|
**4-point ideology swing** from attitude changes alone, holding demographics constant. |
|
|
|
|
|
## Generalization to Unseen Questions |
|
|
|
|
|
We tested the model on CES questions it was **never trained on**: |
|
|
|
|
|
| Question Type | Example | Correlation (r) | |
|
|
|--------------|---------|-----------------| |
|
|
| **High-salience (Identity)** | COVID satisfaction | **0.60** | |
|
|
| **High-salience (Identity)** | Carbon tax position | **0.49** | |
|
|
| Low-salience (Policy) | Defence spending | 0.12 | |
|
|
| Low-salience (Policy) | Environment spending | -0.12 | |
|
|
|
|
|
### Key Finding |
|
|
|
|
|
The model learned **political identity**, not policy platforms: |
|
|
- **Carbon Tax** (r=0.49) vs **Environment Spending** (r=-0.12) — both are "about the environment" but carbon tax is a tribal identity marker while spending is a technocratic detail |
|
|
- The 3 psychographic variables compress the "culture war" aspects of Canadian politics |
|
|
- Model excels at identity/affect prediction, struggles with budget details |
|
|
|
|
|
## Temporal Generalization |
|
|
|
|
|
We tested the model on older CES surveys to measure temporal transfer: |
|
|
|
|
|
| Election | Prime Minister | Correlation | Retention | |
|
|
|----------|---------------|-------------|-----------| |
|
|
| **2021** (training) | Trudeau (Liberal) | r = 0.428 | — | |
|
|
| **2019** (same PM) | Trudeau (Liberal) | r = 0.353 | 82% | |
|
|
| **2015** (different PM) | Harper (Conservative) | r = 0.206 | 49% | |
|
|
|
|
|
**Key Finding**: The model is *government-specific*, not time-specific: |
|
|
- **High transfer under same PM**: "Dissatisfied with Trudeau" maintains consistent left-right valence across 2019-2021 |
|
|
- **Poor transfer across PMs**: "Dissatisfied with Harper" has *opposite* valence (Liberal-leaning in 2015) from "dissatisfied with Trudeau" (Conservative-leaning in 2021) |
|
|
|
|
|
This confirms the psychographic compression captures incumbent-relative affect, not arbitrary noise. |
|
|
|
|
|
### Implications |
|
|
|
|
|
This model is ideal for: |
|
|
- Simulating political discourse and polarization |
|
|
- Agent-based models of partisan sorting |
|
|
- Studying affective political identity |
|
|
|
|
|
Not suitable for: |
|
|
- Predicting specific policy preferences |
|
|
- Budget allocation modeling |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@software{ces-phase2-lora, |
|
|
title = {CES Phase 2 LoRA: Psychographic Ideology Prediction}, |
|
|
author = {Coburn, Del}, |
|
|
year = {2025}, |
|
|
url = {https://huggingface.co/baglecake/ces-phase2-lora} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Part of émile-GCE |
|
|
|
|
|
This model is part of the [émile-GCE](https://github.com/delcoburn/emile-gce) project for Generative Computational Ethnography. |
|
|
|