ces-phase2-lora / README.md

Add temporal generalization results (2019, 2015 time travel tests)

476c6b2 verified about 2 months ago

5.35 kB

	---
	license: mit
	base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
	tags:
	- llama
	- lora
	- political-science
	- survey-replication
	- canadian-election-study
	- peft
	- unsloth
	datasets:
	- custom
	language:
	- en
	pipeline_tag: text-generation
	---

	# CES Phase 2 LoRA: Psychographic Ideology Prediction

	A LoRA adapter for Llama 3.1 8B Instruct that predicts political ideology from demographics + psychographic attitudes.

	## Model Description

	This model was trained on the Canadian Election Study (CES) 2021 to predict self-reported ideology (0-10 left-right scale) from:
	- Demographics: Age, gender, province, education, employment, religion, etc.
	- Psychographics: Federal government satisfaction, economic retrospective, immigration views

	## Performance

	\| Model \| Ideology Correlation (r) \|
	\|-------\|-------------------------\|
	\| Base Llama 8B \| 0.03 \|
	\| GPT-4o-mini \| 0.285 \|
	\| Phase 1 (demographics only) \| 0.213 \|
	\| This model (demographics + psychographics) \| 0.428 \|

	## Usage

	```python
	from peft import PeftModel
	from transformers import AutoModelForCausalLM, AutoTokenizer

	base_model = AutoModelForCausalLM.from_pretrained(
	"meta-llama/Meta-Llama-3.1-8B-Instruct",
	load_in_4bit=True
	)
	model = PeftModel.from_pretrained(base_model, "baglecake/ces-phase2-lora")
	tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3.1-8B-Instruct")

	# Example prompt
	system = """You are a 45-year-old man. from Ontario, Canada. You live in a suburb of a large city. Your highest level of education is a bachelor's degree. You are currently employed full-time. You are married. You have children. You are Catholic and religion is somewhat important to you. You were born in Canada.

	This person not at all satisfied with the federal government, thinks the economy has gotten worse over the past year, thinks Canada should admit fewer immigrants.

	Answer survey questions as this person would, based on their background, experiences, and views. Give direct, concise answers."""

	user = "On a scale from 0 to 10, where 0 means left/liberal and 10 means right/conservative, where would you place yourself politically? Just give the number."

	# Format as Llama chat and generate
	```

	## Training Details

	- Base model: meta-llama/Meta-Llama-3.1-8B-Instruct (4-bit quantized via Unsloth)
	- Training data: 14,456 examples from CES 2021
	- LoRA rank: 32
	- LoRA alpha: 64
	- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
	- Epochs: 3
	- Hardware: NVIDIA H100 80GB

	## Steerability

	The model is steerable - changing attitudes while holding demographics constant shifts predicted ideology:

	\| Attitude Config \| Predicted Ideology \|
	\|-----------------\|-------------------\|
	\| Satisfied + Economy better + More immigration \| 2 (left) \|
	\| Dissatisfied + Economy worse + Fewer immigration \| 6 (center-right) \|

	4-point ideology swing from attitude changes alone, holding demographics constant.

	## Generalization to Unseen Questions

	We tested the model on CES questions it was never trained on:

	\| Question Type \| Example \| Correlation (r) \|
	\|--------------\|---------\|-----------------\|
	\| High-salience (Identity) \| COVID satisfaction \| 0.60 \|
	\| High-salience (Identity) \| Carbon tax position \| 0.49 \|
	\| Low-salience (Policy) \| Defence spending \| 0.12 \|
	\| Low-salience (Policy) \| Environment spending \| -0.12 \|

	### Key Finding

	The model learned political identity, not policy platforms:
	- Carbon Tax (r=0.49) vs Environment Spending (r=-0.12) — both are "about the environment" but carbon tax is a tribal identity marker while spending is a technocratic detail
	- The 3 psychographic variables compress the "culture war" aspects of Canadian politics
	- Model excels at identity/affect prediction, struggles with budget details

	## Temporal Generalization

	We tested the model on older CES surveys to measure temporal transfer:

	\| Election \| Prime Minister \| Correlation \| Retention \|
	\|----------\|---------------\|-------------\|-----------\|
	\| 2021 (training) \| Trudeau (Liberal) \| r = 0.428 \| — \|
	\| 2019 (same PM) \| Trudeau (Liberal) \| r = 0.353 \| 82% \|
	\| 2015 (different PM) \| Harper (Conservative) \| r = 0.206 \| 49% \|

	Key Finding: The model is government-specific, not time-specific:
	- High transfer under same PM: "Dissatisfied with Trudeau" maintains consistent left-right valence across 2019-2021
	- Poor transfer across PMs: "Dissatisfied with Harper" has opposite valence (Liberal-leaning in 2015) from "dissatisfied with Trudeau" (Conservative-leaning in 2021)

	This confirms the psychographic compression captures incumbent-relative affect, not arbitrary noise.

	### Implications

	This model is ideal for:
	- Simulating political discourse and polarization
	- Agent-based models of partisan sorting
	- Studying affective political identity

	Not suitable for:
	- Predicting specific policy preferences
	- Budget allocation modeling

	## Citation

	```bibtex
	@software{ces-phase2-lora,
	title = {CES Phase 2 LoRA: Psychographic Ideology Prediction},
	author = {Coburn, Del},
	year = {2025},
	url = {https://huggingface.co/baglecake/ces-phase2-lora}
	}
	```

	## Part of émile-GCE

	This model is part of the [émile-GCE](https://github.com/delcoburn/emile-gce) project for Generative Computational Ethnography.