ces-phase3b-lora / README.md

Upload README.md with huggingface_hub

348b544 verified 2 months ago

4.62 kB

	---
	license: mit
	base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
	tags:
	- llama
	- lora
	- political-science
	- survey-replication
	- canadian-election-study
	- peft
	- unsloth
	datasets:
	- custom
	language:
	- en
	pipeline_tag: text-generation
	---

	# CES Phase 3B LoRA: With Party Identification

	A LoRA adapter for Llama 3.1 8B Instruct that predicts political ideology using party identification in addition to leader ratings and policy positions.

	For most use cases, prefer [Phase 3A](https://huggingface.co/baglecake/ces-phase3a-lora) instead — this model exists to demonstrate that party ID is redundant.

	## Model Description

	This model was trained on the Canadian Election Study (CES) 2021 to predict self-reported ideology (0-10 left-right scale) from:

	- Demographics: Age, gender, province, education, employment, religion, marital status, urban/rural, born in Canada
	- Leader Thermometers: Ratings (0-100) of Justin Trudeau, Erin O'Toole, and Jagmeet Singh
	- Wedge Issues: Positions on carbon tax, energy/pipelines, and medical assistance in dying (MAID)
	- Government Satisfaction: Overall satisfaction with federal government
	- Party Identification: "I usually think of myself as a Liberal/Conservative/NDP..." (ONLY IN THIS MODEL)

	## Performance

	\| Model \| Inputs \| Correlation (r) \|
	\|-------\|--------\|-----------------\|
	\| Phase 2 \| Demographics + 3 psychographics \| 0.428 \|
	\| Phase 3A \| + Leader thermometers + wedge issues \| 0.560 \|
	\| Phase 3B (this model) \| + Party ID \| 0.574 \|

	Partisan Delta = 0.014 — Party ID adds only 1.4% improvement.

	## Why Phase 3A is Preferred

	We trained this model (Phase 3B) specifically to test whether party identification adds predictive value beyond substantive attitudes. It doesn't.

	The null result is the finding:
	- Party identity is redundant — it's already encoded in how people feel about leaders and their policy positions
	- Canadian ideology is substantive, not tribal — people's "team" reflects their actual views
	- Adding party ID is "cheating" — you're just asking people their ideology with extra steps

	Phase 3B exists for reproducibility and to demonstrate this null result empirically.

	## Usage

	```python
	from peft import PeftModel
	from transformers import AutoModelForCausalLM, AutoTokenizer

	base_model = AutoModelForCausalLM.from_pretrained(
	"meta-llama/Meta-Llama-3.1-8B-Instruct",
	load_in_4bit=True
	)
	model = PeftModel.from_pretrained(base_model, "baglecake/ces-phase3b-lora")
	tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3.1-8B-Instruct")

	# Example prompt (note: includes party ID)
	system = """You are a 45-year-old man from Ontario, Canada. You live in a suburb of a large city. Your highest level of education is a bachelor's degree. You are currently employed full-time. You are married. You have children. You are Catholic. You were born in Canada.

	Political Profile:
	Leader Ratings: Justin Trudeau: 25/100, Erin O'Toole: 70/100, Jagmeet Singh: 30/100.
	Views: Strongly disagrees that the federal government should continue the carbon tax; strongly agrees that the government should do more to help the energy sector/pipelines.
	Overall Satisfaction: Is not at all satisfied with the federal government.
	Party ID: Generally thinks of themselves as a Conservative.

	Answer survey questions as this person would, based on their background and detailed political profile."""

	user = "On a scale from 0 to 10, where 0 means left/liberal and 10 means right/conservative, where would you place yourself politically? Just give the number."
	```

	## Training Details

	- Base model: meta-llama/Meta-Llama-3.1-8B-Instruct (4-bit quantized via Unsloth)
	- Training data: 14,455 examples from CES 2021
	- LoRA rank: 32
	- LoRA alpha: 64
	- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
	- Epochs: 3
	- Hardware: NVIDIA A100 40GB (Colab Pro)

	## Limitations

	1. Narrow task: Model only outputs ideology numbers (0-10).
	2. Canadian-specific: Trained on CES 2021 under Trudeau government.
	3. Leader-specific: Uses 2021 leader names.
	4. Redundant information: Party ID doesn't add meaningful predictive value over Phase 3A.

	## Citation

	```bibtex
	@software{ces-phase3-lora,
	title = {CES Phase 3 LoRA: Leader Affect and Policy Prediction},
	author = {Coburn, Del},
	year = {2025},
	url = {https://huggingface.co/baglecake/ces-phase3a-lora}
	}
	```

	## Part of emile-GCE

	This model is part of the [emile-GCE](https://github.com/delcoburn/emile-gce) project for Generative Computational Ethnography.