Upload test_model.ipynb with huggingface_hub
Browse files- test_model.ipynb +161 -0
test_model.ipynb
ADDED
|
@@ -0,0 +1,161 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
library_name: transformers
|
| 3 |
+
pipeline_tag: summarization
|
| 4 |
+
---
|
| 5 |
+
# Populism Detection & Summarization
|
| 6 |
+
|
| 7 |
+
This checkpoint is a BART-based, LoRA-fine-tuned model that does two things:
|
| 8 |
+
|
| 9 |
+
Summarizes party press releases (and, when relevant, explains where populist framing appears), and
|
| 10 |
+
|
| 11 |
+
Classifies whether the text contains populist language (Is_Populist ∈ {0,1}).
|
| 12 |
+
|
| 13 |
+
Weights here are the merged LoRA result—no adapters required.
|
| 14 |
+
|
| 15 |
+
The model was trained on ~10k official party press releases from 12 countries (Italy, Sweden, Switzerland, Netherlands, Germany, Denmark, Spain, UK, Austria, Poland, Ireland, France) that were labeled and summarized via a Palantir AIP Ontology step using GPT-4o.
|
| 16 |
+
|
| 17 |
+
## Model Details
|
| 18 |
+
|
| 19 |
+
Pretrained Model: facebook/bart-base (seq2seq) fine-tuned with LoRA and then merged.
|
| 20 |
+
Instruction Framing: Two prefixes:
|
| 21 |
+
|
| 22 |
+
Summarize: summarize: <original_text>
|
| 23 |
+
|
| 24 |
+
Classify: classify_populism: <original_text> → model outputs 0 or 1 (or you can argmax over first decoder step logits for tokens “0” vs “1”).
|
| 25 |
+
|
| 26 |
+
Tokenization: BART’s subword tokenizer (Byte-Pair Encoding).
|
| 27 |
+
|
| 28 |
+
Input Processing: Text is truncated to 1024 tokens; summaries capped at 128 tokens.
|
| 29 |
+
|
| 30 |
+
Output Generation (summarization): beam search (typically 5 beams), mild length penalty, and no-repeat bigrams to reduce redundancy.
|
| 31 |
+
|
| 32 |
+
Key Parameters:
|
| 33 |
+
|
| 34 |
+
Max Input Length: 1024 tokens — fits long releases while controlling memory.
|
| 35 |
+
|
| 36 |
+
Max Target Length: 128 tokens — concise summaries with good coverage.
|
| 37 |
+
|
| 38 |
+
Beam Search: ~5 beams — balances quality and speed.
|
| 39 |
+
|
| 40 |
+
Classification Decoding: read the first generated token (0/1) or take first-step logits for a deterministic argmax.
|
| 41 |
+
|
| 42 |
+
Generation Process (high level)
|
| 43 |
+
|
| 44 |
+
Input Tokenization: Convert text to subwords and build the encoder input.
|
| 45 |
+
|
| 46 |
+
Beam Search (summarize): Explore multiple candidate sequences, pick the most probable.
|
| 47 |
+
|
| 48 |
+
Output Decoding: Map token IDs back to text, skipping special tokens.
|
| 49 |
+
|
| 50 |
+
Model Hub: tdickson17/Populism_detection
|
| 51 |
+
|
| 52 |
+
Repository: https://github.com/tcdickson/Populism.git
|
| 53 |
+
|
| 54 |
+
## Training Details
|
| 55 |
+
|
| 56 |
+
Data Collection:
|
| 57 |
+
Press releases were scraped from official party websites to capture formal statements and policy messaging. A Palantir AIP Ontology step (powered by GPT-4o) produced:
|
| 58 |
+
|
| 59 |
+
Is_Populist (binary) — whether the text exhibits populist framing (e.g., “people vs. elites,” anti-institutional rhetoric).
|
| 60 |
+
|
| 61 |
+
Summaries/Explanations — concise abstracts; when populism is present, the text explains where/how it appears.
|
| 62 |
+
|
| 63 |
+
Preprocessing:
|
| 64 |
+
HTML/boilerplate removal, normalization, and formatting into pairs:
|
| 65 |
+
|
| 66 |
+
Input: original release text (title optional at inference)
|
| 67 |
+
|
| 68 |
+
Targets: (a) abstract summary/explanation, (b) binary label
|
| 69 |
+
|
| 70 |
+
Training Objective:
|
| 71 |
+
Supervised fine-tuning for joint tasks:
|
| 72 |
+
|
| 73 |
+
Abstractive summarization (seq2seq cross-entropy)
|
| 74 |
+
|
| 75 |
+
Binary classification (decoded 0/1 via the same seq2seq head)
|
| 76 |
+
|
| 77 |
+
Training Strategy:
|
| 78 |
+
|
| 79 |
+
Base: facebook/bart-base
|
| 80 |
+
|
| 81 |
+
Method: LoRA on attention/FFN blocks (r=16, α=32, dropout=0.05), then merged into base.
|
| 82 |
+
|
| 83 |
+
Decoding: beam search for summaries; argmax or short generation for labels.
|
| 84 |
+
|
| 85 |
+
Evaluation signals: ROUGE for summaries; Accuracy/Precision/Recall/F1 for classification.
|
| 86 |
+
|
| 87 |
+
This setup lets one checkpoint handle both analysis (populism flag) and explanation (summary) with simple instruction prefixes.
|
| 88 |
+
|
| 89 |
+
## Usage:
|
| 90 |
+
|
| 91 |
+
install dependency:
|
| 92 |
+
Bash: pip install transformers
|
| 93 |
+
|
| 94 |
+
then run:
|
| 95 |
+
|
| 96 |
+
import torch
|
| 97 |
+
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
|
| 98 |
+
|
| 99 |
+
MODEL_ID = "tdickson17/Populism_detection"
|
| 100 |
+
device = "cuda" if torch.cuda.is_available() else "cpu"
|
| 101 |
+
|
| 102 |
+
tok = AutoTokenizer.from_pretrained(MODEL_ID)
|
| 103 |
+
model = AutoModelForSeq2SeqLM.from_pretrained(MODEL_ID).to(device).eval()
|
| 104 |
+
|
| 105 |
+
MAX_SRC, MAX_SUM = 1024, 128
|
| 106 |
+
DEC_START = model.config.decoder_start_token_id
|
| 107 |
+
ID0 = tok("0", add_special_tokens=False)["input_ids"][0]
|
| 108 |
+
ID1 = tok("1", add_special_tokens=False)["input_ids"][0]
|
| 109 |
+
|
| 110 |
+
THRESHOLD = 0.5 # raise for higher precision, lower for higher recall
|
| 111 |
+
POSITIVE_MSG = "This text DOES contain populist sentiment.\n"
|
| 112 |
+
NEGATIVE_MSG = "Populist sentiment is NOT detected in this text.\n"
|
| 113 |
+
|
| 114 |
+
GEN_SUM = dict(
|
| 115 |
+
do_sample=False, num_beams=5,
|
| 116 |
+
max_new_tokens=MAX_SUM, min_new_tokens=16,
|
| 117 |
+
length_penalty=1.1, no_repeat_ngram_size=3
|
| 118 |
+
)
|
| 119 |
+
|
| 120 |
+
@torch.no_grad()
|
| 121 |
+
def summarize(text: str) -> str:
|
| 122 |
+
enc = tok("summarize: " + text, return_tensors="pt",
|
| 123 |
+
truncation=True, max_length=MAX_SRC).to(device)
|
| 124 |
+
out = model.generate(**enc, **GEN_SUM)
|
| 125 |
+
s = tok.decode(out[0], skip_special_tokens=True).strip()
|
| 126 |
+
if s.lower().startswith("summarize:"):
|
| 127 |
+
s = s.split(":", 1)[1].strip()
|
| 128 |
+
return s
|
| 129 |
+
|
| 130 |
+
@torch.no_grad()
|
| 131 |
+
def classify_populism_prob(text: str) -> float:
|
| 132 |
+
enc = tok("classify_populism: " + text, return_tensors="pt",
|
| 133 |
+
truncation=True, max_length=MAX_SRC).to(device)
|
| 134 |
+
dec_inp = torch.tensor([[DEC_START]], device=device)
|
| 135 |
+
logits = model(**enc, decoder_input_ids=dec_inp, use_cache=False).logits[:, -1, :]
|
| 136 |
+
|
| 137 |
+
two = torch.stack([logits[:, ID0], logits[:, ID1]], dim=-1)
|
| 138 |
+
p1 = torch.softmax(two, dim=-1)[0, 1].item()
|
| 139 |
+
return p1
|
| 140 |
+
|
| 141 |
+
def classify_populism_label(text: str, threshold: float = THRESHOLD, include_probability: bool = True) -> str:
|
| 142 |
+
p1 = classify_populism_prob(text)
|
| 143 |
+
msg = POSITIVE_MSG if p1 >= threshold else NEGATIVE_MSG
|
| 144 |
+
return f"{msg} Confidence={p1:.3f}%" if include_probability else msg
|
| 145 |
+
|
| 146 |
+
# Example
|
| 147 |
+
text = """<Insert Text here>"""
|
| 148 |
+
print(classify_populism_label(text))
|
| 149 |
+
print("\nSummary:\n", summarize(text))
|
| 150 |
+
|
| 151 |
+
|
| 152 |
+
|
| 153 |
+
## Citation:
|
| 154 |
+
|
| 155 |
+
@article{dickson2024going,
|
| 156 |
+
title={Going against the grain: Climate change as a wedge issue for the radical right},
|
| 157 |
+
author={Dickson, Zachary P and Hobolt, Sara B},
|
| 158 |
+
journal={Comparative Political Studies},
|
| 159 |
+
year={2024},
|
| 160 |
+
publisher={SAGE Publications Sage CA: Los Angeles, CA}
|
| 161 |
+
}
|