File size: 5,720 Bytes
4df1b82
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
---
library_name: transformers
pipeline_tag: summarization
---
# Populism Detection & Summarization

This checkpoint is a BART-based, LoRA-fine-tuned model that does two things:

Summarizes party press releases (and, when relevant, explains where populist framing appears), and

Classifies whether the text contains populist language (Is_Populist  {0,1}).

Weights here are the merged LoRA result—no adapters required.

The model was trained on ~10k official party press releases from 12 countries (Italy, Sweden, Switzerland, Netherlands, Germany, Denmark, Spain, UK, Austria, Poland, Ireland, France) that were labeled and summarized via a Palantir AIP Ontology step using GPT-4o.

## Model Details

Pretrained Model: facebook/bart-base (seq2seq) fine-tuned with LoRA and then merged.
Instruction Framing: Two prefixes:

Summarize: summarize: <original_text>

Classify: classify_populism: <original_text>  model outputs 0 or 1 (or you can argmax over first decoder step logits for tokens “0” vs “1”).

Tokenization: BART’s subword tokenizer (Byte-Pair Encoding).

Input Processing: Text is truncated to 1024 tokens; summaries capped at 128 tokens.

Output Generation (summarization): beam search (typically 5 beams), mild length penalty, and no-repeat bigrams to reduce redundancy.

Key Parameters:

Max Input Length: 1024 tokens  fits long releases while controlling memory.

Max Target Length: 128 tokens  concise summaries with good coverage.

Beam Search: ~5 beams  balances quality and speed.

Classification Decoding: read the first generated token (0/1) or take first-step logits for a deterministic argmax.

Generation Process (high level)

Input Tokenization: Convert text to subwords and build the encoder input.

Beam Search (summarize): Explore multiple candidate sequences, pick the most probable.

Output Decoding: Map token IDs back to text, skipping special tokens.

Model Hub: tdickson17/Populism_detection

Repository: https://github.com/tcdickson/Populism.git

## Training Details

Data Collection:
Press releases were scraped from official party websites to capture formal statements and policy messaging. A Palantir AIP Ontology step (powered by GPT-4o) produced:

Is_Populist (binary)  whether the text exhibits populist framing (e.g., “people vs. elites,” anti-institutional rhetoric).

Summaries/Explanations  concise abstracts; when populism is present, the text explains where/how it appears.

Preprocessing:
HTML/boilerplate removal, normalization, and formatting into pairs:

Input: original release text (title optional at inference)

Targets: (a) abstract summary/explanation, (b) binary label

Training Objective:
Supervised fine-tuning for joint tasks:

Abstractive summarization (seq2seq cross-entropy)

Binary classification (decoded 0/1 via the same seq2seq head)

Training Strategy:

Base: facebook/bart-base

Method: LoRA on attention/FFN blocks (r=16, α=32, dropout=0.05), then merged into base.

Decoding: beam search for summaries; argmax or short generation for labels.

Evaluation signals: ROUGE for summaries; Accuracy/Precision/Recall/F1 for classification.

This setup lets one checkpoint handle both analysis (populism flag) and explanation (summary) with simple instruction prefixes.

## Usage:

install dependency:
Bash: pip install transformers

then run:

import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

MODEL_ID = "tdickson17/Populism_detection"
device = "cuda" if torch.cuda.is_available() else "cpu"

tok = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForSeq2SeqLM.from_pretrained(MODEL_ID).to(device).eval()

MAX_SRC, MAX_SUM = 1024, 128
DEC_START = model.config.decoder_start_token_id
ID0 = tok("0", add_special_tokens=False)["input_ids"][0]
ID1 = tok("1", add_special_tokens=False)["input_ids"][0]

THRESHOLD = 0.5  # raise for higher precision, lower for higher recall
POSITIVE_MSG = "This text DOES contain populist sentiment.\n"
NEGATIVE_MSG = "Populist sentiment is NOT detected in this text.\n"

GEN_SUM = dict(
    do_sample=False, num_beams=5,
    max_new_tokens=MAX_SUM, min_new_tokens=16,
    length_penalty=1.1, no_repeat_ngram_size=3
)

@torch.no_grad()
def summarize(text: str) -> str:
    enc = tok("summarize: " + text, return_tensors="pt",
              truncation=True, max_length=MAX_SRC).to(device)
    out = model.generate(**enc, **GEN_SUM)
    s = tok.decode(out[0], skip_special_tokens=True).strip()
    if s.lower().startswith("summarize:"):
        s = s.split(":", 1)[1].strip()
    return s

@torch.no_grad()
def classify_populism_prob(text: str) -> float:
    enc = tok("classify_populism: " + text, return_tensors="pt",
              truncation=True, max_length=MAX_SRC).to(device)
    dec_inp = torch.tensor([[DEC_START]], device=device)
    logits = model(**enc, decoder_input_ids=dec_inp, use_cache=False).logits[:, -1, :]

    two = torch.stack([logits[:, ID0], logits[:, ID1]], dim=-1)
    p1 = torch.softmax(two, dim=-1)[0, 1].item()
    return p1

def classify_populism_label(text: str, threshold: float = THRESHOLD, include_probability: bool = True) -> str:
    p1 = classify_populism_prob(text)
    msg = POSITIVE_MSG if p1 >= threshold else NEGATIVE_MSG
    return f"{msg} Confidence={p1:.3f}%" if include_probability else msg

# Example
text = """<Insert Text here>"""
print(classify_populism_label(text))
print("\nSummary:\n", summarize(text))



## Citation:

@article{dickson2024going,
  title={Going against the grain: Climate change as a wedge issue for the radical right},
  author={Dickson, Zachary P and Hobolt, Sara B},
  journal={Comparative Political Studies},
  year={2024},
  publisher={SAGE Publications Sage CA: Los Angeles, CA}
}