tdickson17 commited on
Commit
0dcd4c8
·
verified ·
1 Parent(s): 4598311

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +97 -0
README.md ADDED
@@ -0,0 +1,97 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ pipeline_tag: summarization
4
+ ---
5
+ # Populism Detection & Summarization
6
+
7
+ This checkpoint is a BART-based, LoRA-fine-tuned model that does two things:
8
+
9
+ Summarizes party press releases (and, when relevant, explains where populist framing appears), and
10
+
11
+ Classifies whether the text contains populist language (Is_Populist ∈ {0,1}).
12
+
13
+ Weights here are the merged LoRA result—no adapters required.
14
+
15
+ The model was trained on ~10k official party press releases from 12 countries (Italy, Sweden, Switzerland, Netherlands, Germany, Denmark, Spain, UK, Austria, Poland, Ireland, France) that were labeled and summarized via a Palantir AIP Ontology step using GPT-4o.
16
+
17
+ ## Model Details
18
+
19
+ Pretrained Model: facebook/bart-base (seq2seq) fine-tuned with LoRA and then merged.
20
+ Instruction Framing: Two prefixes:
21
+
22
+ Summarize: summarize: <original_text>
23
+
24
+ Classify: classify_populism: <original_text> → model outputs 0 or 1 (or you can argmax over first decoder step logits for tokens “0” vs “1”).
25
+
26
+ Tokenization: BART’s subword tokenizer (Byte-Pair Encoding).
27
+
28
+ Input Processing: Text is truncated to 1024 tokens; summaries capped at 128 tokens.
29
+
30
+ Output Generation (summarization): beam search (typically 5 beams), mild length penalty, and no-repeat bigrams to reduce redundancy.
31
+
32
+ Key Parameters:
33
+
34
+ Max Input Length: 1024 tokens — fits long releases while controlling memory.
35
+
36
+ Max Target Length: 128 tokens — concise summaries with good coverage.
37
+
38
+ Beam Search: ~5 beams — balances quality and speed.
39
+
40
+ Classification Decoding: read the first generated token (0/1) or take first-step logits for a deterministic argmax.
41
+
42
+ Generation Process (high level)
43
+
44
+ Input Tokenization: Convert text to subwords and build the encoder input.
45
+
46
+ Beam Search (summarize): Explore multiple candidate sequences, pick the most probable.
47
+
48
+ Output Decoding: Map token IDs back to text, skipping special tokens.
49
+
50
+ Model Hub: tdickson17/Populism_detection
51
+
52
+ Repository: https://github.com/tcdickson/Populism.git
53
+
54
+ ## Training Details
55
+
56
+ Data Collection:
57
+ Press releases were scraped from official party websites to capture formal statements and policy messaging. A Palantir AIP Ontology step (powered by GPT-4o) produced:
58
+
59
+ Is_Populist (binary) — whether the text exhibits populist framing (e.g., “people vs. elites,” anti-institutional rhetoric).
60
+
61
+ Summaries/Explanations — concise abstracts; when populism is present, the text explains where/how it appears.
62
+
63
+ Preprocessing:
64
+ HTML/boilerplate removal, normalization, and formatting into pairs:
65
+
66
+ Input: original release text (title optional at inference)
67
+
68
+ Targets: (a) abstract summary/explanation, (b) binary label
69
+
70
+ Training Objective:
71
+ Supervised fine-tuning for joint tasks:
72
+
73
+ Abstractive summarization (seq2seq cross-entropy)
74
+
75
+ Binary classification (decoded 0/1 via the same seq2seq head)
76
+
77
+ Training Strategy:
78
+
79
+ Base: facebook/bart-base
80
+
81
+ Method: LoRA on attention/FFN blocks (r=16, α=32, dropout=0.05), then merged into base.
82
+
83
+ Decoding: beam search for summaries; argmax or short generation for labels.
84
+
85
+ Evaluation signals: ROUGE for summaries; Accuracy/Precision/Recall/F1 for classification.
86
+
87
+ This setup lets one checkpoint handle both analysis (populism flag) and explanation (summary) with simple instruction prefixes.
88
+
89
+ ## Citation:
90
+
91
+ @article{dickson2024going,
92
+ title={Going against the grain: Climate change as a wedge issue for the radical right},
93
+ author={Dickson, Zachary P and Hobolt, Sara B},
94
+ journal={Comparative Political Studies},
95
+ year={2024},
96
+ publisher={SAGE Publications Sage CA: Los Angeles, CA}
97
+ }