File size: 3,305 Bytes
c69b8f2
 
87ecbe7
 
33c3207
 
 
 
87ecbe7
33c3207
 
 
 
 
 
 
 
 
 
 
 
a7d0f0b
c69b8f2
6c0ca72
 
 
 
 
4b18d1e
af998af
 
 
 
 
 
 
 
 
 
 
 
87ecbe7
af998af
ab734a4
 
 
 
 
 
1cfb9b0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8067575
 
 
 
 
4fa415e
8067575
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
---
license: afl-3.0
widget:
- text: >-
    To ask the Secretary of State for Energy and Climate Change what estimate he
    has made of the proportion of carbon dioxide emissions arising in the UK
    attributable to burning.
  example_title: English (UK House of Commons Question)
- text: >-
    To ask the Scottish Government what action it is taking to ensure that women
    who are prescribed sodium valproate are (a) adequately counselled regarding
    the risks of taking the drug while pregnant and (b) supported to plan their
    pregnancies in order to minimise the risk of foetal abnormalities.
  example_title: English (Scottish Parliamentary Question)
tags:
- CAP
- politics
- issues
- agenda
- multilingual
- science
- comparative agendas project
---

Multilingual Bert base (multilingual uncased) model trained to predict [CAP issue codes](https://www.comparativeagendas.net/pages/master-codebook). 

Model training on 120,000 assorted political documents -- mostly from the [Comparative Agendas Project](https://www.comparativeagendas.net/) 

# Countries: 
- Italy
- Sweden
- France
- Switzerland
- Poland
- Netherlands
- Germany
- Denmark
- Spain
- UK
- Austria
- Ireland


# LABELS USED IN TRAINING

- Model labels -> CAP labels:
- {0: 1.0, 1: 2.0, 2: 3.0, 3: 4.0, 4: 5.0, 5: 6.0, 6: 7.0, 7: 8.0, 8: 9.0, 9: 10.0, 10: 12.0, 11: 13.0, 12: 14.0, 13: 15.0, 14: 16.0, 15: 17.0, 16: 18.0, 17: 19.0, 18: 20.0, 19: 23.0}

- Model labels -> CAP issues:
- {0: 'macroeconomics', 1: 'civil_rights', 2: 'healthcare', 3: 'agriculture', 4: 'labour', 5: 'education', 6: 'environment', 7: 'energy', 8: 'immigration', 9: 'transportation', 10: 'law_crime', 11: 'social_welfare', 12: 'housing', 13: 'domestic_commerce', 14: 'defense', 15: 'technology', 16: 'foreign_trade', 17: 'international_affairs', 18: 'government_operations', 19: 'culture'}

 # Validation
 
| Class | Precision | Recall | F1-score | Support |
|---|---|---|---|---|
| 0 | 0.72 | 0.83 | 0.77 | 211 |
| 1 | 0.82 | 0.77 | 0.79 | 242 |
| 2 | 0.82 | 0.86 | 0.84 | 251 |
| 3 | 0.92 | 0.89 | 0.90 | 228 |
| 4 | 0.81 | 0.85 | 0.83 | 220 |
| 5 | 0.90 | 0.93 | 0.91 | 244 |
| 6 | 0.87 | 0.87 | 0.87 | 230 |
| 7 | 0.92 | 0.88 | 0.90 | 251 |
| 8 | 0.94 | 0.90 | 0.92 | 237 |
| 9 | 0.87 | 0.88 | 0.87 | 263 |
| 10 | 0.70 | 0.88 | 0.78 | 189 |
| 11 | 0.90 | 0.81 | 0.85 | 248 |
| 12 | 0.87 | 0.90 | 0.88 | 222 |
| 13 | 0.76 | 0.72 | 0.74 | 255 |
| 14 | 0.84 | 0.84 | 0.84 | 241 |
| 15 | 0.92 | 0.79 | 0.85 | 276 |
| 16 | 0.95 | 0.90 | 0.92 | 258 |
| 17 | 0.71 | 0.82 | 0.76 | 200 |
| 18 | 0.77 | 0.73 | 0.75 | 215 |
| 19 | 0.92 | 0.91 | 0.92 | 239 |
| Accuracy | --- 0.85 --- | | | |
| Macro Avg | 0.85 | 0.85 | 0.85 | 4720 |
| Weighted Avg | 0.85 | 0.85 | 0.85 | 4720 |




```python
from transformers import AutoModelForSequenceClassification
from transformers import TextClassificationPipeline, AutoTokenizer

mp = 'z-dickson/CAP_multilingual'
model = AutoModelForSequenceClassification.from_pretrained(mp)
tokenizer =  AutoTokenizer.from_pretrained(mp)

classifier = TextClassificationPipeline(tokenizer=tokenizer, model=model, device=0)

classifier("""
To ask the Secretary of State for Energy and Climate \\
Change what estimate he has made of the proportion of carbon \\
dioxide emissions arising in the UK attributable to burning.
"""
)
```