CAP_multilingual / README.md
z-dickson's picture
Update README.md
4fa415e verified
---
license: afl-3.0
widget:
- text: >-
To ask the Secretary of State for Energy and Climate Change what estimate he
has made of the proportion of carbon dioxide emissions arising in the UK
attributable to burning.
example_title: English (UK House of Commons Question)
- text: >-
To ask the Scottish Government what action it is taking to ensure that women
who are prescribed sodium valproate are (a) adequately counselled regarding
the risks of taking the drug while pregnant and (b) supported to plan their
pregnancies in order to minimise the risk of foetal abnormalities.
example_title: English (Scottish Parliamentary Question)
tags:
- CAP
- politics
- issues
- agenda
- multilingual
- science
- comparative agendas project
---
Multilingual Bert base (multilingual uncased) model trained to predict [CAP issue codes](https://www.comparativeagendas.net/pages/master-codebook).
Model training on 120,000 assorted political documents -- mostly from the [Comparative Agendas Project](https://www.comparativeagendas.net/)
# Countries:
- Italy
- Sweden
- France
- Switzerland
- Poland
- Netherlands
- Germany
- Denmark
- Spain
- UK
- Austria
- Ireland
# LABELS USED IN TRAINING
- Model labels -> CAP labels:
- {0: 1.0, 1: 2.0, 2: 3.0, 3: 4.0, 4: 5.0, 5: 6.0, 6: 7.0, 7: 8.0, 8: 9.0, 9: 10.0, 10: 12.0, 11: 13.0, 12: 14.0, 13: 15.0, 14: 16.0, 15: 17.0, 16: 18.0, 17: 19.0, 18: 20.0, 19: 23.0}
- Model labels -> CAP issues:
- {0: 'macroeconomics', 1: 'civil_rights', 2: 'healthcare', 3: 'agriculture', 4: 'labour', 5: 'education', 6: 'environment', 7: 'energy', 8: 'immigration', 9: 'transportation', 10: 'law_crime', 11: 'social_welfare', 12: 'housing', 13: 'domestic_commerce', 14: 'defense', 15: 'technology', 16: 'foreign_trade', 17: 'international_affairs', 18: 'government_operations', 19: 'culture'}
# Validation
| Class | Precision | Recall | F1-score | Support |
|---|---|---|---|---|
| 0 | 0.72 | 0.83 | 0.77 | 211 |
| 1 | 0.82 | 0.77 | 0.79 | 242 |
| 2 | 0.82 | 0.86 | 0.84 | 251 |
| 3 | 0.92 | 0.89 | 0.90 | 228 |
| 4 | 0.81 | 0.85 | 0.83 | 220 |
| 5 | 0.90 | 0.93 | 0.91 | 244 |
| 6 | 0.87 | 0.87 | 0.87 | 230 |
| 7 | 0.92 | 0.88 | 0.90 | 251 |
| 8 | 0.94 | 0.90 | 0.92 | 237 |
| 9 | 0.87 | 0.88 | 0.87 | 263 |
| 10 | 0.70 | 0.88 | 0.78 | 189 |
| 11 | 0.90 | 0.81 | 0.85 | 248 |
| 12 | 0.87 | 0.90 | 0.88 | 222 |
| 13 | 0.76 | 0.72 | 0.74 | 255 |
| 14 | 0.84 | 0.84 | 0.84 | 241 |
| 15 | 0.92 | 0.79 | 0.85 | 276 |
| 16 | 0.95 | 0.90 | 0.92 | 258 |
| 17 | 0.71 | 0.82 | 0.76 | 200 |
| 18 | 0.77 | 0.73 | 0.75 | 215 |
| 19 | 0.92 | 0.91 | 0.92 | 239 |
| Accuracy | --- 0.85 --- | | | |
| Macro Avg | 0.85 | 0.85 | 0.85 | 4720 |
| Weighted Avg | 0.85 | 0.85 | 0.85 | 4720 |
```python
from transformers import AutoModelForSequenceClassification
from transformers import TextClassificationPipeline, AutoTokenizer
mp = 'z-dickson/CAP_multilingual'
model = AutoModelForSequenceClassification.from_pretrained(mp)
tokenizer = AutoTokenizer.from_pretrained(mp)
classifier = TextClassificationPipeline(tokenizer=tokenizer, model=model, device=0)
classifier("""
To ask the Secretary of State for Energy and Climate \\
Change what estimate he has made of the proportion of carbon \\
dioxide emissions arising in the UK attributable to burning.
"""
)
```