Text Classification
Transformers
PyTorch
bert
CAP
politics
issues
agenda
multilingual
science
comparative agendas project
text-embeddings-inference
Instructions to use z-dickson/CAP_multilingual with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use z-dickson/CAP_multilingual with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="z-dickson/CAP_multilingual")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("z-dickson/CAP_multilingual") model = AutoModelForSequenceClassification.from_pretrained("z-dickson/CAP_multilingual") - Notebooks
- Google Colab
- Kaggle
| license: afl-3.0 | |
| widget: | |
| - text: >- | |
| To ask the Secretary of State for Energy and Climate Change what estimate he | |
| has made of the proportion of carbon dioxide emissions arising in the UK | |
| attributable to burning. | |
| example_title: English (UK House of Commons Question) | |
| - text: >- | |
| To ask the Scottish Government what action it is taking to ensure that women | |
| who are prescribed sodium valproate are (a) adequately counselled regarding | |
| the risks of taking the drug while pregnant and (b) supported to plan their | |
| pregnancies in order to minimise the risk of foetal abnormalities. | |
| example_title: English (Scottish Parliamentary Question) | |
| tags: | |
| - CAP | |
| - politics | |
| - issues | |
| - agenda | |
| - multilingual | |
| - science | |
| - comparative agendas project | |
| Multilingual Bert base (multilingual uncased) model trained to predict [CAP issue codes](https://www.comparativeagendas.net/pages/master-codebook) from text documents such as speeches, press releases, social media messages, news articles, bills, laws etc.. | |
| Model training on 120,000 assorted political documents -- mostly from the [Comparative Agendas Project](https://www.comparativeagendas.net/) | |
| # Countries: | |
| - Italy | |
| - Sweden | |
| - France | |
| - Switzerland | |
| - Poland | |
| - Netherlands | |
| - Germany | |
| - Denmark | |
| - Spain | |
| - UK | |
| - Austria | |
| - Ireland | |
| # LABELS USED IN TRAINING | |
| - Model labels -> CAP labels: | |
| - {0: 1.0, 1: 2.0, 2: 3.0, 3: 4.0, 4: 5.0, 5: 6.0, 6: 7.0, 7: 8.0, 8: 9.0, 9: 10.0, 10: 12.0, 11: 13.0, 12: 14.0, 13: 15.0, 14: 16.0, 15: 17.0, 16: 18.0, 17: 19.0, 18: 20.0, 19: 23.0} | |
| - Model labels -> CAP issues: | |
| - {0: 'macroeconomics', 1: 'civil_rights', 2: 'healthcare', 3: 'agriculture', 4: 'labour', 5: 'education', 6: 'environment', 7: 'energy', 8: 'immigration', 9: 'transportation', 10: 'law_crime', 11: 'social_welfare', 12: 'housing', 13: 'domestic_commerce', 14: 'defense', 15: 'technology', 16: 'foreign_trade', 17: 'international_affairs', 18: 'government_operations', 19: 'culture'} | |
| # Validation | |
| | Class | Precision | Recall | F1-score | Support | | |
| |---|---|---|---|---| | |
| | 0 | 0.72 | 0.83 | 0.77 | 211 | | |
| | 1 | 0.82 | 0.77 | 0.79 | 242 | | |
| | 2 | 0.82 | 0.86 | 0.84 | 251 | | |
| | 3 | 0.92 | 0.89 | 0.90 | 228 | | |
| | 4 | 0.81 | 0.85 | 0.83 | 220 | | |
| | 5 | 0.90 | 0.93 | 0.91 | 244 | | |
| | 6 | 0.87 | 0.87 | 0.87 | 230 | | |
| | 7 | 0.92 | 0.88 | 0.90 | 251 | | |
| | 8 | 0.94 | 0.90 | 0.92 | 237 | | |
| | 9 | 0.87 | 0.88 | 0.87 | 263 | | |
| | 10 | 0.70 | 0.88 | 0.78 | 189 | | |
| | 11 | 0.90 | 0.81 | 0.85 | 248 | | |
| | 12 | 0.87 | 0.90 | 0.88 | 222 | | |
| | 13 | 0.76 | 0.72 | 0.74 | 255 | | |
| | 14 | 0.84 | 0.84 | 0.84 | 241 | | |
| | 15 | 0.92 | 0.79 | 0.85 | 276 | | |
| | 16 | 0.95 | 0.90 | 0.92 | 258 | | |
| | 17 | 0.71 | 0.82 | 0.76 | 200 | | |
| | 18 | 0.77 | 0.73 | 0.75 | 215 | | |
| | 19 | 0.92 | 0.91 | 0.92 | 239 | | |
| | Accuracy | --- 0.85 --- | | | | | |
| | Macro Avg | 0.85 | 0.85 | 0.85 | 4720 | | |
| | Weighted Avg | 0.85 | 0.85 | 0.85 | 4720 | | |
| ```python | |
| from transformers import AutoModelForSequenceClassification | |
| from transformers import TextClassificationPipeline, AutoTokenizer | |
| mp = 'z-dickson/CAP_multilingual' | |
| model = AutoModelForSequenceClassification.from_pretrained(mp) | |
| tokenizer = AutoTokenizer.from_pretrained(mp) | |
| classifier = TextClassificationPipeline(tokenizer=tokenizer, model=model, device=0) | |
| classifier(""" | |
| To ask the Secretary of State for Energy and Climate \\ | |
| Change what estimate he has made of the proportion of carbon \\ | |
| dioxide emissions arising in the UK attributable to burning. | |
| """ | |
| ) | |
| ``` |