z-dickson
/

CAP_multilingual

Text Classification

comparative agendas project

text-embeddings-inference

Model card Files Files and versions

CAP_multilingual / README.md

z-dickson's picture

Update README.md

85e66f6 verified about 1 month ago

|

3.41 kB

	---
	license: afl-3.0
	widget:
	- text: >-
	To ask the Secretary of State for Energy and Climate Change what estimate he
	has made of the proportion of carbon dioxide emissions arising in the UK
	attributable to burning.
	example_title: English (UK House of Commons Question)
	- text: >-
	To ask the Scottish Government what action it is taking to ensure that women
	who are prescribed sodium valproate are (a) adequately counselled regarding
	the risks of taking the drug while pregnant and (b) supported to plan their
	pregnancies in order to minimise the risk of foetal abnormalities.
	example_title: English (Scottish Parliamentary Question)
	tags:
	- CAP
	- politics
	- issues
	- agenda
	- multilingual
	- science
	- comparative agendas project
	---

	Multilingual Bert base (multilingual uncased) model trained to predict [CAP issue codes](https://www.comparativeagendas.net/pages/master-codebook) from text documents such as speeches, press releases, social media messages, news articles, bills, laws etc..

	Model training on 120,000 assorted political documents -- mostly from the [Comparative Agendas Project](https://www.comparativeagendas.net/)

	# Countries:
	- Italy
	- Sweden
	- France
	- Switzerland
	- Poland
	- Netherlands
	- Germany
	- Denmark
	- Spain
	- UK
	- Austria
	- Ireland


	# LABELS USED IN TRAINING

	- Model labels -> CAP labels:
	- {0: 1.0, 1: 2.0, 2: 3.0, 3: 4.0, 4: 5.0, 5: 6.0, 6: 7.0, 7: 8.0, 8: 9.0, 9: 10.0, 10: 12.0, 11: 13.0, 12: 14.0, 13: 15.0, 14: 16.0, 15: 17.0, 16: 18.0, 17: 19.0, 18: 20.0, 19: 23.0}

	- Model labels -> CAP issues:
	- {0: 'macroeconomics', 1: 'civil_rights', 2: 'healthcare', 3: 'agriculture', 4: 'labour', 5: 'education', 6: 'environment', 7: 'energy', 8: 'immigration', 9: 'transportation', 10: 'law_crime', 11: 'social_welfare', 12: 'housing', 13: 'domestic_commerce', 14: 'defense', 15: 'technology', 16: 'foreign_trade', 17: 'international_affairs', 18: 'government_operations', 19: 'culture'}

	# Validation

	\| Class \| Precision \| Recall \| F1-score \| Support \|
	\|---\|---\|---\|---\|---\|
	\| 0 \| 0.72 \| 0.83 \| 0.77 \| 211 \|
	\| 1 \| 0.82 \| 0.77 \| 0.79 \| 242 \|
	\| 2 \| 0.82 \| 0.86 \| 0.84 \| 251 \|
	\| 3 \| 0.92 \| 0.89 \| 0.90 \| 228 \|
	\| 4 \| 0.81 \| 0.85 \| 0.83 \| 220 \|
	\| 5 \| 0.90 \| 0.93 \| 0.91 \| 244 \|
	\| 6 \| 0.87 \| 0.87 \| 0.87 \| 230 \|
	\| 7 \| 0.92 \| 0.88 \| 0.90 \| 251 \|
	\| 8 \| 0.94 \| 0.90 \| 0.92 \| 237 \|
	\| 9 \| 0.87 \| 0.88 \| 0.87 \| 263 \|
	\| 10 \| 0.70 \| 0.88 \| 0.78 \| 189 \|
	\| 11 \| 0.90 \| 0.81 \| 0.85 \| 248 \|
	\| 12 \| 0.87 \| 0.90 \| 0.88 \| 222 \|
	\| 13 \| 0.76 \| 0.72 \| 0.74 \| 255 \|
	\| 14 \| 0.84 \| 0.84 \| 0.84 \| 241 \|
	\| 15 \| 0.92 \| 0.79 \| 0.85 \| 276 \|
	\| 16 \| 0.95 \| 0.90 \| 0.92 \| 258 \|
	\| 17 \| 0.71 \| 0.82 \| 0.76 \| 200 \|
	\| 18 \| 0.77 \| 0.73 \| 0.75 \| 215 \|
	\| 19 \| 0.92 \| 0.91 \| 0.92 \| 239 \|
	\| Accuracy \| --- 0.85 --- \| \| \| \|
	\| Macro Avg \| 0.85 \| 0.85 \| 0.85 \| 4720 \|
	\| Weighted Avg \| 0.85 \| 0.85 \| 0.85 \| 4720 \|




	```python
	from transformers import AutoModelForSequenceClassification
	from transformers import TextClassificationPipeline, AutoTokenizer

	mp = 'z-dickson/CAP_multilingual'
	model = AutoModelForSequenceClassification.from_pretrained(mp)
	tokenizer = AutoTokenizer.from_pretrained(mp)

	classifier = TextClassificationPipeline(tokenizer=tokenizer, model=model, device=0)

	classifier("""
	To ask the Secretary of State for Energy and Climate \\
	Change what estimate he has made of the proportion of carbon \\
	dioxide emissions arising in the UK attributable to burning.
	"""
	)
	```