You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Our models are intended for academic projects and academic research only. If you are not affiliated with an academic institution, please reach out to us at huggingface [at] poltextlab [dot] com for further inquiry. If we cannot clearly determine your academic affiliation and use case based on your form data, your request may be rejected. Please allow us a few business days to manually review subscriptions.

Log in or Sign Up to review the conditions and access this model content.

xlm-roberta-large-pooled-cap-minor-v5

This model was fine-tuned on multilingual multi-domain data labeled with the CAP Minor topics (https://www.comparativeagendas.net/pages/master-codebook).

How to use the model

from transformers import AutoTokenizer, pipeline

tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-large")
pipe = pipeline(
    model="poltextlab/xlm-roberta-large-pooled-cap-minor-v5-e8checkpoint",
    task="text-classification",
    tokenizer=tokenizer,
    use_fast=False,
    token="<your_hf_read_only_token>"
)

text = "<text_to_classify>"
pipe(text)

Classification Report

Overall Performance:

  • Accuracy: N/A
  • Macro Avg: Precision: 0.68, Recall: 0.61, F1-score: 0.63
  • Weighted Avg: Precision: 0.81, Recall: 0.82, F1-score: 0.81

Per-Language Weighted F1-Scores:

  • hu: 0.88 (on 204 labels)
  • pt: 0.86 (on 146 labels)
  • en: 0.79 (on 213 labels)
  • it: 0.56 (on 193 labels)
  • es: 0.52 (on 187 labels)
  • fr: 0.50 (on 192 labels)
  • da: 0.48 (on 130 labels)

Per-Class Metrics:

Label Precision Recall F1-score Support
(100) Macroeconomics – General 0.64 0.67 0.65 2378
(101) Macroeconomics – Interest Rates 0.71 0.61 0.66 313
(103) Macroeconomics – Unemployment Rate 0.59 0.52 0.55 454
(104) Macroeconomics – Monetary Policy 0.72 0.68 0.7 645
(105) Macroeconomics – National Budget 0.75 0.81 0.78 5394
(107) Macroeconomics – Tax Code 0.74 0.79 0.77 3152
(108) Macroeconomics – Industrial Policy 0.65 0.65 0.65 871
(110) Macroeconomics – Price Control 0.62 0.48 0.54 160
(199) Macroeconomics – Other 0 0 0 15
(200) Civil Rights – General 0.59 0.59 0.59 796
(201) Civil Rights – Minority Discrimination 0.74 0.72 0.73 723
(202) Civil Rights – Gender Discrimination 0.75 0.75 0.75 554
(204) Civil Rights – Age Discrimination 0.64 0.33 0.44 54
(205) Civil Rights – Handicap Discrimination 0.74 0.66 0.69 152
(206) Civil Rights – Voting Rights 0.71 0.65 0.68 284
(207) Civil Rights – Freedom of Speech 0.72 0.67 0.69 389
(208) Civil Rights – Right to Privacy 0.71 0.62 0.66 561
(209) Civil Rights – Anti-Government 0.71 0.37 0.48 123
(299) Civil Rights – Other 0.46 0.12 0.19 51
(300) Health – General 0.61 0.6 0.6 824
(301) Health – Health Care Reform 0.49 0.46 0.47 556
(302) Health – Insurance 0.64 0.61 0.62 728
(321) Health – Drug Industry 0.65 0.62 0.64 228
(322) Health – Medical Facilities 0.77 0.69 0.73 422
(323) Health – Insurance Providers 0.77 0.61 0.68 285
(324) Health – Medical Liability 0.78 0.68 0.73 158
(325) Health – Manpower 0.66 0.69 0.67 360
(331) Health – Disease Prevention 0.79 0.78 0.78 783
(332) Health – Infants and Children 0.73 0.73 0.73 242
(333) Health – Mental Health 0.83 0.81 0.82 218
(334) Health – Long-term Care 0.74 0.71 0.72 264
(335) Health – Drug Coverage and Cost 0.78 0.78 0.78 207
(341) Health – Tobacco Abuse 0.84 0.85 0.85 232
(342) Health – Drug and Alcohol Abuse 0.75 0.64 0.69 286
(398) Health – R&D 0.72 0.75 0.73 244
(399) Health – Other 0.5 0.14 0.22 78
(400) Agriculture – General 0.67 0.67 0.67 1116
(401) Agriculture – Trade 0.52 0.5 0.51 325
(402) Agriculture – Subsidies to Farmers 0.71 0.68 0.7 776
(403) Agriculture – Food Inspection & Safety 0.69 0.7 0.69 319
(404) Agriculture – Food Marketing & Promotion 0.64 0.58 0.61 133
(405) Agriculture – Animal and Crop Disease 0.83 0.78 0.81 327
(408) Agriculture – Fisheries & Fishing 0.84 0.74 0.78 207
(498) Agriculture – R&D 0.69 0.75 0.72 111
(499) Agriculture – Other 0.63 0.44 0.52 218
(500) Labor – General 0.67 0.61 0.64 1076
(501) Labor – Worker Safety 0.74 0.71 0.73 259
(502) Labor – Employment Training 0.63 0.61 0.62 777
(503) Labor – Employee Benefits 0.68 0.63 0.65 918
(504) Labor – Labor Unions 0.69 0.66 0.67 742
(505) Labor – Fair Labor Standards 0.59 0.59 0.59 556
(506) Labor – Youth Employment 0.64 0.68 0.66 324
(529) Labor – Migrant and Seasonal 0.63 0.53 0.57 150
(599) Labor – Other 0.7 0.12 0.2 59
(600) Education – General 0.63 0.58 0.61 811
(601) Education – Higher 0.85 0.84 0.84 1836
(602) Education – Elementary & Secondary 0.77 0.8 0.78 2400
(603) Education – Underprivileged 0.51 0.46 0.48 163
(604) Education – Vocational 0.76 0.76 0.76 468
(606) Education – Special 0.73 0.76 0.74 123
(607) Education – Excellence 0.57 0.54 0.55 275
(698) Education – R&D 0.74 0.25 0.37 56
(699) Education – Other 0.72 0.24 0.36 115
(700) Environment – General 0.72 0.73 0.72 979
(701) Environment – Drinking Water 0.7 0.7 0.7 322
(703) Environment – Waste Disposal 0.84 0.82 0.83 367
(704) Environment – Hazardous Waste 0.68 0.75 0.72 208
(705) Environment – Air Pollution 0.78 0.8 0.79 645
(707) Environment – Recycling 0.81 0.89 0.85 63
(708) Environment – Indoor Hazards 0.69 0.67 0.68 49
(709) Environment – Species & Forest 0.76 0.79 0.77 476
(711) Environment – Land and Water Conservation 0.55 0.38 0.45 173
(798) Environment – R&D 0.53 0.2 0.3 44
(799) Environment – Other 0.6 0.06 0.11 51
(800) Energy – General 0.76 0.74 0.75 802
(801) Energy – Nuclear 0.79 0.72 0.75 289
(802) Energy – Electricity 0.72 0.66 0.69 328
(803) Energy – Natural Gas & Oil 0.75 0.77 0.76 646
(805) Energy – Coal 0.73 0.73 0.73 133
(806) Energy – Alternative & Renewable 0.74 0.76 0.75 270
(807) Energy – Conservation 0.65 0.69 0.67 185
(898) Energy – R&D 0.7 0.29 0.41 55
(899) Energy – Other 0 0 0 23
(900) Immigration – Immigration 0.76 0.74 0.75 1411
(999) No Policy Content 0.94 0.97 0.96 89092
(1000) Transportation – General 0.68 0.66 0.67 488
(1001) Transportation – Mass 0.64 0.6 0.62 298
(1002) Transportation – Highways 0.86 0.84 0.85 1715
(1003) Transportation – Air Travel 0.8 0.8 0.8 360
(1005) Transportation – Railroad Travel 0.79 0.8 0.79 539
(1007) Transportation – Maritime 0.79 0.81 0.8 516
(1010) Transportation – Infrastructure 0.56 0.51 0.54 220
(1098) Transportation – R&D 0 0 0 16
(1099) Transportation – Other 0.88 0.58 0.7 52
(1200) Law and Crime – General 0.59 0.56 0.58 714
(1201) Law and Crime – Agencies 0.78 0.79 0.79 1949
(1202) Law and Crime – White Collar Crime 0.56 0.58 0.57 402
(1203) Law and Crime – Illegal Drugs 0.76 0.79 0.78 384
(1204) Law and Crime – Court Administration 0.75 0.74 0.75 1601
(1205) Law and Crime – Prisons 0.79 0.75 0.77 426
(1206) Law and Crime – Juvenile Crime 0.69 0.71 0.7 173
(1207) Law and Crime – Child Abuse 0.77 0.74 0.76 273
(1208) Law and Crime – Family Issues 0.77 0.75 0.76 866
(1210) Law and Crime – Criminal & Civil Code 0.73 0.7 0.71 1480
(1211) Law and Crime – Crime Control 0.61 0.47 0.54 417
(1227) Law and Crime – Police 0.61 0.57 0.59 124
(1299) Law and Crime – Other 0.7 0.38 0.49 165
(1300) Social Welfare – General 0.66 0.62 0.64 1366
(1302) Social Welfare – Low-Income Assistance 0.63 0.63 0.63 589
(1303) Social Welfare – Elderly Assistance 0.74 0.72 0.73 971
(1304) Social Welfare – Disabled Assistance 0.65 0.68 0.66 336
(1305) Social Welfare – Volunteer Associations 0.67 0.65 0.66 422
(1308) Social Welfare – Child Care 0.72 0.77 0.74 351
(1399) Social Welfare – Other 0.5 0.02 0.05 41
(1400) Housing – General 0.69 0.68 0.68 1015
(1401) Housing – Community Development 0.47 0.42 0.44 396
(1403) Housing – Urban Development 0.55 0.54 0.55 329
(1404) Housing – Rural Housing 0 0 0 14
(1405) Housing – Rural Development 0.76 0.66 0.71 486
(1406) Housing – Low-Income Assistance 0.62 0.6 0.61 359
(1407) Housing – Veterans 0.82 0.74 0.78 42
(1408) Housing – Elderly 0.73 0.28 0.4 40
(1409) Housing – Homeless 0.68 0.74 0.71 90
(1498) Housing – R&D 0 0 0 0
(1499) Housing – Other 0.78 0.76 0.77 229
(1500) Domestic Commerce – General 0.62 0.64 0.63 1030
(1501) Domestic Commerce – Banking 0.69 0.65 0.67 745
(1502) Domestic Commerce – Securities & Commodities 0.56 0.57 0.57 282
(1504) Domestic Commerce – Consumer Finance 0.7 0.63 0.66 260
(1505) Domestic Commerce – Insurance Regulation 0.87 0.83 0.85 339
(1507) Domestic Commerce – Bankruptcy 0.66 0.65 0.66 148
(1520) Domestic Commerce – Corporate Management 0.62 0.58 0.6 636
(1521) Domestic Commerce – Small Businesses 0.75 0.73 0.74 651
(1522) Domestic Commerce – Copyrights and Patents 0.83 0.82 0.82 256
(1523) Domestic Commerce – Disaster Relief 0.79 0.75 0.77 421
(1524) Domestic Commerce – Tourism 0.73 0.77 0.75 210
(1525) Domestic Commerce – Consumer Safety 0.75 0.73 0.74 543
(1526) Domestic Commerce – Sports Regulation 0.84 0.86 0.85 794
(1598) Domestic Commerce – R&D 0 0 0 1
(1599) Domestic Commerce – Other 0.84 0.8 0.82 598
(1600) Defense – General 0.68 0.64 0.66 1165
(1602) Defense – Alliances 0.68 0.64 0.66 730
(1603) Defense – Intelligence 0.7 0.66 0.68 289
(1604) Defense – Readiness 0.72 0.73 0.72 564
(1605) Defense – Nuclear Arms 0.81 0.78 0.8 793
(1606) Defense – Military Aid 0.73 0.56 0.63 95
(1608) Defense – Personnel Issues 0.75 0.78 0.77 1284
(1610) Defense – Procurement 0.67 0.71 0.69 211
(1611) Defense – Installations & Land 0.6 0.64 0.62 160
(1612) Defense – Reserve Forces 0.6 0.59 0.59 68
(1614) Defense – Hazardous Waste 0 0 0 12
(1615) Defense – Civil 0.73 0.71 0.72 359
(1616) Defense – Civilian Personnel 0.86 0.28 0.42 65
(1617) Defense – Contractors 0.66 0.53 0.59 79
(1619) Defense – Foreign Operations 0.67 0.64 0.65 866
(1620) Defense – Claims against Military 0.72 0.67 0.69 174
(1698) Defense – R&D 0.63 0.37 0.47 46
(1699) Defense – Other 0.56 0.37 0.44 117
(1700) Technology – General 0.53 0.54 0.53 343
(1701) Technology – Space 0.82 0.86 0.84 129
(1704) Technology – Commercial Use of Space 0.88 0.44 0.58 32
(1705) Technology – Science Transfer 0.63 0.46 0.53 99
(1706) Technology – Telecommunications 0.7 0.74 0.72 345
(1707) Technology – Broadcast 0.9 0.9 0.9 1716
(1708) Technology – Weather Forecasting 0.72 0.77 0.74 62
(1709) Technology – Computers 0.72 0.73 0.73 241
(1798) Technology – R&D 0.74 0.79 0.76 638
(1799) Technology – Other 0.86 0.14 0.24 44
(1800) Foreign Trade – General 0.58 0.6 0.59 375
(1802) Foreign Trade – Trade Agreements 0.75 0.65 0.69 634
(1803) Foreign Trade – Exports 0.72 0.66 0.69 173
(1804) Foreign Trade – Private Investments 0.62 0.6 0.61 196
(1806) Foreign Trade – Competitiveness 0.63 0.62 0.62 329
(1807) Foreign Trade – Tariff & Imports 0.77 0.7 0.73 280
(1808) Foreign Trade – Exchange Rates 0.65 0.39 0.49 79
(1899) Foreign Trade – Other 0 0 0 8
(1900) International Affairs – General 0.71 0.77 0.74 2791
(1901) International Affairs – Foreign Aid 0.69 0.63 0.66 610
(1902) International Affairs – Resources Exploitation 0.62 0.32 0.42 151
(1905) International Affairs – Developing Countries 0.56 0.57 0.56 234
(1906) International Affairs – International Finance 0.64 0.63 0.63 483
(1910) International Affairs – Western Europe 0.68 0.68 0.68 1355
(1921) International Affairs – Specific Country 0.8 0.82 0.81 3664
(1925) International Affairs – Human Rights 0.7 0.7 0.7 505
(1926) International Affairs – Organizations 0.7 0.69 0.7 488
(1927) International Affairs – Terrorism 0.66 0.62 0.64 472
(1929) International Affairs – Diplomats 0.61 0.59 0.6 396
(1999) International Affairs – Other 0.65 0.18 0.28 72
(2000) Government Operations – General 0.67 0.6 0.63 1553
(2001) Government Operations – Intergovernmental Relations 0.66 0.66 0.66 2047
(2002) Government Operations – Bureaucracy 0.61 0.55 0.57 1247
(2003) Government Operations – Postal Service 0.85 0.87 0.86 269
(2004) Government Operations – Employees 0.75 0.74 0.74 1352
(2005) Government Operations – Appointments 0.76 0.78 0.77 540
(2006) Government Operations – Currency 0.71 0.6 0.65 179
(2007) Government Operations – Procurement & Contractors 0.74 0.62 0.67 577
(2008) Government Operations – Property Management 0.73 0.74 0.73 712
(2009) Government Operations – Tax Administration 0.74 0.62 0.67 238
(2010) Government Operations – Scandals 0.66 0.53 0.59 289
(2011) Government Operations – Branch Relations 0.68 0.67 0.68 1919
(2012) Government Operations – Political Campaigns 0.76 0.72 0.74 1821
(2013) Government Operations – Census & Statistics 0.89 0.84 0.87 81
(2014) Government Operations – Capital City 0.85 0.77 0.81 377
(2015) Government Operations – Claims against the Government 0.65 0.7 0.67 746
(2030) Government Operations – National Holidays 0.69 0.66 0.68 296
(2099) Government Operations – Other 0.51 0.35 0.41 323
(2100) Public Lands – General 0.61 0.32 0.42 158
(2101) Public Lands – National Parks 0.77 0.79 0.78 864
(2102) Public Lands – Indigenous Affairs 0.92 0.91 0.92 429
(2103) Public Lands – Public Lands 0.7 0.71 0.71 1137
(2104) Public Lands – Water Resources 0.77 0.75 0.76 760
(2105) Public Lands – Dependencies & Territories 0.66 0.64 0.65 307
(2199) Public Lands – Other 0.56 0.45 0.5 11
(2300) Culture – General 0.75 0.72 0.74 1363

Inference platform

This model is used by the CAP Babel Machine, an open-source and free natural language processing tool, designed to simplify and speed up projects for comparative research.

Cooperation

Model performance can be significantly improved by extending our training sets. We appreciate every submission of CAP-coded corpora (of any domain and language) at poltextlab{at}poltextlab{dot}com or by using the CAP Babel Machine.

Debugging and issues

This architecture uses the sentencepiece tokenizer. In order to run the model before transformers==4.27 you need to install it manually.

Downloads last month
8,879
Safetensors
Model size
0.6B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for poltextlab/xlm-roberta-large-pooled-cap-minor-v5

Finetuned
(922)
this model

Collection including poltextlab/xlm-roberta-large-pooled-cap-minor-v5

Evaluation results