This is a transformers model trained on the U.S. Comparative Agendas Project (CAP) dataset, annotated with a top-level taxonomy covering 20 policy areas, as well as an "Others" category for non-policy-related text. The model is designed to identify policy and non-policy issues in political discourse.
This model was trained specifically for additional analyses presented in this paper.
Model performance
The model performance on unseen test set is as follows:
| Label | F1 score |
|---|---|
| Macroeconomics | 0.8303 |
| Civil rights | 0.7676 |
| Health | 0.8886 |
| Agriculture | 0.8439 |
| Labor | 0.7818 |
| Education | 0.9005 |
| Environment | 0.8481 |
| Energy | 0.8629 |
| Immigration | 0.8682 |
| Transportation | 0.8731 |
| Law and crime | 0.8207 |
| Social welfare | 0.7957 |
| Housing | 0.8462 |
| Domestic commerce | 0.8421 |
| Defense | 0.8627 |
| Technology | 0.8333 |
| Foreign trade | 0.8269 |
| International affairs | 0.8907 |
| Government operations | 0.8777 |
| Public lands | 0.8758 |
| Others | 0.6543 |
| Macro average | 0.8573 |
Citation
If you find this model useful for your work, please cite:
@article{aroyehun2025computational,
title={Computational analysis of US congressional speeches reveals a shift from evidence to intuition},
author={Aroyehun, Segun T and Simchon, Almog and Carrella, Fabio and Lasser, Jana and Lewandowsky, Stephan and Garcia, David},
journal={Nature Human Behaviour},
year={2025},
doi={10.1038/s41562-025-02136-2},
url={https://doi.org/10.1038/s41562-025-02136-2}
}
- Downloads last month
- 20