This is a transformers model trained on the U.S. Comparative Agendas Project (CAP) dataset, annotated with a top-level taxonomy covering 20 policy areas, as well as an "Others" category for non-policy-related text. The model is designed to identify policy and non-policy issues in political discourse.

This model was trained specifically for additional analyses presented in this paper.

Model performance

The model performance on unseen test set is as follows:

Label	F1 score
Macroeconomics	0.8303
Civil rights	0.7676
Health	0.8886
Agriculture	0.8439
Labor	0.7818
Education	0.9005
Environment	0.8481
Energy	0.8629
Immigration	0.8682
Transportation	0.8731
Law and crime	0.8207
Social welfare	0.7957
Housing	0.8462
Domestic commerce	0.8421
Defense	0.8627
Technology	0.8333
Foreign trade	0.8269
International affairs	0.8907
Government operations	0.8777
Public lands	0.8758
Others	0.6543
Macro average	0.8573

Citation

If you find this model useful for your work, please cite:

@article{aroyehun2025computational,
  title={Computational analysis of US congressional speeches reveals a shift from evidence to intuition},
  author={Aroyehun, Segun T and Simchon, Almog and Carrella, Fabio and Lasser, Jana and Lewandowsky, Stephan and Garcia, David},
  journal={Nature Human Behaviour},
  year={2025},
  doi={10.1038/s41562-025-02136-2},
  url={https://doi.org/10.1038/s41562-025-02136-2}  
}

Downloads last month: 2