This model is fine-tuned on 4,000 paragraphs from 10-K reports to detect firms' climate change–related disclosures.
The original model, climatebert/distilroberta-base-climate-detector, had a high false positive rate. It often misclassifies descriptive language about business operations in environmentally sensitive industries as climate change–related content.
I used a 3:1:1 split for training, validation, and testing. After fine-tuning, the model's accuracy in classifying climate change–related paragraphs in 10-K reports improved from 0.759 to 0.978, and the improvement in accuracy is mainly driven by fewer false positive cases. *Note: The original model yielded zero false negatives, indicating strong capability in identifying climate change–related disclosures.
In the validation set, the number of false positives dropped significantly, from 193 to 20.
- Downloads last month
- -