File size: 974 Bytes
c2f286f
 
 
 
9c8e045
5f77729
0320db8
c2f286f
c5cf5e0
c2f286f
 
 
 
9c8e045
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
---
license: mit
---



Classifier is fine-tuned from [deberta-v3-base](https://huggingface.co/microsoft/deberta-v3-base) on [this forecastability classification dataset](https://huggingface.co/datasets/noanabeshima/forecastability_classification) to predict if Claude 3.7 Sonnet thinks a [fineweb](https://huggingface.co/datasets/HuggingFaceFW/fineweb/viewer/default/train) document is 'forecastable', i.e. is a useful seed for generating pastcasting questions.

Despite having a ROC AUC of .9625, only ~2% of fineweb documents are considered forecastable, so this classifier's precision/recall curves on random unseen fineweb documents look like this:


![image/png](https://cdn-uploads.huggingface.co/production/uploads/62cdc97b3a2cecfdabed40dc/44NnVScT0QdM5ydWWxaeR.png)


To load the model use
```
model = AutoModel.from_pretrained('noanabeshima/forecastability-classifier-v1')
tokenizer = AutoTokenizer.from_pretrained('noanabeshima/forecastability-classifier-v1')
```