|
|
--- |
|
|
license: mit |
|
|
--- |
|
|
|
|
|
|
|
|
|
|
|
Classifier is fine-tuned from [deberta-v3-base](https://huggingface.co/microsoft/deberta-v3-base) on [this forecastability classification dataset](https://huggingface.co/datasets/noanabeshima/forecastability_classification) to predict if Claude 3.7 Sonnet thinks a [fineweb](https://huggingface.co/datasets/HuggingFaceFW/fineweb/viewer/default/train) document is 'forecastable', i.e. is a useful seed for generating pastcasting questions. |
|
|
|
|
|
Despite having a ROC AUC of .9625, only ~2% of fineweb documents are considered forecastable, so this classifier's precision/recall curves on random unseen fineweb documents look like this: |
|
|
|
|
|
|
|
|
 |
|
|
|
|
|
|
|
|
To load the model use |
|
|
``` |
|
|
model = AutoModel.from_pretrained('noanabeshima/forecastability-classifier-v1') |
|
|
tokenizer = AutoTokenizer.from_pretrained('noanabeshima/forecastability-classifier-v1') |
|
|
``` |