fastText
German
LetiP commited on
Commit
d8c2469
·
verified ·
1 Parent(s): f813a3e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -16,7 +16,7 @@ To train Aleph-Alpha-GermanWeb-Quality-Classifier-fastText, we used an LLM-as-a-
16
 
17
  For each document, we calculated a combined educational quality score by taking the minimum over the three criteria rated by the LLM-as-a-judge. We then used these educational quality scores as the training signal for the quality classification model. The Aleph-Alpha-GermanWeb-Quality-Classifier-fastText model was tasked with distinguishing between texts with educational quality scores of one or two (“low quality”) vs. four or five (“high quality”) given the document's text.
18
 
19
- We trained Aleph-Alpha-GermanWeb-Quality-Classifier-fastText using 185,403 documents in each class. We used 95% of the data (and the remaining 5% for validation) to train a fastText model to classify between high and low quality text data. It reached 92% precision and 91.5% recall on the validation set.
20
 
21
  Further details, including our LLM judging prompt, can be found in our accompanying paper (link to paper coming soon).
22
 
 
16
 
17
  For each document, we calculated a combined educational quality score by taking the minimum over the three criteria rated by the LLM-as-a-judge. We then used these educational quality scores as the training signal for the quality classification model. The Aleph-Alpha-GermanWeb-Quality-Classifier-fastText model was tasked with distinguishing between texts with educational quality scores of one or two (“low quality”) vs. four or five (“high quality”) given the document's text.
18
 
19
+ We trained Aleph-Alpha-GermanWeb-Quality-Classifier-fastText using 185,403 documents in each class. We used 95% of the data (and the remaining 5% for validation) to train a fastText model to classify between high and low quality text data. It reached 77% precision and 77% recall on the validation set.
20
 
21
  Further details, including our LLM judging prompt, can be found in our accompanying paper (link to paper coming soon).
22