Cyrile/dataset-quality
Viewer • Updated • 44.8k • 47 • 1
How to use Cyrile/EuroBERT-210m-Quality-CL with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-classification", model="Cyrile/EuroBERT-210m-Quality-CL", trust_remote_code=True) # Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("Cyrile/EuroBERT-210m-Quality-CL", trust_remote_code=True)
model = AutoModelForSequenceClassification.from_pretrained("Cyrile/EuroBERT-210m-Quality-CL", trust_remote_code=True)Automatically assess the quality of textual data using a clear and intuitive scale, adapted for both natural language (NL) and code language (CL).
We compare two distinct approaches:
| Catégorie | Global (NL + CL) | NL | CL |
|---|---|---|---|
| Harmfull | 0.86 | 0.93 | 0.79 |
| Low | 0.62 | 0.81 | 0.40 |
| Medium | 0.63 | 0.78 | 0.50 |
| High | 0.77 | 0.81 | 0.74 |
| Accuracy | 0.73 | 0.83 | 0.62 |
| Catégorie | Global (NL + CL) | NL | CL |
|---|---|---|---|
| Harmfull | 0.83 | 0.93 | 0.72 |
| Low | 0.64 | 0.76 | 0.53 |
| Medium | 0.63 | 0.76 | 0.52 |
| High | 0.79 | 0.81 | 0.76 |
| Accuracy | 0.73 | 0.82 | 0.63 |
Unified Model (NL + CL):
Separate Models:
Please cite or link back to this model on Hugging Face Hub if used in your projects.
Base model
EuroBERT/EuroBERT-210m