|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: |
|
|
- Qwen/Qwen2.5-1.5B |
|
|
- Qwen/Qwen2.5-3B |
|
|
task_categories: |
|
|
- text-classification |
|
|
language: |
|
|
- en |
|
|
- zh |
|
|
tags: |
|
|
- quality-assessment |
|
|
- text-quality |
|
|
- regression |
|
|
pipeline_tag: text-classification |
|
|
library_name: transformers |
|
|
--- |
|
|
|
|
|
# Qwen2.5 Text Quality Classifier |
|
|
|
|
|
Fine-tuned Qwen2.5-1.5B and Qwen2.5-3B models for automated text quality assessment. Predicts quality scores on a 0-1 scale focusing on educational value and mathematical intelligence. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Base Models**: Qwen2.5-1.5B / Qwen2.5-3B |
|
|
- **Task**: Text Quality Regression |
|
|
- **Languages**: English, Chinese |
|
|
- **Training Data**: [OpenSQZ/Classifiers-Data](https://huggingface.co/datasets/OpenSQZ/Classifiers-Data) |
|
|
- **Loss Function**: MSE Loss |
|
|
|
|
|
## Performance |
|
|
|
|
|
| Model | Test MSE Loss | |
|
|
|-------|---------------| |
|
|
| Qwen2.5-1.5B | 0.00226 | |
|
|
| Qwen2.5-3B | 0.00209 | |
|
|
|
|
|
## Quick Start |
|
|
|
|
|
### Installation |
|
|
```bash |
|
|
pip install transformers torch |
|
|
``` |
|
|
|
|
|
### Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForSequenceClassification, AutoTokenizer |
|
|
import torch |
|
|
|
|
|
# Load model and tokenizer |
|
|
model_name = "OpenSQZ/Qwen2.5-1.5B-Classifier" # or Qwen2.5-3B-Quality-Classifier |
|
|
model = AutoModelForSequenceClassification.from_pretrained(model_name) |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
|
|
|
# Predict quality score |
|
|
text = "Linear algebra is fundamental to understanding vector spaces and matrix operations in mathematics." |
|
|
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=8192) |
|
|
|
|
|
with torch.no_grad(): |
|
|
outputs = model(**inputs) |
|
|
score = torch.sigmoid(outputs.logits).item() |
|
|
|
|
|
print(f"Quality Score: {score:.3f}") # Output: Quality Score: 0.847 |
|
|
``` |
|
|
|
|
|
## Quality Score Interpretation |
|
|
|
|
|
| Score Range | Quality Level | Use Case | |
|
|
|-------------|---------------|----------| |
|
|
| 0.8 - 1.0 | Excellent | Premium training data | |
|
|
| 0.6 - 0.8 | Good | Standard training data | |
|
|
| 0.4 - 0.6 | Average | Conditional use | |
|
|
| 0.0 - 0.4 | Poor | Filter out | |
|
|
|
|
|
## Model Selection |
|
|
|
|
|
- **1.5B Model**: Faster inference, good for real-time applications |
|
|
- **3B Model**: Higher accuracy, better for batch processing |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Optimized for educational and mathematical content |
|
|
- May not generalize well to creative or subjective content |
|
|
- Scores should be used as guidance, not absolute judgments |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@model{qwen25_quality_classifier_2025, |
|
|
title={Qwen2.5 Text Quality Classifier}, |
|
|
author={Chao Li, Yifan Zhang}, |
|
|
year={2025}, |
|
|
publisher={OpenSQZ} |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
Apache 2.0 |
|
|
|