eruiner's picture
Upload README.md with huggingface_hub
1184440 verified
---
license: apache-2.0
base_model:
- Qwen/Qwen2.5-1.5B
- Qwen/Qwen2.5-3B
task_categories:
- text-classification
language:
- en
- zh
tags:
- quality-assessment
- text-quality
- regression
pipeline_tag: text-classification
library_name: transformers
---
# Qwen2.5 Text Quality Classifier
Fine-tuned Qwen2.5-1.5B and Qwen2.5-3B models for automated text quality assessment. Predicts quality scores on a 0-1 scale focusing on educational value and mathematical intelligence.
## Model Details
- **Base Models**: Qwen2.5-1.5B / Qwen2.5-3B
- **Task**: Text Quality Regression
- **Languages**: English, Chinese
- **Training Data**: [OpenSQZ/Classifiers-Data](https://huggingface.co/datasets/OpenSQZ/Classifiers-Data)
- **Loss Function**: MSE Loss
## Performance
| Model | Test MSE Loss |
|-------|---------------|
| Qwen2.5-1.5B | 0.00226 |
| Qwen2.5-3B | 0.00209 |
## Quick Start
### Installation
```bash
pip install transformers torch
```
### Usage
```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
# Load model and tokenizer
model_name = "OpenSQZ/Qwen2.5-1.5B-Classifier" # or Qwen2.5-3B-Quality-Classifier
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Predict quality score
text = "Linear algebra is fundamental to understanding vector spaces and matrix operations in mathematics."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=8192)
with torch.no_grad():
outputs = model(**inputs)
score = torch.sigmoid(outputs.logits).item()
print(f"Quality Score: {score:.3f}") # Output: Quality Score: 0.847
```
## Quality Score Interpretation
| Score Range | Quality Level | Use Case |
|-------------|---------------|----------|
| 0.8 - 1.0 | Excellent | Premium training data |
| 0.6 - 0.8 | Good | Standard training data |
| 0.4 - 0.6 | Average | Conditional use |
| 0.0 - 0.4 | Poor | Filter out |
## Model Selection
- **1.5B Model**: Faster inference, good for real-time applications
- **3B Model**: Higher accuracy, better for batch processing
## Limitations
- Optimized for educational and mathematical content
- May not generalize well to creative or subjective content
- Scores should be used as guidance, not absolute judgments
## Citation
```bibtex
@model{qwen25_quality_classifier_2025,
title={Qwen2.5 Text Quality Classifier},
author={Chao Li, Yifan Zhang},
year={2025},
publisher={OpenSQZ}
}
```
## License
Apache 2.0