File size: 2,518 Bytes
---
license: apache-2.0
base_model: 
- Qwen/Qwen2.5-1.5B
- Qwen/Qwen2.5-3B
task_categories:
- text-classification
language:
- en
- zh
tags:
- quality-assessment
- text-quality
- regression
pipeline_tag: text-classification
library_name: transformers
---

# Qwen2.5 Text Quality Classifier

Fine-tuned Qwen2.5-1.5B and Qwen2.5-3B models for automated text quality assessment. Predicts quality scores on a 0-1 scale focusing on educational value and mathematical intelligence.

## Model Details

- **Base Models**: Qwen2.5-1.5B / Qwen2.5-3B  
- **Task**: Text Quality Regression
- **Languages**: English, Chinese
- **Training Data**: [OpenSQZ/Classifiers-Data](https://huggingface.co/datasets/OpenSQZ/Classifiers-Data)
- **Loss Function**: MSE Loss

## Performance

| Model | Test MSE Loss |
|-------|---------------|
| Qwen2.5-1.5B | 0.00226 |
| Qwen2.5-3B | 0.00209 |

## Quick Start

### Installation
```bash
pip install transformers torch
```

### Usage

```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

# Load model and tokenizer
model_name = "OpenSQZ/Qwen2.5-1.5B-Classifier"  # or Qwen2.5-3B-Quality-Classifier
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Predict quality score
text = "Linear algebra is fundamental to understanding vector spaces and matrix operations in mathematics."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=8192)

with torch.no_grad():
    outputs = model(**inputs)
    score = torch.sigmoid(outputs.logits).item()

print(f"Quality Score: {score:.3f}")  # Output: Quality Score: 0.847
```

## Quality Score Interpretation

| Score Range | Quality Level | Use Case |
|-------------|---------------|----------|
| 0.8 - 1.0 | Excellent | Premium training data |
| 0.6 - 0.8 | Good | Standard training data |
| 0.4 - 0.6 | Average | Conditional use |
| 0.0 - 0.4 | Poor | Filter out |

## Model Selection

- **1.5B Model**: Faster inference, good for real-time applications
- **3B Model**: Higher accuracy, better for batch processing

## Limitations

- Optimized for educational and mathematical content
- May not generalize well to creative or subjective content
- Scores should be used as guidance, not absolute judgments

## Citation

```bibtex
@model{qwen25_quality_classifier_2025,
  title={Qwen2.5 Text Quality Classifier},
  author={Chao Li, Yifan Zhang},
  year={2025},
  publisher={OpenSQZ}
}
```

## License

Apache 2.0