File size: 3,314 Bytes
f533772 349b43c f533772 349b43c f533772 d88b3b4 e2d0a47 d88b3b4 e2d0a47 d88b3b4 e2d0a47 d88b3b4 e2d0a47 d88b3b4 e2d0a47 d88b3b4 0b47efc d88b3b4 e2d0a47 d88b3b4 e2d0a47 d88b3b4 e2d0a47 d88b3b4 e2d0a47 d88b3b4 e2d0a47 d88b3b4 e2d0a47 d88b3b4 e2d0a47 d88b3b4 2386c55 e2d0a47 d88b3b4 0d23886 d88b3b4 0d23886 d88b3b4 e2d0a47 d88b3b4 e2d0a47 2386c55 e2d0a47 d88b3b4 e2d0a47 d88b3b4 e2d0a47 2386c55 d88b3b4 e2d0a47 d88b3b4 e2d0a47 d88b3b4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 | ---
language: zh
license: apache-2.0
library_name: transformers
tags:
- text-classification
- sentiment-analysis
- chinese
- movie-review
datasets:
- utmhikari/doubanmovieshortcomments
base_model: hfl/chinese-roberta-wwm-ext
pipeline_tag: text-classification
---
**Chinese Movie Review Sentiment Classification Model (5-Star Rating)**
---
## 1. Model Overview
`H-Z-Ning/Senti-RoBERTa-Mini` is a lightweight Chinese RoBERTa model fine-tuned specifically for assigning 1-to-5-star sentiment ratings to Chinese movie short reviews. Built on the HFL-Tencent `hfl/chinese-roberta-wwm-ext` checkpoint, it retains a small footprint and fast inference, making it ideal for resource-constrained deployments.
---
## 2. Model Facts
| Item | Details |
|---|---|
| Task | Chinese text classification (sentiment / star rating) |
| Labels | 5 classes (1 star – 5 stars) |
| Base model | [hfl/chinese-roberta-wwm-ext](https://huggingface.co/hfl/chinese-roberta-wwm-ext) |
| Dataset | [Kaggle: Douban Movie Short Comments (2000 K)](https://www.kaggle.com/datasets/utmhikari/doubanmovieshortcomments) |
| Training framework | 🤗 transformers + Trainer |
| Language | Simplified Chinese |
| Parameters | ≈ 102 M (same as base model) |
---
## 3. Quick Start
### 3.1 Install Dependencies
```bash
pip install transformers torch
```
### 3.2 One-Line Inference
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
repo = "H-Z-Ning/Senti-RoBERTa-Mini"
tok = AutoTokenizer.from_pretrained(repo)
model = AutoModelForSequenceClassification.from_pretrained(repo)
text = "这个导演真厉害。"
inputs = tok(text, return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
logits = model(**inputs).logits
pred = int(torch.argmax(logits, dim=-1).item()) + 1 # 1..5
print("predicted rating:", pred)
```
---
## 4.Training source code
**[senti-roberta-mini training source code](https://www.kaggle.com/code/hzning/senti-roberta-mini)**
## 5. Training Details
| Hyper-parameter | Value |
|---|---|
| Base model | hfl/chinese-roberta-wwm-ext |
| Training framework | 🤗 transformers `Trainer` |
| Training set | 150 000 samples (randomly drawn from 2000 K) |
| Validation set | 15 000 samples (same random draw) |
| Test set | full original test set |
| Max sequence length | 256 |
| Training epochs | 3 |
| Batch size | 32 (train) / 64 (eval) |
| Learning rate | 2e-5 |
| Optimizer | AdamW |
| Weight decay | 0.01 |
| Scheduler | linear warmup (warmup_ratio=0.1) |
| Precision | FP16 |
| Best-model criterion | **QWK (↑)** |
| Training time | ≈ 120 min on single P100 (FP16) |
| Logging interval | every 10 steps |
---
## 6. Citation
```bibtex
@misc{senti-roberta-mini-2025,
title={Senti-RoBERTa-Mini: A Mini Chinese RoBERTa for Movie Review Rating},
author={H-Z-Ning},
year={2025},
howpublished={\url{https://huggingface.co/H-Z-Ning/Senti-RoBERTa-Mini}}
}
```
---
## 7. License
This model is released under [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0). The base model `hfl/chinese-roberta-wwm-ext` is also Apache-2.0.
---
Community contributions and feedback are welcome! If you encounter any issues, please open an [Issue](https://huggingface.co/H-Z-Ning/Senti-RoBERTa-Mini/discussions) or email the author. |