jason23322
/

email-classifier-optimized

Safetensors

mobilebert

Model card Files Files and versions

xet

Community

jason23322 commited on Jul 3, 2025

Commit

39853e4

verified ·

1 Parent(s): 60f8c5b

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +166 -0

README.md ADDED Viewed

	@@ -0,0 +1,166 @@

+# 中文邮件分类模型 (Chinese Email Classification Model)
+## 模型概述
+这是一个基于MobileBERT的中文邮件分类模型，专门用于对邮件内容进行6类分类。该模型经过优化，适合在移动设备和资源受限的环境中使用。
+## 模型信息
+- **模型架构**: MobileBERTForSequenceClassification
+- **基础模型**: MobileBERT (针对移动设备优化的BERT变体)
+- **语言**: 中文 (Chinese)
+- **任务**: 文本分类 (Text Classification)
+- **类别数**: 6个邮件类别
+## 分类标签
+```
+0: 工作邮件 (Work Email)
+1: 个人邮件 (Personal Email)
+2: 促销邮件 (Promotional Email)
+3: 垃圾邮件 (Spam Email)
+4: 通知邮件 (Notification Email)
+5: 其他邮件 (Other Email)
+```
+## 模型性能
+基于3000个训练样本和500个测试样本的评估结果：
+- **准确率 (Accuracy)**: 98.2%
+- **F1分数 (F1-Score)**: 98.2%
+- **精确率 (Precision)**: 98.2%
+- **召回率 (Recall)**: 98.2%
+## 模型参数
+- **参数量**: 24,584,966 (~2450万)
+- **模型大小**: ~94 MB
+- **最大序列长度**: 128 tokens
+- **词汇表大小**: 30,522
+## 使用方法
+### 快速开始
+```python
+from transformers import MobileBertForSequenceClassification, MobileBertTokenizer
+import torch
+# 加载模型和tokenizer
+model = MobileBertForSequenceClassification.from_pretrained('your-username/email-classifier-chinese')
+tokenizer = MobileBertTokenizer.from_pretrained('your-username/email-classifier-chinese')
+# 预测示例
+def predict_email_category(text):
+    inputs = tokenizer(
+        text,
+        return_tensors='pt',
+        truncation=True,
+        padding='max_length',
+        max_length=128
+    )
+    with torch.no_grad():
+        outputs = model(**inputs)
+        predictions = torch.softmax(outputs.logits, dim=-1)
+        predicted_class = torch.argmax(predictions, dim=-1).item()
+    labels = {
+        0: "工作邮件",
+        1: "个人邮件",
+        2: "促销邮件",
+        3: "垃圾邮件",
+        4: "通知邮件",
+        5: "其他邮件"
+    }
+    confidence = predictions[0][predicted_class].item()
+    return labels[predicted_class], confidence
+# 使用示例
+email_text = "恭喜您中奖了！点击链接领取奖品。"
+category, confidence = predict_email_category(email_text)
+print(f"邮件类型: {category}, 置信度: {confidence:.3f}")
+# 输出: 邮件类型: 促销邮件, 置信度: 0.920
+```
+### Transformers Pipeline
+```python
+from transformers import pipeline
+classifier = pipeline(
+    "text-classification",
+    model="your-username/email-classifier-chinese",
+    tokenizer="your-username/email-classifier-chinese"
+)
+result = classifier("您好，请查收今天的工作报告。")
+print(result)
+```
+## 训练详情
+- **训练设备**: CPU
+- **训练轮次**: 2 epochs
+- **批次大小**: 4
+- **梯度累积步数**: 4
+- **学习率**: 3e-05
+- **优化器**: AdamW
+- **训练时间**: ~10分钟
+## 移动端部署
+该模型特别适合移动端部署：
+1. **Android集成**: 支持PyTorch Mobile
+2. **iOS集成**: 支持Core ML转换
+3. **边缘计算**: 可在边缘设备上运行
+4. **量化支持**: 支持INT8量化以减少模型大小
+详细的移动端集成指南请参考模型仓库中的文档。
+## 使用场景
+- 邮件客户端自动分类
+- 垃圾邮件过滤
+- 邮件管理系统
+- 企业邮件自动化处理
+- 移动邮件应用
+## 限制和注意事项
+1. **语言限制**: 主要针对中文邮件优化
+2. **领域适应**: 可能需要针对特定领域进行微调
+3. **上下文长度**: 最大支持128个token
+4. **数据隐私**: 建议在本地设备上处理敏感邮件内容
+## 引用
+如果您使用了这个模型，请考虑引用：
+```bibtex
+@misc{chinese-email-classifier-2024,
+  title={Chinese Email Classification Model Based on MobileBERT},
+  author={Your Name},
+  year={2024},
+  publisher={Hugging Face},
+  journal={Hugging Face Model Hub},
+  howpublished={\\url{https://huggingface.co/your-username/email-classifier-chinese}}
+}
+```
+## 许可证
+本模型基于Apache 2.0许可证发布。
+## 联系信息
+如有问题或建议，请通过以下方式联系：
+- GitHub Issues: [项目链接]
+- Email: [您的邮箱]
+---
+**免责声明**: 本模型仅供研究和非商业用途。在生产环境中使用前，请进行充分的测试和验证。