Update README.md
Browse files
README.md
CHANGED
|
@@ -11,4 +11,104 @@ tags:
|
|
| 11 |
- text2text-generation
|
| 12 |
- onnx
|
| 13 |
- mobile
|
| 14 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
- text2text-generation
|
| 12 |
- onnx
|
| 13 |
- mobile
|
| 14 |
+
---
|
| 15 |
+
|
| 16 |
+
# Gmail Email Classifier (FLAN-T5 ONNX)
|
| 17 |
+
|
| 18 |
+
A fine-tuned FLAN-T5-small model for email classification, optimized for on-device inference in mobile apps using ONNX Runtime.
|
| 19 |
+
|
| 20 |
+
## Model Description
|
| 21 |
+
|
| 22 |
+
This model classifies emails into 5 categories and determines if action is required:
|
| 23 |
+
|
| 24 |
+
| Category | Description |
|
| 25 |
+
|----------|-------------|
|
| 26 |
+
| **PERSONAL** | 1:1 human communication, social messages |
|
| 27 |
+
| **NEWSLETTER** | Marketing, promotions, subscribed content |
|
| 28 |
+
| **TRANSACTION** | Orders, receipts, payments, confirmations |
|
| 29 |
+
| **ALERT** | Security notices, important notifications |
|
| 30 |
+
| **SOCIAL** | Social network notifications, community updates |
|
| 31 |
+
|
| 32 |
+
### Output Format
|
| 33 |
+
|
| 34 |
+
```
|
| 35 |
+
CATEGORY | ACTION/NO_ACTION | Brief summary
|
| 36 |
+
```
|
| 37 |
+
|
| 38 |
+
**Example:**
|
| 39 |
+
|
| 40 |
+
```
|
| 41 |
+
Input: "Subject: Your order has shipped\n\nBody: Your order #12345 is on its way..."
|
| 42 |
+
Output: "TRANSACTION | NO_ACTION | Order shipment confirmation for #12345"
|
| 43 |
+
```
|
| 44 |
+
|
| 45 |
+
## Intended Use
|
| 46 |
+
|
| 47 |
+
- **Primary:** On-device email triage in mobile apps (iOS/Android)
|
| 48 |
+
- **Runtime:** ONNX Runtime React Native
|
| 49 |
+
- **Use case:** Prioritizing inbox, filtering noise, surfacing actionable emails
|
| 50 |
+
|
| 51 |
+
## Model Details
|
| 52 |
+
|
| 53 |
+
| Attribute | Value |
|
| 54 |
+
|-----------|-------|
|
| 55 |
+
| Base Model | `google/flan-t5-small` |
|
| 56 |
+
| Parameters | ~80M |
|
| 57 |
+
| Architecture | T5 Encoder-Decoder |
|
| 58 |
+
| ONNX Size | 357 MB (encoder: 141 MB, decoder: 232 MB) |
|
| 59 |
+
| Latency | ~79ms (iPhone, CPU) |
|
| 60 |
+
| Max Sequence | 512 tokens |
|
| 61 |
+
|
| 62 |
+
## Training Data
|
| 63 |
+
|
| 64 |
+
- **Size:** 2,043 training / 256 validation / 255 test examples
|
| 65 |
+
- **Source:** Personal Gmail inboxes (anonymized)
|
| 66 |
+
- **Languages:** English, French
|
| 67 |
+
- **Labeling:** Human-annotated with category + action flag
|
| 68 |
+
|
| 69 |
+
## How to Use
|
| 70 |
+
|
| 71 |
+
### ONNX Runtime (React Native)
|
| 72 |
+
|
| 73 |
+
```typescript
|
| 74 |
+
import { InferenceSession } from 'onnxruntime-react-native';
|
| 75 |
+
|
| 76 |
+
const encoder = await InferenceSession.create('encoder_model.onnx');
|
| 77 |
+
const decoder = await InferenceSession.create('decoder_model.onnx');
|
| 78 |
+
|
| 79 |
+
// Tokenize input, run encoder, greedy decode
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
### Python (Transformers)
|
| 83 |
+
|
| 84 |
+
```python
|
| 85 |
+
from transformers import T5ForConditionalGeneration, T5Tokenizer
|
| 86 |
+
|
| 87 |
+
model = T5ForConditionalGeneration.from_pretrained("ippoboi/gmail-classifier")
|
| 88 |
+
tokenizer = T5Tokenizer.from_pretrained("ippoboi/gmail-classifier")
|
| 89 |
+
|
| 90 |
+
input_text = "Classify this email: Subject: Meeting tomorrow\n\nBody: Can we reschedule?"
|
| 91 |
+
inputs = tokenizer(input_text, return_tensors="pt")
|
| 92 |
+
outputs = model.generate(**inputs)
|
| 93 |
+
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
| 94 |
+
# Output: "PERSONAL | ACTION | Request to reschedule meeting"
|
| 95 |
+
```
|
| 96 |
+
|
| 97 |
+
## Files
|
| 98 |
+
|
| 99 |
+
| File | Size | Description |
|
| 100 |
+
|------|------|-------------|
|
| 101 |
+
| `encoder_model.onnx` | 141 MB | ONNX encoder |
|
| 102 |
+
| `decoder_model.onnx` | 232 MB | ONNX decoder |
|
| 103 |
+
| `tokenizer.json` | 2.4 MB | SentencePiece tokenizer |
|
| 104 |
+
| `config.json` | 2 KB | Model configuration |
|
| 105 |
+
|
| 106 |
+
## Limitations
|
| 107 |
+
|
| 108 |
+
- Trained primarily on English/French emails
|
| 109 |
+
- May not generalize well to enterprise/corporate email patterns
|
| 110 |
+
- Classification accuracy depends on email content quality (plain text preferred over HTML-heavy)
|
| 111 |
+
|
| 112 |
+
## License
|
| 113 |
+
|
| 114 |
+
Apache 2.0
|