Ippoboi commited on
Commit
ee2937e
·
verified ·
1 Parent(s): 3db3336

Create README.md

Browse files

# Gmail Email Classifier (FLAN-T5 ONNX)

A fine-tuned FLAN-T5-small model for email classification, optimized for on-device inference in mobile apps using ONNX Runtime.

## Model Description

This model classifies emails into 5 categories and determines if action is required:

| Category | Description |
|----------|-------------|
| **PERSONAL** | 1:1 human communication, social messages |
| **NEWSLETTER** | Marketing, promotions, subscribed content |
| **TRANSACTION** | Orders, receipts, payments, confirmations |
| **ALERT** | Security notices, important notifications |
| **SOCIAL** | Social network notifications, community updates |

### Output Format

```
CATEGORY | ACTION/NO_ACTION | Brief summary
```

**Example:**

```
Input: "Subject: Your order has shipped\n\nBody: Your order #12345 is on its way..."
Output: "TRANSACTION | NO_ACTION | Order shipment confirmation for #12345"
```

## Intended Use

- **Primary:** On-device email triage in mobile apps (iOS/Android)
- **Runtime:** ONNX Runtime React Native
- **Use case:** Prioritizing inbox, filtering noise, surfacing actionable emails

## Model Details

| Attribute | Value |
|-----------|-------|
| Base Model | `google/flan-t5-small` |
| Parameters | ~80M |
| Architecture | T5 Encoder-Decoder |
| ONNX Size | 357 MB (encoder: 141 MB, decoder: 232 MB) |
| Latency | ~79ms (iPhone, CPU) |
| Max Sequence | 512 tokens |

## Training Data

- **Size:** 2,043 training / 256 validation / 255 test examples
- **Source:** Personal Gmail inboxes (anonymized)
- **Languages:** English, French
- **Labeling:** Human-annotated with category + action flag

## How to Use

### ONNX Runtime (React Native)

```typescript
import { InferenceSession } from 'onnxruntime-react-native';

const encoder = await InferenceSession.create('encoder_model.onnx');
const decoder = await InferenceSession.create('decoder_model.onnx');

// Tokenize input, run encoder, greedy decode
```

### Python (Transformers)

```python
from transformers import T5ForConditionalGeneration, T5Tokenizer

model = T5ForConditionalGeneration.from_pretrained("ippoboi/gmail-classifier")
tokenizer = T5Tokenizer.from_pretrained("ippoboi/gmail-classifier")

input_text = "Classify this email: Subject: Meeting tomorrow\n\nBody: Can we reschedule?"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# Output: "PERSONAL | ACTION | Request to reschedule meeting"
```

## Files

| File | Size | Description |
|------|------|-------------|
| `encoder_model.onnx` | 141 MB | ONNX encoder |
| `decoder_model.onnx` | 232 MB | ONNX decoder |
| `tokenizer.json` | 2.4 MB | SentencePiece tokenizer |
| `config.json` | 2 KB | Model configuration |

## Limitations

- Trained primarily on English/French emails
- May not generalize well to enterprise/corporate email patterns
- Classification accuracy depends on email content quality (plain text preferred over HTML-heavy)

## License

Apache 2.0

Files changed (1) hide show
  1. README.md +14 -0
README.md ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - google/flan-t5-small
4
+ license: apache-2.0
5
+ language:
6
+ - en
7
+ - fr
8
+ tags:
9
+ - classification
10
+ - emails
11
+ - text2text-generation
12
+ - onnx
13
+ - mobile
14
+ ---