File size: 3,312 Bytes
13ff8bf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
---
license: mit
language:
- en
- zh
base_model:
- Qwen/Qwen2.5-1.5B
tags:
- cellsentry
- excel
- spreadsheet
- formula-audit
- pii-detection
- data-extraction
- gguf
- mlx
- lora
- qwen2.5
pipeline_tag: text-generation
---

# CellSentry Model β€” Multi-Task Spreadsheet AI

A fine-tuned 1.5B parameter model for spreadsheet intelligence tasks. Built on Qwen2.5-1.5B with LoRA, this model handles three distinct tasks through prompt routing:

- **Formula Audit** β€” Verify or dismiss rule engine findings in Excel formulas
- **PII Detection** β€” Identify sensitive data (SSN, phone, email, national IDs) in cell values
- **Data Extraction** β€” Extract structured fields (invoice number, date, vendor, totals) from spreadsheets

## Model Details

| Property | Value |
|----------|-------|
| Base model | [Qwen/Qwen2.5-1.5B](https://huggingface.co/Qwen/Qwen2.5-1.5B) |
| Fine-tuning | LoRA (rank 16, alpha 32) |
| Training | 4000 iterations, batch_size=2, lr=3e-5, AdamW |
| Quantization | 4-bit, group_size=32 (Q4_K_M for GGUF) |
| Context length | 1024 tokens |
| License | MIT |

## Available Formats

| Format | File | Size | Platform |
|--------|------|------|----------|
| **GGUF** (Q4_K_M) | `cellsentry-1.5b-v3-q4km.gguf` | ~940 MB | Windows (llama.cpp) |
| **MLX** (4-bit g32) | `cellsentry-1.5b-v3-4bit-g32/` | ~920 MB | macOS (MLX) |

> Currently only the GGUF format is uploaded. MLX format coming soon.

## Usage

This model is designed to be used with [CellSentry](https://github.com/almax000/cellsentry), an open-source desktop app for spreadsheet auditing. The app downloads the model automatically on first launch.

### Manual Download

```bash
# Install Hugging Face CLI
pip install huggingface-hub

# Download GGUF model
huggingface-cli download almax000/cellsentry-model cellsentry-1.5b-v3-q4km.gguf --local-dir ./models
```

### Prompt Format

The model uses Qwen2.5 chat template with task-specific system prompts:

**Formula Audit:**
```
<|im_start|>system
You are a spreadsheet formula auditor...<|im_end|>
<|im_start|>user
{rule engine finding + cell context}<|im_end|>
<|im_start|>assistant
```

**PII Detection:**
```
<|im_start|>system
You are a PII detection specialist...<|im_end|>
<|im_start|>user
{cell values to scan}<|im_end|>
<|im_start|>assistant
```

**Data Extraction:**
```
<|im_start|>system
You are a document data extractor...<|im_end|>
<|im_start|>user
{spreadsheet content + template}<|im_end|>
<|im_start|>assistant
```

## Training

- **Method**: LoRA fine-tuning with multi-task data
- **Data**: Synthetic + real-world spreadsheet samples across all three tasks
- **Fusion**: LoRA weights fused into base model, then quantized (dequantize β†’ fuse β†’ re-quantize with group_size=32)
- **Key lesson**: group_size=64 loses fine-tuning quality; group_size=32 is the minimum viable floor for 1.5B models

## Limitations

- Optimized for structured spreadsheet content, not general text
- 1024 token context β€” large spreadsheets need chunking
- PII patterns trained primarily on US and Chinese formats
- Extraction templates cover 5 document types (invoice, receipt, PO, expense, payroll)

## Related

- [CellSentry App](https://github.com/almax000/cellsentry) β€” Desktop app that uses this model
- [CellSentry Website](https://cellsentry.pro) β€” Project homepage