File size: 3,312 Bytes
13ff8bf | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 | ---
license: mit
language:
- en
- zh
base_model:
- Qwen/Qwen2.5-1.5B
tags:
- cellsentry
- excel
- spreadsheet
- formula-audit
- pii-detection
- data-extraction
- gguf
- mlx
- lora
- qwen2.5
pipeline_tag: text-generation
---
# CellSentry Model β Multi-Task Spreadsheet AI
A fine-tuned 1.5B parameter model for spreadsheet intelligence tasks. Built on Qwen2.5-1.5B with LoRA, this model handles three distinct tasks through prompt routing:
- **Formula Audit** β Verify or dismiss rule engine findings in Excel formulas
- **PII Detection** β Identify sensitive data (SSN, phone, email, national IDs) in cell values
- **Data Extraction** β Extract structured fields (invoice number, date, vendor, totals) from spreadsheets
## Model Details
| Property | Value |
|----------|-------|
| Base model | [Qwen/Qwen2.5-1.5B](https://huggingface.co/Qwen/Qwen2.5-1.5B) |
| Fine-tuning | LoRA (rank 16, alpha 32) |
| Training | 4000 iterations, batch_size=2, lr=3e-5, AdamW |
| Quantization | 4-bit, group_size=32 (Q4_K_M for GGUF) |
| Context length | 1024 tokens |
| License | MIT |
## Available Formats
| Format | File | Size | Platform |
|--------|------|------|----------|
| **GGUF** (Q4_K_M) | `cellsentry-1.5b-v3-q4km.gguf` | ~940 MB | Windows (llama.cpp) |
| **MLX** (4-bit g32) | `cellsentry-1.5b-v3-4bit-g32/` | ~920 MB | macOS (MLX) |
> Currently only the GGUF format is uploaded. MLX format coming soon.
## Usage
This model is designed to be used with [CellSentry](https://github.com/almax000/cellsentry), an open-source desktop app for spreadsheet auditing. The app downloads the model automatically on first launch.
### Manual Download
```bash
# Install Hugging Face CLI
pip install huggingface-hub
# Download GGUF model
huggingface-cli download almax000/cellsentry-model cellsentry-1.5b-v3-q4km.gguf --local-dir ./models
```
### Prompt Format
The model uses Qwen2.5 chat template with task-specific system prompts:
**Formula Audit:**
```
<|im_start|>system
You are a spreadsheet formula auditor...<|im_end|>
<|im_start|>user
{rule engine finding + cell context}<|im_end|>
<|im_start|>assistant
```
**PII Detection:**
```
<|im_start|>system
You are a PII detection specialist...<|im_end|>
<|im_start|>user
{cell values to scan}<|im_end|>
<|im_start|>assistant
```
**Data Extraction:**
```
<|im_start|>system
You are a document data extractor...<|im_end|>
<|im_start|>user
{spreadsheet content + template}<|im_end|>
<|im_start|>assistant
```
## Training
- **Method**: LoRA fine-tuning with multi-task data
- **Data**: Synthetic + real-world spreadsheet samples across all three tasks
- **Fusion**: LoRA weights fused into base model, then quantized (dequantize β fuse β re-quantize with group_size=32)
- **Key lesson**: group_size=64 loses fine-tuning quality; group_size=32 is the minimum viable floor for 1.5B models
## Limitations
- Optimized for structured spreadsheet content, not general text
- 1024 token context β large spreadsheets need chunking
- PII patterns trained primarily on US and Chinese formats
- Extraction templates cover 5 document types (invoice, receipt, PO, expense, payroll)
## Related
- [CellSentry App](https://github.com/almax000/cellsentry) β Desktop app that uses this model
- [CellSentry Website](https://cellsentry.pro) β Project homepage
|