CellSentry Model β Multi-Task Spreadsheet AI
A fine-tuned 1.5B parameter model for spreadsheet intelligence tasks. Built on Qwen2.5-1.5B with LoRA, this model handles three distinct tasks through prompt routing:
- Formula Audit β Verify or dismiss rule engine findings in Excel formulas
- PII Detection β Identify sensitive data (SSN, phone, email, national IDs) in cell values
- Data Extraction β Extract structured fields (invoice number, date, vendor, totals) from spreadsheets
Model Details
| Property | Value |
|---|---|
| Base model | Qwen/Qwen2.5-1.5B |
| Fine-tuning | LoRA (rank 16, alpha 32) |
| Training | 4000 iterations, batch_size=2, lr=3e-5, AdamW |
| Quantization | 4-bit, group_size=32 (Q4_K_M for GGUF) |
| Context length | 1024 tokens |
| License | MIT |
Available Formats
| Format | File | Size | Platform |
|---|---|---|---|
| GGUF (Q4_K_M) | cellsentry-1.5b-v3-q4km.gguf |
~940 MB | Windows (llama.cpp) |
| MLX (4-bit g32) | cellsentry-1.5b-v3-4bit-g32/ |
~920 MB | macOS (MLX) |
Currently only the GGUF format is uploaded. MLX format coming soon.
Usage
This model is designed to be used with CellSentry, an open-source desktop app for spreadsheet auditing. The app downloads the model automatically on first launch.
Manual Download
# Install Hugging Face CLI
pip install huggingface-hub
# Download GGUF model
huggingface-cli download almax000/cellsentry-model cellsentry-1.5b-v3-q4km.gguf --local-dir ./models
Prompt Format
The model uses Qwen2.5 chat template with task-specific system prompts:
Formula Audit:
<|im_start|>system
You are a spreadsheet formula auditor...<|im_end|>
<|im_start|>user
{rule engine finding + cell context}<|im_end|>
<|im_start|>assistant
PII Detection:
<|im_start|>system
You are a PII detection specialist...<|im_end|>
<|im_start|>user
{cell values to scan}<|im_end|>
<|im_start|>assistant
Data Extraction:
<|im_start|>system
You are a document data extractor...<|im_end|>
<|im_start|>user
{spreadsheet content + template}<|im_end|>
<|im_start|>assistant
Training
- Method: LoRA fine-tuning with multi-task data
- Data: Synthetic + real-world spreadsheet samples across all three tasks
- Fusion: LoRA weights fused into base model, then quantized (dequantize β fuse β re-quantize with group_size=32)
- Key lesson: group_size=64 loses fine-tuning quality; group_size=32 is the minimum viable floor for 1.5B models
Limitations
- Optimized for structured spreadsheet content, not general text
- 1024 token context β large spreadsheets need chunking
- PII patterns trained primarily on US and Chinese formats
- Extraction templates cover 5 document types (invoice, receipt, PO, expense, payroll)
Related
- CellSentry App β Desktop app that uses this model
- CellSentry Website β Project homepage
- Downloads last month
- 131
Hardware compatibility
Log In to add your hardware
We're not able to determine the quantization variants.
Model tree for almax000/cellsentry-model
Base model
Qwen/Qwen2.5-1.5B