CellSentry Model β€” Multi-Task Spreadsheet AI

A fine-tuned 1.5B parameter model for spreadsheet intelligence tasks. Built on Qwen2.5-1.5B with LoRA, this model handles three distinct tasks through prompt routing:

  • Formula Audit β€” Verify or dismiss rule engine findings in Excel formulas
  • PII Detection β€” Identify sensitive data (SSN, phone, email, national IDs) in cell values
  • Data Extraction β€” Extract structured fields (invoice number, date, vendor, totals) from spreadsheets

Model Details

Property Value
Base model Qwen/Qwen2.5-1.5B
Fine-tuning LoRA (rank 16, alpha 32)
Training 4000 iterations, batch_size=2, lr=3e-5, AdamW
Quantization 4-bit, group_size=32 (Q4_K_M for GGUF)
Context length 1024 tokens
License MIT

Available Formats

Format File Size Platform
GGUF (Q4_K_M) cellsentry-1.5b-v3-q4km.gguf ~940 MB Windows (llama.cpp)
MLX (4-bit g32) cellsentry-1.5b-v3-4bit-g32/ ~920 MB macOS (MLX)

Currently only the GGUF format is uploaded. MLX format coming soon.

Usage

This model is designed to be used with CellSentry, an open-source desktop app for spreadsheet auditing. The app downloads the model automatically on first launch.

Manual Download

# Install Hugging Face CLI
pip install huggingface-hub

# Download GGUF model
huggingface-cli download almax000/cellsentry-model cellsentry-1.5b-v3-q4km.gguf --local-dir ./models

Prompt Format

The model uses Qwen2.5 chat template with task-specific system prompts:

Formula Audit:

<|im_start|>system
You are a spreadsheet formula auditor...<|im_end|>
<|im_start|>user
{rule engine finding + cell context}<|im_end|>
<|im_start|>assistant

PII Detection:

<|im_start|>system
You are a PII detection specialist...<|im_end|>
<|im_start|>user
{cell values to scan}<|im_end|>
<|im_start|>assistant

Data Extraction:

<|im_start|>system
You are a document data extractor...<|im_end|>
<|im_start|>user
{spreadsheet content + template}<|im_end|>
<|im_start|>assistant

Training

  • Method: LoRA fine-tuning with multi-task data
  • Data: Synthetic + real-world spreadsheet samples across all three tasks
  • Fusion: LoRA weights fused into base model, then quantized (dequantize β†’ fuse β†’ re-quantize with group_size=32)
  • Key lesson: group_size=64 loses fine-tuning quality; group_size=32 is the minimum viable floor for 1.5B models

Limitations

  • Optimized for structured spreadsheet content, not general text
  • 1024 token context β€” large spreadsheets need chunking
  • PII patterns trained primarily on US and Chinese formats
  • Extraction templates cover 5 document types (invoice, receipt, PO, expense, payroll)

Related

Downloads last month
131
GGUF
Model size
2B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for almax000/cellsentry-model

Adapter
(504)
this model