Lease Abstractor V7.6
A fine-tuned Qwen2.5-3B-Instruct model for extracting structured data from commercial real estate leases.
Model Description
This model reads commercial lease documents and extracts 18 standardized fields into clean JSON. It was trained on 684 labeled examples and achieves:
| Metric | Score |
|---|---|
| JSON Parse Rate | 100% |
| Schema Compliance | 100% |
| Verbatim Accuracy | 83.6% |
This model is designed for abstracting lease agreements, extracting lease terms for rent rolls, income analysis, and property valuation. Intended users include real estate appraisers, commercial brokers, asset managers, analysts, and attorneys.
Intended Use
- Primary: Commercial real estate appraisers, analysts, brokers, asset managers
- Input: Raw text from commercial lease documents (office, retail, industrial)
- Output: Structured JSON with 18 fields
Schema (18 Fields)
| Group | Fields |
|---|---|
| Parties | landlord_name, tenant_name, guarantor |
| Property | property_address, suite_unit, rentable_sqft, usable_sqft |
| Term | lease_term_months, commencement_date, expiration_date, lease_execution_date |
| Rent | initial_monthly_rent, rent_schedule, escalation_clause |
| Expenses | expense_structure, cam_description, tax_obligations, insurance_obligations |
How to Use
With Ollama
# Create the model
ollama create lease-v76 -f Modelfile
# Run inference
ollama run lease-v76 "Extract lease data from: [paste lease text here]"
Modelfile
FROM ./qwen2.5-3b-instruct.Q4_K_M.gguf
TEMPLATE """{{ if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ if .Prompt }}<|im_start|>user
{{ .Prompt }}<|im_end|>
{{ end }}<|im_start|>assistant
{{ .Response }}<|im_end|>
"""
PARAMETER stop "<|im_start|>"
PARAMETER stop "<|im_end|>"
PARAMETER temperature 0.1
PARAMETER top_p 0.9
PARAMETER num_ctx 8192
SYSTEM """You are a Legal Data Extraction Engine for commercial real estate leases.
STRICT RULES:
1. VERBATIM ONLY: Every value must be an EXACT substring from the source text
2. NO CALCULATIONS: Never compute, derive, or infer values
3. NO FORMATTING CHANGES: Preserve exact punctuation, spacing, case
4. NULL FOR MISSING: If a field is not explicitly stated, return null
5. JSON ONLY: Return raw JSON, no markdown, no explanation"""
With llama.cpp
./llama-cli -m qwen2.5-3b-instruct.Q4_K_M.gguf \
--temp 0.1 --top-p 0.9 -c 8192 \
-p "<|im_start|>system\nYou are a Legal Data Extraction Engine...<|im_end|>\n<|im_start|>user\n[LEASE TEXT]<|im_end|>\n<|im_start|>assistant\n"
Example Output
{
"landlord_name": "ABC Properties LLC",
"tenant_name": "Acme Corporation",
"guarantor": null,
"property_address": "123 Main Street, Suite 200, Philadelphia, PA 19103",
"suite_unit": ["Suite 200"],
"rentable_sqft": "5,000",
"usable_sqft": null,
"lease_term_months": "60",
"commencement_date": ["January 1, 2024"],
"expiration_date": "December 31, 2028",
"lease_execution_date": "November 15, 2023",
"initial_monthly_rent": "$12,500.00",
"rent_schedule": [
{"period": "Year 1", "monthly_rent": "$12,500.00", "annual_rent": "$150,000.00"},
{"period": "Year 2", "monthly_rent": "$12,875.00", "annual_rent": "$154,500.00"}
],
"escalation_clause": "Base Rent shall increase by 3% annually on each anniversary of the Commencement Date",
"expense_structure": "Triple Net (NNN)",
"cam_description": "Tenant shall pay its pro rata share of Common Area Maintenance",
"tax_obligations": "Tenant shall pay its pro rata share of real estate taxes",
"insurance_obligations": "Tenant shall maintain commercial general liability insurance"
}
Production Recommendations
For production use, we recommend adding a "snapper" verification layer that checks each extracted value exists verbatim in the source text. This catches the ~16% of cases where the model makes minor transcription errors or formatting changes.
The snapper approach:
- Model extracts candidate values
- Snapper searches source text for each value
- Values not found verbatim are nullified
- Result: 100% reliable output (verified or null)
Training Details
- Base Model: Qwen/Qwen2.5-3B-Instruct
- Fine-tuning: SFT with response masking (loss only on JSON output)
- Dataset: 684 labeled commercial lease examples
- Hardware: NVIDIA A40 (48GB)
- Training Time: 70 minutes (2 epochs, 172 steps)
- Quantization: Q4_K_M (1.8GB)
Limitations
- English only β Not trained on leases in other languages
- Commercial leases β Residential leases may not extract correctly
- US format β Trained primarily on US commercial lease conventions
- Verbatim extraction β Does not calculate, summarize, or interpret
- Chunk size β Best results with 8K token context windows
License
Apache 2.0 (same as base Qwen2.5 model)
Citation
@misc{lease-abstractor-v76,
author = {Justin Gohn},
title = {Lease Abstractor V7.6: Fine-tuned Qwen2.5-3B for Commercial Lease Extraction},
year = {2024},
publisher = {HuggingFace},
url = {https://huggingface.co/independentsearcher/lease-abstractor-v76}
}
Acknowledgments
Built with assistance from Claude (Anthropic), GPT-5.x (OpenAI), and Gemini Pro (Google) as collaborative AI advisors for brainstorming, overall design, and code review.
- Downloads last month
- 43
4-bit