Lease Abstractor V7.6

A fine-tuned Qwen2.5-3B-Instruct model for extracting structured data from commercial real estate leases.

Model Description

This model reads commercial lease documents and extracts 18 standardized fields into clean JSON. It was trained on 684 labeled examples and achieves:

Metric Score
JSON Parse Rate 100%
Schema Compliance 100%
Verbatim Accuracy 83.6%

This model is designed for abstracting lease agreements, extracting lease terms for rent rolls, income analysis, and property valuation. Intended users include real estate appraisers, commercial brokers, asset managers, analysts, and attorneys.

Intended Use

  • Primary: Commercial real estate appraisers, analysts, brokers, asset managers
  • Input: Raw text from commercial lease documents (office, retail, industrial)
  • Output: Structured JSON with 18 fields

Schema (18 Fields)

Group Fields
Parties landlord_name, tenant_name, guarantor
Property property_address, suite_unit, rentable_sqft, usable_sqft
Term lease_term_months, commencement_date, expiration_date, lease_execution_date
Rent initial_monthly_rent, rent_schedule, escalation_clause
Expenses expense_structure, cam_description, tax_obligations, insurance_obligations

How to Use

With Ollama

# Create the model
ollama create lease-v76 -f Modelfile

# Run inference
ollama run lease-v76 "Extract lease data from: [paste lease text here]"

Modelfile

FROM ./qwen2.5-3b-instruct.Q4_K_M.gguf

TEMPLATE """{{ if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ if .Prompt }}<|im_start|>user
{{ .Prompt }}<|im_end|>
{{ end }}<|im_start|>assistant
{{ .Response }}<|im_end|>
"""

PARAMETER stop "<|im_start|>"
PARAMETER stop "<|im_end|>"
PARAMETER temperature 0.1
PARAMETER top_p 0.9
PARAMETER num_ctx 8192

SYSTEM """You are a Legal Data Extraction Engine for commercial real estate leases.

STRICT RULES:
1. VERBATIM ONLY: Every value must be an EXACT substring from the source text
2. NO CALCULATIONS: Never compute, derive, or infer values
3. NO FORMATTING CHANGES: Preserve exact punctuation, spacing, case
4. NULL FOR MISSING: If a field is not explicitly stated, return null
5. JSON ONLY: Return raw JSON, no markdown, no explanation"""

With llama.cpp

./llama-cli -m qwen2.5-3b-instruct.Q4_K_M.gguf \
  --temp 0.1 --top-p 0.9 -c 8192 \
  -p "<|im_start|>system\nYou are a Legal Data Extraction Engine...<|im_end|>\n<|im_start|>user\n[LEASE TEXT]<|im_end|>\n<|im_start|>assistant\n"

Example Output

{
  "landlord_name": "ABC Properties LLC",
  "tenant_name": "Acme Corporation",
  "guarantor": null,
  "property_address": "123 Main Street, Suite 200, Philadelphia, PA 19103",
  "suite_unit": ["Suite 200"],
  "rentable_sqft": "5,000",
  "usable_sqft": null,
  "lease_term_months": "60",
  "commencement_date": ["January 1, 2024"],
  "expiration_date": "December 31, 2028",
  "lease_execution_date": "November 15, 2023",
  "initial_monthly_rent": "$12,500.00",
  "rent_schedule": [
    {"period": "Year 1", "monthly_rent": "$12,500.00", "annual_rent": "$150,000.00"},
    {"period": "Year 2", "monthly_rent": "$12,875.00", "annual_rent": "$154,500.00"}
  ],
  "escalation_clause": "Base Rent shall increase by 3% annually on each anniversary of the Commencement Date",
  "expense_structure": "Triple Net (NNN)",
  "cam_description": "Tenant shall pay its pro rata share of Common Area Maintenance",
  "tax_obligations": "Tenant shall pay its pro rata share of real estate taxes",
  "insurance_obligations": "Tenant shall maintain commercial general liability insurance"
}

Production Recommendations

For production use, we recommend adding a "snapper" verification layer that checks each extracted value exists verbatim in the source text. This catches the ~16% of cases where the model makes minor transcription errors or formatting changes.

The snapper approach:

  1. Model extracts candidate values
  2. Snapper searches source text for each value
  3. Values not found verbatim are nullified
  4. Result: 100% reliable output (verified or null)

Training Details

  • Base Model: Qwen/Qwen2.5-3B-Instruct
  • Fine-tuning: SFT with response masking (loss only on JSON output)
  • Dataset: 684 labeled commercial lease examples
  • Hardware: NVIDIA A40 (48GB)
  • Training Time: 70 minutes (2 epochs, 172 steps)
  • Quantization: Q4_K_M (1.8GB)

Limitations

  • English only β€” Not trained on leases in other languages
  • Commercial leases β€” Residential leases may not extract correctly
  • US format β€” Trained primarily on US commercial lease conventions
  • Verbatim extraction β€” Does not calculate, summarize, or interpret
  • Chunk size β€” Best results with 8K token context windows

License

Apache 2.0 (same as base Qwen2.5 model)

Citation

@misc{lease-abstractor-v76,
  author = {Justin Gohn},
  title = {Lease Abstractor V7.6: Fine-tuned Qwen2.5-3B for Commercial Lease Extraction},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/independentsearcher/lease-abstractor-v76}
}

Acknowledgments

Built with assistance from Claude (Anthropic), GPT-5.x (OpenAI), and Gemini Pro (Google) as collaborative AI advisors for brainstorming, overall design, and code review.

Downloads last month
43
GGUF
Model size
3B params
Architecture
qwen2
Hardware compatibility
Log In to view the estimation

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for justingohn/lease-abstractor-v76

Base model

Qwen/Qwen2.5-3B
Quantized
(168)
this model