gemma-4-E2B-sec-extraction-GGUF-v3
A Gemma 4 2B (E2B) model fine-tuned on 7,683 instruction-tuning examples for
structured financial extraction from SEC filings, then quantized to
Q4_K_M GGUF for local inference on consumer hardware.
This is the v3 release — the third iteration of the fine-tune. Its headline improvement over the base model is a 52% reduction in hallucinated definitions, the failure mode most visible to downstream consumers of the extracted CSV data.
Headline result: hallucination rate cut in half
Evaluated on a 100-sample held-out set (seed 42) from the
sharegpt_financial_extraction training split. Same prompts, same
temperature, same input chunks — only the model differs.
| Metric | Base (Gemma 4 2B) | v3 (this model) | Change |
|---|---|---|---|
| hallucination_phrase_rate | 0.127 | 0.061 | −52% |
| canonical_type_rate | 0.951 | 0.967 | +1.7pp |
| json_parse_rate (financial) | 1.000 | 1.000 | flat |
| symbol_compliance_rate | 0.834 | 0.835 | flat |
| avg_values_extracted | 3.21 | 3.11 | −3% (more selective) |
| metadata json_parse_rate | 1.000 | 1.000 | flat |
| metadata effective_date / parties | 1.000 | 1.000 | flat |
Hallucination phrases are hedging fragments the base model inserts into
the definition field when uncertain ("appears to constitute",
"presumably", "does not specify", etc.). Downstream institutional-data
consumers read these fields directly, so every hedging phrase is a visible
quality defect in the output CSV. v3 cuts them from 12.7% of extracted
values to 6.1%.
Live pipeline validation
The held-out eval was corroborated by an end-to-end pipeline run on 25 fresh S&P 500 filings (77 extractor chunks) that the model had never seen:
- 12 contracts assembled, 16 financial terms, 20 covenants, 11 entities — zero extractor errors, zero JSON parse failures.
- Only 8 validation drops across 77 chunks (0.10 drops/chunk).
- 7×
BARE SYMBOL(model emitted$or%with no number) - 1× literal
NONEstring - Zero hallucination-phrase drops
- Zero alphabet contamination drops
- Zero template-echo drops
- 7×
The failure mode the held-out eval flagged as v3's biggest win (hallucination phrases) produced zero drops in the live pipeline. The model generalizes the improvement to filings it has never seen.
Model details
- Base model:
unsloth/gemma-4-E2B-it(Gemma 4 2B instruction-tuned) - Quantization:
Q4_K_M(3.43 GB) — runs on a 6 GB GPU - Companion file:
gemma-4-E2B-it.BF16-mmproj.gguf(vision projector, kept for architecture completeness; SEC extraction uses the text path only) - Fine-tuning framework: Unsloth + TRL SFTTrainer, LoRA adapters merged before GGUF export
- Chat template:
gemma(required — usingchatmlproduces NaN loss) - Hardware compatibility: RTX 4050 Laptop (6 GB) and above
Training data
Fine-tuned on 7,683 silver-labeled examples from TheTokenFactory/sec-contracts-financial-extraction-instructions, produced by running base Gemma 4 2B through a 6-stage extraction pipeline and applying 14+ validation gates. Composition:
| Pipeline | Task | Positive | Corrective |
|---|---|---|---|
| Exhibit 10 | Metadata extraction | 1,028 | 1,027 |
| Exhibit 10 | Financial term extraction | 1,434 | 1,600 |
| Exhibit 10 | Covenant extraction | 264 | 433 |
| DEF 14A | Exec metadata | 150 | 150 |
| DEF 14A | Compensation extraction | 293 | 750 |
| DEF 14A | Governance extraction | 261 | 293 |
Corrective examples include positive-corrected rows, rescued edge cases
(e.g., $3,205 on share counts repaired to 3,205 shares), and negative
examples teaching the model to emit empty output on absent signal.
Intended use
Structured extraction of financial entities from SEC filings:
- Exhibit 10 material contracts: effective date, contracting parties, financial terms (13 canonical types), debt covenants (7 canonical types).
- DEF 14A proxy statements: Named Executive Officers, compensation values (9 canonical types), governance items (say-on-pay, clawback, peer groups).
Output is JSON. The three extraction tasks are routed to specialized system prompts — the model performs best when given one task per call.
Example usage
LM Studio (recommended for local use)
Load the Q4_K_M GGUF in LM Studio, then call the OpenAI-compatible endpoint:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")
response = client.chat.completions.create(
model="gemma",
temperature=0.1,
messages=[
{"role": "system", "content": FINANCIAL_EXTRACTION_PROMPT},
{"role": "user", "content": contract_chunk_text},
],
)
print(response.choices[0].message.content)
llama.cpp
llama-cli -hf TheTokenFactory/gemma-4-E2B-sec-extraction-GGUF-v3 --jinja
Ollama note
Ollama currently does not support separate mmproj files for vision
models. For SEC extraction (text only), the text GGUF alone is sufficient.
Prompt format
Task-specific system prompts route the model to extract one entity type per call. Example for financial term extraction:
You extract financial values from SEC contract text. Return JSON with
this exact schema:
{"financial_values": [{"value": "...", "definition": "...", "term_type": "..."}]}
term_type must be one of: salary, bonus, severance, retirement_benefit,
equity_grant, credit_facility, loan_amount, interest_rate, fee,
threshold, purchase_price, compensation, other.
Return {"financial_values": []} if no values are present.
Run at temperature 0.1 for best results.
Limitations
- Bare-symbol output — v3 still occasionally emits
$or%with no number attached (2 of 100 eval samples; 7 of 77 live chunks). If you are training a successor model, targeted negative examples on this failure mode would be the cheapest next improvement. - Out-of-distribution on other filing types — v3 is fine-tuned on Exhibit 10 and DEF 14A text distributions. Performance on 10-K/10-Q MD&A prose or other filing types is not characterized and may be worse than base Gemma on those distributions.
- Silver-standard training data — labels were produced by base Gemma 2B + validation gates, not human annotators. v3 inherits any systematic biases of that pipeline.
- Language — English only.
- S&P 500 domain — training data is drawn from large-cap US equities. Small-cap or non-US filings are out of distribution.
- Temporal scope — trained on a 6-month SEC filing window; not a historical backtest.
Evaluation methodology
# Load the model in LM Studio (or any OpenAI-compatible endpoint), then:
python scripts/evaluate_finetune.py --label v3
# Reports saved to data/eval_reports/eval_v3.json
The eval script runs 100 held-out samples (seed 42) from the training distribution, measuring JSON parse rate, canonical-type rate, symbol compliance, and eight failure-mode sub-metrics including alphabet contamination, template echoes, year-as-value, par values, bare numbers, bare symbols, and hallucination phrases.
Citation
@misc{thetokenfactory2026gemmav3secextraction,
title={gemma-4-E2B-sec-extraction-GGUF-v3},
author={TheTokenFactory},
year={2026},
publisher={HuggingFace},
howpublished={\url{https://huggingface.co/TheTokenFactory/gemma-4-E2B-sec-extraction-GGUF-v3}}
}
License
Released under the Gemma Terms of Use. Users must comply with Google's Gemma license in addition to any terms applicable to the fine-tuning dataset.
- Downloads last month
- 613
4-bit
Model tree for TheTokenFactory/gemma-4-E2B-sec-extraction-GGUF-v3
Dataset used to train TheTokenFactory/gemma-4-E2B-sec-extraction-GGUF-v3
Evaluation results
- Hallucination phrase rate (lower is better) on TheTokenFactory/sec-contracts-financial-extraction-instructionsself-reported0.061
- Canonical term-type rate on TheTokenFactory/sec-contracts-financial-extraction-instructionsself-reported0.967
- JSON parse rate on TheTokenFactory/sec-contracts-financial-extraction-instructionsself-reported1.000