gemma-4-E2B-sec-extraction-GGUF-v3

A Gemma 4 2B (E2B) model fine-tuned on 7,683 instruction-tuning examples for structured financial extraction from SEC filings, then quantized to Q4_K_M GGUF for local inference on consumer hardware.

This is the v3 release — the third iteration of the fine-tune. Its headline improvement over the base model is a 52% reduction in hallucinated definitions, the failure mode most visible to downstream consumers of the extracted CSV data.

Headline result: hallucination rate cut in half

Evaluated on a 100-sample held-out set (seed 42) from the sharegpt_financial_extraction training split. Same prompts, same temperature, same input chunks — only the model differs.

Metric	Base (Gemma 4 2B)	v3 (this model)	Change
hallucination_phrase_rate	0.127	0.061	−52%
canonical_type_rate	0.951	0.967	+1.7pp
json_parse_rate (financial)	1.000	1.000	flat
symbol_compliance_rate	0.834	0.835	flat
avg_values_extracted	3.21	3.11	−3% (more selective)
metadata json_parse_rate	1.000	1.000	flat
metadata effective_date / parties	1.000	1.000	flat

Hallucination phrases are hedging fragments the base model inserts into the definition field when uncertain ("appears to constitute", "presumably", "does not specify", etc.). Downstream institutional-data consumers read these fields directly, so every hedging phrase is a visible quality defect in the output CSV. v3 cuts them from 12.7% of extracted values to 6.1%.

Live pipeline validation

The held-out eval was corroborated by an end-to-end pipeline run on 25 fresh S&P 500 filings (77 extractor chunks) that the model had never seen:

12 contracts assembled, 16 financial terms, 20 covenants, 11 entities — zero extractor errors, zero JSON parse failures.
Only 8 validation drops across 77 chunks (0.10 drops/chunk).
- 7× BARE SYMBOL (model emitted $ or % with no number)
- 1× literal NONE string
- Zero hallucination-phrase drops
- Zero alphabet contamination drops
- Zero template-echo drops

The failure mode the held-out eval flagged as v3's biggest win (hallucination phrases) produced zero drops in the live pipeline. The model generalizes the improvement to filings it has never seen.

Model details

Base model: unsloth/gemma-4-E2B-it (Gemma 4 2B instruction-tuned)
Quantization: Q4_K_M (3.43 GB) — runs on a 6 GB GPU
Companion file: gemma-4-E2B-it.BF16-mmproj.gguf (vision projector, kept for architecture completeness; SEC extraction uses the text path only)
Fine-tuning framework: Unsloth + TRL SFTTrainer, LoRA adapters merged before GGUF export
Chat template: gemma (required — using chatml produces NaN loss)
Hardware compatibility: RTX 4050 Laptop (6 GB) and above

Training data

Fine-tuned on 7,683 silver-labeled examples from TheTokenFactory/sec-contracts-financial-extraction-instructions, produced by running base Gemma 4 2B through a 6-stage extraction pipeline and applying 14+ validation gates. Composition:

Pipeline	Task	Positive	Corrective
Exhibit 10	Metadata extraction	1,028	1,027
Exhibit 10	Financial term extraction	1,434	1,600
Exhibit 10	Covenant extraction	264	433
DEF 14A	Exec metadata	150	150
DEF 14A	Compensation extraction	293	750
DEF 14A	Governance extraction	261	293

Corrective examples include positive-corrected rows, rescued edge cases (e.g., $3,205 on share counts repaired to 3,205 shares), and negative examples teaching the model to emit empty output on absent signal.

Intended use

Structured extraction of financial entities from SEC filings:

Exhibit 10 material contracts: effective date, contracting parties, financial terms (13 canonical types), debt covenants (7 canonical types).
DEF 14A proxy statements: Named Executive Officers, compensation values (9 canonical types), governance items (say-on-pay, clawback, peer groups).

Output is JSON. The three extraction tasks are routed to specialized system prompts — the model performs best when given one task per call.

Example usage

LM Studio (recommended for local use)

Load the Q4_K_M GGUF in LM Studio, then call the OpenAI-compatible endpoint:

from openai import OpenAI
client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")

response = client.chat.completions.create(
    model="gemma",
    temperature=0.1,
    messages=[
        {"role": "system", "content": FINANCIAL_EXTRACTION_PROMPT},
        {"role": "user", "content": contract_chunk_text},
    ],
)
print(response.choices[0].message.content)

llama.cpp

llama-cli -hf TheTokenFactory/gemma-4-E2B-sec-extraction-GGUF-v3 --jinja

Ollama note

Ollama currently does not support separate mmproj files for vision models. For SEC extraction (text only), the text GGUF alone is sufficient.

Prompt format

Task-specific system prompts route the model to extract one entity type per call. Example for financial term extraction:

You extract financial values from SEC contract text. Return JSON with
this exact schema:
{"financial_values": [{"value": "...", "definition": "...", "term_type": "..."}]}

term_type must be one of: salary, bonus, severance, retirement_benefit,
equity_grant, credit_facility, loan_amount, interest_rate, fee,
threshold, purchase_price, compensation, other.

Return {"financial_values": []} if no values are present.

Run at temperature 0.1 for best results.

Limitations

Bare-symbol output — v3 still occasionally emits $ or % with no number attached (2 of 100 eval samples; 7 of 77 live chunks). If you are training a successor model, targeted negative examples on this failure mode would be the cheapest next improvement.
Out-of-distribution on other filing types — v3 is fine-tuned on Exhibit 10 and DEF 14A text distributions. Performance on 10-K/10-Q MD&A prose or other filing types is not characterized and may be worse than base Gemma on those distributions.
Silver-standard training data — labels were produced by base Gemma 2B + validation gates, not human annotators. v3 inherits any systematic biases of that pipeline.
Language — English only.
S&P 500 domain — training data is drawn from large-cap US equities. Small-cap or non-US filings are out of distribution.
Temporal scope — trained on a 6-month SEC filing window; not a historical backtest.

Evaluation methodology

# Load the model in LM Studio (or any OpenAI-compatible endpoint), then:
python scripts/evaluate_finetune.py --label v3
# Reports saved to data/eval_reports/eval_v3.json

The eval script runs 100 held-out samples (seed 42) from the training distribution, measuring JSON parse rate, canonical-type rate, symbol compliance, and eight failure-mode sub-metrics including alphabet contamination, template echoes, year-as-value, par values, bare numbers, bare symbols, and hallucination phrases.

Citation

@misc{thetokenfactory2026gemmav3secextraction,
  title={gemma-4-E2B-sec-extraction-GGUF-v3},
  author={TheTokenFactory},
  year={2026},
  publisher={HuggingFace},
  howpublished={\url{https://huggingface.co/TheTokenFactory/gemma-4-E2B-sec-extraction-GGUF-v3}}
}

License

Released under the Gemma Terms of Use. Users must comply with Google's Gemma license in addition to any terms applicable to the fine-tuning dataset.

Downloads last month: 613

GGUF

Model size

5B params

Architecture

gemma4

Hardware compatibility

4-bit

Model tree for TheTokenFactory/gemma-4-E2B-sec-extraction-GGUF-v3

Base model

google/gemma-4-E2B-it

Finetuned

unsloth/gemma-4-E2B-it

Quantized

(10)

this model

Dataset used to train TheTokenFactory/gemma-4-E2B-sec-extraction-GGUF-v3

Evaluation results

Hallucination phrase rate (lower is better) on TheTokenFactory/sec-contracts-financial-extraction-instructions
self-reported

0.061
Canonical term-type rate on TheTokenFactory/sec-contracts-financial-extraction-instructions
self-reported

0.967
JSON parse rate on TheTokenFactory/sec-contracts-financial-extraction-instructions
self-reported

1.000