swik-heuristic-v1 (v0.1)

Deterministic keyword-based financial sentiment classifier. Fast, interpretable, no GPU, no API key. A baseline for domain-specific financial news sentiment.

This is the Layer 1 model in swik's two-layer inference pipeline. It processes every request before any LLM call — both as a fast path for high-confidence cases and as a fallback when the API is unavailable.

What it does

Two-pass classification:

Inversion check — matches asset-specific inversion phrases (e.g., "production cut" → BULLISH for OIL)
Keyword scan — matches generic bullish/bearish keyword lists

If neither pass fires, the label is neutral.

Keyword Lists

Bullish (14 terms): cut, surge, rally, record high, growth, beat, upgrade, rise, gain, boost, strong, exceed, recovery, rebound

Bearish (13 terms): crash, plunge, drop, fall, miss, downgrade, warning, decline, loss, weak, below, cut guidance, layoff

Inversions: Asset-specific phrase overrides from the swik inversion catalog (125 active entries). Published separately as a dataset.

Usage

from inference import SwikHeuristicV1

model = SwikHeuristicV1()

# Basic usage
result = model.predict("Oil surges after OPEC production cut")
# {'label': 'bullish', 'magnitude': 0.72, 'confidence': 0.45, 'method': 'keyword'}

# With inversion catalog
inversions = [
    {"phrase": "coal power", "direction": "BULLISH", "variants": ["coal-fired power"]},
    {"phrase": "production cut", "direction": "BULLISH"},
]
model_with_inv = SwikHeuristicV1(known_inversions=inversions)
result = model_with_inv.predict("Coal power demand rises as gas prices surge", security="NATGAS")
# {'label': 'bullish', ..., 'inversion_applied': 'coal power', 'method': 'inversion'}

Benchmark Results

Evaluated on matched corpus: inference_log vs community_labels_legacy (text_hash join), 2026-03-08 to 2026-03-29.

Metric	heuristic-v1	haiku-4-5 (baseline)	haiku-4-5 (variant B)
Accuracy	98.88%	39.6%	46.0%
F1 macro	0.981	0.309	0.456
Neutral F1	0.992	0.506	—
Bullish F1	0.970	0.231	—
Bearish F1	0.981	0.189	—
n (pairs)	13,966	16,141	200 (test set)

⚠️ Important: These benchmarks are measured against AI-generated labels (Claude Haiku), not human ground truth. The high heuristic accuracy reflects agreement with the labeling model, not necessarily alignment with human judgment. Human-label benchmarks are pending.

⚠️ Known dataset bias: The companion labeled dataset is OIL-dominant — OIL accounts for ~56% of all labeled records. Model performance on other securities (especially low-volume ones) may be significantly lower than the aggregate numbers suggest. Evaluate per-security before deploying.

Confidence Calibration

The heuristic outputs a fixed confidence of 0.45 for all predictions. This is intentional — unlike the Haiku baseline (which is anti-calibrated: higher confidence → higher error rate), the heuristic makes no claim about certainty. Use it as a deterministic rule engine, not a probabilistic model.

Known Failure Modes

Ambiguous generic terms: Words like "cut" appear in both bullish (supply cuts → oil bullish) and neutral contexts (budget cuts, interest rate cuts). Without the inversion catalog, these will be mis-labeled.
Multi-entity headlines: "Oil falls as dollar rises" — the heuristic detects "falls" (bearish) but may assign it to the wrong security if entity filtering is weak.
Negation blindness: "Oil did NOT surge" → misclassified as bullish. No negation handling.
Language and spelling: English only. Abbreviations and misspellings not handled.
Context window: Heuristic has no memory of prior sentences. Each text is classified in isolation.

Model Weights

This model has no neural network weights. It is a deterministic rule-based system (keyword lists + inversion catalog).

No fine-tuning. No LoRA adapter. No PyTorch/TensorFlow required.
Labels in the companion dataset were generated by Claude Haiku (claude-haiku-4-5 via API) — not by a local model.
A LoRA fine-tuned adapter is planned once the community label corpus reaches sufficient size and multi-labeler consensus.

Architecture Context

This model is Layer 1 in swik's inference pipeline:

Text Input
    ↓
[heuristic-v1]  ← this model
    ↓ layer1_score
if security ∈ [OIL, NATGAS, LNG, GOLD, EURUSD]: use heuristic output
else if relevance < threshold: use heuristic output
else:
    ↓
[claude-haiku-4-5 + inversion catalog]  ← Layer 2
    ↓
Final prediction

For OIL, NATGAS, LNG, GOLD, EURUSD: the heuristic is the final model (accuracy ~99% on these). For other securities: heuristic pre-screens, Haiku runs if relevance passes.

Training Data

Not trained. Deterministic rule-based system. Keyword lists were derived from:

Manual curation of financial news vocabulary
Error analysis on the swik inference corpus
Cross-validated against community labels

Dataset

Labels used for benchmarking: polibert/swik-sentiment-labels

License

CC BY 4.0

Citation

@misc{swik_heuristic_v1_2026,
  title={swik-heuristic-v1: Domain-Specific Financial Sentiment Classifier},
  author={swik Community},
  year={2026},
  url={https://huggingface.co/polibert/swik-heuristic-v1},
  license={CC BY 4.0}
}

polibert
/

swik-heuristic-v1