Initial release: swik-heuristic-v1 keyword classifier with benchmarks
Browse files- README.md +166 -0
- inference.py +107 -0
README.md
ADDED
|
@@ -0,0 +1,166 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc-by-4.0
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
tags:
|
| 6 |
+
- text-classification
|
| 7 |
+
- sentiment-analysis
|
| 8 |
+
- financial-sentiment
|
| 9 |
+
- finance
|
| 10 |
+
- commodities
|
| 11 |
+
- domain-specific
|
| 12 |
+
- rule-based
|
| 13 |
+
- interpretable
|
| 14 |
+
pretty_name: swik Heuristic Sentiment v1
|
| 15 |
+
---
|
| 16 |
+
|
| 17 |
+
# swik-heuristic-v1 (v0.1)
|
| 18 |
+
|
| 19 |
+
**Deterministic keyword-based financial sentiment classifier.**
|
| 20 |
+
Fast, interpretable, no GPU, no API key. A baseline for domain-specific financial news sentiment.
|
| 21 |
+
|
| 22 |
+
This is the Layer 1 model in swik's two-layer inference pipeline. It processes every request before
|
| 23 |
+
any LLM call — both as a fast path for high-confidence cases and as a fallback when the API is unavailable.
|
| 24 |
+
|
| 25 |
+
## What it does
|
| 26 |
+
|
| 27 |
+
Two-pass classification:
|
| 28 |
+
1. **Inversion check** — matches asset-specific inversion phrases (e.g., "production cut" → BULLISH for OIL)
|
| 29 |
+
2. **Keyword scan** — matches generic bullish/bearish keyword lists
|
| 30 |
+
|
| 31 |
+
If neither pass fires, the label is `neutral`.
|
| 32 |
+
|
| 33 |
+
## Keyword Lists
|
| 34 |
+
|
| 35 |
+
**Bullish (14 terms):** cut, surge, rally, record high, growth, beat, upgrade, rise, gain, boost, strong, exceed, recovery, rebound
|
| 36 |
+
|
| 37 |
+
**Bearish (13 terms):** crash, plunge, drop, fall, miss, downgrade, warning, decline, loss, weak, below, cut guidance, layoff
|
| 38 |
+
|
| 39 |
+
**Inversions:** Asset-specific phrase overrides from the [swik inversion catalog](https://swik.io/inversions) (125 active entries). Published separately as a dataset.
|
| 40 |
+
|
| 41 |
+
## Usage
|
| 42 |
+
|
| 43 |
+
```python
|
| 44 |
+
from inference import SwikHeuristicV1
|
| 45 |
+
|
| 46 |
+
model = SwikHeuristicV1()
|
| 47 |
+
|
| 48 |
+
# Basic usage
|
| 49 |
+
result = model.predict("Oil surges after OPEC production cut")
|
| 50 |
+
# {'label': 'bullish', 'magnitude': 0.72, 'confidence': 0.45, 'method': 'keyword'}
|
| 51 |
+
|
| 52 |
+
# With inversion catalog
|
| 53 |
+
inversions = [
|
| 54 |
+
{"phrase": "coal power", "direction": "BULLISH", "variants": ["coal-fired power"]},
|
| 55 |
+
{"phrase": "production cut", "direction": "BULLISH"},
|
| 56 |
+
]
|
| 57 |
+
model_with_inv = SwikHeuristicV1(known_inversions=inversions)
|
| 58 |
+
result = model_with_inv.predict("Coal power demand rises as gas prices surge", security="NATGAS")
|
| 59 |
+
# {'label': 'bullish', ..., 'inversion_applied': 'coal power', 'method': 'inversion'}
|
| 60 |
+
```
|
| 61 |
+
|
| 62 |
+
## Benchmark Results
|
| 63 |
+
|
| 64 |
+
Evaluated on matched corpus: inference_log vs community_labels_legacy (text_hash join), 2026-03-08 to 2026-03-29.
|
| 65 |
+
|
| 66 |
+
| Metric | heuristic-v1 | haiku-4-5 (baseline) | haiku-4-5 (variant B) |
|
| 67 |
+
|--------|-------------|---------------------|----------------------|
|
| 68 |
+
| **Accuracy** | **98.88%** | 39.6% | 46.0% |
|
| 69 |
+
| **F1 macro** | **0.981** | 0.309 | 0.456 |
|
| 70 |
+
| Neutral F1 | 0.992 | 0.506 | — |
|
| 71 |
+
| Bullish F1 | 0.970 | 0.231 | — |
|
| 72 |
+
| Bearish F1 | 0.981 | 0.189 | — |
|
| 73 |
+
| n (pairs) | 13,966 | 16,141 | 200 (test set) |
|
| 74 |
+
|
| 75 |
+
> ⚠️ **Important:** These benchmarks are measured against AI-generated labels (Claude Haiku), not
|
| 76 |
+
> human ground truth. The high heuristic accuracy reflects agreement with the labeling model, not
|
| 77 |
+
> necessarily alignment with human judgment. Human-label benchmarks are pending.
|
| 78 |
+
>
|
| 79 |
+
> ⚠️ **Known dataset bias:** The companion labeled dataset is OIL-dominant — OIL accounts for ~56% of all
|
| 80 |
+
> labeled records. Model performance on other securities (especially low-volume ones) may be significantly
|
| 81 |
+
> lower than the aggregate numbers suggest. Evaluate per-security before deploying.
|
| 82 |
+
|
| 83 |
+
## Confidence Calibration
|
| 84 |
+
|
| 85 |
+
The heuristic outputs a fixed confidence of `0.45` for all predictions. This is intentional —
|
| 86 |
+
unlike the Haiku baseline (which is anti-calibrated: higher confidence → higher error rate),
|
| 87 |
+
the heuristic makes no claim about certainty. Use it as a deterministic rule engine, not a
|
| 88 |
+
probabilistic model.
|
| 89 |
+
|
| 90 |
+
## Known Failure Modes
|
| 91 |
+
|
| 92 |
+
1. **Ambiguous generic terms**: Words like "cut" appear in both bullish (supply cuts → oil bullish)
|
| 93 |
+
and neutral contexts (budget cuts, interest rate cuts). Without the inversion catalog, these
|
| 94 |
+
will be mis-labeled.
|
| 95 |
+
|
| 96 |
+
2. **Multi-entity headlines**: "Oil falls as dollar rises" — the heuristic detects "falls" (bearish)
|
| 97 |
+
but may assign it to the wrong security if entity filtering is weak.
|
| 98 |
+
|
| 99 |
+
3. **Negation blindness**: "Oil did NOT surge" → misclassified as bullish. No negation handling.
|
| 100 |
+
|
| 101 |
+
4. **Language and spelling**: English only. Abbreviations and misspellings not handled.
|
| 102 |
+
|
| 103 |
+
5. **Context window**: Heuristic has no memory of prior sentences. Each text is classified in
|
| 104 |
+
isolation.
|
| 105 |
+
|
| 106 |
+
## Model Weights
|
| 107 |
+
|
| 108 |
+
**This model has no neural network weights.** It is a deterministic rule-based system (keyword lists + inversion catalog).
|
| 109 |
+
|
| 110 |
+
- No fine-tuning. No LoRA adapter. No PyTorch/TensorFlow required.
|
| 111 |
+
- Labels in the companion dataset were generated by Claude Haiku (claude-haiku-4-5 via API) — not by a local model.
|
| 112 |
+
- A LoRA fine-tuned adapter is planned once the community label corpus reaches sufficient size and multi-labeler consensus.
|
| 113 |
+
|
| 114 |
+
## Architecture Context
|
| 115 |
+
|
| 116 |
+
This model is Layer 1 in swik's inference pipeline:
|
| 117 |
+
|
| 118 |
+
```
|
| 119 |
+
Text Input
|
| 120 |
+
↓
|
| 121 |
+
[heuristic-v1] ← this model
|
| 122 |
+
↓ layer1_score
|
| 123 |
+
if security ∈ [OIL, NATGAS, LNG, GOLD, EURUSD]: use heuristic output
|
| 124 |
+
else if relevance < threshold: use heuristic output
|
| 125 |
+
else:
|
| 126 |
+
↓
|
| 127 |
+
[claude-haiku-4-5 + inversion catalog] ← Layer 2
|
| 128 |
+
↓
|
| 129 |
+
Final prediction
|
| 130 |
+
```
|
| 131 |
+
|
| 132 |
+
For OIL, NATGAS, LNG, GOLD, EURUSD: the heuristic is the final model (accuracy ~99% on these).
|
| 133 |
+
For other securities: heuristic pre-screens, Haiku runs if relevance passes.
|
| 134 |
+
|
| 135 |
+
## Training Data
|
| 136 |
+
|
| 137 |
+
Not trained. Deterministic rule-based system. Keyword lists were derived from:
|
| 138 |
+
- Manual curation of financial news vocabulary
|
| 139 |
+
- Error analysis on the swik inference corpus
|
| 140 |
+
- Cross-validated against community labels
|
| 141 |
+
|
| 142 |
+
## Dataset
|
| 143 |
+
|
| 144 |
+
Labels used for benchmarking: [polibert/swik-sentiment-labels](https://huggingface.co/datasets/polibert/swik-sentiment-labels)
|
| 145 |
+
|
| 146 |
+
## License
|
| 147 |
+
|
| 148 |
+
CC BY 4.0
|
| 149 |
+
|
| 150 |
+
## Citation
|
| 151 |
+
|
| 152 |
+
```bibtex
|
| 153 |
+
@misc{swik_heuristic_v1_2026,
|
| 154 |
+
title={swik-heuristic-v1: Domain-Specific Financial Sentiment Classifier},
|
| 155 |
+
author={swik Community},
|
| 156 |
+
year={2026},
|
| 157 |
+
url={https://huggingface.co/polibert/swik-heuristic-v1},
|
| 158 |
+
license={CC BY 4.0}
|
| 159 |
+
}
|
| 160 |
+
```
|
| 161 |
+
|
| 162 |
+
## Links
|
| 163 |
+
|
| 164 |
+
- Platform: [swik.io](https://swik.io)
|
| 165 |
+
- Dataset: [polibert/swik-sentiment-labels](https://huggingface.co/datasets/polibert/swik-sentiment-labels)
|
| 166 |
+
- Contribute labels: [swik.io/contribute/label](https://swik.io/contribute/label)
|
inference.py
ADDED
|
@@ -0,0 +1,107 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
swik-heuristic-v1 — deterministic keyword-based financial sentiment classifier.
|
| 4 |
+
|
| 5 |
+
A fast, interpretable baseline for domain-specific financial news sentiment.
|
| 6 |
+
No GPU required. No API calls. Runs in microseconds.
|
| 7 |
+
|
| 8 |
+
Usage:
|
| 9 |
+
from inference import SwikHeuristicV1, KNOWN_INVERSIONS
|
| 10 |
+
model = SwikHeuristicV1()
|
| 11 |
+
result = model.predict("OPEC agrees to production cuts", security="OIL")
|
| 12 |
+
# {"label": "bullish", "magnitude": 0.72, "confidence": 0.45, "method": "keyword"}
|
| 13 |
+
|
| 14 |
+
For inversion-aware inference, pass known_inversions (list of dicts with phrase/direction).
|
| 15 |
+
"""
|
| 16 |
+
|
| 17 |
+
BULLISH_KEYWORDS = [
|
| 18 |
+
"cut", "surge", "rally", "record high", "growth", "beat", "upgrade",
|
| 19 |
+
"rise", "gain", "boost", "strong", "exceed", "recovery", "rebound"
|
| 20 |
+
]
|
| 21 |
+
BEARISH_KEYWORDS = [
|
| 22 |
+
"crash", "plunge", "drop", "fall", "miss", "downgrade", "warning",
|
| 23 |
+
"decline", "loss", "weak", "below", "cut guidance", "layoff"
|
| 24 |
+
]
|
| 25 |
+
|
| 26 |
+
LABEL_MAP = {"bullish": 0, "bearish": 1, "neutral": 2, "irrelevant": 3}
|
| 27 |
+
LABEL_NAMES = ["bullish", "bearish", "neutral", "irrelevant"]
|
| 28 |
+
|
| 29 |
+
|
| 30 |
+
class SwikHeuristicV1:
|
| 31 |
+
"""
|
| 32 |
+
Two-pass keyword classifier:
|
| 33 |
+
Pass 1: Check known inversions (asset-specific phrase overrides)
|
| 34 |
+
Pass 2: Check generic bullish/bearish keyword lists
|
| 35 |
+
Default: neutral
|
| 36 |
+
|
| 37 |
+
Accuracy: 98.88% on matched inference corpus vs AI labels (n=13,966).
|
| 38 |
+
Note: measured against AI-generated labels, not human ground truth.
|
| 39 |
+
"""
|
| 40 |
+
|
| 41 |
+
def __init__(self, known_inversions=None):
|
| 42 |
+
"""
|
| 43 |
+
known_inversions: list of dicts with keys:
|
| 44 |
+
phrase (str), direction (str: BULLISH|BEARISH|NEUTRAL),
|
| 45 |
+
variants (list[str], optional), confidence (float, optional)
|
| 46 |
+
"""
|
| 47 |
+
self.known_inversions = known_inversions or []
|
| 48 |
+
|
| 49 |
+
def predict(self, text: str, security: str = None, key_entities: list = None) -> dict:
|
| 50 |
+
text_lower = text.lower()
|
| 51 |
+
direction = "neutral"
|
| 52 |
+
magnitude = 0.4
|
| 53 |
+
relevance = 0.5
|
| 54 |
+
inversion_applied = None
|
| 55 |
+
|
| 56 |
+
# Pass 1: known inversions (highest priority)
|
| 57 |
+
for inv in self.known_inversions:
|
| 58 |
+
phrase = inv["phrase"].lower()
|
| 59 |
+
variants = [v.lower() for v in inv.get("variants", [])]
|
| 60 |
+
if phrase in text_lower or any(v in text_lower for v in variants):
|
| 61 |
+
direction = inv["direction"].lower()
|
| 62 |
+
magnitude = float(inv.get("confidence", 0.7))
|
| 63 |
+
relevance = 0.85
|
| 64 |
+
inversion_applied = inv["phrase"]
|
| 65 |
+
break
|
| 66 |
+
|
| 67 |
+
# Pass 2: generic keywords
|
| 68 |
+
if not inversion_applied:
|
| 69 |
+
if any(kw in text_lower for kw in BULLISH_KEYWORDS):
|
| 70 |
+
direction = "bullish"
|
| 71 |
+
magnitude = 0.72
|
| 72 |
+
relevance = 0.75
|
| 73 |
+
elif any(kw in text_lower for kw in BEARISH_KEYWORDS):
|
| 74 |
+
direction = "bearish"
|
| 75 |
+
magnitude = 0.68
|
| 76 |
+
relevance = 0.75
|
| 77 |
+
|
| 78 |
+
# Boost relevance if key entities mentioned
|
| 79 |
+
if key_entities:
|
| 80 |
+
for entity in key_entities:
|
| 81 |
+
if entity.lower() in text_lower:
|
| 82 |
+
relevance = min(1.0, relevance + 0.15)
|
| 83 |
+
break
|
| 84 |
+
|
| 85 |
+
return {
|
| 86 |
+
"label": direction,
|
| 87 |
+
"label_id": LABEL_MAP.get(direction, 2),
|
| 88 |
+
"magnitude": round(magnitude, 2),
|
| 89 |
+
"relevance": round(relevance, 2),
|
| 90 |
+
"confidence": 0.45, # heuristic confidence is always 0.45
|
| 91 |
+
"inversion_applied": inversion_applied,
|
| 92 |
+
"method": "inversion" if inversion_applied else ("keyword" if direction != "neutral" else "default"),
|
| 93 |
+
}
|
| 94 |
+
|
| 95 |
+
def predict_batch(self, texts: list, security: str = None, key_entities: list = None) -> list:
|
| 96 |
+
return [self.predict(t, security, key_entities) for t in texts]
|
| 97 |
+
|
| 98 |
+
|
| 99 |
+
if __name__ == "__main__":
|
| 100 |
+
import sys
|
| 101 |
+
model = SwikHeuristicV1()
|
| 102 |
+
text = " ".join(sys.argv[1:]) if len(sys.argv) > 1 else "OPEC agrees to production cuts, oil surges"
|
| 103 |
+
result = model.predict(text)
|
| 104 |
+
print(f"Text: {text}")
|
| 105 |
+
print(f"Label: {result['label']} (id={result['label_id']})")
|
| 106 |
+
print(f"Magnitude: {result['magnitude']} | Relevance: {result['relevance']} | Confidence: {result['confidence']}")
|
| 107 |
+
print(f"Method: {result['method']}")
|