Publish binomial-shannon v2

Browse files

Files changed (7) hide show

README.md +130 -0
config.json +108 -0
configuration_shannon2.py +72 -0
model.safetensors +3 -0
modeling_shannon2.py +192 -0
tokenizer.json +0 -0
tokenizer_config.json +17 -0

README.md ADDED Viewed

	@@ -0,0 +1,130 @@

+---
+license: apache-2.0
+language: en
+library_name: transformers
+tags:
+  - finance
+  - news
+  - macro
+  - financial-news
+  - text-classification
+pipeline_tag: text-classification
+---
+# binomial-shannon-2
+A **financial news characterizer with two modes**: it reads *ticker-tagged* company news the way [binomial-shannon-1](https://huggingface.co/BinomialTechnologies/binomial-shannon-1) does (19 structured features), and reads *macro* news (central banks, inflation, rates, FX, commodities, geopolitics) with a dedicated 35-output macro head bank. A built-in router selects the right head set per article. ~15-30 ms on CPU.
+## Quick start
+```python
+from transformers import AutoTokenizer, AutoModel
+tok   = AutoTokenizer.from_pretrained("BinomialTechnologies/binomial-shannon-2")
+model = AutoModel.from_pretrained("BinomialTechnologies/binomial-shannon-2",
+                                   trust_remote_code=True)
+inputs = tok("[FEED: reuters] [SITE: reuters.com] [DATE: 2026-03-18]\n\n"
+             "TITLE: Fed holds rates, signals two cuts later this year\n\nBODY: ...",
+             return_tensors="pt", truncation=True, max_length=1024)
+out = model.predict(**inputs)
+out["mode_prob"]            # [P(ticker), P(macro)]
+out["topic_prob"]          # 18-way macro topic distribution
+out["directional_read"]    # signed macro read in [-1, +1]
+out["hawkish_dovish_prob"] # 5-way, meaningful on monetary-policy / rates articles
+```
+## What it outputs
+**Ticker mode (19 features, identical to shannon-1)** — event type (10 binary), tone, implied_direction, novelty, claim_type (4), specificity, materiality_if_true.
+**Macro mode (35 features):**
+| Head | Type | Meaning |
+|---|---|---|
+| `topic` | softmax (18) | monetary_policy / fiscal_policy / inflation / growth / labor / rates_fixed_income / equities_markets / fx_currency / energy / commodities / credit_banking / crypto / mergers_acquisitions / trade_policy / geopolitics / single_company / technicals / other |
+| `directional_read` | [-1, +1] | net read for risk assets implied by the article |
+| `severity` | softmax (5) | noise / minor / notable / major / crisis |
+| `novelty` | softmax (3) | rehash / commentary / breaking |
+| `claim_type` | softmax (4) | fact / opinion / rumor / forecast |
+| `hawkish_dovish` | softmax (5) | dovish → hawkish; meaningful on monetary-policy / rates articles |
+Every macro head is a softmax or a signed scalar — argmax for a label, the weighted score for a continuous summary, or the entropy for uncertainty.
+## Eval
+Held-out forward-temporal test set (Oct 2025 – May 2026, never seen during training). Numbers from a reproducible harness over all 15,805 macro test articles + a seeded 10,000 ticker sample.
+### Ticker heads (parity with shannon-1)
+| Event-flag macro F1 | implied_direction | tone | claim acc |
+|---|---|---|---|
+| 0.79 | 0.854 | 0.834 | 89.5% |
+The ticker bank matches the standalone shannon-1 model — shannon-2 is a strict superset, adding macro without regressing ticker quality.
+### Macro heads (n=15,805)
+| Head | Metric | Value |
+|---|---|---|
+| topic (18-way) | accuracy | **0.814** |
+| directional_read | Pearson vs panel | **+0.783** |
+| severity (5-way) | accuracy | 0.708 |
+| novelty (3-way) | accuracy | 0.648 |
+| claim_type (4-way) | accuracy | 0.785 |
+| hawkish_dovish (5-way) | accuracy | 0.616 (n=1,650) |
+Per-topic F1 (selected):
+| Topic | F1 | Support |
+|---|---|---|
+| commodities | 0.94 | 2,061 |
+| equities_markets | 0.88 | 3,613 |
+| fx_currency | 0.88 | 3,861 |
+| monetary_policy | 0.79 | 1,662 |
+| inflation | 0.70 | 526 |
+| geopolitics | 0.48 | 346 |
+| technicals | 0.25 | 517 |
+Strongest on high-volume market topics (commodities, FX, equities, monetary policy); weakest on technicals and geopolitics, which are lower-support and more heterogeneous.
+**Routing.** Ticker and macro articles arrive on structurally distinct feeds (per-company news vs. macro wires), so the router separates the two modes essentially perfectly — it is a convenience for serving mixed streams, not a hard classification result.
+## Architecture
+A specialized ~150M-parameter encoder shared across a 2-way router and two head banks (ticker + macro), each a 3-layer MLP over a CLS+masked-mean pooled representation.
+- ~150M encoder params + lightweight head banks
+- 4096-token context (1024 default at inference)
+- bf16 GPU / fp32 CPU
+- ~15-30 ms CPU
+## How it was trained
+- **Corpus**: ticker-tagged company news + press releases and a macro news corpus (2018-2026)
+- **Labels**: distilled from a frontier reasoning model against per-mode rubrics (separate ticker and macro labeling specs)
+- **Split**: forward temporal — train on ≤2025-09-30, test on 2025-10 → 2026-05
+- **Compute**: trained from the base encoder on a single B200
+## Caveats
+- **Trained against frontier-LLM labels.** Eval correlations are partly imitation; treat the outputs as structured features, not ground truth.
+- **Macro corpus is English-language wire news**, weighted toward 2024-2026.
+- **`hawkish_dovish` only fires meaningfully on monetary-policy / rates articles** (it is loss-masked elsewhere during training).
+- **Tier 2** — research preview. Don't use the outputs as standalone trading signals; combine with your own pipelines.
+## License
+Apache 2.0, like the rest of the Binomial AI Research model zoo.
+## Citation
+```bibtex
+@misc{binomial-shannon-2-2026,
+  title  = {binomial-shannon-2: A dual-mode financial news characterizer (ticker + macro)},
+  author = {Binomial AI Research},
+  year   = {2026},
+  url    = {https://huggingface.co/BinomialTechnologies/binomial-shannon-2}
+}
+```

config.json ADDED Viewed

	@@ -0,0 +1,108 @@

+{
+  "model_type": "shannon2",
+  "architectures": [
+    "Shannon2MultiHead"
+  ],
+  "auto_map": {
+    "AutoConfig": "configuration_shannon2.Shannon2Config",
+    "AutoModel": "modeling_shannon2.Shannon2MultiHead"
+  },
+  "encoder_name_or_path": "answerdotai/ModernBERT-base",
+  "encoder_config": {
+    "transformers_version": "5.6.2",
+    "architectures": [
+      "ModernBertForMaskedLM"
+    ],
+    "output_hidden_states": false,
+    "return_dict": true,
+    "dtype": "float32",
+    "chunk_size_feed_forward": 0,
+    "is_encoder_decoder": false,
+    "id2label": {
+      "0": "LABEL_0",
+      "1": "LABEL_1"
+    },
+    "label2id": {
+      "LABEL_0": 0,
+      "LABEL_1": 1
+    },
+    "problem_type": null,
+    "vocab_size": 50368,
+    "hidden_size": 768,
+    "intermediate_size": 1152,
+    "num_hidden_layers": 22,
+    "num_attention_heads": 12,
+    "hidden_activation": "gelu",
+    "max_position_embeddings": 8192,
+    "initializer_range": 0.02,
+    "initializer_cutoff_factor": 2.0,
+    "norm_eps": 1e-05,
+    "norm_bias": false,
+    "pad_token_id": 50283,
+    "eos_token_id": 50282,
+    "bos_token_id": 50281,
+    "cls_token_id": 50281,
+    "sep_token_id": 50282,
+    "attention_bias": false,
+    "attention_dropout": 0.0,
+    "layer_types": [
+      "full_attention",
+      "sliding_attention",
+      "sliding_attention",
+      "full_attention",
+      "sliding_attention",
+      "sliding_attention",
+      "full_attention",
+      "sliding_attention",
+      "sliding_attention",
+      "full_attention",
+      "sliding_attention",
+      "sliding_attention",
+      "full_attention",
+      "sliding_attention",
+      "sliding_attention",
+      "full_attention",
+      "sliding_attention",
+      "sliding_attention",
+      "full_attention",
+      "sliding_attention",
+      "sliding_attention",
+      "full_attention"
+    ],
+    "rope_parameters": {
+      "sliding_attention": {
+        "rope_type": "default",
+        "rope_theta": 10000.0
+      },
+      "full_attention": {
+        "rope_type": "default",
+        "rope_theta": 160000.0
+      }
+    },
+    "local_attention": 128,
+    "embedding_dropout": 0.0,
+    "mlp_bias": false,
+    "mlp_dropout": 0.0,
+    "decoder_bias": true,
+    "classifier_pooling": "mean",
+    "classifier_dropout": 0.0,
+    "classifier_bias": false,
+    "classifier_activation": "gelu",
+    "deterministic_flash_attn": false,
+    "sparse_prediction": false,
+    "sparse_pred_ignore_index": -100,
+    "tie_word_embeddings": true,
+    "_name_or_path": "answerdotai/ModernBERT-base",
+    "global_attn_every_n_layers": 3,
+    "gradient_checkpointing": false,
+    "layer_norm_eps": 1e-05,
+    "model_type": "modernbert",
+    "position_embedding_type": "absolute",
+    "output_attentions": false
+  },
+  "max_position_embeddings": 4096,
+  "head_h1": 512,
+  "head_h2": 256,
+  "dropout": 0.1,
+  "torch_dtype": "float32"
+}

configuration_shannon2.py ADDED Viewed

	@@ -0,0 +1,72 @@

+"""Configuration class for binomial-shannon-2 (router + ticker + macro).
+Ships with the model on HuggingFace Hub so
+`AutoConfig.from_pretrained(repo, trust_remote_code=True)` works.
+"""
+from __future__ import annotations
+from transformers.configuration_utils import PretrainedConfig
+# Ticker mode (Shannon-1 schema)
+EVENTS = (
+    "earnings", "guidance", "m_and_a", "regulatory_legal", "product",
+    "exec_change", "dividend_buyback", "analyst_rating", "macro_sector", "other",
+)
+CLAIM_TYPES = ("fact", "opinion", "rumor", "forecast")
+# Macro mode
+TOPICS = (
+    "monetary_policy", "fiscal_policy", "inflation", "growth", "labor",
+    "rates_fixed_income", "equities_markets", "fx_currency", "energy",
+    "commodities", "credit_banking", "crypto", "mergers_acquisitions",
+    "trade_policy", "geopolitics", "single_company", "technicals", "other",
+)
+SEVERITY_BUCKETS = ("noise", "minor", "notable", "major", "crisis")
+NOVELTY_BUCKETS_MACRO = ("rehash", "commentary", "breaking")
+CLAIM_TYPES_MACRO = ("fact", "opinion", "rumor", "forecast")
+HAWKISH_DOVISH_BUCKETS = (
+    "dovish", "mildly_dovish", "neutral", "mildly_hawkish", "hawkish",
+)
+class Shannon2Config(PretrainedConfig):
+    """Config for Shannon2MultiHead.
+    Shared encoder + 2-way router + ticker head bank (19 outputs, inherited
+    from shannon-1) + macro head bank (35 outputs). Mirrors shannon-1's hub
+    configuration approach.
+    """
+    model_type = "shannon2"
+    def __init__(
+        self,
+        encoder_name_or_path: str = "answerdotai/ModernBERT-base",
+        max_position_embeddings: int = 4096,
+        head_h1: int = 512,
+        head_h2: int = 256,
+        dropout: float = 0.1,
+        events: tuple[str, ...] = EVENTS,
+        claim_types: tuple[str, ...] = CLAIM_TYPES,
+        topics: tuple[str, ...] = TOPICS,
+        severity_buckets: tuple[str, ...] = SEVERITY_BUCKETS,
+        novelty_buckets_macro: tuple[str, ...] = NOVELTY_BUCKETS_MACRO,
+        claim_types_macro: tuple[str, ...] = CLAIM_TYPES_MACRO,
+        hawkish_dovish_buckets: tuple[str, ...] = HAWKISH_DOVISH_BUCKETS,
+        **kwargs,
+    ) -> None:
+        super().__init__(**kwargs)
+        self.encoder_name_or_path = encoder_name_or_path
+        self.max_position_embeddings = max_position_embeddings
+        self.head_h1 = head_h1
+        self.head_h2 = head_h2
+        self.dropout = dropout
+        self.events = list(events)
+        self.claim_types = list(claim_types)
+        self.topics = list(topics)
+        self.severity_buckets = list(severity_buckets)
+        self.novelty_buckets_macro = list(novelty_buckets_macro)
+        self.claim_types_macro = list(claim_types_macro)
+        self.hawkish_dovish_buckets = list(hawkish_dovish_buckets)

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fcebafc4b9629d3316ed0d08fd17fdc0fe4fa4055f28ab16d5cd8b25b0931534
+size 647560940

modeling_shannon2.py ADDED Viewed

	@@ -0,0 +1,192 @@

+"""Self-contained model class for binomial-shannon-2.
+Distributed alongside the weights on HuggingFace Hub so anyone can do:
+    from transformers import AutoTokenizer, AutoModel
+    tok   = AutoTokenizer.from_pretrained("BinomialTechnologies/binomial-shannon-2")
+    model = AutoModel.from_pretrained("BinomialTechnologies/binomial-shannon-2",
+                                       trust_remote_code=True)
+Imports only from `transformers` + `torch` — no project-internal dependencies.
+Module names match the training checkpoint so weights load verbatim.
+Architecture:
+    shared encoder
+        ↓ (CLS + masked mean pool concatenated)
+        ↓ 2-way router (ticker | macro)
+        ↓ ticker head bank (19 outputs) + macro head bank (35 outputs)
+    Router            mode_prob over {ticker, macro}
+    Ticker bank       event(10) + tone + implied_direction + novelty
+                      + claim(4) + specificity + materiality   (19, = shannon-1)
+    Macro bank        topic(18) + directional_read + severity(5)
+                      + novelty_macro(3) + claim_macro(4) + hawkish_dovish(5)  (35)
+"""
+from __future__ import annotations
+from dataclasses import dataclass
+from typing import Optional
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from transformers import AutoConfig
+from transformers.modeling_utils import PreTrainedModel
+from transformers.modeling_outputs import ModelOutput
+from .configuration_shannon2 import (
+    Shannon2Config, EVENTS, CLAIM_TYPES, TOPICS, SEVERITY_BUCKETS,
+    NOVELTY_BUCKETS_MACRO, CLAIM_TYPES_MACRO, HAWKISH_DOVISH_BUCKETS,
+)
+N_EVENTS = len(EVENTS)
+N_CLAIMS_TICKER = len(CLAIM_TYPES)
+N_TOPICS = len(TOPICS)
+N_SEVERITY = len(SEVERITY_BUCKETS)
+N_NOVELTY_MACRO = len(NOVELTY_BUCKETS_MACRO)
+N_CLAIMS_MACRO = len(CLAIM_TYPES_MACRO)
+N_HD = len(HAWKISH_DOVISH_BUCKETS)
+MODE_TICKER = 0
+MODE_MACRO = 1
+@dataclass
+class Shannon2Output(ModelOutput):
+    mode_logits: Optional[torch.Tensor] = None
+    # ticker bank
+    event_logits: Optional[torch.Tensor] = None
+    tone: Optional[torch.Tensor] = None
+    implied_direction: Optional[torch.Tensor] = None
+    novelty: Optional[torch.Tensor] = None
+    claim_logits: Optional[torch.Tensor] = None
+    specificity: Optional[torch.Tensor] = None
+    materiality_if_true: Optional[torch.Tensor] = None
+    # macro bank
+    topic_logits: Optional[torch.Tensor] = None
+    directional_read: Optional[torch.Tensor] = None
+    severity_logits: Optional[torch.Tensor] = None
+    novelty_macro_logits: Optional[torch.Tensor] = None
+    claim_macro_logits: Optional[torch.Tensor] = None
+    hawkish_dovish_logits: Optional[torch.Tensor] = None
+class Shannon2MultiHead(PreTrainedModel):
+    config_class = Shannon2Config
+    base_model_prefix = "shannon2"
+    _tied_weights_keys: list = []
+    all_tied_weights_keys: dict = {}
+    def __init__(self, config: Shannon2Config) -> None:
+        super().__init__(config)
+        self.config = config
+        # Rebuild the encoder from the bundled config so loading works offline.
+        if hasattr(config, "encoder_config") and config.encoder_config:
+            from transformers.models.auto.configuration_auto import CONFIG_MAPPING
+            mtype = config.encoder_config.get("model_type")
+            if mtype and mtype in CONFIG_MAPPING:
+                enc_cfg = CONFIG_MAPPING[mtype].from_dict(config.encoder_config)
+            else:
+                enc_cfg = AutoConfig.from_pretrained(config.encoder_name_or_path)
+        else:
+            enc_cfg = AutoConfig.from_pretrained(config.encoder_name_or_path)
+        if config.max_position_embeddings > getattr(enc_cfg, "max_position_embeddings", 8192):
+            enc_cfg.max_position_embeddings = config.max_position_embeddings
+        from transformers import AutoModel as _AutoModel
+        self.encoder = _AutoModel.from_config(enc_cfg, attn_implementation="sdpa")
+        H = enc_cfg.hidden_size
+        head_in = 2 * H
+        h1, h2 = config.head_h1, config.head_h2
+        d = config.dropout
+        self.dropout = nn.Dropout(d)
+        def _mlp(out_dim: int) -> nn.Sequential:
+            return nn.Sequential(
+                nn.Linear(head_in, h1), nn.GELU(), nn.Dropout(d),
+                nn.Linear(h1, h2),      nn.GELU(), nn.Dropout(d),
+                nn.Linear(h2, out_dim),
+            )
+        # Router
+        self.head_router = _mlp(2)
+        # Ticker bank
+        self.head_event = _mlp(N_EVENTS)
+        self.head_tone = _mlp(1)
+        self.head_implied_direction = _mlp(1)
+        self.head_novelty = _mlp(1)
+        self.head_claim = _mlp(N_CLAIMS_TICKER)
+        self.head_specificity = _mlp(1)
+        self.head_materiality = _mlp(1)
+        # Macro bank
+        self.head_topic = _mlp(N_TOPICS)
+        self.head_directional_read = _mlp(1)
+        self.head_severity = _mlp(N_SEVERITY)
+        self.head_novelty_macro = _mlp(N_NOVELTY_MACRO)
+        self.head_claim_macro = _mlp(N_CLAIMS_MACRO)
+        self.head_hawkish_dovish = _mlp(N_HD)
+    def _pool(self, last_hidden: torch.Tensor, attention_mask: torch.Tensor) -> torch.Tensor:
+        cls = last_hidden[:, 0]
+        m = attention_mask.unsqueeze(-1).to(last_hidden.dtype)
+        mean_pool = (last_hidden * m).sum(1) / m.sum(1).clamp(min=1.0)
+        return self.dropout(torch.cat([cls, mean_pool], dim=-1))
+    def forward(self, input_ids: torch.Tensor, attention_mask: torch.Tensor,
+                **kwargs) -> Shannon2Output:
+        enc = self.encoder(input_ids=input_ids, attention_mask=attention_mask)
+        pooled = self._pool(enc.last_hidden_state, attention_mask)
+        return Shannon2Output(
+            mode_logits=self.head_router(pooled),
+            event_logits=self.head_event(pooled),
+            tone=self.head_tone(pooled).squeeze(-1),
+            implied_direction=self.head_implied_direction(pooled).squeeze(-1),
+            novelty=self.head_novelty(pooled).squeeze(-1),
+            claim_logits=self.head_claim(pooled),
+            specificity=self.head_specificity(pooled).squeeze(-1),
+            materiality_if_true=self.head_materiality(pooled).squeeze(-1),
+            topic_logits=self.head_topic(pooled),
+            directional_read=self.head_directional_read(pooled).squeeze(-1),
+            severity_logits=self.head_severity(pooled),
+            novelty_macro_logits=self.head_novelty_macro(pooled),
+            claim_macro_logits=self.head_claim_macro(pooled),
+            hawkish_dovish_logits=self.head_hawkish_dovish(pooled),
+        )
+    @torch.no_grad()
+    def predict(self, input_ids: torch.Tensor, attention_mask: torch.Tensor,
+                mention_threshold: float = 0.5) -> dict:
+        out = self.forward(input_ids=input_ids, attention_mask=attention_mask)
+        ev_prob = torch.sigmoid(out.event_logits)
+        return {
+            "mode_prob": F.softmax(out.mode_logits, dim=-1),
+            # ticker
+            "event_prob": ev_prob,
+            "event_mentioned": (ev_prob >= mention_threshold).float(),
+            "tone": out.tone.clamp(-1.0, 1.0),
+            "implied_direction": out.implied_direction.clamp(-1.0, 1.0),
+            "novelty": out.novelty.clamp(0.0, 1.0),
+            "claim_prob": F.softmax(out.claim_logits, dim=-1),
+            "specificity": out.specificity.clamp(0.0, 1.0),
+            "materiality_if_true": out.materiality_if_true.clamp(0.0, 1.0),
+            # macro
+            "topic_prob": F.softmax(out.topic_logits, dim=-1),
+            "directional_read": out.directional_read.clamp(-1.0, 1.0),
+            "severity_prob": F.softmax(out.severity_logits, dim=-1),
+            "novelty_macro_prob": F.softmax(out.novelty_macro_logits, dim=-1),
+            "claim_macro_prob": F.softmax(out.claim_macro_logits, dim=-1),
+            "hawkish_dovish_prob": F.softmax(out.hawkish_dovish_logits, dim=-1),
+        }
+    def gradient_checkpointing_enable(self, gradient_checkpointing_kwargs=None):
+        if hasattr(self.encoder, "gradient_checkpointing_enable"):
+            self.encoder.gradient_checkpointing_enable(
+                gradient_checkpointing_kwargs=gradient_checkpointing_kwargs)
+    def gradient_checkpointing_disable(self):
+        if hasattr(self.encoder, "gradient_checkpointing_disable"):
+            self.encoder.gradient_checkpointing_disable()

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,17 @@

+{
+  "backend": "tokenizers",
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "[CLS]",
+  "is_local": false,
+  "local_files_only": false,
+  "mask_token": "[MASK]",
+  "model_input_names": [
+    "input_ids",
+    "attention_mask"
+  ],
+  "model_max_length": 8192,
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "tokenizer_class": "TokenizersBackend",
+  "unk_token": "[UNK]"
+}