Text Classification
Transformers
Safetensors
English
shannon2
image-feature-extraction
finance
news
macro
financial-news
custom_code
Instructions to use BinomialTechnologies/binomial-shannon-2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use BinomialTechnologies/binomial-shannon-2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="BinomialTechnologies/binomial-shannon-2", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("BinomialTechnologies/binomial-shannon-2", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Publish binomial-shannon v2
Browse files- README.md +130 -0
- config.json +108 -0
- configuration_shannon2.py +72 -0
- model.safetensors +3 -0
- modeling_shannon2.py +192 -0
- tokenizer.json +0 -0
- tokenizer_config.json +17 -0
README.md
ADDED
|
@@ -0,0 +1,130 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language: en
|
| 4 |
+
library_name: transformers
|
| 5 |
+
tags:
|
| 6 |
+
- finance
|
| 7 |
+
- news
|
| 8 |
+
- macro
|
| 9 |
+
- financial-news
|
| 10 |
+
- text-classification
|
| 11 |
+
pipeline_tag: text-classification
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# binomial-shannon-2
|
| 15 |
+
|
| 16 |
+
A **financial news characterizer with two modes**: it reads *ticker-tagged* company news the way [binomial-shannon-1](https://huggingface.co/BinomialTechnologies/binomial-shannon-1) does (19 structured features), and reads *macro* news (central banks, inflation, rates, FX, commodities, geopolitics) with a dedicated 35-output macro head bank. A built-in router selects the right head set per article. ~15-30 ms on CPU.
|
| 17 |
+
|
| 18 |
+
## Quick start
|
| 19 |
+
|
| 20 |
+
```python
|
| 21 |
+
from transformers import AutoTokenizer, AutoModel
|
| 22 |
+
|
| 23 |
+
tok = AutoTokenizer.from_pretrained("BinomialTechnologies/binomial-shannon-2")
|
| 24 |
+
model = AutoModel.from_pretrained("BinomialTechnologies/binomial-shannon-2",
|
| 25 |
+
trust_remote_code=True)
|
| 26 |
+
|
| 27 |
+
inputs = tok("[FEED: reuters] [SITE: reuters.com] [DATE: 2026-03-18]\n\n"
|
| 28 |
+
"TITLE: Fed holds rates, signals two cuts later this year\n\nBODY: ...",
|
| 29 |
+
return_tensors="pt", truncation=True, max_length=1024)
|
| 30 |
+
out = model.predict(**inputs)
|
| 31 |
+
|
| 32 |
+
out["mode_prob"] # [P(ticker), P(macro)]
|
| 33 |
+
out["topic_prob"] # 18-way macro topic distribution
|
| 34 |
+
out["directional_read"] # signed macro read in [-1, +1]
|
| 35 |
+
out["hawkish_dovish_prob"] # 5-way, meaningful on monetary-policy / rates articles
|
| 36 |
+
```
|
| 37 |
+
|
| 38 |
+
## What it outputs
|
| 39 |
+
|
| 40 |
+
**Ticker mode (19 features, identical to shannon-1)** — event type (10 binary), tone, implied_direction, novelty, claim_type (4), specificity, materiality_if_true.
|
| 41 |
+
|
| 42 |
+
**Macro mode (35 features):**
|
| 43 |
+
|
| 44 |
+
| Head | Type | Meaning |
|
| 45 |
+
|---|---|---|
|
| 46 |
+
| `topic` | softmax (18) | monetary_policy / fiscal_policy / inflation / growth / labor / rates_fixed_income / equities_markets / fx_currency / energy / commodities / credit_banking / crypto / mergers_acquisitions / trade_policy / geopolitics / single_company / technicals / other |
|
| 47 |
+
| `directional_read` | [-1, +1] | net read for risk assets implied by the article |
|
| 48 |
+
| `severity` | softmax (5) | noise / minor / notable / major / crisis |
|
| 49 |
+
| `novelty` | softmax (3) | rehash / commentary / breaking |
|
| 50 |
+
| `claim_type` | softmax (4) | fact / opinion / rumor / forecast |
|
| 51 |
+
| `hawkish_dovish` | softmax (5) | dovish → hawkish; meaningful on monetary-policy / rates articles |
|
| 52 |
+
|
| 53 |
+
Every macro head is a softmax or a signed scalar — argmax for a label, the weighted score for a continuous summary, or the entropy for uncertainty.
|
| 54 |
+
|
| 55 |
+
## Eval
|
| 56 |
+
|
| 57 |
+
Held-out forward-temporal test set (Oct 2025 – May 2026, never seen during training). Numbers from a reproducible harness over all 15,805 macro test articles + a seeded 10,000 ticker sample.
|
| 58 |
+
|
| 59 |
+
### Ticker heads (parity with shannon-1)
|
| 60 |
+
|
| 61 |
+
| Event-flag macro F1 | implied_direction | tone | claim acc |
|
| 62 |
+
|---|---|---|---|
|
| 63 |
+
| 0.79 | 0.854 | 0.834 | 89.5% |
|
| 64 |
+
|
| 65 |
+
The ticker bank matches the standalone shannon-1 model — shannon-2 is a strict superset, adding macro without regressing ticker quality.
|
| 66 |
+
|
| 67 |
+
### Macro heads (n=15,805)
|
| 68 |
+
|
| 69 |
+
| Head | Metric | Value |
|
| 70 |
+
|---|---|---|
|
| 71 |
+
| topic (18-way) | accuracy | **0.814** |
|
| 72 |
+
| directional_read | Pearson vs panel | **+0.783** |
|
| 73 |
+
| severity (5-way) | accuracy | 0.708 |
|
| 74 |
+
| novelty (3-way) | accuracy | 0.648 |
|
| 75 |
+
| claim_type (4-way) | accuracy | 0.785 |
|
| 76 |
+
| hawkish_dovish (5-way) | accuracy | 0.616 (n=1,650) |
|
| 77 |
+
|
| 78 |
+
Per-topic F1 (selected):
|
| 79 |
+
|
| 80 |
+
| Topic | F1 | Support |
|
| 81 |
+
|---|---|---|
|
| 82 |
+
| commodities | 0.94 | 2,061 |
|
| 83 |
+
| equities_markets | 0.88 | 3,613 |
|
| 84 |
+
| fx_currency | 0.88 | 3,861 |
|
| 85 |
+
| monetary_policy | 0.79 | 1,662 |
|
| 86 |
+
| inflation | 0.70 | 526 |
|
| 87 |
+
| geopolitics | 0.48 | 346 |
|
| 88 |
+
| technicals | 0.25 | 517 |
|
| 89 |
+
|
| 90 |
+
Strongest on high-volume market topics (commodities, FX, equities, monetary policy); weakest on technicals and geopolitics, which are lower-support and more heterogeneous.
|
| 91 |
+
|
| 92 |
+
**Routing.** Ticker and macro articles arrive on structurally distinct feeds (per-company news vs. macro wires), so the router separates the two modes essentially perfectly — it is a convenience for serving mixed streams, not a hard classification result.
|
| 93 |
+
|
| 94 |
+
## Architecture
|
| 95 |
+
|
| 96 |
+
A specialized ~150M-parameter encoder shared across a 2-way router and two head banks (ticker + macro), each a 3-layer MLP over a CLS+masked-mean pooled representation.
|
| 97 |
+
|
| 98 |
+
- ~150M encoder params + lightweight head banks
|
| 99 |
+
- 4096-token context (1024 default at inference)
|
| 100 |
+
- bf16 GPU / fp32 CPU
|
| 101 |
+
- ~15-30 ms CPU
|
| 102 |
+
|
| 103 |
+
## How it was trained
|
| 104 |
+
|
| 105 |
+
- **Corpus**: ticker-tagged company news + press releases and a macro news corpus (2018-2026)
|
| 106 |
+
- **Labels**: distilled from a frontier reasoning model against per-mode rubrics (separate ticker and macro labeling specs)
|
| 107 |
+
- **Split**: forward temporal — train on ≤2025-09-30, test on 2025-10 → 2026-05
|
| 108 |
+
- **Compute**: trained from the base encoder on a single B200
|
| 109 |
+
|
| 110 |
+
## Caveats
|
| 111 |
+
|
| 112 |
+
- **Trained against frontier-LLM labels.** Eval correlations are partly imitation; treat the outputs as structured features, not ground truth.
|
| 113 |
+
- **Macro corpus is English-language wire news**, weighted toward 2024-2026.
|
| 114 |
+
- **`hawkish_dovish` only fires meaningfully on monetary-policy / rates articles** (it is loss-masked elsewhere during training).
|
| 115 |
+
- **Tier 2** — research preview. Don't use the outputs as standalone trading signals; combine with your own pipelines.
|
| 116 |
+
|
| 117 |
+
## License
|
| 118 |
+
|
| 119 |
+
Apache 2.0, like the rest of the Binomial AI Research model zoo.
|
| 120 |
+
|
| 121 |
+
## Citation
|
| 122 |
+
|
| 123 |
+
```bibtex
|
| 124 |
+
@misc{binomial-shannon-2-2026,
|
| 125 |
+
title = {binomial-shannon-2: A dual-mode financial news characterizer (ticker + macro)},
|
| 126 |
+
author = {Binomial AI Research},
|
| 127 |
+
year = {2026},
|
| 128 |
+
url = {https://huggingface.co/BinomialTechnologies/binomial-shannon-2}
|
| 129 |
+
}
|
| 130 |
+
```
|
config.json
ADDED
|
@@ -0,0 +1,108 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"model_type": "shannon2",
|
| 3 |
+
"architectures": [
|
| 4 |
+
"Shannon2MultiHead"
|
| 5 |
+
],
|
| 6 |
+
"auto_map": {
|
| 7 |
+
"AutoConfig": "configuration_shannon2.Shannon2Config",
|
| 8 |
+
"AutoModel": "modeling_shannon2.Shannon2MultiHead"
|
| 9 |
+
},
|
| 10 |
+
"encoder_name_or_path": "answerdotai/ModernBERT-base",
|
| 11 |
+
"encoder_config": {
|
| 12 |
+
"transformers_version": "5.6.2",
|
| 13 |
+
"architectures": [
|
| 14 |
+
"ModernBertForMaskedLM"
|
| 15 |
+
],
|
| 16 |
+
"output_hidden_states": false,
|
| 17 |
+
"return_dict": true,
|
| 18 |
+
"dtype": "float32",
|
| 19 |
+
"chunk_size_feed_forward": 0,
|
| 20 |
+
"is_encoder_decoder": false,
|
| 21 |
+
"id2label": {
|
| 22 |
+
"0": "LABEL_0",
|
| 23 |
+
"1": "LABEL_1"
|
| 24 |
+
},
|
| 25 |
+
"label2id": {
|
| 26 |
+
"LABEL_0": 0,
|
| 27 |
+
"LABEL_1": 1
|
| 28 |
+
},
|
| 29 |
+
"problem_type": null,
|
| 30 |
+
"vocab_size": 50368,
|
| 31 |
+
"hidden_size": 768,
|
| 32 |
+
"intermediate_size": 1152,
|
| 33 |
+
"num_hidden_layers": 22,
|
| 34 |
+
"num_attention_heads": 12,
|
| 35 |
+
"hidden_activation": "gelu",
|
| 36 |
+
"max_position_embeddings": 8192,
|
| 37 |
+
"initializer_range": 0.02,
|
| 38 |
+
"initializer_cutoff_factor": 2.0,
|
| 39 |
+
"norm_eps": 1e-05,
|
| 40 |
+
"norm_bias": false,
|
| 41 |
+
"pad_token_id": 50283,
|
| 42 |
+
"eos_token_id": 50282,
|
| 43 |
+
"bos_token_id": 50281,
|
| 44 |
+
"cls_token_id": 50281,
|
| 45 |
+
"sep_token_id": 50282,
|
| 46 |
+
"attention_bias": false,
|
| 47 |
+
"attention_dropout": 0.0,
|
| 48 |
+
"layer_types": [
|
| 49 |
+
"full_attention",
|
| 50 |
+
"sliding_attention",
|
| 51 |
+
"sliding_attention",
|
| 52 |
+
"full_attention",
|
| 53 |
+
"sliding_attention",
|
| 54 |
+
"sliding_attention",
|
| 55 |
+
"full_attention",
|
| 56 |
+
"sliding_attention",
|
| 57 |
+
"sliding_attention",
|
| 58 |
+
"full_attention",
|
| 59 |
+
"sliding_attention",
|
| 60 |
+
"sliding_attention",
|
| 61 |
+
"full_attention",
|
| 62 |
+
"sliding_attention",
|
| 63 |
+
"sliding_attention",
|
| 64 |
+
"full_attention",
|
| 65 |
+
"sliding_attention",
|
| 66 |
+
"sliding_attention",
|
| 67 |
+
"full_attention",
|
| 68 |
+
"sliding_attention",
|
| 69 |
+
"sliding_attention",
|
| 70 |
+
"full_attention"
|
| 71 |
+
],
|
| 72 |
+
"rope_parameters": {
|
| 73 |
+
"sliding_attention": {
|
| 74 |
+
"rope_type": "default",
|
| 75 |
+
"rope_theta": 10000.0
|
| 76 |
+
},
|
| 77 |
+
"full_attention": {
|
| 78 |
+
"rope_type": "default",
|
| 79 |
+
"rope_theta": 160000.0
|
| 80 |
+
}
|
| 81 |
+
},
|
| 82 |
+
"local_attention": 128,
|
| 83 |
+
"embedding_dropout": 0.0,
|
| 84 |
+
"mlp_bias": false,
|
| 85 |
+
"mlp_dropout": 0.0,
|
| 86 |
+
"decoder_bias": true,
|
| 87 |
+
"classifier_pooling": "mean",
|
| 88 |
+
"classifier_dropout": 0.0,
|
| 89 |
+
"classifier_bias": false,
|
| 90 |
+
"classifier_activation": "gelu",
|
| 91 |
+
"deterministic_flash_attn": false,
|
| 92 |
+
"sparse_prediction": false,
|
| 93 |
+
"sparse_pred_ignore_index": -100,
|
| 94 |
+
"tie_word_embeddings": true,
|
| 95 |
+
"_name_or_path": "answerdotai/ModernBERT-base",
|
| 96 |
+
"global_attn_every_n_layers": 3,
|
| 97 |
+
"gradient_checkpointing": false,
|
| 98 |
+
"layer_norm_eps": 1e-05,
|
| 99 |
+
"model_type": "modernbert",
|
| 100 |
+
"position_embedding_type": "absolute",
|
| 101 |
+
"output_attentions": false
|
| 102 |
+
},
|
| 103 |
+
"max_position_embeddings": 4096,
|
| 104 |
+
"head_h1": 512,
|
| 105 |
+
"head_h2": 256,
|
| 106 |
+
"dropout": 0.1,
|
| 107 |
+
"torch_dtype": "float32"
|
| 108 |
+
}
|
configuration_shannon2.py
ADDED
|
@@ -0,0 +1,72 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Configuration class for binomial-shannon-2 (router + ticker + macro).
|
| 2 |
+
|
| 3 |
+
Ships with the model on HuggingFace Hub so
|
| 4 |
+
`AutoConfig.from_pretrained(repo, trust_remote_code=True)` works.
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
from __future__ import annotations
|
| 8 |
+
|
| 9 |
+
from transformers.configuration_utils import PretrainedConfig
|
| 10 |
+
|
| 11 |
+
|
| 12 |
+
# Ticker mode (Shannon-1 schema)
|
| 13 |
+
EVENTS = (
|
| 14 |
+
"earnings", "guidance", "m_and_a", "regulatory_legal", "product",
|
| 15 |
+
"exec_change", "dividend_buyback", "analyst_rating", "macro_sector", "other",
|
| 16 |
+
)
|
| 17 |
+
CLAIM_TYPES = ("fact", "opinion", "rumor", "forecast")
|
| 18 |
+
|
| 19 |
+
# Macro mode
|
| 20 |
+
TOPICS = (
|
| 21 |
+
"monetary_policy", "fiscal_policy", "inflation", "growth", "labor",
|
| 22 |
+
"rates_fixed_income", "equities_markets", "fx_currency", "energy",
|
| 23 |
+
"commodities", "credit_banking", "crypto", "mergers_acquisitions",
|
| 24 |
+
"trade_policy", "geopolitics", "single_company", "technicals", "other",
|
| 25 |
+
)
|
| 26 |
+
SEVERITY_BUCKETS = ("noise", "minor", "notable", "major", "crisis")
|
| 27 |
+
NOVELTY_BUCKETS_MACRO = ("rehash", "commentary", "breaking")
|
| 28 |
+
CLAIM_TYPES_MACRO = ("fact", "opinion", "rumor", "forecast")
|
| 29 |
+
HAWKISH_DOVISH_BUCKETS = (
|
| 30 |
+
"dovish", "mildly_dovish", "neutral", "mildly_hawkish", "hawkish",
|
| 31 |
+
)
|
| 32 |
+
|
| 33 |
+
|
| 34 |
+
class Shannon2Config(PretrainedConfig):
|
| 35 |
+
"""Config for Shannon2MultiHead.
|
| 36 |
+
|
| 37 |
+
Shared encoder + 2-way router + ticker head bank (19 outputs, inherited
|
| 38 |
+
from shannon-1) + macro head bank (35 outputs). Mirrors shannon-1's hub
|
| 39 |
+
configuration approach.
|
| 40 |
+
"""
|
| 41 |
+
|
| 42 |
+
model_type = "shannon2"
|
| 43 |
+
|
| 44 |
+
def __init__(
|
| 45 |
+
self,
|
| 46 |
+
encoder_name_or_path: str = "answerdotai/ModernBERT-base",
|
| 47 |
+
max_position_embeddings: int = 4096,
|
| 48 |
+
head_h1: int = 512,
|
| 49 |
+
head_h2: int = 256,
|
| 50 |
+
dropout: float = 0.1,
|
| 51 |
+
events: tuple[str, ...] = EVENTS,
|
| 52 |
+
claim_types: tuple[str, ...] = CLAIM_TYPES,
|
| 53 |
+
topics: tuple[str, ...] = TOPICS,
|
| 54 |
+
severity_buckets: tuple[str, ...] = SEVERITY_BUCKETS,
|
| 55 |
+
novelty_buckets_macro: tuple[str, ...] = NOVELTY_BUCKETS_MACRO,
|
| 56 |
+
claim_types_macro: tuple[str, ...] = CLAIM_TYPES_MACRO,
|
| 57 |
+
hawkish_dovish_buckets: tuple[str, ...] = HAWKISH_DOVISH_BUCKETS,
|
| 58 |
+
**kwargs,
|
| 59 |
+
) -> None:
|
| 60 |
+
super().__init__(**kwargs)
|
| 61 |
+
self.encoder_name_or_path = encoder_name_or_path
|
| 62 |
+
self.max_position_embeddings = max_position_embeddings
|
| 63 |
+
self.head_h1 = head_h1
|
| 64 |
+
self.head_h2 = head_h2
|
| 65 |
+
self.dropout = dropout
|
| 66 |
+
self.events = list(events)
|
| 67 |
+
self.claim_types = list(claim_types)
|
| 68 |
+
self.topics = list(topics)
|
| 69 |
+
self.severity_buckets = list(severity_buckets)
|
| 70 |
+
self.novelty_buckets_macro = list(novelty_buckets_macro)
|
| 71 |
+
self.claim_types_macro = list(claim_types_macro)
|
| 72 |
+
self.hawkish_dovish_buckets = list(hawkish_dovish_buckets)
|
model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:fcebafc4b9629d3316ed0d08fd17fdc0fe4fa4055f28ab16d5cd8b25b0931534
|
| 3 |
+
size 647560940
|
modeling_shannon2.py
ADDED
|
@@ -0,0 +1,192 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Self-contained model class for binomial-shannon-2.
|
| 2 |
+
|
| 3 |
+
Distributed alongside the weights on HuggingFace Hub so anyone can do:
|
| 4 |
+
|
| 5 |
+
from transformers import AutoTokenizer, AutoModel
|
| 6 |
+
tok = AutoTokenizer.from_pretrained("BinomialTechnologies/binomial-shannon-2")
|
| 7 |
+
model = AutoModel.from_pretrained("BinomialTechnologies/binomial-shannon-2",
|
| 8 |
+
trust_remote_code=True)
|
| 9 |
+
|
| 10 |
+
Imports only from `transformers` + `torch` — no project-internal dependencies.
|
| 11 |
+
Module names match the training checkpoint so weights load verbatim.
|
| 12 |
+
|
| 13 |
+
Architecture:
|
| 14 |
+
shared encoder
|
| 15 |
+
↓ (CLS + masked mean pool concatenated)
|
| 16 |
+
↓ 2-way router (ticker | macro)
|
| 17 |
+
↓ ticker head bank (19 outputs) + macro head bank (35 outputs)
|
| 18 |
+
|
| 19 |
+
Router mode_prob over {ticker, macro}
|
| 20 |
+
Ticker bank event(10) + tone + implied_direction + novelty
|
| 21 |
+
+ claim(4) + specificity + materiality (19, = shannon-1)
|
| 22 |
+
Macro bank topic(18) + directional_read + severity(5)
|
| 23 |
+
+ novelty_macro(3) + claim_macro(4) + hawkish_dovish(5) (35)
|
| 24 |
+
"""
|
| 25 |
+
|
| 26 |
+
from __future__ import annotations
|
| 27 |
+
|
| 28 |
+
from dataclasses import dataclass
|
| 29 |
+
from typing import Optional
|
| 30 |
+
|
| 31 |
+
import torch
|
| 32 |
+
import torch.nn as nn
|
| 33 |
+
import torch.nn.functional as F
|
| 34 |
+
from transformers import AutoConfig
|
| 35 |
+
from transformers.modeling_utils import PreTrainedModel
|
| 36 |
+
from transformers.modeling_outputs import ModelOutput
|
| 37 |
+
|
| 38 |
+
from .configuration_shannon2 import (
|
| 39 |
+
Shannon2Config, EVENTS, CLAIM_TYPES, TOPICS, SEVERITY_BUCKETS,
|
| 40 |
+
NOVELTY_BUCKETS_MACRO, CLAIM_TYPES_MACRO, HAWKISH_DOVISH_BUCKETS,
|
| 41 |
+
)
|
| 42 |
+
|
| 43 |
+
N_EVENTS = len(EVENTS)
|
| 44 |
+
N_CLAIMS_TICKER = len(CLAIM_TYPES)
|
| 45 |
+
N_TOPICS = len(TOPICS)
|
| 46 |
+
N_SEVERITY = len(SEVERITY_BUCKETS)
|
| 47 |
+
N_NOVELTY_MACRO = len(NOVELTY_BUCKETS_MACRO)
|
| 48 |
+
N_CLAIMS_MACRO = len(CLAIM_TYPES_MACRO)
|
| 49 |
+
N_HD = len(HAWKISH_DOVISH_BUCKETS)
|
| 50 |
+
|
| 51 |
+
MODE_TICKER = 0
|
| 52 |
+
MODE_MACRO = 1
|
| 53 |
+
|
| 54 |
+
|
| 55 |
+
@dataclass
|
| 56 |
+
class Shannon2Output(ModelOutput):
|
| 57 |
+
mode_logits: Optional[torch.Tensor] = None
|
| 58 |
+
# ticker bank
|
| 59 |
+
event_logits: Optional[torch.Tensor] = None
|
| 60 |
+
tone: Optional[torch.Tensor] = None
|
| 61 |
+
implied_direction: Optional[torch.Tensor] = None
|
| 62 |
+
novelty: Optional[torch.Tensor] = None
|
| 63 |
+
claim_logits: Optional[torch.Tensor] = None
|
| 64 |
+
specificity: Optional[torch.Tensor] = None
|
| 65 |
+
materiality_if_true: Optional[torch.Tensor] = None
|
| 66 |
+
# macro bank
|
| 67 |
+
topic_logits: Optional[torch.Tensor] = None
|
| 68 |
+
directional_read: Optional[torch.Tensor] = None
|
| 69 |
+
severity_logits: Optional[torch.Tensor] = None
|
| 70 |
+
novelty_macro_logits: Optional[torch.Tensor] = None
|
| 71 |
+
claim_macro_logits: Optional[torch.Tensor] = None
|
| 72 |
+
hawkish_dovish_logits: Optional[torch.Tensor] = None
|
| 73 |
+
|
| 74 |
+
|
| 75 |
+
class Shannon2MultiHead(PreTrainedModel):
|
| 76 |
+
config_class = Shannon2Config
|
| 77 |
+
base_model_prefix = "shannon2"
|
| 78 |
+
_tied_weights_keys: list = []
|
| 79 |
+
all_tied_weights_keys: dict = {}
|
| 80 |
+
|
| 81 |
+
def __init__(self, config: Shannon2Config) -> None:
|
| 82 |
+
super().__init__(config)
|
| 83 |
+
self.config = config
|
| 84 |
+
|
| 85 |
+
# Rebuild the encoder from the bundled config so loading works offline.
|
| 86 |
+
if hasattr(config, "encoder_config") and config.encoder_config:
|
| 87 |
+
from transformers.models.auto.configuration_auto import CONFIG_MAPPING
|
| 88 |
+
mtype = config.encoder_config.get("model_type")
|
| 89 |
+
if mtype and mtype in CONFIG_MAPPING:
|
| 90 |
+
enc_cfg = CONFIG_MAPPING[mtype].from_dict(config.encoder_config)
|
| 91 |
+
else:
|
| 92 |
+
enc_cfg = AutoConfig.from_pretrained(config.encoder_name_or_path)
|
| 93 |
+
else:
|
| 94 |
+
enc_cfg = AutoConfig.from_pretrained(config.encoder_name_or_path)
|
| 95 |
+
|
| 96 |
+
if config.max_position_embeddings > getattr(enc_cfg, "max_position_embeddings", 8192):
|
| 97 |
+
enc_cfg.max_position_embeddings = config.max_position_embeddings
|
| 98 |
+
|
| 99 |
+
from transformers import AutoModel as _AutoModel
|
| 100 |
+
self.encoder = _AutoModel.from_config(enc_cfg, attn_implementation="sdpa")
|
| 101 |
+
|
| 102 |
+
H = enc_cfg.hidden_size
|
| 103 |
+
head_in = 2 * H
|
| 104 |
+
h1, h2 = config.head_h1, config.head_h2
|
| 105 |
+
d = config.dropout
|
| 106 |
+
self.dropout = nn.Dropout(d)
|
| 107 |
+
|
| 108 |
+
def _mlp(out_dim: int) -> nn.Sequential:
|
| 109 |
+
return nn.Sequential(
|
| 110 |
+
nn.Linear(head_in, h1), nn.GELU(), nn.Dropout(d),
|
| 111 |
+
nn.Linear(h1, h2), nn.GELU(), nn.Dropout(d),
|
| 112 |
+
nn.Linear(h2, out_dim),
|
| 113 |
+
)
|
| 114 |
+
|
| 115 |
+
# Router
|
| 116 |
+
self.head_router = _mlp(2)
|
| 117 |
+
# Ticker bank
|
| 118 |
+
self.head_event = _mlp(N_EVENTS)
|
| 119 |
+
self.head_tone = _mlp(1)
|
| 120 |
+
self.head_implied_direction = _mlp(1)
|
| 121 |
+
self.head_novelty = _mlp(1)
|
| 122 |
+
self.head_claim = _mlp(N_CLAIMS_TICKER)
|
| 123 |
+
self.head_specificity = _mlp(1)
|
| 124 |
+
self.head_materiality = _mlp(1)
|
| 125 |
+
# Macro bank
|
| 126 |
+
self.head_topic = _mlp(N_TOPICS)
|
| 127 |
+
self.head_directional_read = _mlp(1)
|
| 128 |
+
self.head_severity = _mlp(N_SEVERITY)
|
| 129 |
+
self.head_novelty_macro = _mlp(N_NOVELTY_MACRO)
|
| 130 |
+
self.head_claim_macro = _mlp(N_CLAIMS_MACRO)
|
| 131 |
+
self.head_hawkish_dovish = _mlp(N_HD)
|
| 132 |
+
|
| 133 |
+
def _pool(self, last_hidden: torch.Tensor, attention_mask: torch.Tensor) -> torch.Tensor:
|
| 134 |
+
cls = last_hidden[:, 0]
|
| 135 |
+
m = attention_mask.unsqueeze(-1).to(last_hidden.dtype)
|
| 136 |
+
mean_pool = (last_hidden * m).sum(1) / m.sum(1).clamp(min=1.0)
|
| 137 |
+
return self.dropout(torch.cat([cls, mean_pool], dim=-1))
|
| 138 |
+
|
| 139 |
+
def forward(self, input_ids: torch.Tensor, attention_mask: torch.Tensor,
|
| 140 |
+
**kwargs) -> Shannon2Output:
|
| 141 |
+
enc = self.encoder(input_ids=input_ids, attention_mask=attention_mask)
|
| 142 |
+
pooled = self._pool(enc.last_hidden_state, attention_mask)
|
| 143 |
+
return Shannon2Output(
|
| 144 |
+
mode_logits=self.head_router(pooled),
|
| 145 |
+
event_logits=self.head_event(pooled),
|
| 146 |
+
tone=self.head_tone(pooled).squeeze(-1),
|
| 147 |
+
implied_direction=self.head_implied_direction(pooled).squeeze(-1),
|
| 148 |
+
novelty=self.head_novelty(pooled).squeeze(-1),
|
| 149 |
+
claim_logits=self.head_claim(pooled),
|
| 150 |
+
specificity=self.head_specificity(pooled).squeeze(-1),
|
| 151 |
+
materiality_if_true=self.head_materiality(pooled).squeeze(-1),
|
| 152 |
+
topic_logits=self.head_topic(pooled),
|
| 153 |
+
directional_read=self.head_directional_read(pooled).squeeze(-1),
|
| 154 |
+
severity_logits=self.head_severity(pooled),
|
| 155 |
+
novelty_macro_logits=self.head_novelty_macro(pooled),
|
| 156 |
+
claim_macro_logits=self.head_claim_macro(pooled),
|
| 157 |
+
hawkish_dovish_logits=self.head_hawkish_dovish(pooled),
|
| 158 |
+
)
|
| 159 |
+
|
| 160 |
+
@torch.no_grad()
|
| 161 |
+
def predict(self, input_ids: torch.Tensor, attention_mask: torch.Tensor,
|
| 162 |
+
mention_threshold: float = 0.5) -> dict:
|
| 163 |
+
out = self.forward(input_ids=input_ids, attention_mask=attention_mask)
|
| 164 |
+
ev_prob = torch.sigmoid(out.event_logits)
|
| 165 |
+
return {
|
| 166 |
+
"mode_prob": F.softmax(out.mode_logits, dim=-1),
|
| 167 |
+
# ticker
|
| 168 |
+
"event_prob": ev_prob,
|
| 169 |
+
"event_mentioned": (ev_prob >= mention_threshold).float(),
|
| 170 |
+
"tone": out.tone.clamp(-1.0, 1.0),
|
| 171 |
+
"implied_direction": out.implied_direction.clamp(-1.0, 1.0),
|
| 172 |
+
"novelty": out.novelty.clamp(0.0, 1.0),
|
| 173 |
+
"claim_prob": F.softmax(out.claim_logits, dim=-1),
|
| 174 |
+
"specificity": out.specificity.clamp(0.0, 1.0),
|
| 175 |
+
"materiality_if_true": out.materiality_if_true.clamp(0.0, 1.0),
|
| 176 |
+
# macro
|
| 177 |
+
"topic_prob": F.softmax(out.topic_logits, dim=-1),
|
| 178 |
+
"directional_read": out.directional_read.clamp(-1.0, 1.0),
|
| 179 |
+
"severity_prob": F.softmax(out.severity_logits, dim=-1),
|
| 180 |
+
"novelty_macro_prob": F.softmax(out.novelty_macro_logits, dim=-1),
|
| 181 |
+
"claim_macro_prob": F.softmax(out.claim_macro_logits, dim=-1),
|
| 182 |
+
"hawkish_dovish_prob": F.softmax(out.hawkish_dovish_logits, dim=-1),
|
| 183 |
+
}
|
| 184 |
+
|
| 185 |
+
def gradient_checkpointing_enable(self, gradient_checkpointing_kwargs=None):
|
| 186 |
+
if hasattr(self.encoder, "gradient_checkpointing_enable"):
|
| 187 |
+
self.encoder.gradient_checkpointing_enable(
|
| 188 |
+
gradient_checkpointing_kwargs=gradient_checkpointing_kwargs)
|
| 189 |
+
|
| 190 |
+
def gradient_checkpointing_disable(self):
|
| 191 |
+
if hasattr(self.encoder, "gradient_checkpointing_disable"):
|
| 192 |
+
self.encoder.gradient_checkpointing_disable()
|
tokenizer.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
tokenizer_config.json
ADDED
|
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"backend": "tokenizers",
|
| 3 |
+
"clean_up_tokenization_spaces": true,
|
| 4 |
+
"cls_token": "[CLS]",
|
| 5 |
+
"is_local": false,
|
| 6 |
+
"local_files_only": false,
|
| 7 |
+
"mask_token": "[MASK]",
|
| 8 |
+
"model_input_names": [
|
| 9 |
+
"input_ids",
|
| 10 |
+
"attention_mask"
|
| 11 |
+
],
|
| 12 |
+
"model_max_length": 8192,
|
| 13 |
+
"pad_token": "[PAD]",
|
| 14 |
+
"sep_token": "[SEP]",
|
| 15 |
+
"tokenizer_class": "TokenizersBackend",
|
| 16 |
+
"unk_token": "[UNK]"
|
| 17 |
+
}
|