ilayibrahimzadeh commited on
Commit
c24924a
·
verified ·
1 Parent(s): 9d8d0ac

Publish binomial-shannon v2

Browse files
README.md ADDED
@@ -0,0 +1,130 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language: en
4
+ library_name: transformers
5
+ tags:
6
+ - finance
7
+ - news
8
+ - macro
9
+ - financial-news
10
+ - text-classification
11
+ pipeline_tag: text-classification
12
+ ---
13
+
14
+ # binomial-shannon-2
15
+
16
+ A **financial news characterizer with two modes**: it reads *ticker-tagged* company news the way [binomial-shannon-1](https://huggingface.co/BinomialTechnologies/binomial-shannon-1) does (19 structured features), and reads *macro* news (central banks, inflation, rates, FX, commodities, geopolitics) with a dedicated 35-output macro head bank. A built-in router selects the right head set per article. ~15-30 ms on CPU.
17
+
18
+ ## Quick start
19
+
20
+ ```python
21
+ from transformers import AutoTokenizer, AutoModel
22
+
23
+ tok = AutoTokenizer.from_pretrained("BinomialTechnologies/binomial-shannon-2")
24
+ model = AutoModel.from_pretrained("BinomialTechnologies/binomial-shannon-2",
25
+ trust_remote_code=True)
26
+
27
+ inputs = tok("[FEED: reuters] [SITE: reuters.com] [DATE: 2026-03-18]\n\n"
28
+ "TITLE: Fed holds rates, signals two cuts later this year\n\nBODY: ...",
29
+ return_tensors="pt", truncation=True, max_length=1024)
30
+ out = model.predict(**inputs)
31
+
32
+ out["mode_prob"] # [P(ticker), P(macro)]
33
+ out["topic_prob"] # 18-way macro topic distribution
34
+ out["directional_read"] # signed macro read in [-1, +1]
35
+ out["hawkish_dovish_prob"] # 5-way, meaningful on monetary-policy / rates articles
36
+ ```
37
+
38
+ ## What it outputs
39
+
40
+ **Ticker mode (19 features, identical to shannon-1)** — event type (10 binary), tone, implied_direction, novelty, claim_type (4), specificity, materiality_if_true.
41
+
42
+ **Macro mode (35 features):**
43
+
44
+ | Head | Type | Meaning |
45
+ |---|---|---|
46
+ | `topic` | softmax (18) | monetary_policy / fiscal_policy / inflation / growth / labor / rates_fixed_income / equities_markets / fx_currency / energy / commodities / credit_banking / crypto / mergers_acquisitions / trade_policy / geopolitics / single_company / technicals / other |
47
+ | `directional_read` | [-1, +1] | net read for risk assets implied by the article |
48
+ | `severity` | softmax (5) | noise / minor / notable / major / crisis |
49
+ | `novelty` | softmax (3) | rehash / commentary / breaking |
50
+ | `claim_type` | softmax (4) | fact / opinion / rumor / forecast |
51
+ | `hawkish_dovish` | softmax (5) | dovish → hawkish; meaningful on monetary-policy / rates articles |
52
+
53
+ Every macro head is a softmax or a signed scalar — argmax for a label, the weighted score for a continuous summary, or the entropy for uncertainty.
54
+
55
+ ## Eval
56
+
57
+ Held-out forward-temporal test set (Oct 2025 – May 2026, never seen during training). Numbers from a reproducible harness over all 15,805 macro test articles + a seeded 10,000 ticker sample.
58
+
59
+ ### Ticker heads (parity with shannon-1)
60
+
61
+ | Event-flag macro F1 | implied_direction | tone | claim acc |
62
+ |---|---|---|---|
63
+ | 0.79 | 0.854 | 0.834 | 89.5% |
64
+
65
+ The ticker bank matches the standalone shannon-1 model — shannon-2 is a strict superset, adding macro without regressing ticker quality.
66
+
67
+ ### Macro heads (n=15,805)
68
+
69
+ | Head | Metric | Value |
70
+ |---|---|---|
71
+ | topic (18-way) | accuracy | **0.814** |
72
+ | directional_read | Pearson vs panel | **+0.783** |
73
+ | severity (5-way) | accuracy | 0.708 |
74
+ | novelty (3-way) | accuracy | 0.648 |
75
+ | claim_type (4-way) | accuracy | 0.785 |
76
+ | hawkish_dovish (5-way) | accuracy | 0.616 (n=1,650) |
77
+
78
+ Per-topic F1 (selected):
79
+
80
+ | Topic | F1 | Support |
81
+ |---|---|---|
82
+ | commodities | 0.94 | 2,061 |
83
+ | equities_markets | 0.88 | 3,613 |
84
+ | fx_currency | 0.88 | 3,861 |
85
+ | monetary_policy | 0.79 | 1,662 |
86
+ | inflation | 0.70 | 526 |
87
+ | geopolitics | 0.48 | 346 |
88
+ | technicals | 0.25 | 517 |
89
+
90
+ Strongest on high-volume market topics (commodities, FX, equities, monetary policy); weakest on technicals and geopolitics, which are lower-support and more heterogeneous.
91
+
92
+ **Routing.** Ticker and macro articles arrive on structurally distinct feeds (per-company news vs. macro wires), so the router separates the two modes essentially perfectly — it is a convenience for serving mixed streams, not a hard classification result.
93
+
94
+ ## Architecture
95
+
96
+ A specialized ~150M-parameter encoder shared across a 2-way router and two head banks (ticker + macro), each a 3-layer MLP over a CLS+masked-mean pooled representation.
97
+
98
+ - ~150M encoder params + lightweight head banks
99
+ - 4096-token context (1024 default at inference)
100
+ - bf16 GPU / fp32 CPU
101
+ - ~15-30 ms CPU
102
+
103
+ ## How it was trained
104
+
105
+ - **Corpus**: ticker-tagged company news + press releases and a macro news corpus (2018-2026)
106
+ - **Labels**: distilled from a frontier reasoning model against per-mode rubrics (separate ticker and macro labeling specs)
107
+ - **Split**: forward temporal — train on ≤2025-09-30, test on 2025-10 → 2026-05
108
+ - **Compute**: trained from the base encoder on a single B200
109
+
110
+ ## Caveats
111
+
112
+ - **Trained against frontier-LLM labels.** Eval correlations are partly imitation; treat the outputs as structured features, not ground truth.
113
+ - **Macro corpus is English-language wire news**, weighted toward 2024-2026.
114
+ - **`hawkish_dovish` only fires meaningfully on monetary-policy / rates articles** (it is loss-masked elsewhere during training).
115
+ - **Tier 2** — research preview. Don't use the outputs as standalone trading signals; combine with your own pipelines.
116
+
117
+ ## License
118
+
119
+ Apache 2.0, like the rest of the Binomial AI Research model zoo.
120
+
121
+ ## Citation
122
+
123
+ ```bibtex
124
+ @misc{binomial-shannon-2-2026,
125
+ title = {binomial-shannon-2: A dual-mode financial news characterizer (ticker + macro)},
126
+ author = {Binomial AI Research},
127
+ year = {2026},
128
+ url = {https://huggingface.co/BinomialTechnologies/binomial-shannon-2}
129
+ }
130
+ ```
config.json ADDED
@@ -0,0 +1,108 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "shannon2",
3
+ "architectures": [
4
+ "Shannon2MultiHead"
5
+ ],
6
+ "auto_map": {
7
+ "AutoConfig": "configuration_shannon2.Shannon2Config",
8
+ "AutoModel": "modeling_shannon2.Shannon2MultiHead"
9
+ },
10
+ "encoder_name_or_path": "answerdotai/ModernBERT-base",
11
+ "encoder_config": {
12
+ "transformers_version": "5.6.2",
13
+ "architectures": [
14
+ "ModernBertForMaskedLM"
15
+ ],
16
+ "output_hidden_states": false,
17
+ "return_dict": true,
18
+ "dtype": "float32",
19
+ "chunk_size_feed_forward": 0,
20
+ "is_encoder_decoder": false,
21
+ "id2label": {
22
+ "0": "LABEL_0",
23
+ "1": "LABEL_1"
24
+ },
25
+ "label2id": {
26
+ "LABEL_0": 0,
27
+ "LABEL_1": 1
28
+ },
29
+ "problem_type": null,
30
+ "vocab_size": 50368,
31
+ "hidden_size": 768,
32
+ "intermediate_size": 1152,
33
+ "num_hidden_layers": 22,
34
+ "num_attention_heads": 12,
35
+ "hidden_activation": "gelu",
36
+ "max_position_embeddings": 8192,
37
+ "initializer_range": 0.02,
38
+ "initializer_cutoff_factor": 2.0,
39
+ "norm_eps": 1e-05,
40
+ "norm_bias": false,
41
+ "pad_token_id": 50283,
42
+ "eos_token_id": 50282,
43
+ "bos_token_id": 50281,
44
+ "cls_token_id": 50281,
45
+ "sep_token_id": 50282,
46
+ "attention_bias": false,
47
+ "attention_dropout": 0.0,
48
+ "layer_types": [
49
+ "full_attention",
50
+ "sliding_attention",
51
+ "sliding_attention",
52
+ "full_attention",
53
+ "sliding_attention",
54
+ "sliding_attention",
55
+ "full_attention",
56
+ "sliding_attention",
57
+ "sliding_attention",
58
+ "full_attention",
59
+ "sliding_attention",
60
+ "sliding_attention",
61
+ "full_attention",
62
+ "sliding_attention",
63
+ "sliding_attention",
64
+ "full_attention",
65
+ "sliding_attention",
66
+ "sliding_attention",
67
+ "full_attention",
68
+ "sliding_attention",
69
+ "sliding_attention",
70
+ "full_attention"
71
+ ],
72
+ "rope_parameters": {
73
+ "sliding_attention": {
74
+ "rope_type": "default",
75
+ "rope_theta": 10000.0
76
+ },
77
+ "full_attention": {
78
+ "rope_type": "default",
79
+ "rope_theta": 160000.0
80
+ }
81
+ },
82
+ "local_attention": 128,
83
+ "embedding_dropout": 0.0,
84
+ "mlp_bias": false,
85
+ "mlp_dropout": 0.0,
86
+ "decoder_bias": true,
87
+ "classifier_pooling": "mean",
88
+ "classifier_dropout": 0.0,
89
+ "classifier_bias": false,
90
+ "classifier_activation": "gelu",
91
+ "deterministic_flash_attn": false,
92
+ "sparse_prediction": false,
93
+ "sparse_pred_ignore_index": -100,
94
+ "tie_word_embeddings": true,
95
+ "_name_or_path": "answerdotai/ModernBERT-base",
96
+ "global_attn_every_n_layers": 3,
97
+ "gradient_checkpointing": false,
98
+ "layer_norm_eps": 1e-05,
99
+ "model_type": "modernbert",
100
+ "position_embedding_type": "absolute",
101
+ "output_attentions": false
102
+ },
103
+ "max_position_embeddings": 4096,
104
+ "head_h1": 512,
105
+ "head_h2": 256,
106
+ "dropout": 0.1,
107
+ "torch_dtype": "float32"
108
+ }
configuration_shannon2.py ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Configuration class for binomial-shannon-2 (router + ticker + macro).
2
+
3
+ Ships with the model on HuggingFace Hub so
4
+ `AutoConfig.from_pretrained(repo, trust_remote_code=True)` works.
5
+ """
6
+
7
+ from __future__ import annotations
8
+
9
+ from transformers.configuration_utils import PretrainedConfig
10
+
11
+
12
+ # Ticker mode (Shannon-1 schema)
13
+ EVENTS = (
14
+ "earnings", "guidance", "m_and_a", "regulatory_legal", "product",
15
+ "exec_change", "dividend_buyback", "analyst_rating", "macro_sector", "other",
16
+ )
17
+ CLAIM_TYPES = ("fact", "opinion", "rumor", "forecast")
18
+
19
+ # Macro mode
20
+ TOPICS = (
21
+ "monetary_policy", "fiscal_policy", "inflation", "growth", "labor",
22
+ "rates_fixed_income", "equities_markets", "fx_currency", "energy",
23
+ "commodities", "credit_banking", "crypto", "mergers_acquisitions",
24
+ "trade_policy", "geopolitics", "single_company", "technicals", "other",
25
+ )
26
+ SEVERITY_BUCKETS = ("noise", "minor", "notable", "major", "crisis")
27
+ NOVELTY_BUCKETS_MACRO = ("rehash", "commentary", "breaking")
28
+ CLAIM_TYPES_MACRO = ("fact", "opinion", "rumor", "forecast")
29
+ HAWKISH_DOVISH_BUCKETS = (
30
+ "dovish", "mildly_dovish", "neutral", "mildly_hawkish", "hawkish",
31
+ )
32
+
33
+
34
+ class Shannon2Config(PretrainedConfig):
35
+ """Config for Shannon2MultiHead.
36
+
37
+ Shared encoder + 2-way router + ticker head bank (19 outputs, inherited
38
+ from shannon-1) + macro head bank (35 outputs). Mirrors shannon-1's hub
39
+ configuration approach.
40
+ """
41
+
42
+ model_type = "shannon2"
43
+
44
+ def __init__(
45
+ self,
46
+ encoder_name_or_path: str = "answerdotai/ModernBERT-base",
47
+ max_position_embeddings: int = 4096,
48
+ head_h1: int = 512,
49
+ head_h2: int = 256,
50
+ dropout: float = 0.1,
51
+ events: tuple[str, ...] = EVENTS,
52
+ claim_types: tuple[str, ...] = CLAIM_TYPES,
53
+ topics: tuple[str, ...] = TOPICS,
54
+ severity_buckets: tuple[str, ...] = SEVERITY_BUCKETS,
55
+ novelty_buckets_macro: tuple[str, ...] = NOVELTY_BUCKETS_MACRO,
56
+ claim_types_macro: tuple[str, ...] = CLAIM_TYPES_MACRO,
57
+ hawkish_dovish_buckets: tuple[str, ...] = HAWKISH_DOVISH_BUCKETS,
58
+ **kwargs,
59
+ ) -> None:
60
+ super().__init__(**kwargs)
61
+ self.encoder_name_or_path = encoder_name_or_path
62
+ self.max_position_embeddings = max_position_embeddings
63
+ self.head_h1 = head_h1
64
+ self.head_h2 = head_h2
65
+ self.dropout = dropout
66
+ self.events = list(events)
67
+ self.claim_types = list(claim_types)
68
+ self.topics = list(topics)
69
+ self.severity_buckets = list(severity_buckets)
70
+ self.novelty_buckets_macro = list(novelty_buckets_macro)
71
+ self.claim_types_macro = list(claim_types_macro)
72
+ self.hawkish_dovish_buckets = list(hawkish_dovish_buckets)
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fcebafc4b9629d3316ed0d08fd17fdc0fe4fa4055f28ab16d5cd8b25b0931534
3
+ size 647560940
modeling_shannon2.py ADDED
@@ -0,0 +1,192 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Self-contained model class for binomial-shannon-2.
2
+
3
+ Distributed alongside the weights on HuggingFace Hub so anyone can do:
4
+
5
+ from transformers import AutoTokenizer, AutoModel
6
+ tok = AutoTokenizer.from_pretrained("BinomialTechnologies/binomial-shannon-2")
7
+ model = AutoModel.from_pretrained("BinomialTechnologies/binomial-shannon-2",
8
+ trust_remote_code=True)
9
+
10
+ Imports only from `transformers` + `torch` — no project-internal dependencies.
11
+ Module names match the training checkpoint so weights load verbatim.
12
+
13
+ Architecture:
14
+ shared encoder
15
+ ↓ (CLS + masked mean pool concatenated)
16
+ ↓ 2-way router (ticker | macro)
17
+ ↓ ticker head bank (19 outputs) + macro head bank (35 outputs)
18
+
19
+ Router mode_prob over {ticker, macro}
20
+ Ticker bank event(10) + tone + implied_direction + novelty
21
+ + claim(4) + specificity + materiality (19, = shannon-1)
22
+ Macro bank topic(18) + directional_read + severity(5)
23
+ + novelty_macro(3) + claim_macro(4) + hawkish_dovish(5) (35)
24
+ """
25
+
26
+ from __future__ import annotations
27
+
28
+ from dataclasses import dataclass
29
+ from typing import Optional
30
+
31
+ import torch
32
+ import torch.nn as nn
33
+ import torch.nn.functional as F
34
+ from transformers import AutoConfig
35
+ from transformers.modeling_utils import PreTrainedModel
36
+ from transformers.modeling_outputs import ModelOutput
37
+
38
+ from .configuration_shannon2 import (
39
+ Shannon2Config, EVENTS, CLAIM_TYPES, TOPICS, SEVERITY_BUCKETS,
40
+ NOVELTY_BUCKETS_MACRO, CLAIM_TYPES_MACRO, HAWKISH_DOVISH_BUCKETS,
41
+ )
42
+
43
+ N_EVENTS = len(EVENTS)
44
+ N_CLAIMS_TICKER = len(CLAIM_TYPES)
45
+ N_TOPICS = len(TOPICS)
46
+ N_SEVERITY = len(SEVERITY_BUCKETS)
47
+ N_NOVELTY_MACRO = len(NOVELTY_BUCKETS_MACRO)
48
+ N_CLAIMS_MACRO = len(CLAIM_TYPES_MACRO)
49
+ N_HD = len(HAWKISH_DOVISH_BUCKETS)
50
+
51
+ MODE_TICKER = 0
52
+ MODE_MACRO = 1
53
+
54
+
55
+ @dataclass
56
+ class Shannon2Output(ModelOutput):
57
+ mode_logits: Optional[torch.Tensor] = None
58
+ # ticker bank
59
+ event_logits: Optional[torch.Tensor] = None
60
+ tone: Optional[torch.Tensor] = None
61
+ implied_direction: Optional[torch.Tensor] = None
62
+ novelty: Optional[torch.Tensor] = None
63
+ claim_logits: Optional[torch.Tensor] = None
64
+ specificity: Optional[torch.Tensor] = None
65
+ materiality_if_true: Optional[torch.Tensor] = None
66
+ # macro bank
67
+ topic_logits: Optional[torch.Tensor] = None
68
+ directional_read: Optional[torch.Tensor] = None
69
+ severity_logits: Optional[torch.Tensor] = None
70
+ novelty_macro_logits: Optional[torch.Tensor] = None
71
+ claim_macro_logits: Optional[torch.Tensor] = None
72
+ hawkish_dovish_logits: Optional[torch.Tensor] = None
73
+
74
+
75
+ class Shannon2MultiHead(PreTrainedModel):
76
+ config_class = Shannon2Config
77
+ base_model_prefix = "shannon2"
78
+ _tied_weights_keys: list = []
79
+ all_tied_weights_keys: dict = {}
80
+
81
+ def __init__(self, config: Shannon2Config) -> None:
82
+ super().__init__(config)
83
+ self.config = config
84
+
85
+ # Rebuild the encoder from the bundled config so loading works offline.
86
+ if hasattr(config, "encoder_config") and config.encoder_config:
87
+ from transformers.models.auto.configuration_auto import CONFIG_MAPPING
88
+ mtype = config.encoder_config.get("model_type")
89
+ if mtype and mtype in CONFIG_MAPPING:
90
+ enc_cfg = CONFIG_MAPPING[mtype].from_dict(config.encoder_config)
91
+ else:
92
+ enc_cfg = AutoConfig.from_pretrained(config.encoder_name_or_path)
93
+ else:
94
+ enc_cfg = AutoConfig.from_pretrained(config.encoder_name_or_path)
95
+
96
+ if config.max_position_embeddings > getattr(enc_cfg, "max_position_embeddings", 8192):
97
+ enc_cfg.max_position_embeddings = config.max_position_embeddings
98
+
99
+ from transformers import AutoModel as _AutoModel
100
+ self.encoder = _AutoModel.from_config(enc_cfg, attn_implementation="sdpa")
101
+
102
+ H = enc_cfg.hidden_size
103
+ head_in = 2 * H
104
+ h1, h2 = config.head_h1, config.head_h2
105
+ d = config.dropout
106
+ self.dropout = nn.Dropout(d)
107
+
108
+ def _mlp(out_dim: int) -> nn.Sequential:
109
+ return nn.Sequential(
110
+ nn.Linear(head_in, h1), nn.GELU(), nn.Dropout(d),
111
+ nn.Linear(h1, h2), nn.GELU(), nn.Dropout(d),
112
+ nn.Linear(h2, out_dim),
113
+ )
114
+
115
+ # Router
116
+ self.head_router = _mlp(2)
117
+ # Ticker bank
118
+ self.head_event = _mlp(N_EVENTS)
119
+ self.head_tone = _mlp(1)
120
+ self.head_implied_direction = _mlp(1)
121
+ self.head_novelty = _mlp(1)
122
+ self.head_claim = _mlp(N_CLAIMS_TICKER)
123
+ self.head_specificity = _mlp(1)
124
+ self.head_materiality = _mlp(1)
125
+ # Macro bank
126
+ self.head_topic = _mlp(N_TOPICS)
127
+ self.head_directional_read = _mlp(1)
128
+ self.head_severity = _mlp(N_SEVERITY)
129
+ self.head_novelty_macro = _mlp(N_NOVELTY_MACRO)
130
+ self.head_claim_macro = _mlp(N_CLAIMS_MACRO)
131
+ self.head_hawkish_dovish = _mlp(N_HD)
132
+
133
+ def _pool(self, last_hidden: torch.Tensor, attention_mask: torch.Tensor) -> torch.Tensor:
134
+ cls = last_hidden[:, 0]
135
+ m = attention_mask.unsqueeze(-1).to(last_hidden.dtype)
136
+ mean_pool = (last_hidden * m).sum(1) / m.sum(1).clamp(min=1.0)
137
+ return self.dropout(torch.cat([cls, mean_pool], dim=-1))
138
+
139
+ def forward(self, input_ids: torch.Tensor, attention_mask: torch.Tensor,
140
+ **kwargs) -> Shannon2Output:
141
+ enc = self.encoder(input_ids=input_ids, attention_mask=attention_mask)
142
+ pooled = self._pool(enc.last_hidden_state, attention_mask)
143
+ return Shannon2Output(
144
+ mode_logits=self.head_router(pooled),
145
+ event_logits=self.head_event(pooled),
146
+ tone=self.head_tone(pooled).squeeze(-1),
147
+ implied_direction=self.head_implied_direction(pooled).squeeze(-1),
148
+ novelty=self.head_novelty(pooled).squeeze(-1),
149
+ claim_logits=self.head_claim(pooled),
150
+ specificity=self.head_specificity(pooled).squeeze(-1),
151
+ materiality_if_true=self.head_materiality(pooled).squeeze(-1),
152
+ topic_logits=self.head_topic(pooled),
153
+ directional_read=self.head_directional_read(pooled).squeeze(-1),
154
+ severity_logits=self.head_severity(pooled),
155
+ novelty_macro_logits=self.head_novelty_macro(pooled),
156
+ claim_macro_logits=self.head_claim_macro(pooled),
157
+ hawkish_dovish_logits=self.head_hawkish_dovish(pooled),
158
+ )
159
+
160
+ @torch.no_grad()
161
+ def predict(self, input_ids: torch.Tensor, attention_mask: torch.Tensor,
162
+ mention_threshold: float = 0.5) -> dict:
163
+ out = self.forward(input_ids=input_ids, attention_mask=attention_mask)
164
+ ev_prob = torch.sigmoid(out.event_logits)
165
+ return {
166
+ "mode_prob": F.softmax(out.mode_logits, dim=-1),
167
+ # ticker
168
+ "event_prob": ev_prob,
169
+ "event_mentioned": (ev_prob >= mention_threshold).float(),
170
+ "tone": out.tone.clamp(-1.0, 1.0),
171
+ "implied_direction": out.implied_direction.clamp(-1.0, 1.0),
172
+ "novelty": out.novelty.clamp(0.0, 1.0),
173
+ "claim_prob": F.softmax(out.claim_logits, dim=-1),
174
+ "specificity": out.specificity.clamp(0.0, 1.0),
175
+ "materiality_if_true": out.materiality_if_true.clamp(0.0, 1.0),
176
+ # macro
177
+ "topic_prob": F.softmax(out.topic_logits, dim=-1),
178
+ "directional_read": out.directional_read.clamp(-1.0, 1.0),
179
+ "severity_prob": F.softmax(out.severity_logits, dim=-1),
180
+ "novelty_macro_prob": F.softmax(out.novelty_macro_logits, dim=-1),
181
+ "claim_macro_prob": F.softmax(out.claim_macro_logits, dim=-1),
182
+ "hawkish_dovish_prob": F.softmax(out.hawkish_dovish_logits, dim=-1),
183
+ }
184
+
185
+ def gradient_checkpointing_enable(self, gradient_checkpointing_kwargs=None):
186
+ if hasattr(self.encoder, "gradient_checkpointing_enable"):
187
+ self.encoder.gradient_checkpointing_enable(
188
+ gradient_checkpointing_kwargs=gradient_checkpointing_kwargs)
189
+
190
+ def gradient_checkpointing_disable(self):
191
+ if hasattr(self.encoder, "gradient_checkpointing_disable"):
192
+ self.encoder.gradient_checkpointing_disable()
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "backend": "tokenizers",
3
+ "clean_up_tokenization_spaces": true,
4
+ "cls_token": "[CLS]",
5
+ "is_local": false,
6
+ "local_files_only": false,
7
+ "mask_token": "[MASK]",
8
+ "model_input_names": [
9
+ "input_ids",
10
+ "attention_mask"
11
+ ],
12
+ "model_max_length": 8192,
13
+ "pad_token": "[PAD]",
14
+ "sep_token": "[SEP]",
15
+ "tokenizer_class": "TokenizersBackend",
16
+ "unk_token": "[UNK]"
17
+ }