yezdata commited on
Commit
a10898b
·
verified ·
1 Parent(s): 56c0146

UPDATE EmCoder TO V2

Browse files
.gitattributes CHANGED
@@ -4,3 +4,6 @@ outputs/epistemic_unc_scatter.png filter=lfs diff=lfs merge=lfs -text
4
  outputs/aleatoric_unc_scatter.png filter=lfs diff=lfs merge=lfs -text
5
  outputs/ridge_aleatoric.png filter=lfs diff=lfs merge=lfs -text
6
  outputs/ridge_epistemic.png filter=lfs diff=lfs merge=lfs -text
 
 
 
 
4
  outputs/aleatoric_unc_scatter.png filter=lfs diff=lfs merge=lfs -text
5
  outputs/ridge_aleatoric.png filter=lfs diff=lfs merge=lfs -text
6
  outputs/ridge_epistemic.png filter=lfs diff=lfs merge=lfs -text
7
+ outputs/admiration_scatters.png filter=lfs diff=lfs merge=lfs -text
8
+ outputs/fear_scatters.png filter=lfs diff=lfs merge=lfs -text
9
+ outputs/neutral_scatters.png filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,237 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: cc-by-4.0
5
+ library_name: transformers
6
+ pipeline_tag: text-classification
7
+ tags:
8
+ - emotion-recognition
9
+ - bayesian-deep-learning
10
+ - mc-dropout
11
+ - uncertainty-quantification
12
+ - multi-label-classification
13
+ datasets:
14
+ - Skylion007/openwebtext
15
+ - google-research-datasets/go_emotions
16
+ metrics:
17
+ - precision
18
+ - recall
19
+ - f1
20
+ model-index:
21
+ - name: EmCoder
22
+ results:
23
+ - task:
24
+ type: text-classification
25
+ name: Multi-label Emotion Classification
26
+ dataset:
27
+ name: GoEmotions
28
+ type: go_emotions
29
+ split: test
30
+ metrics:
31
+ - name: Macro F1
32
+ type: f1
33
+ value: 0.488
34
+ - name: Macro Precision
35
+ type: precision
36
+ value: 0.503
37
+ - name: Macro Recall
38
+ type: recall
39
+ value: 0.503
40
+ ---
41
+
42
+ # EmCoder
43
+ <blockquote>
44
+ <b>Probabilistic Emotion Recognition & Uncertainty Quantification</b><br>
45
+ <b>28 Emotion multi-label Transformer classifier</b>
46
+ </blockquote>
47
+
48
+
49
+ Unlike standard classifiers, EmCoder quantifies what it doesn't know using Monte Carlo Dropout, making it suitable for high-stakes AI pipelines.<br>
50
+ EmCoder is optimized for **MC Dropout inference**.
51
+
52
+
53
+
54
+ ## SOTA benchmark
55
+ ### Evaluation on the GoEmotions test split (macro avg metrics)
56
+ <!-- TODO: UPDATE % SIZE-->
57
+ EmCoder achieves highly competitive Macro F1-score with its compact size (~35% smaller than RoBERTa-base and ~45% smaller than ModernBERT), while providing per-class epistemic uncertainty quantification.
58
+ <!-- TODO: UPDATE PARAM COUNT -->
59
+ | Model | Precision | Recall | F1-Score | Params |
60
+ | :--- | :--- | :--- | :--- | :--- |
61
+ | **EmCoder** | **0.503** | **0.503** | **0.488** | **82.1M** |
62
+ | Google BERT (Original) | 0.400 | 0.630 | 0.460 | 110M |
63
+ | RoBERTa-base | 0.575 | 0.396 | 0.450 | 125M |
64
+ | ModernBERT-base | 0.583 | 0.535 | 0.550 | 149M |
65
+
66
+
67
+ ## How to use
68
+ ### 1. Setup & Tokenization
69
+ EmCoder uses the `ModernBERT` tokenizer for correct token-to-embedding mapping.
70
+ Ensure you allow remote code execution since it's a custom architecture.
71
+ ```python
72
+ import torch
73
+ from transformers import AutoModel, AutoTokenizer
74
+
75
+ repo_id = "yezdata/EmCoder"
76
+
77
+ # Load the same tokenizer used during training
78
+ tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
79
+
80
+ # Initialize with same config as training
81
+ model = AutoModel.from_pretrained(repo_id, trust_remote_code=True)
82
+ ```
83
+ ### 2. Bayesian inference
84
+ To obtain probabilistic outputs and uncertainty metrics, use the `mc_forward` method:
85
+ ```python
86
+ # Perform 50 stochastic passes
87
+ N_SAMPLES = 50
88
+ MAX_BATCH_SIZE = 10 # optional sub-batching of N_SAMPLES
89
+
90
+ inputs = tokenizer("I am so happy you are here!", return_tensors="pt")
91
+
92
+ model.eval()
93
+ with torch.no_grad():
94
+ # Automatically keeps Dropout active, even when in model.eval
95
+ mc_logits = model.mc_forward(
96
+ **inputs,
97
+ n_samples=N_SAMPLES,
98
+ max_batch_size=MAX_BATCH_SIZE
99
+ )
100
+
101
+ # Bayesian Post-processing
102
+ all_probs = torch.sigmoid(mc_logits) # (n_samples, B, 28)
103
+
104
+ mean_probs = all_probs.mean(dim=0) # Mean Predicted Probability
105
+ # base std estimation of Epistemic Uncertainty
106
+ uncertainty = all_probs.std(dim=0)
107
+
108
+
109
+ # Formatted Output
110
+ m_probs = mean_probs.squeeze(0)
111
+ u_vals = uncertainty.squeeze(0)
112
+
113
+ print(f"{'Emotion':<15} | {'Prob':<10} | {'Uncertainty':<10}")
114
+ print("-" * 40)
115
+
116
+ sorted_indices = torch.argsort(m_probs, descending=True)
117
+
118
+ for idx in sorted_indices:
119
+ prob, unc = m_probs[idx].item(), u_vals[idx].item()
120
+ label = model.config.id2label[idx.item()]
121
+
122
+ if prob > 0.05: # Print only emotions with prob > 5%
123
+ print(f"{label:<15} | {prob:>8.2%} | ±{unc:>8.4f}")
124
+ ```
125
+
126
+
127
+ ## Model Architecture
128
+ ![EmCoder Architecture](outputs/architecture.png)
129
+
130
+
131
+ ### Optimization
132
+ The model is trained using a **Weighted Binary Cross Entropy loss**
133
+ Where weights **w** are calculated using a logarithmic class-balancing scale to handle extreme label imbalance:
134
+
135
+ $$
136
+ w_{c} = \max\left( 0.1, \min\left( 20, 1 + \ln \left( \frac{N_{neg,c} + \epsilon}{N_{pos,c} + \epsilon} \right) \right) \right)
137
+ $$
138
+
139
+
140
+
141
+ ## Performance on test set
142
+ **Using `thresholds.json` optimization of probabilty thresholds for binarizing predictions (from val set)**
143
+ | | precision | recall | f1-score | support |
144
+ |:---------------|----------:|---------:|---------:|----------:|
145
+ | micro avg | 0.524 | 0.635 | 0.574 | 6329 |
146
+ | **macro avg** | **0.503** |**0.503** |**0.488** | 6329 |
147
+ | weighted avg | 0.537 | 0.635 | 0.573 | 6329 |
148
+ | samples avg | 0.562 | 0.661 | 0.584 | 6329 |
149
+ |----------------|-----------|----------|----------|-----------|
150
+ | admiration | 0.642 | 0.681 | 0.661 | 504 |
151
+ | amusement | 0.731 | 0.898 | 0.806 | 264 |
152
+ | anger | 0.491 | 0.434 | 0.461 | 198 |
153
+ | annoyance | 0.352 | 0.316 | 0.333 | 320 |
154
+ | approval | 0.273 | 0.501 | 0.354 | 351 |
155
+ | caring | 0.271 | 0.415 | 0.327 | 135 |
156
+ | confusion | 0.377 | 0.392 | 0.385 | 153 |
157
+ | curiosity | 0.496 | 0.648 | 0.562 | 284 |
158
+ | desire | 0.525 | 0.373 | 0.437 | 83 |
159
+ | disappointment | 0.272 | 0.305 | 0.288 | 151 |
160
+ | disapproval | 0.333 | 0.461 | 0.387 | 267 |
161
+ | disgust | 0.422 | 0.528 | 0.469 | 123 |
162
+ | embarrassment | 0.545 | 0.324 | 0.407 | 37 |
163
+ | excitement | 0.467 | 0.340 | 0.393 | 103 |
164
+ | fear | 0.565 | 0.667 | 0.612 | 78 |
165
+ | gratitude | 0.946 | 0.889 | 0.917 | 352 |
166
+ | grief | 0.667 | 0.333 | 0.444 | 6 |
167
+ | joy | 0.603 | 0.584 | 0.593 | 161 |
168
+ | love | 0.809 | 0.782 | 0.795 | 238 |
169
+ | nervousness | 0.500 | 0.174 | 0.258 | 23 |
170
+ | optimism | 0.614 | 0.478 | 0.538 | 186 |
171
+ | pride | 0.583 | 0.438 | 0.500 | 16 |
172
+ | realization | 0.270 | 0.214 | 0.238 | 145 |
173
+ | relief | 0.118 | 0.364 | 0.178 | 11 |
174
+ | remorse | 0.551 | 0.768 | 0.642 | 56 |
175
+ | sadness | 0.576 | 0.462 | 0.512 | 156 |
176
+ | surprise | 0.511 | 0.482 | 0.496 | 141 |
177
+ | neutral | 0.564 | 0.838 | 0.674 | 1787 |
178
+
179
+
180
+
181
+ ### Entropy-based Uncertainty Decomposition
182
+ EmCoder computes probabilistic uncertainty using Information Theory metrics over $N$ stochastic forward passes
183
+
184
+ **Demonstration of model uncertainty utilization**
185
+ To validate uncertainty quantification, reject the top **X%** most uncertain (epistemic) classifications. The model's Macro F1 jumps from 0.488 to above 0.70, proving that the model's self-reported uncertainty is highly correlated with its actual error rate
186
+ ![F1 Rejection curve](outputs/f1_rejection_epistemic.png)
187
+
188
+
189
+ **Uncertainty quantification on GoEmotions test set for selected emotions**
190
+ - `admiration`: medium appereance
191
+ - `fear`: minority representation
192
+ - `neutral`: the most samples
193
+
194
+ Admiration | Fear |
195
+ | :---: | :---: |
196
+ | ![Admiration Scatter](outputs/admiration_scatters.png) | ![Fear Scatter](outputs/fear_scatters.png) |
197
+
198
+ **Neutral**
199
+ ![Neutral Scatter](outputs/neutral_scatters.png)
200
+
201
+
202
+
203
+
204
+ **Emotion uncertainty distribution**
205
+ | Epistemic | Aleatoric |
206
+ | :---: | :---: |
207
+ | ![Epistemic Ridge](outputs/ridge_epistemic.png) | ![Aleatoric Ridge](outputs/ridge_aleatoric.png) |
208
+
209
+ **Co-occurrence Confusion Matrix (normalized to Recall %)**
210
+ ![Confusion Matrix](outputs/confusion_matrix.png)
211
+
212
+
213
+ ## Workflow
214
+ ![EmCoder Workflow](outputs/workflow.png)
215
+
216
+
217
+ ## Concrete Dropout Experiment
218
+ An experimental branch of EmCoder integrated Concrete Dropout (Gal et al., 2017) to dynamically learn optimal dropout probabilities. While this marginally sharpened the isolation of extreme edge-cases (yielding a slightly steeper first part on the F1-Rejection curve with an optimized p=0.15), the resulting heavier regularization constrained the capacity of compact EmCoder. This caused a slight degradation in standard macro metrics. Consequently, the production EmCoder model utilizes a fixed **p=0.1** to maintain optimal encoder-classifier synergy.
219
+
220
+
221
+ ## Note
222
+ Note that this model was trained on GoEmotions dataset (social networks domain) and it may not generalize well to other domains.
223
+
224
+
225
+ ## Citation
226
+ If you use this model, please cite it as follows:
227
+
228
+ ```bibtex
229
+ @misc{jez2026emcoder,
230
+ author = {Václav Jež},
231
+ title = {EmCoder},
232
+ year = {2026},
233
+ publisher = {Hugging Face},
234
+ howpublished = {\url{https://huggingface.co/yezdata/EmCoder}},
235
+ version = {1.0.0}
236
+ }
237
+ ```
configuration_emcoder.py ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from transformers import PretrainedConfig
2
+
3
+
4
+ class EmCoderConfig(PretrainedConfig):
5
+ model_type = "emcoder"
6
+
7
+ def __init__(
8
+ self,
9
+ vocab_size=50368,
10
+ d_model=768,
11
+ n_head=12,
12
+ n_layers=6,
13
+ d_ffn=2048,
14
+ dropout=0.1,
15
+ num_labels=28,
16
+ base_encoder_path="",
17
+ id2label=None,
18
+ label2id=None,
19
+ **kwargs,
20
+ ):
21
+ if id2label is not None:
22
+ id2label = {int(k): v for k, v in id2label.items()}
23
+
24
+ super().__init__(id2label=id2label, label2id=label2id, **kwargs)
25
+ self.vocab_size = vocab_size
26
+ self.d_model = d_model
27
+ self.n_head = n_head
28
+ self.n_layers = n_layers
29
+ self.d_ffn = d_ffn
30
+ self.dropout = dropout
31
+ self.num_labels = num_labels
32
+ self.base_encoder_path = base_encoder_path
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5013a3b32923fa719eea0597d593d64f0e824d611531d1259d8bf81ae13aa5be
3
+ size 327097416
modeling_emcoder.py ADDED
@@ -0,0 +1,301 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ import torch.nn as nn
3
+ import torch.nn.functional as F
4
+ from .rope_embeddings import RotaryEmbedding
5
+ from transformers import PreTrainedModel, AutoConfig, AutoModel
6
+ from transformers.modeling_outputs import SequenceClassifierOutput
7
+
8
+ from .configuration_emcoder import EmCoderConfig
9
+
10
+
11
+ class RMSNorm(nn.Module):
12
+ def __init__(self, dim: int, eps: float = 1e-6):
13
+ super().__init__()
14
+ self.eps = eps
15
+ self.weight = nn.Parameter(torch.ones(dim))
16
+
17
+ def forward(self, x: torch.Tensor) -> torch.Tensor:
18
+ variance = x.pow(2).mean(-1, keepdim=True)
19
+ return x * torch.rsqrt(variance + self.eps) * self.weight
20
+
21
+
22
+ class SwiGLU(nn.Module):
23
+ def __init__(self, d_model: int, d_ffn: int):
24
+ super().__init__()
25
+ self.wi = nn.Linear(d_model, 2 * d_ffn, bias=False)
26
+ self.wo = nn.Linear(d_ffn, d_model, bias=False)
27
+
28
+ def forward(self, x: torch.Tensor) -> torch.Tensor:
29
+ x1, x2 = self.wi(x).chunk(2, dim=-1)
30
+ return self.wo(x1 * F.silu(x2))
31
+
32
+
33
+
34
+
35
+ class EmCoderEncoderLayer(nn.Module):
36
+ """Custom Pre-LN Transformer Encoder Layer with RoPE and FlashAttention."""
37
+
38
+ def __init__(self, config: EmCoderConfig, rope: RotaryEmbedding):
39
+ super().__init__()
40
+ self.n_head = config.n_head
41
+ self.d_head = config.d_model // config.n_head
42
+ self.rope = rope
43
+
44
+ # Attention projections
45
+ self.q_proj = nn.Linear(config.d_model, config.d_model, bias=False)
46
+ self.k_proj = nn.Linear(config.d_model, config.d_model, bias=False)
47
+ self.v_proj = nn.Linear(config.d_model, config.d_model, bias=False)
48
+ self.out_proj = nn.Linear(config.d_model, config.d_model, bias=False)
49
+
50
+ self.ln1 = RMSNorm(config.d_model)
51
+ self.ln2 = RMSNorm(config.d_model)
52
+
53
+ self.ffn = SwiGLU(config.d_model, config.d_ffn)
54
+
55
+ self.dropout = nn.Dropout(config.dropout)
56
+
57
+ # mark for initialization
58
+ self.out_proj._is_residual = True
59
+ self.ffn.wo._is_residual = True
60
+
61
+ def forward(self, x: torch.Tensor, attn_mask: torch.Tensor) -> torch.Tensor:
62
+ # MULTI-HEAD ATTENTION
63
+ residual = x
64
+ nx = self.ln1(x)
65
+ B, S, _ = nx.shape
66
+
67
+ # Projections -> (B, H, S, D_head)
68
+ q = self.q_proj(nx).view(B, S, self.n_head, self.d_head).transpose(1, 2)
69
+ k = self.k_proj(nx).view(B, S, self.n_head, self.d_head).transpose(1, 2)
70
+ v = self.v_proj(nx).view(B, S, self.n_head, self.d_head).transpose(1, 2)
71
+
72
+ q = self.rope.rotate_queries_or_keys(q)
73
+ k = self.rope.rotate_queries_or_keys(k)
74
+
75
+ attn_out = F.scaled_dot_product_attention(
76
+ q,
77
+ k,
78
+ v,
79
+ attn_mask=attn_mask,
80
+ dropout_p=self.dropout.p if self.dropout.training else 0.0,
81
+ )
82
+
83
+ # Join heads -> (B, S, D_model)
84
+ attn_out = attn_out.transpose(1, 2).contiguous().view(B, S, -1)
85
+ x = residual + self.dropout(self.out_proj(attn_out))
86
+
87
+ x = x + self.dropout(self.ffn(self.ln2(x)))
88
+ return x
89
+
90
+
91
+ class EmCoderEncoder(nn.Module):
92
+ """The core encoder architecture of EmCoder Transformer."""
93
+
94
+ def __init__(self, config: EmCoderConfig):
95
+ super().__init__()
96
+ self.token_embedding = nn.Embedding(config.vocab_size, config.d_model)
97
+ self.embed_norm = RMSNorm(config.d_model)
98
+ self.dropout = nn.Dropout(config.dropout)
99
+
100
+ self.rope = RotaryEmbedding(dim=config.d_model // config.n_head)
101
+
102
+ self.layers = nn.ModuleList(
103
+ [EmCoderEncoderLayer(config, self.rope) for _ in range(config.n_layers)]
104
+ )
105
+
106
+ self.final_norm = RMSNorm(config.d_model)
107
+
108
+ def forward(self, x: torch.Tensor, mask: torch.Tensor) -> torch.Tensor:
109
+ """Standard forward pass through the encoder."""
110
+ x = self.token_embedding(x)
111
+ x = self.embed_norm(x)
112
+ x = self.dropout(x)
113
+
114
+ B, S = mask.shape
115
+ attn_mask = mask.view(B, 1, 1, S).to(dtype=torch.bool)
116
+
117
+ for layer in self.layers:
118
+ x = layer(x, attn_mask)
119
+
120
+ return self.final_norm(x)
121
+
122
+
123
+
124
+ class EmCoder(PreTrainedModel):
125
+ """The full EmCoder model, including the backbone encoder and the classification head."""
126
+
127
+ config_class = EmCoderConfig
128
+
129
+ def __init__(self, config: EmCoderConfig):
130
+ super().__init__(config)
131
+
132
+ self.encoder = EmCoderEncoder(config)
133
+
134
+ self.classifier = nn.Sequential(
135
+ nn.Linear(config.d_model, config.d_model),
136
+ nn.GELU(),
137
+ nn.Dropout(config.dropout),
138
+ nn.Linear(config.d_model, config.num_labels),
139
+ )
140
+
141
+ self.post_init()
142
+
143
+
144
+ def _init_weights(self, module: nn.Module) -> None:
145
+ if isinstance(module, nn.Linear):
146
+ # scale down the init for residual connections
147
+ if getattr(module, "_is_residual", False):
148
+ std = 0.02 / ((2 * self.config.n_layers) ** 0.5)
149
+ else:
150
+ std = 0.02
151
+
152
+ nn.init.trunc_normal_(module.weight, std=std)
153
+ if module.bias is not None:
154
+ nn.init.zeros_(module.bias)
155
+
156
+ elif isinstance(module, nn.Embedding):
157
+ nn.init.trunc_normal_(module.weight, std=0.02)
158
+
159
+ elif isinstance(module, RMSNorm):
160
+ nn.init.ones_(module.weight)
161
+
162
+
163
+
164
+ def _set_mc_dropout(self, active: bool = True):
165
+ for m in self.modules():
166
+ if isinstance(m, nn.Dropout):
167
+ m.train(active)
168
+
169
+
170
+ @staticmethod
171
+ def _masked_mean_pooling(
172
+ features: torch.Tensor, mask: torch.Tensor
173
+ ) -> torch.Tensor:
174
+ mask = mask.unsqueeze(-1) # (B, S, 1)
175
+ masked_features = features * mask # (B, S, D)
176
+ sum_masked_features = masked_features.sum(dim=1) # (B, D)
177
+ count_tokens = torch.clamp(mask.sum(dim=1), min=1e-9) # (B, 1)
178
+ return sum_masked_features / count_tokens # (B, D)
179
+
180
+
181
+ def mc_forward(
182
+ self,
183
+ input_ids: torch.Tensor | None = None,
184
+ attention_mask: torch.Tensor | None = None,
185
+ labels: torch.Tensor | None = None,
186
+ n_samples: int = 10,
187
+ max_batch_size: int | None = None,
188
+ return_dict: bool | None = None,
189
+ **kwargs,
190
+ ) -> tuple[torch.Tensor, ...] | SequenceClassifierOutput:
191
+ """
192
+ Performs Monte Carlo Dropout inference to quantify uncertainty.
193
+
194
+ Args:
195
+ input_ids: Input token IDs of shape (B, S).
196
+ attention_mask: Attention mask of shape (B, S).
197
+ n_samples: Total number of Monte Carlo samples.
198
+ max_batch_size: Maximum number of samples in one forward pass.
199
+
200
+ Returns:
201
+ Logits of shape (n_samples, B, num_labels).
202
+ """
203
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
204
+
205
+ x = input_ids if input_ids is not None else kwargs.get("x")
206
+ mask = attention_mask if attention_mask is not None else kwargs.get("mask")
207
+
208
+ if x is None or mask is None:
209
+ raise ValueError("input_ids (x) and attention_mask (mask) must be provided")
210
+
211
+ if max_batch_size is None:
212
+ max_batch_size = n_samples
213
+
214
+
215
+ B, S = x.shape
216
+ num_labels = self.classifier[-1].out_features
217
+
218
+ all_logits = torch.empty((n_samples, B, num_labels), device=x.device)
219
+
220
+ is_training = self.training
221
+ self._set_mc_dropout(active=True)
222
+ try:
223
+ with torch.no_grad():
224
+ for i in range(0, n_samples, max_batch_size):
225
+ batch_samples = min(max_batch_size, n_samples - i)
226
+
227
+ x_stacked = x.repeat(batch_samples, 1) # (batch_samples * B, S)
228
+ mask_stacked = mask.repeat(batch_samples, 1) # (batch_samples * B, S)
229
+
230
+ features = self.encoder(
231
+ x_stacked, mask_stacked
232
+ ) # (batch_samples * B, S, D)
233
+
234
+ pooled = self._masked_mean_pooling(features, mask_stacked)
235
+ logits = self.classifier(pooled) # (n_samples * B, num_labels)
236
+
237
+ all_logits[i : i + batch_samples] = logits.view(batch_samples, B, -1)
238
+ finally:
239
+ self._set_mc_dropout(active=is_training)
240
+
241
+ loss = None
242
+ if labels is not None:
243
+ loss_fct = nn.BCEWithLogitsLoss()
244
+ loss = loss_fct(all_logits.mean(dim=0), labels.to(all_logits.dtype))
245
+
246
+ if not return_dict:
247
+ output = (all_logits,)
248
+ return ((loss,) + output) if loss is not None else output
249
+
250
+ return SequenceClassifierOutput(
251
+ loss=loss,
252
+ logits=all_logits,
253
+ hidden_states=None,
254
+ attentions=None,
255
+ )
256
+
257
+
258
+ def forward(
259
+ self,
260
+ input_ids: torch.Tensor | None = None,
261
+ attention_mask: torch.Tensor | None = None,
262
+ labels: torch.Tensor | None = None,
263
+ return_dict: bool | None = None,
264
+ **kwargs,
265
+ ) -> tuple[torch.Tensor, ...] | SequenceClassifierOutput:
266
+ """Standard forward pass without MC Dropout."""
267
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
268
+
269
+ x = input_ids if input_ids is not None else kwargs.get("x")
270
+ mask = attention_mask if attention_mask is not None else kwargs.get("mask")
271
+
272
+ if x is None or mask is None:
273
+ raise ValueError("input_ids (x) and attention_mask (mask) must be provided")
274
+
275
+ features = self.encoder(x, mask)
276
+
277
+ pooled = self._masked_mean_pooling(features, mask)
278
+
279
+ logits = self.classifier(pooled)
280
+
281
+ loss = None
282
+ if labels is not None:
283
+ loss_fct = nn.BCEWithLogitsLoss()
284
+ loss = loss_fct(logits, labels.to(logits.dtype))
285
+
286
+ if not return_dict:
287
+ output = (logits,)
288
+ return ((loss,) + output) if loss is not None else output
289
+
290
+ return SequenceClassifierOutput(
291
+ loss=loss,
292
+ logits=logits,
293
+ hidden_states=None,
294
+ attentions=None,
295
+ )
296
+
297
+ try:
298
+ AutoConfig.register("emcoder", EmCoderConfig)
299
+ AutoModel.register(EmCoderConfig, EmCoder)
300
+ except ValueError:
301
+ pass
outputs/admiration_scatters.png ADDED

Git LFS Details

  • SHA256: 5cab43562862ea40bd700109f8dacf96ca5bd47598c5d13fda358659ec0304c9
  • Pointer size: 131 Bytes
  • Size of remote file: 249 kB
outputs/confusion_matrix.png ADDED
outputs/f1_rejection_epistemic.png ADDED
outputs/fear_scatters.png ADDED

Git LFS Details

  • SHA256: 1eb5e47f1b3366d93daf60ada82070330ddb520d68441b7978c982d1fcfba06e
  • Pointer size: 131 Bytes
  • Size of remote file: 130 kB
outputs/neutral_scatters.png ADDED

Git LFS Details

  • SHA256: bdfeab47c87920893ebf91eba8eca21ac5c0939e106b929cb478ca71322fb7b1
  • Pointer size: 131 Bytes
  • Size of remote file: 309 kB
outputs/ridge_aleatoric.png ADDED

Git LFS Details

  • SHA256: 3fa0a3aeb52fae0fbd585eda43262db6586c5fd0a84b11dd9b2d9077bb2c6ce8
  • Pointer size: 131 Bytes
  • Size of remote file: 168 kB
outputs/ridge_epistemic.png ADDED

Git LFS Details

  • SHA256: d208742be0a9e25c026230d7657ef3f1c9e7d8668c3b3a798168cac134457575
  • Pointer size: 131 Bytes
  • Size of remote file: 111 kB
rope_embeddings.py ADDED
@@ -0,0 +1,270 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+ from math import pi, log
3
+ import torch
4
+ from torch.amp import autocast
5
+ from torch.nn import Module
6
+ from torch import nn, broadcast_tensors, is_tensor, tensor, Tensor
7
+ from typing import Literal
8
+
9
+
10
+ def exists(val):
11
+ return val is not None
12
+
13
+ def default(val, d):
14
+ return val if exists(val) else d
15
+
16
+ def broadcat(tensors, dim=-1):
17
+ broadcasted_tensors = broadcast_tensors(*tensors)
18
+ return torch.cat(broadcasted_tensors, dim=dim)
19
+
20
+ def slice_at_dim(t, dim_slice: slice, *, dim):
21
+ dim += (t.ndim if dim < 0 else 0)
22
+ colons = [slice(None)] * t.ndim
23
+ colons[dim] = dim_slice
24
+ return t[tuple(colons)]
25
+
26
+ def rotate_half(x):
27
+ orig_shape = x.shape
28
+ d_head = orig_shape[-1]
29
+ x = x.view(*orig_shape[:-1], d_head // 2, 2)
30
+
31
+ x1 = x[..., 0]
32
+ x2 = x[..., 1]
33
+
34
+ res = torch.stack((-x2, x1), dim=-1)
35
+ return res.view(*orig_shape)
36
+
37
+
38
+ @autocast('cuda', enabled=False)
39
+ def apply_rotary_emb(
40
+ freqs,
41
+ t,
42
+ start_index=0,
43
+ scale=1.,
44
+ seq_dim=-2,
45
+ freqs_seq_dim=None
46
+ ):
47
+ dtype = t.dtype
48
+
49
+ if not exists(freqs_seq_dim):
50
+ if freqs.ndim == 2 or t.ndim == 3:
51
+ freqs_seq_dim = 0
52
+
53
+ if t.ndim == 3 or exists(freqs_seq_dim):
54
+ seq_len = t.shape[seq_dim]
55
+ freqs = slice_at_dim(freqs, slice(-seq_len, None), dim=freqs_seq_dim)
56
+
57
+ rot_dim = freqs.shape[-1]
58
+ end_index = start_index + rot_dim
59
+
60
+ assert rot_dim <= t.shape[-1], f'feature dimension {t.shape[-1]} is not of sufficient size to rotate in all the positions {rot_dim}'
61
+
62
+ t_left = t[..., :start_index]
63
+ t_middle = t[..., start_index:end_index]
64
+ t_right = t[..., end_index:]
65
+
66
+ t_transformed = (t_middle * freqs.cos() * scale) + (rotate_half(t_middle) * freqs.sin() * scale)
67
+
68
+ out = torch.cat((t_left, t_transformed, t_right), dim=-1)
69
+ return out.type(dtype)
70
+
71
+
72
+ def apply_learned_rotations(rotations, t, start_index=0, freq_ranges=None):
73
+ if exists(freq_ranges):
74
+ rotations = torch.einsum('..., f -> ... f', rotations, freq_ranges)
75
+ rotations = rotations.reshape(*rotations.shape[:-2], -1)
76
+
77
+ rotations = rotations.repeat_interleave(2, dim=-1)
78
+ return apply_rotary_emb(rotations, t, start_index=start_index)
79
+
80
+
81
+ class RotaryEmbedding(Module):
82
+ def __init__(
83
+ self,
84
+ dim,
85
+ custom_freqs: Tensor | None = None,
86
+ freqs_for: Literal['lang', 'pixel', 'constant'] = 'lang',
87
+ theta = 10000,
88
+ max_freq = 10,
89
+ num_freqs = 1,
90
+ learned_freq = False,
91
+ use_xpos = False,
92
+ xpos_scale_base = 512,
93
+ interpolate_factor = 1.,
94
+ theta_rescale_factor = 1.,
95
+ seq_before_head_dim = False,
96
+ cache_if_possible = True,
97
+ cache_max_seq_len = 8192
98
+ ):
99
+ super().__init__()
100
+
101
+ theta *= theta_rescale_factor ** (dim / (dim - 2))
102
+ self.freqs_for = freqs_for
103
+
104
+ if exists(custom_freqs):
105
+ freqs = custom_freqs
106
+ elif freqs_for == 'lang':
107
+ freqs = 1. / (theta ** (torch.arange(0, dim, 2)[:(dim // 2)].float() / dim))
108
+ elif freqs_for == 'pixel':
109
+ freqs = torch.linspace(1., max_freq / 2, dim // 2) * pi
110
+ elif freqs_for == 'constant':
111
+ freqs = torch.ones(num_freqs).float()
112
+
113
+ self.cache_if_possible = cache_if_possible
114
+ self.cache_max_seq_len = cache_max_seq_len
115
+
116
+ self.register_buffer('cached_freqs', torch.zeros(cache_max_seq_len, dim), persistent=False)
117
+ self.cached_freqs_seq_len = 0
118
+
119
+ self.freqs = nn.Parameter(freqs, requires_grad=learned_freq)
120
+ self.learned_freq = learned_freq
121
+
122
+ self.register_buffer('dummy', torch.tensor(0), persistent=False)
123
+
124
+ self.seq_before_head_dim = seq_before_head_dim
125
+ self.default_seq_dim = -3 if seq_before_head_dim else -2
126
+
127
+ assert interpolate_factor >= 1.
128
+ self.interpolate_factor = interpolate_factor
129
+
130
+ self.use_xpos = use_xpos
131
+
132
+ if not use_xpos:
133
+ return
134
+
135
+ scale = (torch.arange(0, dim, 2) + 0.4 * dim) / (1.4 * dim)
136
+ self.scale_base = xpos_scale_base
137
+
138
+ self.register_buffer('scale', scale, persistent=False)
139
+ self.register_buffer('cached_scales', torch.zeros(cache_max_seq_len, dim), persistent=False)
140
+ self.cached_scales_seq_len = 0
141
+
142
+ self.apply_rotary_emb = staticmethod(apply_rotary_emb)
143
+
144
+ @property
145
+ def device(self):
146
+ return self.dummy.device
147
+
148
+ def get_seq_pos(self, seq_len, device=None, dtype=None, offset=0):
149
+ device = default(device, self.device)
150
+ dtype = default(dtype, self.cached_freqs.dtype)
151
+ return (torch.arange(seq_len, device=device, dtype=dtype) + offset) / self.interpolate_factor
152
+
153
+ def rotate_queries_or_keys(self, t, seq_dim=None, offset=0, scale=None):
154
+ seq_dim = default(seq_dim, self.default_seq_dim)
155
+ assert not self.use_xpos or exists(scale), 'you must use `.rotate_queries_and_keys` method instead'
156
+
157
+ device, dtype, seq_len = t.device, t.dtype, t.shape[seq_dim]
158
+ seq = self.get_seq_pos(seq_len, device=device, dtype=dtype, offset=offset)
159
+ freqs = self.forward(seq, seq_len=seq_len, offset=offset)
160
+
161
+ if seq_dim == -3:
162
+ freqs = freqs.unsqueeze(1)
163
+
164
+ return apply_rotary_emb(freqs, t, scale=default(scale, 1.), seq_dim=seq_dim)
165
+
166
+ def rotate_queries_with_cached_keys(self, q, k, seq_dim=None, offset=0):
167
+ dtype, device, seq_dim = q.dtype, q.device, default(seq_dim, self.default_seq_dim)
168
+
169
+ q_len, k_len = q.shape[seq_dim], k.shape[seq_dim]
170
+ assert q_len <= k_len
171
+
172
+ q_scale = k_scale = 1.
173
+
174
+ if self.use_xpos:
175
+ seq = self.get_seq_pos(k_len, dtype=dtype, device=device)
176
+ q_scale = self.get_scale(seq[-q_len:]).type(dtype)
177
+ k_scale = self.get_scale(seq).type(dtype)
178
+
179
+ rotated_q = self.rotate_queries_or_keys(q, seq_dim=seq_dim, scale=q_scale, offset=k_len - q_len + offset)
180
+ rotated_k = self.rotate_queries_or_keys(k, seq_dim=seq_dim, scale=k_scale ** -1)
181
+
182
+ return rotated_q.type(q.dtype), rotated_k.type(k.dtype)
183
+
184
+ def rotate_queries_and_keys(self, q, k, seq_dim=None):
185
+ seq_dim = default(seq_dim, self.default_seq_dim)
186
+ assert self.use_xpos
187
+ device, dtype, seq_len = q.device, q.dtype, q.shape[seq_dim]
188
+
189
+ seq = self.get_seq_pos(seq_len, dtype=dtype, device=device)
190
+ freqs = self.forward(seq, seq_len=seq_len)
191
+ scale = self.get_scale(seq, seq_len=seq_len).to(dtype)
192
+
193
+ if seq_dim == -3:
194
+ freqs = freqs.unsqueeze(1)
195
+ scale = scale.unsqueeze(1)
196
+
197
+ rotated_q = apply_rotary_emb(freqs, q, scale=scale, seq_dim=seq_dim)
198
+ rotated_k = apply_rotary_emb(freqs, k, scale=scale ** -1, seq_dim=seq_dim)
199
+
200
+ return rotated_q.type(q.dtype), rotated_k.type(k.dtype)
201
+
202
+ def get_scale(self, t: Tensor, seq_len: int | None = None, offset=0):
203
+ assert self.use_xpos
204
+ should_cache = self.cache_if_possible and exists(seq_len) and (offset + seq_len) <= self.cache_max_seq_len
205
+
206
+ if should_cache and (seq_len + offset) <= self.cached_scales_seq_len:
207
+ return self.cached_scales[offset:(offset + seq_len)]
208
+
209
+ scale = 1.
210
+ if self.use_xpos:
211
+ power = (t - len(t) // 2) / self.scale_base
212
+ scale = self.scale ** power.unsqueeze(-1)
213
+ scale = scale.repeat_interleave(2, dim=-1)
214
+
215
+ if should_cache and offset == 0:
216
+ self.cached_scales[:seq_len] = scale.detach()
217
+ self.cached_scales_seq_len = seq_len
218
+
219
+ return scale
220
+
221
+ def get_axial_freqs(self, *dims, offsets: tuple[int | float, ...] | Tensor | None = None):
222
+ Colon = slice(None)
223
+ all_freqs = []
224
+
225
+ if exists(offsets):
226
+ if not is_tensor(offsets):
227
+ offsets = tensor(offsets)
228
+ assert len(offsets) == len(dims)
229
+
230
+ for ind, dim in enumerate(dims):
231
+ offset = 0
232
+ if exists(offsets):
233
+ offset = offsets[ind]
234
+
235
+ if self.freqs_for == 'pixel':
236
+ pos = torch.linspace(-1, 1, steps=dim, device=self.device)
237
+ else:
238
+ pos = torch.arange(dim, device=self.device)
239
+
240
+ pos = pos + offset
241
+ freqs = self.forward(pos, seq_len=dim)
242
+
243
+ all_axis = [None] * len(dims)
244
+ all_axis[ind] = Colon
245
+ new_axis_slice = (Ellipsis, *all_axis, Colon)
246
+ all_freqs.append(freqs[new_axis_slice])
247
+
248
+ all_freqs = broadcast_tensors(*all_freqs)
249
+ return torch.cat(all_freqs, dim=-1)
250
+
251
+ @autocast('cuda', enabled=False)
252
+ def forward(self, t: Tensor, seq_len: int | None = None, offset=0):
253
+ should_cache = (
254
+ self.cache_if_possible and not self.learned_freq and
255
+ exists(seq_len) and self.freqs_for != 'pixel' and
256
+ (offset + seq_len) <= self.cache_max_seq_len
257
+ )
258
+
259
+ if should_cache and (offset + seq_len) <= self.cached_freqs_seq_len:
260
+ return self.cached_freqs[offset:(offset + seq_len)].detach()
261
+
262
+ freqs = self.freqs
263
+ freqs = torch.einsum('..., f -> ... f', t.type(freqs.dtype), freqs)
264
+ freqs = freqs.repeat_interleave(2, dim=-1)
265
+
266
+ if should_cache and offset == 0:
267
+ self.cached_freqs[:seq_len] = freqs.detach()
268
+ self.cached_freqs_seq_len = seq_len
269
+
270
+ return freqs
thresholds.json ADDED
@@ -0,0 +1,114 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "admiration": {
3
+ "p": 0.6142857142857143,
4
+ "f1": 0.7186574531095755
5
+ },
6
+ "amusement": {
7
+ "p": 0.5,
8
+ "f1": 0.7870778267254038
9
+ },
10
+ "anger": {
11
+ "p": 0.6714285714285715,
12
+ "f1": 0.42744063324538256
13
+ },
14
+ "annoyance": {
15
+ "p": 0.5571428571428572,
16
+ "f1": 0.3525423728813559
17
+ },
18
+ "approval": {
19
+ "p": 0.3857142857142858,
20
+ "f1": 0.36084452975047987
21
+ },
22
+ "caring": {
23
+ "p": 0.44285714285714284,
24
+ "f1": 0.4715909090909091
25
+ },
26
+ "confusion": {
27
+ "p": 0.6142857142857143,
28
+ "f1": 0.4217252396166134
29
+ },
30
+ "curiosity": {
31
+ "p": 0.6714285714285715,
32
+ "f1": 0.5331125827814569
33
+ },
34
+ "desire": {
35
+ "p": 0.6142857142857143,
36
+ "f1": 0.5324675324675324
37
+ },
38
+ "disappointment": {
39
+ "p": 0.5,
40
+ "f1": 0.36416184971098264
41
+ },
42
+ "disapproval": {
43
+ "p": 0.5,
44
+ "f1": 0.41025641025641024
45
+ },
46
+ "disgust": {
47
+ "p": 0.5,
48
+ "f1": 0.425531914893617
49
+ },
50
+ "embarrassment": {
51
+ "p": 0.5,
52
+ "f1": 0.5294117647058824
53
+ },
54
+ "excitement": {
55
+ "p": 0.7857142857142857,
56
+ "f1": 0.33986928104575165
57
+ },
58
+ "fear": {
59
+ "p": 0.6142857142857143,
60
+ "f1": 0.632183908045977
61
+ },
62
+ "gratitude": {
63
+ "p": 0.7857142857142857,
64
+ "f1": 0.9131075110456554
65
+ },
66
+ "grief": {
67
+ "p": 0.6714285714285715,
68
+ "f1": 0.45454545454545453
69
+ },
70
+ "joy": {
71
+ "p": 0.6142857142857143,
72
+ "f1": 0.5688622754491018
73
+ },
74
+ "love": {
75
+ "p": 0.7285714285714286,
76
+ "f1": 0.8052930056710775
77
+ },
78
+ "nervousness": {
79
+ "p": 0.7857142857142857,
80
+ "f1": 0.375
81
+ },
82
+ "optimism": {
83
+ "p": 0.6714285714285715,
84
+ "f1": 0.6054054054054054
85
+ },
86
+ "pride": {
87
+ "p": 0.5,
88
+ "f1": 0.56
89
+ },
90
+ "realization": {
91
+ "p": 0.5,
92
+ "f1": 0.24892703862660945
93
+ },
94
+ "relief": {
95
+ "p": 0.3285714285714286,
96
+ "f1": 0.1935483870967742
97
+ },
98
+ "remorse": {
99
+ "p": 0.7285714285714286,
100
+ "f1": 0.7916666666666666
101
+ },
102
+ "sadness": {
103
+ "p": 0.6714285714285715,
104
+ "f1": 0.5255474452554745
105
+ },
106
+ "surprise": {
107
+ "p": 0.5,
108
+ "f1": 0.5128205128205128
109
+ },
110
+ "neutral": {
111
+ "p": 0.3857142857142858,
112
+ "f1": 0.6646788990825688
113
+ }
114
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "backend": "tokenizers",
3
+ "clean_up_tokenization_spaces": true,
4
+ "cls_token": "[CLS]",
5
+ "is_local": false,
6
+ "local_files_only": false,
7
+ "mask_token": "[MASK]",
8
+ "model_input_names": [
9
+ "input_ids",
10
+ "attention_mask"
11
+ ],
12
+ "model_max_length": 8192,
13
+ "pad_token": "[PAD]",
14
+ "sep_token": "[SEP]",
15
+ "tokenizer_class": "TokenizersBackend",
16
+ "unk_token": "[UNK]"
17
+ }
train_config.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "n_samples": 50,
3
+ "tokenized_ds_dir": "data/goemotions_v2_no_trunc",
4
+ "encoder_lr": 0.00001,
5
+ "head_lr": 0.0002,
6
+ "lr_warmup": 0.02,
7
+ "weight_decay": 0.01,
8
+ "batch_size": 64,
9
+ "gradient_accumulation_steps": 1,
10
+ "num_epochs": 10
11
+ }
train_state.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "train_loss": 0.16548924763660894,
3
+ "eval_loss": 0.21261409854187685
4
+ }