EmCoder / README.md

Update README.md

bbcc716 verified 15 days ago

10.2 kB

	---
	language:
	- en
	license: cc-by-nc-nd-4.0
	library_name: generic
	tags:
	- emotion-recognition
	- bayesian-deep-learning
	- mc-dropout
	- uncertainty-quantification
	- multi-label-classification
	datasets:
	- go_emotions
	metrics:
	- precision
	- recall
	- f1
	model-index:
	- name: EmCoder (v1)
	results:
	- task:
	type: text-classification
	name: Multi-label Emotion Classification
	dataset:
	name: GoEmotions
	type: go_emotions
	split: test
	metrics:
	- name: Macro F1
	type: f1
	value: 0.440
	- name: Macro Precision
	type: precision
	value: 0.408
	- name: Macro Recall
	type: recall
	value: 0.495
	---

	# EmCoder
	> Probabilistic Emotion Recognition & Uncertainty Quantification<br>28 Emotion multi-label classifier trained with MC Dropout methodology



	Unlike standard classifiers, EmCoder quantifies what it doesn't know using Monte Carlo Dropout, making it suitable for high-stakes AI pipelines.<br>
	EmCoder is optimized for MC Dropout inference.



	## SOTA benchmark
	### Evaluation on the GoEmotions test split (macro avg metrics)
	EmCoder achieves competitive F1-scores while being ~35% smaller than RoBERTa-base and ~45% smaller than ModernBERT, offering a superior efficiency-to-uncertainty ratio.
	\| Model \| Precision \| Recall \| F1-Score \| Params \|
	\| :--- \| :--- \| :--- \| :--- \| :--- \|
	\| EmCoder (v1) \| 0.408 \| 0.495 \| 0.440 \| 82.1M \|
	\| Google BERT (Original) \| 0.400 \| 0.630 \| 0.460 \| 110M \|
	\| RoBERTa-base \| 0.575 \| 0.396 \| 0.450 \| 125M \|
	\| ModernBERT-base \| 0.652 \| 0.443 \| 0.500 \| 149M \|


	## How to use
	> Since `.safetensors` files only store model weights and not the class logic, you need to use the provided `emcoder.py` to enable MC Dropout inference.<br>EmCoder v1.0 requires the `roberta-base` tokenizer for correct token-to-embedding mapping.
	### 1. Setup & Tokenization
	```python
	from transformers import AutoTokenizer
	from emcoder import EmCoder # Ensure emcoder.py is in your directory

	# Load the same tokenizer used during training
	tokenizer = AutoTokenizer.from_pretrained("roberta-base")

	EMCODER_PATH = "path/to/emcoder"

	# Initialize with same config as training
	model = EmCoder.from_pretrained(EMCODER_PATH)
	```
	### 2. Bayesian inference
	To obtain probabilistic outputs and uncertainty metrics, use the mc_forward method:
	```python
	import torch

	# Perform 50 stochastic passes
	N_SAMPLES = 50
	model.eval()

	inputs = tokenizer("I am so happy you are here!", return_tensors="pt")
	logits_mc = model.mc_forward(inputs['input_ids'], inputs['attention_mask'], n_samples=N_SAMPLES) # Automatically keeps Dropout active, even when in model.eval

	# Bayesian Post-processing
	# logits_mc shape: (n_samples, batch_size, 28)
	probs_all = torch.sigmoid(logits_mc)

	mean_probs = probs_all.mean(dim=0) # Mean Predicted Probability
	uncertainty = probs_all.std(dim=0) # Epistemic Uncertainty (Standard Deviation)
	```



	## Model Architecture
	```mermaid
	flowchart LR

	subgraph InputGroup["Input Operations"]
	direction TB
	MCD_Loop(["MC-Inference Loop: N_samples"]):::LoopNode
	ids["Batch IDs"]
	mask["Batch Mask"]
	end

	subgraph EmCoderCore["EmCoder Core"]
	direction LR
	tok_emb["Token Embedding"]
	ln_in["Input LayerNorm"]
	Transformer["Transformer Encoder"]
	final_norm["Final LayerNorm"]
	Dropout1[("MC-Dropout")]
	Dropout2[("MC-Dropout")]
	end

	subgraph Row1[" "]
	direction LR
	InputGroup
	EmCoderCore
	end

	subgraph MLP["Classifier MLP"]
	L_lin["Linear 1"]
	Dropout3[("MC-Dropout")]
	GELU["GELU"]
	F_lin["Final Linear"]
	end

	subgraph ClassifierHead[" "]
	direction TB
	pool["Masked Mean Pooling"]
	MLP
	end

	subgraph Row2[" "]
	direction LR
	ClassifierHead
	Out(["Class LogitsMC
	(n_samples, B, 28)"])

	Avg["Bayesian Post-processing"]
	end

	tok_emb ==> ln_in
	ln_in -.-> Dropout1
	Dropout1 ==> Transformer
	Transformer -.-> Dropout2
	Dropout2 ==> final_norm
	MCD_Loop -.-> ids
	ids ==> tok_emb
	final_norm ==> pool
	mask ==> pool
	pool ==> L_lin
	L_lin -.-> Dropout3
	Dropout3 ==> GELU
	GELU ==> F_lin
	F_lin ==> Out
	Out ==> Avg
	mask ==> Transformer

	classDef MCD fill:#424242,stroke:#fbc02d,stroke-width:2px,stroke-dasharray: 5 5,color:#fff
	classDef OutNode fill:#0d47a1,stroke:#1976d2,stroke-width:3px,color:#fff,font-weight:bold
	classDef BayesNode fill:#3e2723,stroke:#8d6e63,stroke-width:2px,stroke-dasharray: 3 3,color:#fff
	classDef LoopNode fill:#263238,stroke:#78909c,stroke-width:2px,color:#fff,font-style:italic
	classDef LightNode fill:#212121,stroke:#90a4ae,color:#fff

	class MCD_Loop LoopNode
	class ids,mask,tok_emb,ln_in,Transformer,final_norm,L_lin,GELU,F_lin,pool LightNode
	class Dropout1,Dropout2,Dropout3 MCD
	class Out OutNode
	class Avg BayesNode

	style InputGroup fill:#1a1a1a,stroke:#444,color:#fff
	style EmCoderCore fill:#2d1a2d,stroke:#6a1b9a,color:#fff
	style MLP fill:#212121,stroke:#455a64,color:#fff
	style ClassifierHead fill:#012a4a,stroke:#01497c,color:#fff
	style Row1 fill:none,stroke:none
	style Row2 fill:none,stroke:none

	linkStyle 2 stroke:#fbc02d,stroke-width:2px,fill:none
	linkStyle 5 stroke:#fbc02d,stroke-width:2px,fill:none
	linkStyle 11 stroke:#fbc02d,stroke-width:2px,fill:none
	```


	### Optimization
	The model is trained using a Weighted Bayesian Binary Cross Entropy loss:

	$$
	\mathcal{L}_{Bayesian} = \frac{1}{T} \sum_{t=1}^{T} \text{BCEWithLogits}(z^{(t)}, y; w)
	$$

	Where weights $w$ are calculated using a logarithmic class-balancing scale to handle extreme label imbalance:

	$$
	w_{c} = \max\left( 0.1, \min\left( 20, 1 + \ln \left( \frac{N_{neg,c} + \epsilon}{N_{pos,c} + \epsilon} \right) \right) \right)
	$$



	## Performance
	Using threshold of 0.5 for binarizing predictions
	\| \| precision \| recall \| f1-score \| support \|
	\|:---------------\|------------:\|---------:\|-----------:\|----------:\|
	\| micro avg \| 0.494 \| 0.596 \| 0.54 \| 6329 \|
	\| macro avg \| 0.408 \| 0.495 \| 0.44 \| 6329 \|
	\| weighted avg \| 0.492 \| 0.596 \| 0.535 \| 6329 \|
	\| samples avg \| 0.525 \| 0.616 \| 0.544 \| 6329 \|
	\|----------------\|-------------\|----------\|------------\|-----------\|
	\| admiration \| 0.541 \| 0.673 \| 0.599 \| 504 \|
	\| amusement \| 0.688 \| 0.909 \| 0.783 \| 264 \|
	\| anger \| 0.419 \| 0.47 \| 0.443 \| 198 \|
	\| annoyance \| 0.31 \| 0.25 \| 0.277 \| 320 \|
	\| approval \| 0.304 \| 0.271 \| 0.287 \| 351 \|
	\| caring \| 0.229 \| 0.281 \| 0.252 \| 135 \|
	\| confusion \| 0.26 \| 0.497 \| 0.342 \| 153 \|
	\| curiosity \| 0.432 \| 0.764 \| 0.552 \| 284 \|
	\| desire \| 0.453 \| 0.518 \| 0.483 \| 83 \|
	\| disappointment \| 0.176 \| 0.152 \| 0.163 \| 151 \|
	\| disapproval \| 0.279 \| 0.404 \| 0.33 \| 267 \|
	\| disgust \| 0.447 \| 0.545 \| 0.491 \| 123 \|
	\| embarrassment \| 0.325 \| 0.351 \| 0.338 \| 37 \|
	\| excitement \| 0.288 \| 0.427 \| 0.344 \| 103 \|
	\| fear \| 0.47 \| 0.692 \| 0.56 \| 78 \|
	\| gratitude \| 0.834 \| 0.943 \| 0.885 \| 352 \|
	\| grief \| 0 \| 0 \| 0 \| 6 \|
	\| joy \| 0.445 \| 0.652 \| 0.529 \| 161 \|
	\| love \| 0.724 \| 0.895 \| 0.801 \| 238 \|
	\| nervousness \| 0.24 \| 0.261 \| 0.25 \| 23 \|
	\| optimism \| 0.483 \| 0.543 \| 0.511 \| 186 \|
	\| pride \| 0.667 \| 0.375 \| 0.48 \| 16 \|
	\| realization \| 0.226 \| 0.166 \| 0.191 \| 145 \|
	\| relief \| 0.222 \| 0.182 \| 0.2 \| 11 \|
	\| remorse \| 0.516 \| 0.857 \| 0.644 \| 56 \|
	\| sadness \| 0.405 \| 0.545 \| 0.464 \| 156 \|
	\| surprise \| 0.429 \| 0.539 \| 0.478 \| 141 \|
	\| neutral \| 0.602 \| 0.695 \| 0.645 \| 1787 \|



	Model uncertainty estimation
	![epistemic_unc](outputs/epistemic_unc_scatter.png)

	Confusion matrix
	![multi_label_confusion_matrix](outputs/confusion_matrix.png)



	## Workflow
	```mermaid
	flowchart LR
	classDef StageNode fill:#121212,stroke:#546e7a,color:#fff;
	classDef HighlightNode fill:#4e342e,stroke:#ff7043,stroke-width:2px,color:#fff,font-weight:bold;

	subgraph PT ["Phase 1: Pre-training"]
	direction TB
	OWT[(OpenWebText)]:::StageNode --> MLM[Masked Language Modeling]:::StageNode
	MLM --> Core[Save EmCoderCore]:::StageNode
	end

	subgraph FT ["Phase 2: Fine-tuning"]
	direction TB
	Core --> Init[Init ClassificationHead]:::StageNode
	GE[(GoEmotions)]:::StageNode --> WBT[Bayesian Fine-tuning]:::HighlightNode
	WBT --> LogW[Log-weighted BCE Loss]:::StageNode
	LogW --> Freeze[Step 0-500: Encoder Frozen]:::StageNode
	end

	subgraph EV ["Phase 3: Testing & Inference"]
	direction TB
	Freeze --> MCD[MC Dropout Inference]:::HighlightNode
	MCD --> Unc[Uncertainty Estimation]:::HighlightNode

	subgraph Metrics ["Analysis"]
	Unc --> EPI[Epistemic: Model Confidence]:::StageNode
	Unc --> ALE[Aleatoric: Data Ambiguity]:::StageNode
	Unc --> CM[Test set metrics]:::StageNode
	end
	end

	style PT fill:#0d1b2a,stroke:#1b263b,color:#fff
	style FT fill:#2e1500,stroke:#5d2a00,color:#fff
	style EV fill:#1b2e1b,stroke:#2d4a2d,color:#fff
	style Metrics fill:#000,stroke:#333,color:#fff

	linkStyle default stroke:#aaa,stroke-width:2px;
	```


	### Note
	Note that this model was trained on GoEmotions dataset (social networks domain) and it may not generalize well to other domains.


	## Citation
	If you use this model, please cite it as follows:

	```bibtex
	@software{jez2026emcoder,
	author = {Václav Jež},
	title = {EmCoder: Probabilistic Emotion Recognition & Uncertainty Quantification},
	year = {2026},
	publisher = {GitHub},
	journal = {GitHub repository},
	howpublished = {\url{https://github.com/yezdata/emcoder}},
	version = {1.0.0}
	}
	```