yezdata
/

EmCoder

Text Classification

feature-extraction

emotion-recognition

bayesian-deep-learning

uncertainty-quantification

multi-label-classification

Eval Results (legacy)

Model card Files Files and versions

yezdata commited on 11 days ago

Commit

6063aca

·

verified ·

1 Parent(s): 3567175

Update README.md

Files changed (1) hide show

README.md +7 -9

README.md CHANGED Viewed

@@ -70,7 +70,7 @@ EmCoder achieves competitive F1-scores while being ~35% smaller than RoBERTa-bas
 ## How to use
-Since `.safetensors` files only store model weights and not the class logic, you need to use the provided `emcoder.py` to enable **MC Dropout inference**.<br>EmCoder v1.0 requires the `roberta-base` tokenizer for correct token-to-embedding mapping.
 ### 1. Setup & Tokenization
 Install dependencies
 ```bash
@@ -78,19 +78,16 @@ pip install -r requirements.txt
 ```
 Setup EmCoder
 ```python
-from transformers import AutoTokenizer
-from huggingface_hub import snapshot_download
-from emcoder import EmCoder # Ensure emcoder.py is in your directory
 repo_id = "yezdata/EmCoder"
-model_dir = snapshot_download(repo_id=repo_id)
-print(model_dir)
 # Load the same tokenizer used during training
-tokenizer = AutoTokenizer.from_pretrained(model_dir)
 # Initialize with same config as training
-model = EmCoder.from_pretrained(model_dir)
 ```
 ### 2. Bayesian inference
 To obtain probabilistic outputs and uncertainty metrics, use the `mc_forward` method:
@@ -102,7 +99,8 @@ N_SAMPLES = 50
 model.eval()
 inputs = tokenizer("I am so happy you are here!", return_tensors="pt")
-logits_mc = model.mc_forward(inputs['input_ids'], inputs['attention_mask'], n_samples=N_SAMPLES) # Automatically keeps Dropout active, even when in model.eval
 # Bayesian Post-processing
 probs_all = torch.sigmoid(logits_mc) # (n_samples, B, 28)

 ## How to use
+EmCoder v1.0 uses the `roberta-base` tokenizer for correct token-to-embedding mapping.
 ### 1. Setup & Tokenization
 Install dependencies
 ```bash
 ```
 Setup EmCoder
 ```python
+import torch
+from transformers import AutoModel, AutoTokenizer
 repo_id = "yezdata/EmCoder"
 # Load the same tokenizer used during training
+tokenizer = AutoTokenizer.from_pretrained(repo_id)
 # Initialize with same config as training
+model = AutoModel.from_pretrained(repo_id, trust_remote_code=True)
 ```
 ### 2. Bayesian inference
 To obtain probabilistic outputs and uncertainty metrics, use the `mc_forward` method:
 model.eval()
 inputs = tokenizer("I am so happy you are here!", return_tensors="pt")
+with torch.no_grad():
+    logits_mc = model.mc_forward(inputs['input_ids'], inputs['attention_mask'], n_samples=N_SAMPLES) # Automatically keeps Dropout active, even when in model.eval
 # Bayesian Post-processing
 probs_all = torch.sigmoid(logits_mc) # (n_samples, B, 28)