Update README.md
Browse files
README.md
CHANGED
|
@@ -70,7 +70,7 @@ EmCoder achieves competitive F1-scores while being ~35% smaller than RoBERTa-bas
|
|
| 70 |
|
| 71 |
|
| 72 |
## How to use
|
| 73 |
-
|
| 74 |
### 1. Setup & Tokenization
|
| 75 |
Install dependencies
|
| 76 |
```bash
|
|
@@ -78,19 +78,16 @@ pip install -r requirements.txt
|
|
| 78 |
```
|
| 79 |
Setup EmCoder
|
| 80 |
```python
|
| 81 |
-
|
| 82 |
-
from
|
| 83 |
-
from emcoder import EmCoder # Ensure emcoder.py is in your directory
|
| 84 |
|
| 85 |
repo_id = "yezdata/EmCoder"
|
| 86 |
-
model_dir = snapshot_download(repo_id=repo_id)
|
| 87 |
-
print(model_dir)
|
| 88 |
|
| 89 |
# Load the same tokenizer used during training
|
| 90 |
-
tokenizer = AutoTokenizer.from_pretrained(
|
| 91 |
|
| 92 |
# Initialize with same config as training
|
| 93 |
-
model =
|
| 94 |
```
|
| 95 |
### 2. Bayesian inference
|
| 96 |
To obtain probabilistic outputs and uncertainty metrics, use the `mc_forward` method:
|
|
@@ -102,7 +99,8 @@ N_SAMPLES = 50
|
|
| 102 |
model.eval()
|
| 103 |
|
| 104 |
inputs = tokenizer("I am so happy you are here!", return_tensors="pt")
|
| 105 |
-
|
|
|
|
| 106 |
|
| 107 |
# Bayesian Post-processing
|
| 108 |
probs_all = torch.sigmoid(logits_mc) # (n_samples, B, 28)
|
|
|
|
| 70 |
|
| 71 |
|
| 72 |
## How to use
|
| 73 |
+
EmCoder v1.0 uses the `roberta-base` tokenizer for correct token-to-embedding mapping.
|
| 74 |
### 1. Setup & Tokenization
|
| 75 |
Install dependencies
|
| 76 |
```bash
|
|
|
|
| 78 |
```
|
| 79 |
Setup EmCoder
|
| 80 |
```python
|
| 81 |
+
import torch
|
| 82 |
+
from transformers import AutoModel, AutoTokenizer
|
|
|
|
| 83 |
|
| 84 |
repo_id = "yezdata/EmCoder"
|
|
|
|
|
|
|
| 85 |
|
| 86 |
# Load the same tokenizer used during training
|
| 87 |
+
tokenizer = AutoTokenizer.from_pretrained(repo_id)
|
| 88 |
|
| 89 |
# Initialize with same config as training
|
| 90 |
+
model = AutoModel.from_pretrained(repo_id, trust_remote_code=True)
|
| 91 |
```
|
| 92 |
### 2. Bayesian inference
|
| 93 |
To obtain probabilistic outputs and uncertainty metrics, use the `mc_forward` method:
|
|
|
|
| 99 |
model.eval()
|
| 100 |
|
| 101 |
inputs = tokenizer("I am so happy you are here!", return_tensors="pt")
|
| 102 |
+
with torch.no_grad():
|
| 103 |
+
logits_mc = model.mc_forward(inputs['input_ids'], inputs['attention_mask'], n_samples=N_SAMPLES) # Automatically keeps Dropout active, even when in model.eval
|
| 104 |
|
| 105 |
# Bayesian Post-processing
|
| 106 |
probs_all = torch.sigmoid(logits_mc) # (n_samples, B, 28)
|