yezdata commited on
Commit
6063aca
·
verified ·
1 Parent(s): 3567175

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -9
README.md CHANGED
@@ -70,7 +70,7 @@ EmCoder achieves competitive F1-scores while being ~35% smaller than RoBERTa-bas
70
 
71
 
72
  ## How to use
73
- Since `.safetensors` files only store model weights and not the class logic, you need to use the provided `emcoder.py` to enable **MC Dropout inference**.<br>EmCoder v1.0 requires the `roberta-base` tokenizer for correct token-to-embedding mapping.
74
  ### 1. Setup & Tokenization
75
  Install dependencies
76
  ```bash
@@ -78,19 +78,16 @@ pip install -r requirements.txt
78
  ```
79
  Setup EmCoder
80
  ```python
81
- from transformers import AutoTokenizer
82
- from huggingface_hub import snapshot_download
83
- from emcoder import EmCoder # Ensure emcoder.py is in your directory
84
 
85
  repo_id = "yezdata/EmCoder"
86
- model_dir = snapshot_download(repo_id=repo_id)
87
- print(model_dir)
88
 
89
  # Load the same tokenizer used during training
90
- tokenizer = AutoTokenizer.from_pretrained(model_dir)
91
 
92
  # Initialize with same config as training
93
- model = EmCoder.from_pretrained(model_dir)
94
  ```
95
  ### 2. Bayesian inference
96
  To obtain probabilistic outputs and uncertainty metrics, use the `mc_forward` method:
@@ -102,7 +99,8 @@ N_SAMPLES = 50
102
  model.eval()
103
 
104
  inputs = tokenizer("I am so happy you are here!", return_tensors="pt")
105
- logits_mc = model.mc_forward(inputs['input_ids'], inputs['attention_mask'], n_samples=N_SAMPLES) # Automatically keeps Dropout active, even when in model.eval
 
106
 
107
  # Bayesian Post-processing
108
  probs_all = torch.sigmoid(logits_mc) # (n_samples, B, 28)
 
70
 
71
 
72
  ## How to use
73
+ EmCoder v1.0 uses the `roberta-base` tokenizer for correct token-to-embedding mapping.
74
  ### 1. Setup & Tokenization
75
  Install dependencies
76
  ```bash
 
78
  ```
79
  Setup EmCoder
80
  ```python
81
+ import torch
82
+ from transformers import AutoModel, AutoTokenizer
 
83
 
84
  repo_id = "yezdata/EmCoder"
 
 
85
 
86
  # Load the same tokenizer used during training
87
+ tokenizer = AutoTokenizer.from_pretrained(repo_id)
88
 
89
  # Initialize with same config as training
90
+ model = AutoModel.from_pretrained(repo_id, trust_remote_code=True)
91
  ```
92
  ### 2. Bayesian inference
93
  To obtain probabilistic outputs and uncertainty metrics, use the `mc_forward` method:
 
99
  model.eval()
100
 
101
  inputs = tokenizer("I am so happy you are here!", return_tensors="pt")
102
+ with torch.no_grad():
103
+ logits_mc = model.mc_forward(inputs['input_ids'], inputs['attention_mask'], n_samples=N_SAMPLES) # Automatically keeps Dropout active, even when in model.eval
104
 
105
  # Bayesian Post-processing
106
  probs_all = torch.sigmoid(logits_mc) # (n_samples, B, 28)