weicaijaden commited on
Commit
e6ecc86
·
verified ·
1 Parent(s): 0154361

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +72 -0
README.md ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - biology
5
+ - transformers
6
+ - Feature Extraction
7
+ ---
8
+
9
+ ## Usage
10
+
11
+ ### Load tokenizer and model
12
+
13
+ ```python
14
+ from transformers import AutoTokenizer, AutoModel
15
+
16
+ model_name = "CompBioDSA/pig-mutbert-ref"
17
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
18
+ model = AutoModel.from_pretrained(model_name, trust_remote_code=True)
19
+ ```
20
+
21
+ The default attention is flash attention("sdpa"). If you want use basic attention, you can replace it with "eager". Please refer to [here](https://huggingface.co/CompBioDSA/MutBERT/blob/main/modeling_mutbert.py#L438).
22
+
23
+ ### Get embeddings
24
+
25
+ ```python
26
+ import torch
27
+ import torch.nn.functional as F
28
+
29
+ from transformers import AutoTokenizer, AutoModel
30
+
31
+ model_name = "CompBioDSA/pig-mutbert-ref"
32
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
33
+ model = AutoModel.from_pretrained(model_name, trust_remote_code=True)
34
+
35
+ dna = "ATCGGGGCCCATTA"
36
+ inputs = tokenizer(dna, return_tensors='pt')["input_ids"]
37
+
38
+ mut_inputs = F.one_hot(inputs, num_classes=len(tokenizer)).float().to("cpu") # len(tokenizer) is vocab size
39
+ last_hidden_state = model(mut_inputs).last_hidden_state # [1, sequence_length, 768]
40
+ # or: last_hidden_state = model(mut_inputs)[0] # [1, sequence_length, 768]
41
+
42
+ # embedding with mean pooling
43
+ embedding_mean = torch.mean(last_hidden_state[0], dim=0)
44
+ print(embedding_mean.shape) # expect to be 768
45
+
46
+ # embedding with max pooling
47
+ embedding_max = torch.max(last_hidden_state[0], dim=0)[0]
48
+ print(embedding_max.shape) # expect to be 768
49
+ ```
50
+
51
+ ### Using as a Classifier
52
+
53
+ ```python
54
+ from transformers import AutoModelForSequenceClassification
55
+
56
+ model_name = "CompBioDSA/pig-mutbert-ref"
57
+ model = AutoModelForSequenceClassification.from_pretrained(model_name, trust_remote_code=True, num_labels=2)
58
+ ```
59
+
60
+ ### With RoPE scaling
61
+
62
+ Allowed types for RoPE scaling are: `linear` and `dynamic`. To extend the model's context window you need to add rope_scaling parameter.
63
+
64
+ If you want to scale your model context by 2x:
65
+
66
+ ```python
67
+ model_name = "CompBioDSA/pig-mutbert-ref"
68
+ model = AutoModel.from_pretrained(model_name,
69
+ trust_remote_code=True,
70
+ rope_scaling={'type': 'dynamic','factor': 2.0}
71
+ ) # 2.0 for x2 scaling, 4.0 for x4, etc..
72
+ ```