CompBioDSA
/

pig-mutbert-ref

Feature Extraction

Model card Files Files and versions

weicaijaden commited on Oct 31, 2025

Commit

e6ecc86

·

verified ·

1 Parent(s): 0154361

Create README.md

Files changed (1) hide show

README.md +72 -0

README.md ADDED Viewed

	@@ -0,0 +1,72 @@

+---
+license: mit
+tags:
+- biology
+- transformers
+- Feature Extraction
+---
+## Usage
+### Load tokenizer and model
+```python
+from transformers import AutoTokenizer, AutoModel
+model_name = "CompBioDSA/pig-mutbert-ref"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModel.from_pretrained(model_name, trust_remote_code=True)
+```
+The default attention is flash attention("sdpa"). If you want use basic attention, you can replace it with "eager". Please refer to [here](https://huggingface.co/CompBioDSA/MutBERT/blob/main/modeling_mutbert.py#L438).
+### Get embeddings
+```python
+import torch
+import torch.nn.functional as F
+from transformers import AutoTokenizer, AutoModel
+model_name = "CompBioDSA/pig-mutbert-ref"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModel.from_pretrained(model_name, trust_remote_code=True)
+dna = "ATCGGGGCCCATTA"
+inputs = tokenizer(dna, return_tensors='pt')["input_ids"]
+mut_inputs = F.one_hot(inputs, num_classes=len(tokenizer)).float().to("cpu")  # len(tokenizer) is vocab size
+last_hidden_state = model(mut_inputs).last_hidden_state   # [1, sequence_length, 768]
+# or: last_hidden_state = model(mut_inputs)[0]        # [1, sequence_length, 768]
+# embedding with mean pooling
+embedding_mean = torch.mean(last_hidden_state[0], dim=0)
+print(embedding_mean.shape) # expect to be 768
+# embedding with max pooling
+embedding_max = torch.max(last_hidden_state[0], dim=0)[0]
+print(embedding_max.shape) # expect to be 768
+```
+### Using as a Classifier
+```python
+from transformers import AutoModelForSequenceClassification
+model_name = "CompBioDSA/pig-mutbert-ref"
+model = AutoModelForSequenceClassification.from_pretrained(model_name, trust_remote_code=True, num_labels=2)
+```
+### With RoPE scaling
+Allowed types for RoPE scaling are: `linear` and `dynamic`. To extend the model's context window you need to add rope_scaling parameter.
+If you want to scale your model context by 2x:
+```python
+model_name = "CompBioDSA/pig-mutbert-ref"
+model = AutoModel.from_pretrained(model_name,
+                                  trust_remote_code=True,
+                                  rope_scaling={'type': 'dynamic','factor': 2.0}
+                                  ) # 2.0 for x2 scaling, 4.0 for x4, etc..
+```