duttaprat commited on
Commit
ddf07ae
·
verified ·
1 Parent(s): 8358e80

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -0
README.md ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - genomics
5
+ - dnabert
6
+ - virology
7
+ - foundation-model
8
+ - hvilm
9
+ ---
10
+
11
+ # HViLM-base: A Foundation Model for Viral Genomics
12
+
13
+ This is the base pre-trained model for **HViLM**, as described in the paper:
14
+ **"HViLM: A Foundation Model for Viral Genomics Enables Multi-Task Prediction of Pathogenicity, Transmissibility, and Host Tropism"**
15
+
16
+ - **Paper:** [Link to your arXiv paper will go here]
17
+ - **Fine-tuned Models:**
18
+ - `duttaprat/HViLM-finetuned-pathogenicity` (coming soon)
19
+ - `duttaprat/HViLM-finetuned-host-tropism` (coming soon)
20
+ - `duttaprat/HViLM-finetuned-transmissibility-R0` (coming soon)
21
+
22
+ ## Model Description
23
+
24
+ (Paste your abstract here)
25
+
26
+ ## How to Use
27
+
28
+ This model requires trusting remote code because it uses custom architecture files (`bert_layers.py`, etc.).
29
+
30
+ ```python
31
+ from transformers import AutoTokenizer, AutoModel
32
+ import torch
33
+
34
+ repo_id = "duttaprat/HViLM-base"
35
+
36
+ # This will download the files you just uploaded
37
+ tokenizer = AutoTokenizer.from_pretrained(repo_id)
38
+ model = AutoModel.from_pretrained(
39
+ repo_id,
40
+ trust_remote_code=True # <-- This is ESSENTIAL
41
+ )
42
+
43
+ print("Model and tokenizer loaded successfully!")
44
+
45
+ # Example: Get embeddings for a sequence
46
+ sequence = "ATGCGTACGT..."
47
+ inputs = tokenizer(sequence, return_tensors="pt")
48
+ with torch.no_grad():
49
+ outputs = model(**inputs)
50
+ embeddings = outputs.last_hidden_state
51
+
52
+ print(embeddings.shape)