sposhiy commited on
Commit
547b8c8
·
verified ·
1 Parent(s): 88250b0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +75 -1
README.md CHANGED
@@ -3,4 +3,78 @@ language:
3
  - en
4
  base_model:
5
  - microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext
6
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  - en
4
  base_model:
5
  - microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext
6
+ ---
7
+
8
+ # LitGene: An Interpretable Transformer Model for Gene Representation Learning
9
+
10
+ LitGene is a transformer-based model that learns rich gene representations by integrating textual information from the scientific literature with structured knowledge from the Gene Ontology (GO). Using contrastive learning, the model refines gene embeddings that capture both sequence and functional annotations, enabling improved prediction of protein properties, gene-disease associations, and functional annotations such as GO terms and KEGG pathways.
11
+
12
+ This repository provides model weights for the pre-trained LitGene model. It is intended to serve as a base representation model that can be further adapted/fine-tuned for specific biomedical tasks.
13
+
14
+ ## Intended Usage
15
+
16
+ This model is intended to be used for any tasks that require interfacing with models . LitGene can be used for any of the following:
17
+ - Infrence: Providing predictions for gene functions, gene-disease/gene-protien associations, and specific biological pathway information. Prompt Ligene [here](http://64.106.39.56:5000/).
18
+ - Gene Embeddings: Producing embeddings that capture both textual (literature based) sepcific biological properties of gene function.https://github.com/vinash85/LitGene/tree/master
19
+ - Fine-tuning: base representation model can be fine-tuned for a multitude of biomedical tasks (e.g. protien solubility prediction, drug dosage sensitivity). Example tasks can be found in this [repo](https://github.com/vinash85/LitGene/tree/master).
20
+
21
+ ## Usage (Pytorch)
22
+ Below is the example (pytorch) code to import LitGene weights
23
+
24
+ ```python
25
+ import torch
26
+ from transformers import AutoModel, AutoTokenizer
27
+
28
+ # Load the model and tokenizer
29
+ model_name = "tumorailab/LitGene_ContrastiveLearning"
30
+
31
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
32
+ model = AutoModel.from_pretrained(model_name)
33
+
34
+ # If you want to move the model to GPU
35
+ device = "cuda" if torch.cuda.is_available() else "cpu"
36
+ model = model.to(device)
37
+ ```
38
+
39
+ below is example code to get embeddings for an example scentence
40
+
41
+ ```python
42
+ # Prepare your sentence
43
+ sentence = "Your text goes here"
44
+
45
+ # Tokenize the sentence
46
+ inputs = tokenizer(
47
+ sentence,
48
+ padding=True,
49
+ truncation=True,
50
+ max_length=512,
51
+ return_tensors="pt"
52
+ )
53
+
54
+ # Move inputs to the same device as model
55
+ inputs = {k: v.to(device) for k, v in inputs.items()}
56
+
57
+ # Get embeddings
58
+ with torch.no_grad():
59
+ model.eval()
60
+ outputs = model(**inputs)
61
+ # Get the CLS token embedding (first token)
62
+
63
+ print(outputs.last_hidden_state)
64
+ ```
65
+
66
+ ## Training Details
67
+
68
+ ##### Hyperparameters
69
+ | Hyperparameter | Value |
70
+ | --- | --- |
71
+ | Embedding Dimension | 768 |
72
+ | Batch Size | 64 |
73
+ | Optimizer | AdamW |
74
+ | Learning Rate | 2e-5 (with linear decay) |
75
+ | Weight Decay | 0.01 |
76
+ | Contrastive Learning Loss Function | Margin-based ranking loss |
77
+ | Contrastive Loss Margin (δ) | 0.5 |
78
+ | Number of Training Steps | 100k |
79
+ | Dropout Rate | 0.1 |
80
+ | Gradient Clipping | 1.0 |