GustavoHCruz commited on
Commit
d773463
·
verified ·
1 Parent(s): d44e01f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -2
README.md CHANGED
@@ -27,7 +27,11 @@ GPT-2 finetuned model for **classifying DNA sequences** into **introns** and **e
27
 
28
  ## Usage
29
 
 
 
30
  ```python
 
 
31
  from transformers import GPT2Tokenizer, GPT2LMHeadModel
32
 
33
  tokenizer = GPT2Tokenizer.from_pretrained("GustavoHCruz/ExInGPT")
@@ -47,8 +51,8 @@ The model expects the following input format:
47
  <|TARGET|>
48
  ```
49
 
50
- - `<|SEQUENCE|>`: Full DNA sequence.
51
- - `<|FLANK_BEFORE|>` and `<|FLANK_AFTER|>`: Optional upstream/downstream context sequences.
52
  - `<|ORGANISM|>`: Optional organism name (truncated to a maximum of 10 characters in training).
53
  - `<|GENE|>`: Optional gene name (truncated to a maximum of 10 characters in training).
54
  - `<|TARGET|>`: Separation token for label prediction.
 
27
 
28
  ## Usage
29
 
30
+ You can use it though it's own pipeline:
31
+
32
  ```python
33
+ example here...
34
+
35
  from transformers import GPT2Tokenizer, GPT2LMHeadModel
36
 
37
  tokenizer = GPT2Tokenizer.from_pretrained("GustavoHCruz/ExInGPT")
 
51
  <|TARGET|>
52
  ```
53
 
54
+ - `<|SEQUENCE|>`: Full DNA sequence. Maximum of 512 nucleotides.
55
+ - `<|FLANK_BEFORE|>` and `<|FLANK_AFTER|>`: Optional upstream/downstream context sequences. Maximum of 25 nucleotides.
56
  - `<|ORGANISM|>`: Optional organism name (truncated to a maximum of 10 characters in training).
57
  - `<|GENE|>`: Optional gene name (truncated to a maximum of 10 characters in training).
58
  - `<|TARGET|>`: Separation token for label prediction.