Update README.md
Browse files
README.md
CHANGED
|
@@ -27,7 +27,11 @@ GPT-2 finetuned model for **classifying DNA sequences** into **introns** and **e
|
|
| 27 |
|
| 28 |
## Usage
|
| 29 |
|
|
|
|
|
|
|
| 30 |
```python
|
|
|
|
|
|
|
| 31 |
from transformers import GPT2Tokenizer, GPT2LMHeadModel
|
| 32 |
|
| 33 |
tokenizer = GPT2Tokenizer.from_pretrained("GustavoHCruz/ExInGPT")
|
|
@@ -47,8 +51,8 @@ The model expects the following input format:
|
|
| 47 |
<|TARGET|>
|
| 48 |
```
|
| 49 |
|
| 50 |
-
- `<|SEQUENCE|>`: Full DNA sequence.
|
| 51 |
-
- `<|FLANK_BEFORE|>` and `<|FLANK_AFTER|>`: Optional upstream/downstream context sequences.
|
| 52 |
- `<|ORGANISM|>`: Optional organism name (truncated to a maximum of 10 characters in training).
|
| 53 |
- `<|GENE|>`: Optional gene name (truncated to a maximum of 10 characters in training).
|
| 54 |
- `<|TARGET|>`: Separation token for label prediction.
|
|
|
|
| 27 |
|
| 28 |
## Usage
|
| 29 |
|
| 30 |
+
You can use it though it's own pipeline:
|
| 31 |
+
|
| 32 |
```python
|
| 33 |
+
example here...
|
| 34 |
+
|
| 35 |
from transformers import GPT2Tokenizer, GPT2LMHeadModel
|
| 36 |
|
| 37 |
tokenizer = GPT2Tokenizer.from_pretrained("GustavoHCruz/ExInGPT")
|
|
|
|
| 51 |
<|TARGET|>
|
| 52 |
```
|
| 53 |
|
| 54 |
+
- `<|SEQUENCE|>`: Full DNA sequence. Maximum of 512 nucleotides.
|
| 55 |
+
- `<|FLANK_BEFORE|>` and `<|FLANK_AFTER|>`: Optional upstream/downstream context sequences. Maximum of 25 nucleotides.
|
| 56 |
- `<|ORGANISM|>`: Optional organism name (truncated to a maximum of 10 characters in training).
|
| 57 |
- `<|GENE|>`: Optional gene name (truncated to a maximum of 10 characters in training).
|
| 58 |
- `<|TARGET|>`: Separation token for label prediction.
|