GustavoHCruz
/

ExInGPT

sequence-classification

Model card Files Files and versions

GustavoHCruz commited on Nov 17, 2025

Commit

d773463

·

verified ·

1 Parent(s): d44e01f

Update README.md

Files changed (1) hide show

README.md +6 -2

README.md CHANGED Viewed

@@ -27,7 +27,11 @@ GPT-2 finetuned model for **classifying DNA sequences** into **introns** and **e
 ## Usage
 ```python
 from transformers import GPT2Tokenizer, GPT2LMHeadModel
 tokenizer = GPT2Tokenizer.from_pretrained("GustavoHCruz/ExInGPT")
@@ -47,8 +51,8 @@ The model expects the following input format:
 <|TARGET|>
 ```
-- `<|SEQUENCE|>`: Full DNA sequence.
-- `<|FLANK_BEFORE|>` and `<|FLANK_AFTER|>`: Optional upstream/downstream context sequences.
 - `<|ORGANISM|>`: Optional organism name (truncated to a maximum of 10 characters in training).
 - `<|GENE|>`: Optional gene name (truncated to a maximum of 10 characters in training).
 - `<|TARGET|>`: Separation token for label prediction.

 ## Usage
+You can use it though it's own pipeline:
 ```python
+example here...
 from transformers import GPT2Tokenizer, GPT2LMHeadModel
 tokenizer = GPT2Tokenizer.from_pretrained("GustavoHCruz/ExInGPT")
 <|TARGET|>
 ```
+- `<|SEQUENCE|>`: Full DNA sequence. Maximum of 512 nucleotides.
+- `<|FLANK_BEFORE|>` and `<|FLANK_AFTER|>`: Optional upstream/downstream context sequences. Maximum of 25 nucleotides.
 - `<|ORGANISM|>`: Optional organism name (truncated to a maximum of 10 characters in training).
 - `<|GENE|>`: Optional gene name (truncated to a maximum of 10 characters in training).
 - `<|TARGET|>`: Separation token for label prediction.