Finetune without using run_clm.py

#16

by TianlaiChen - opened Feb 5, 2023

Discussion

TianlaiChen

Feb 5, 2023

This comment has been hidden

nferruz

Owner Feb 5, 2023

Hi Leo,

I haven’t tried to fine-tune without the helper script but is definitely possible. I guess once you have defined tokenizer and model (which you do right) I would follow the tutorial. In this case you will have to create a txt file with your sequences, replacing the fasta headers with the endoftext tag. Then not sure how the tutorial does it, but you’ll have to tokenize, and define the splits (90/10?) for that dataset.
Hope this helps
Noelia

TianlaiChen changed discussion status to closed Feb 23, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment