some blocked download gtp2 tokenizer
I did that .
download use fine_tune_jit_with_validation_gpt2.py
mkdir -p tokenizer
wget -O tokenizer/tokenizer.json https://huggingface.co/gpt2/resolve/main/tokenizer.json
wget -O tokenizer/vocab.json https://huggingface.co/gpt2/resolve/main/vocab.json
wget -O tokenizer/merges.txt https://huggingface.co/gpt2/resolve/main/merges.txt
wget -O tokenizer/tokenizer_config.json https://huggingface.co/gpt2/resolve/main/tokenizer_config.json
(rocm_py310) root@jirack1:/home/kgrabko/jirackkit/src/main/python# python fine_tune_jit_with_validation.py
Device: cuda
Using ready dataset → datasets/dialogues_text_clean.txt
Starting with base model: JiRack_H16_L32_V50257_D768_MSL8192_FF768x4.script.pt
Tokenizing datasets/dialogues_text_clean.txt (train)...
Token indices sequence length is longer than the specified maximum sequence length for this model (5313076 > 1024). Running this sequence through the model will result in indexing errors
TRAIN: 19,717 sequences
Tokenizing datasets/dialogues_text_clean.txt (val)...
Token indices sequence length is longer than the specified maximum sequence length for this model (5313076 > 1024). Running this sequence through the model will result in indexing errors
VAL: 1,037 sequences
/home/kgrabko/jirackkit/src/main/python/fine_tune_jit_with_validation.py:133: FutureWarning: torch.cuda.amp.GradScaler(args...) is deprecated. Please use torch.amp.GradScaler('cuda', args...) instead.
scaler = GradScaler() # AMP — 1.5–2× acceleration
STARTING TRAINING — 50 epochs, ~82,150 steps
EPOCH 1/50
Train: 0%| | 0/1643 [00:00<?, ?it/s]/home/kgrabko/jirackkit/src/main/python/fine_tune_jit_with_validation.py:145: FutureWarning: torch.cuda.amp.autocast(args...) is deprecated. Please use torch.amp.autocast('cuda', args...) instead.
with autocast():
Train: 11%|█████████████████▎ | 176/1643 [06:04<55:08, 2.26s/it]client_loop: send disconnect: Connection reset