Is this model supported for finetuning with flash attention ?
#4 opened 7 months ago
by
thaodd11
MMLU Performance After Token Training
👍
2
#3 opened over 1 year ago
by
adol01