Batch vs individual inference output mismatch
#9 opened 18 days ago
by
E1eMental
torch.OutOfMemoryError: CUDA out of memory
#8 opened 19 days ago
by
shadowT
Inference seems to be very slow on A100 even when flash_attn is enabled
➕
6
#7 opened 28 days ago
by
boydcheung
Are these variables implicitly read by transformers library or do I need to incorporate into generate function?
#6 opened 28 days ago
by
boydcheung
why the outputs are different ?
2
#5 opened about 1 month ago
by
AAsuka
How different are its hardware requirements from those of the Qwen2-VL-2B?
2
#4 opened about 2 months ago
by
likewendy
Finetune It's Brain On Text
#3 opened 3 months ago
by
VINAYU7
GGUFs are here. Tutorials to run locally.
🔥
5
#2 opened 3 months ago
by
alanzhuly
Local Installation Video and Testing - Step by Step
#1 opened 3 months ago
by
fahdmirzac