llama.cpp inference - 20 times (!) slower than OSS 20 on a RTX 5090
6
#12 opened about 4 hours ago
by
cmp-nct
We are so back!
β€οΈ
2
#10 opened about 5 hours ago
by
Carnyzzle
The adaptation for SGLang is being processed.
#9 opened about 6 hours ago
by
ZHANGYUXUAN-zR
Is a dedicated Tech Report planned for GLM-4.7-Flash?
1
#8 opened about 6 hours ago
by
NodeLinker
FP8
3
#7 opened about 6 hours ago
by
Daemontatox
Recommended sampling parameters
1
#6 opened about 6 hours ago
by
sszymczyk
DeepseekV3ForCausalLM
π₯
2
1
#5 opened about 6 hours ago
by
davidboring
Thank you!
π₯
10
#4 opened about 7 hours ago
by
mav23
Enormous KV-cache size?
π
β
2
4
#3 opened about 7 hours ago
by
nephepritou
Base model
π₯
5
1
#2 opened about 7 hours ago
by
tcpmux
Performance Discussion
2
#1 opened about 7 hours ago
by
IndenScale