Steve Chen
stev236
AI & ML interests
Running local models on different projects.
Organizations
None yet
Has anybody got MTP working on VLLM? ('GPUModelRunner' object has no attribute 'drafter')
#36 opened 2 months ago
by
stev236
Generates nonsense when run with latest VLLM with Flashinfer 0.4
#35 opened 2 months ago
by
stev236
Generates nonsense if running latest VLLM with Flashinfer 0.4
1
#7 opened 2 months ago
by
stev236
Not working with latest VLLM / Flashinfer
#10 opened 2 months ago
by
stev236
No file named configuration_lfm2_moe.py
❤️
1
3
#3 opened 3 months ago
by
Mohaddz
Quantization results in model not supporting Tensor Parallel mode.
#3 opened 3 months ago
by
stev236
Quantization issues
4
#17 opened 3 months ago
by
stev236
Treating _ as * (tokenizer error?)
5
#10 opened 5 months ago
by
EmilPi
This model is unbelievably ignorant.
➕
👍
40
15
#14 opened 5 months ago
by
phil111
how to make determinstic output?
4
#23 opened 5 months ago
by
junma
not able to use this model (tencent/Hunyuan-7B-Instruct) via vllm
➕
8
2
#5 opened 5 months ago
by
aditya-shinde
Readme says 256k context, but config.json has "max_position_embeddings": 32768
➕
1
#4 opened 5 months ago
by
stev236
Local Installation Video and Testing - Step by Step
😎
1
1
#2 opened 5 months ago
by
fahdmirzac
Long context: YaRN max_position_embeddings 32K or 40k?
➕
1
2
#10 opened 8 months ago
by
stev236
New 8B model much slower than old 7B model when running on vLLM.
1
#6 opened 8 months ago
by
stev236
Why are the new 4B and 8B models slower than the previous 7B-1M model??
3
#6 opened 8 months ago
by
stev236
Hardware requirements?
🤯
1
2
#7 opened 11 months ago
by
ccocks-deca
Does vllm 0.7.3 support this model?
1
#10 opened 10 months ago
by
traphix
Model no longer working with vLLM > v.0.8.5
1
#13 opened 5 months ago
by
stev236
Did anybody notice that this model no longer works with vLLM > v.0.8.5?
1
#7 opened 5 months ago
by
stev236