Steve Chen's picture

Steve Chen

stev236

·

AI & ML interests

Running local models on different projects.

Recent Activity

new activity 22 days ago

deepseek-ai/DeepSeek-V4-Flash:Unable to run on 2x RTX Pro 6000 (DEEP_GEMM problem)

new activity 7 months ago

Qwen/Qwen3-Next-80B-A3B-Instruct:Has anybody got MTP working on VLLM? ('GPUModelRunner' object has no attribute 'drafter')

new activity 7 months ago

Qwen/Qwen3-Next-80B-A3B-Instruct:Generates nonsense when run with latest VLLM with Flashinfer 0.4

View all activity

Organizations

None yet

New activity in deepseek-ai/DeepSeek-V4-Flash 22 days ago

Unable to run on 2x RTX Pro 6000 (DEEP_GEMM problem)

#15 opened 22 days ago by

New activity in Qwen/Qwen3-Next-80B-A3B-Instruct 7 months ago

Has anybody got MTP working on VLLM? ('GPUModelRunner' object has no attribute 'drafter')

#36 opened 7 months ago by

Generates nonsense when run with latest VLLM with Flashinfer 0.4

#35 opened 7 months ago by

New activity in ibm-granite/granite-4.0-h-small 7 months ago

Generates nonsense if running latest VLLM with Flashinfer 0.4

#7 opened 7 months ago by

New activity in ai21labs/AI21-Jamba-Reasoning-3B 7 months ago

Not working with latest VLLM / Flashinfer

#10 opened 7 months ago by

New activity in LiquidAI/LFM2-8B-A1B 7 months ago

No file named configuration_lfm2_moe.py

#3 opened 7 months ago by

New activity in ibm-granite/granite-4.0-h-small 7 months ago

Quantization results in model not supporting Tensor Parallel mode.

#3 opened 8 months ago by

New activity in Qwen/Qwen3-Omni-30B-A3B-Instruct 8 months ago

Quantization issues

#17 opened 8 months ago by

New activity in Qwen/Qwen3-Coder-30B-A3B-Instruct 9 months ago

Treating _ as * (tokenizer error?)

#10 opened 10 months ago by

New activity in openai/gpt-oss-20b 9 months ago

This model is unbelievably ignorant.

#14 opened 9 months ago by

how to make determinstic output?

#23 opened 9 months ago by

New activity in tencent/Hunyuan-7B-Instruct 9 months ago

not able to use this model (tencent/Hunyuan-7B-Instruct) via vllm

#5 opened 9 months ago by

Readme says 256k context, but config.json has "max_position_embeddings": 32768

#4 opened 9 months ago by

Local Installation Video and Testing - Step by Step

#2 opened 10 months ago by

New activity in Qwen/Qwen3-14B 10 months ago

Long context: YaRN max_position_embeddings 32K or 40k?

#10 opened about 1 year ago by

New activity in Qwen/Qwen3-8B 10 months ago

New 8B model much slower than old 7B model when running on vLLM.

#6 opened about 1 year ago by

New activity in Qwen/Qwen3-4B 10 months ago

Why are the new 4B and 8B models slower than the previous 7B-1M model??

#6 opened about 1 year ago by

New activity in Qwen/Qwen2.5-14B-Instruct-1M 10 months ago

Hardware requirements?

#7 opened over 1 year ago by

Does vllm 0.7.3 support this model？

#10 opened about 1 year ago by

Model no longer working with vLLM > v.0.8.5

#13 opened 10 months ago by