ZeroGPU-LLM-Inference / requirements.txt
Alikestocode's picture
Implement vLLM with LLM Compressor and performance optimizations
a79facb
raw
history blame
197 Bytes
wheel
streamlit
ddgs
gradio>=5.0.0
torch>=2.8.0
transformers>=4.53.3
spaces
sentencepiece
accelerate
vllm>=0.6.0
llmcompressor>=0.1.0
autoawq
flash-attn>=2.5.0
timm
compressed-tensors
bitsandbytes