ZeroGPU-LLM-Inference / requirements.txt
Alikestocode's picture
Migrate to AWQ quantization with FlashAttention-2
06b4cf5
raw
history blame
164 Bytes
wheel
streamlit
ddgs
gradio>=5.0.0
torch>=2.8.0
transformers>=4.53.3
spaces
sentencepiece
accelerate
autoawq
flash-attn>=2.5.0
timm
compressed-tensors
bitsandbytes