Switch to transformers version - vLLM uses too much memory on T4 GPU fc3b3a2 Jn-Huang commited on 21 days ago
Switch to vLLM for faster inference with lazy loading and multi-turn fix 89babab Jn-Huang commited on 21 days ago