Switch to transformers version - vLLM uses too much memory on T4 GPU fc3b3a2 Jn-Huang commited on Dec 1, 2025
Switch to vLLM for faster inference with lazy loading and multi-turn fix 89babab Jn-Huang commited on Dec 1, 2025