Make code compatible with Gradio 6.0 structured content format 38dedc7 Jn-Huang commited on Dec 1, 2025
Fix Gradio ChatInterface: remove lambda wrapper, add lazy loading, make public 4cc1531 Jn-Huang commited on Dec 1, 2025
Switch to transformers version - vLLM uses too much memory on T4 GPU fc3b3a2 Jn-Huang commited on Dec 1, 2025
Reduce vLLM GPU memory utilization to 0.7 to avoid OOM on T4 GPU 0600d50 Jn-Huang commited on Dec 1, 2025
Switch to vLLM for faster inference with lazy loading and multi-turn fix 89babab Jn-Huang commited on Dec 1, 2025
Fix bugs: use token param, apply Llama 3.1 chat template, decode only new tokens 1a77428 Jn-Huang commited on Dec 1, 2025