Fix: use 4-bit NF4 quantization to reduce VRAM/CPU-RAM usage e81a3cc verified shantipriya commited on Mar 14
Fix: use 8-bit quantization to fit 7B model in 14GB T4 VRAM 8e37327 verified shantipriya commited on Mar 14
Fix: use 8-bit quantization to fit 7B model in 14GB T4 VRAM ee7ba19 verified shantipriya commited on Mar 14
Fix: add demo.queue() to handle long inference in browser b094e68 verified shantipriya commited on Mar 14
Fix: remove @spaces.GPU decorator (not needed on T4 hardware) b82f9c0 verified shantipriya commited on Mar 13
Fix: remove @spaces.GPU decorator (not needed on T4 hardware) f771006 verified shantipriya commited on Mar 13