Commit History

Fix: use 4-bit NF4 quantization to reduce VRAM/CPU-RAM usage
e81a3cc
verified

shantipriya commited on

Fix: use 8-bit quantization to fit 7B model in 14GB T4 VRAM
8e37327
verified

shantipriya commited on

Fix: use 8-bit quantization to fit 7B model in 14GB T4 VRAM
ee7ba19
verified

shantipriya commited on

Fix: add demo.queue() to handle long inference in browser
b094e68
verified

shantipriya commited on

Fix: remove @spaces.GPU decorator (not needed on T4 hardware)
b82f9c0
verified

shantipriya commited on

Fix: remove @spaces.GPU decorator (not needed on T4 hardware)
f771006
verified

shantipriya commited on

Fix: load model at startup for T4 GPU (not lazy)
47f9edd
verified

shantipriya commited on

Fix: remove image.submit() not available in Gradio 5
caf44cc
verified

shantipriya commited on

Add README.md
93f88a6
verified

shantipriya commited on

Add requirements.txt
b826182
verified

shantipriya commited on

Add app.py
9a56b59
verified

shantipriya commited on

initial commit
bb0b084
verified

shantipriya commited on