future-html / gemmademo

Commit History

Stream LLM responses
d24a753

aadya1762 commited on

port to gradio
8cc5c82

aadya1762 commited on

use 4 bit quantized models for faster inference
5ca1c38

aadya1762 commited on

Add model config sliders
c1e7456

aadya1762 commited on

Update _model.py
bc32324
unverified

Aadya Chinubhai commited on

use llama.cpp
bdca525

aadya1762 commited on

increase cache limit -> fewer recompilations by pytorch
5160420

aadya1762 commited on

remove torch.compile
9038c58

aadya1762 commited on

initial commit
b4ecb60

aadya1762 commited on