Commit History

perf: drop bnb 4-bit and torch.compile for faster ZeroGPU inference
23c94ae
Running

vongole83 Claude Sonnet 4.6 commited on

fix: move demo.load inside Blocks context
21c8ff7

vongole83 commited on

add page-load warmup to pre-load models and burn torch.compile on first visit
ef1de57

vongole83 commited on

add torch.compile with reduce-overhead mode
02bac93

vongole83 commited on

add streaming + sdpa attention for faster generation UX
df41ead

vongole83 commited on

add bitsandbytes to runtime install so 4-bit quantization actually applies
392778d

vongole83 commited on

fix inference: use return_dict=True and unpack inputs for generate
7cbc66f

vongole83 commited on

bypass adapter_config.json by downloading weights-only snapshot
60717df

vongole83 commited on

load fine-tune directly as merged model, drop peft dependency
a854c2a

vongole83 commited on

add 4-bit quantization to bring model size back to ~3GB each
7a27688

vongole83 commited on

fix dtype deprecation warning
4131f41

vongole83 commited on

programmatic install as workaround for requirements.txt not being picked up
f2fee3b

vongole83 commited on

fix requirements: remove spaces (pre-installed), pin transformers>=4.51 for Gemma 4
6a17476

vongole83 commited on

force rebuild
eb89851

vongole83 commited on

First Commit
bcd34ff

vongole83 commited on

initial commit
41cbcc0
verified

vongole83 commited on