perf: drop bnb 4-bit and torch.compile for faster ZeroGPU inference 23c94ae Running vongole83 Claude Sonnet 4.6 commited on 24 days ago
add page-load warmup to pre-load models and burn torch.compile on first visit ef1de57 vongole83 commited on Apr 30
add bitsandbytes to runtime install so 4-bit quantization actually applies 392778d vongole83 commited on Apr 30
fix inference: use return_dict=True and unpack inputs for generate 7cbc66f vongole83 commited on Apr 30
programmatic install as workaround for requirements.txt not being picked up f2fee3b vongole83 commited on Apr 30
fix requirements: remove spaces (pre-installed), pin transformers>=4.51 for Gemma 4 6a17476 vongole83 commited on Apr 30