Auto-correct EVERY broken GDScript block in place (capped at MAX_FIX_PASSES) 635e6fb Running vivekchakraverty Claude Opus 4.8 commited on 1 day ago
Restore max_new_tokens to 512 (4-bit gen is fast: ~25 tok/s on GPU) 6246295 vivekchakraverty commited on 2 days ago
Load Qwen2.5-Coder-7B in 4-bit (nf4) inside the GPU worker 2709f63 vivekchakraverty Claude Opus 4.8 commited on 2 days ago
ZeroGPU: load model on GPU inside @spaces.GPU (canonical), not at import cccb7d5 vivekchakraverty Claude Opus 4.8 commited on 2 days ago
ZeroGPU: raise GPU budget 120->180s, cap max_new_tokens 512->256 5fa56c1 vivekchakraverty commited on 2 days ago
ZeroGPU: force model.to(cuda) in fn (ignore stale is_available); no cuda at import 743e3d3 vivekchakraverty commited on 2 days ago
diag: log cuda availability + model device + gen timing; force model.to(cuda) in fn 5ff14e5 vivekchakraverty commited on 2 days ago
ZeroGPU: keep model GPU-resident (canonical pattern) 8df32ec vivekchakraverty Claude Opus 4.8 commited on 2 days ago
Load the LLM once at startup instead of per ZeroGPU call 043484b vivekchakraverty Claude Opus 4.8 commited on 2 days ago
Hardcode chat memory to 4 turns (lock history_turns slider) 69036da vivekchakraverty Claude Opus 4.8 commited on 2 days ago
Add bounded multi-turn chat memory + turns slider (app.py) 217a06b verified vivekchakraverty commited on 2 days ago
Add bounded multi-turn chat memory (prompt.py) 0298f08 verified vivekchakraverty commited on 2 days ago
Fix ZeroGPU retrieval: pin jina query embedder to CPU e48654b verified vivekchakraverty commited on 2 days ago
Fix Colab OOM: cap seq length + smaller batch c314e63 verified vivekchakraverty commited on 2 days ago
GDScript RAG assistant: app + corpus (index added later via Colab) 777ea0e verified vivekchakraverty commited on 2 days ago