gdscript-assistant / generate.py

Commit History

Restore max_new_tokens to 512 (4-bit gen is fast: ~25 tok/s on GPU)
6246295

vivekchakraverty commited on

Load Qwen2.5-Coder-7B in 4-bit (nf4) inside the GPU worker
2709f63

vivekchakraverty Claude Opus 4.8 commited on

ZeroGPU: load model on GPU inside @spaces.GPU (canonical), not at import
cccb7d5

vivekchakraverty Claude Opus 4.8 commited on

ZeroGPU: raise GPU budget 120->180s, cap max_new_tokens 512->256
5fa56c1

vivekchakraverty commited on

ZeroGPU: force model.to(cuda) in fn (ignore stale is_available); no cuda at import
743e3d3

vivekchakraverty commited on

diag: log cuda availability + model device + gen timing; force model.to(cuda) in fn
5ff14e5

vivekchakraverty commited on

ZeroGPU: keep model GPU-resident (canonical pattern)
8df32ec

vivekchakraverty Claude Opus 4.8 commited on

Load the LLM once at startup instead of per ZeroGPU call
043484b

vivekchakraverty Claude Opus 4.8 commited on

GDScript RAG assistant: app + corpus (index added later via Colab)
777ea0e
verified

vivekchakraverty commited on