Spaces:

vivekchakraverty
/

gdscript-assistant

Running on Zero

App Files Files Community

gdscript-assistant / generate.py

Commit History

Restore max_new_tokens to 512 (4-bit gen is fast: ~25 tok/s on GPU)

6246295

vivekchakraverty commited on 1 day ago

Load Qwen2.5-Coder-7B in 4-bit (nf4) inside the GPU worker

2709f63

vivekchakraverty Claude Opus 4.8 commited on 1 day ago

ZeroGPU: load model on GPU inside @spaces.GPU (canonical), not at import

cccb7d5

vivekchakraverty Claude Opus 4.8 commited on 1 day ago

ZeroGPU: raise GPU budget 120->180s, cap max_new_tokens 512->256

5fa56c1

vivekchakraverty commited on 1 day ago

ZeroGPU: force model.to(cuda) in fn (ignore stale is_available); no cuda at import

743e3d3

vivekchakraverty commited on 2 days ago

diag: log cuda availability + model device + gen timing; force model.to(cuda) in fn

5ff14e5

vivekchakraverty commited on 2 days ago

ZeroGPU: keep model GPU-resident (canonical pattern)

8df32ec

vivekchakraverty Claude Opus 4.8 commited on 2 days ago

Load the LLM once at startup instead of per ZeroGPU call

043484b

vivekchakraverty Claude Opus 4.8 commited on 2 days ago

GDScript RAG assistant: app + corpus (index added later via Colab)

777ea0e
verified

vivekchakraverty commited on 2 days ago