gdscript-assistant / README.md
vivekchakraverty's picture
GDScript RAG assistant: app + corpus (index added later via Colab)
777ea0e verified
---
title: GDScript Coding Assistant
emoji: πŸ€–
colorFrom: purple
colorTo: green
sdk: gradio
app_file: app.py
pinned: false
license: mit
short_description: RAG GDScript assistant with gdtoolkit validation
---
# πŸ€– GDScript Coding Assistant
A Godot 4 / GDScript coding assistant that answers using **RAG** over a curated
**91,720-chunk** corpus crawled from the official docs, demo repos, tutorial
sites and YouTube descriptions. Generated GDScript is **syntax-validated with
`gdtoolkit`** before it's shown.
## How it works
```
question ─▢ jina query-embed (CPU) ─▢ FAISS top-k GDScript snippets
─▢ Qwen2.5-Coder-7B-Instruct (ZeroGPU) ─▢ answer
─▢ gdtoolkit parse + lint (CPU) ─▢ βœ…/❌ + optional 1Γ— self-fix
```
- **Retriever:** `jinaai/jina-embeddings-v2-base-code` (768-dim, code-tuned),
prebuilt FAISS cosine index bundled via Git LFS (`data/embeddings.faiss`,
`data/chunks.jsonl`).
- **Generator:** `Qwen/Qwen2.5-Coder-7B-Instruct` on **ZeroGPU** (only the
generation call uses the GPU).
- **Validation:** `gdtoolkit` (`gdparse` syntax + `gdlint` style). Note: this
checks *syntax and style*, not runtime/scene semantics.
## Setup (hardware)
In **Space β†’ Settings β†’ Hardware**, select **ZeroGPU**. The `spaces` package +
`@spaces.GPU` decorator in `generate.py` do the rest.
## Local dev
```bash
pip install -r requirements.txt
# fast UI/flow test without downloading the 7B model:
GDRAG_STUB_LLM=1 python app.py
# real retrieval needs data/embeddings.faiss + data/chunks.jsonl present
python rag.py "how do I use @export and signals"
python validate.py
```
## Data provenance & licensing
Snippets come from public Godot resources with **varying licenses** (docs CC-BY,
repos MIT/Apache/GPL/…). Each retrieved snippet shows its source; respect the
original licenses when reusing generated code.