File size: 1,873 Bytes
55ee315
777ea0e
 
55ee315
 
 
 
 
777ea0e
 
55ee315
 
777ea0e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
---
title: GDScript Coding Assistant
emoji: 🤖
colorFrom: purple
colorTo: green
sdk: gradio
app_file: app.py
pinned: false
license: mit
short_description: RAG GDScript assistant with gdtoolkit validation
---

# 🤖 GDScript Coding Assistant

A Godot 4 / GDScript coding assistant that answers using **RAG** over a curated
**91,720-chunk** corpus crawled from the official docs, demo repos, tutorial
sites and YouTube descriptions. Generated GDScript is **syntax-validated with
`gdtoolkit`** before it's shown.

## How it works

```
question ─▶ jina query-embed (CPU) ─▶ FAISS top-k GDScript snippets
         ─▶ Qwen2.5-Coder-7B-Instruct (ZeroGPU) ─▶ answer
         ─▶ gdtoolkit parse + lint (CPU) ─▶ ✅/❌ + optional 1× self-fix
```

- **Retriever:** `jinaai/jina-embeddings-v2-base-code` (768-dim, code-tuned),
  prebuilt FAISS cosine index bundled via Git LFS (`data/embeddings.faiss`,
  `data/chunks.jsonl`).
- **Generator:** `Qwen/Qwen2.5-Coder-7B-Instruct` on **ZeroGPU** (only the
  generation call uses the GPU).
- **Validation:** `gdtoolkit` (`gdparse` syntax + `gdlint` style). Note: this
  checks *syntax and style*, not runtime/scene semantics.

## Setup (hardware)

In **Space → Settings → Hardware**, select **ZeroGPU**. The `spaces` package +
`@spaces.GPU` decorator in `generate.py` do the rest.

## Local dev

```bash
pip install -r requirements.txt
# fast UI/flow test without downloading the 7B model:
GDRAG_STUB_LLM=1 python app.py
# real retrieval needs data/embeddings.faiss + data/chunks.jsonl present
python rag.py "how do I use @export and signals"
python validate.py
```

## Data provenance & licensing

Snippets come from public Godot resources with **varying licenses** (docs CC-BY,
repos MIT/Apache/GPL/…). Each retrieved snippet shows its source; respect the
original licenses when reusing generated code.