File size: 11,890 Bytes
9ac80bb
 
2d635f4
43c8912
a036b27
9ac80bb
 
1433b16
9ac80bb
1433b16
9ac80bb
1433b16
 
6a55f00
626c044
 
 
 
 
e4a872b
 
1433b16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cf2d7d2
49b5a13
2a4077f
cf2d7d2
3dffa87
1433b16
9ac80bb
 
1433b16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2a4077f
1433b16
 
 
 
 
 
 
5054925
1433b16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5de5a84
1433b16
 
 
 
 
 
 
 
 
 
 
 
 
42bbd85
1433b16
 
 
 
 
 
42bbd85
1433b16
42bbd85
1433b16
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
---
title: CodeFlow
emoji: πŸ“Š
colorFrom: indigo
colorTo: blue
sdk: gradio
python_version: '3.13'
sdk_version: 6.16.0
app_file: app.py
pinned: true
license: mit
short_description: Turn code into a readable Mermaid.js flowchart πŸ“Š!
tags:
- track:backyard
- achievement:offgrid
- achievement:sharing
- achievement:offbrand
- achievement:llama
- achievement:fieldnotes
- build-small-hackathon
- backyard-ai
- llama-cpp
- field-notes
- sharing-is-caring
- off-brand
- off-the-grid
- code
- mermaid.js
- flowchart
- small-models
- seq2seq
- gradio
- agentic
---

# πŸ“Š CodeFlow

**Paste code β†’ read its logic as a flowchart.** A 30B coder model runs entirely on **CPU via llama.cpp** to translate source code into a clean, animated [Mermaid.js](https://mermaid.js.org/) control-flow diagram β€” with each node wired back to the exact lines it came from.

### πŸ”— Links

[πŸš€ **Live Space**][space] Β· [▢️ **Demo Video**][video] Β· [🐦 **Social Post**][social] Β· [πŸ““ **Field Notes (blog)**][blog] Β· [πŸ” **Agent Traces**][traces]

<!-- ╔═══════════════════════════════════════════════════════════════╗
     β•‘  FILL THESE IN β€” replace each REPLACE_ME with your real URL.   β•‘
     β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β• -->
[space]:  https://huggingface.co/spaces/build-small-hackathon/CodeFlow  "Hugging Face Space"
[video]:  https://youtu.be/R5GbpN9FVxo  "Demo video"
[social]: https://www.linkedin.com/feed/update/urn:li:share:7471327684539785217/  "Social post"
[blog]:   https://huggingface.co/blog/build-small-hackathon/codeflow-field-notes  "Field notes / blog post"
[traces]: https://huggingface.co/datasets/build-small-hackathon/codeflow-agent-traces  "Agent traces dataset"

---

## ❓ The Problem

Reading unfamiliar code means simulating its control flow in your head β€” chasing branches, loops, and early returns line by line. That's slow, error-prone, and gets worse the deeper the nesting. Existing "code β†’ diagram" tools are usually rigid AST parsers (brittle, language-locked) or cloud LLM APIs (your code leaves the building).

**CodeFlow** turns any snippet into a scannable flowchart you can audit at a glance β€” generated by a real language model that runs **100% locally**, so nothing is sent to an external API.

## βš™οΈ How It Works

```
 Paste code ──▢ Generate ──▢ POST /generate_flowchart        (Gradio API)
                                    β”‚
                    number the source lines + structured system prompt
                                    β”‚
                     Qwen3-Coder-30B-A3B   (llama.cpp Β· CPU)
                                    β”‚
                 <thinking> …reasoning… </thinking>
                 graph TD … nodes & edges …
                 <linemap> A:1  B:2  C:3-4 </linemap>
                                    β”‚
        strip reasoning Β· parse + validate the line-map Β· sanitize labels
                                    β”‚
                  { mermaid, linemap }  ──▢  append agent_traces.jsonl
                                    β”‚
   Mermaid render + "trace-the-path" reveal + node ↔ code linking
```

1. You paste code (or pick a pre-rendered example) into the **CodeMirror** editor and hit **Generate**.
2. The backend numbers the source lines and sends them with a strict system prompt to **Qwen3-Coder** running on **llama.cpp**.
3. The model returns hidden `<thinking>`, the Mermaid `graph`, and a `<linemap>` mapping every node to its source line(s).
4. The server strips the reasoning, **validates** the line-map against the source, sanitizes labels for Mermaid, and returns `{ mermaid, linemap }`.
5. The frontend renders the diagram with a **trace-the-path reveal** that flows out of a persistent Start node while the canvas scrolls along in real time.
6. **Node ↔ code linking:** hover a node to highlight its source lines, click a node to jump-and-edit them, or move your cursor over a line to light up the matching node.
7. Every generation is captured as a structured **agent trace** (`/traces`).

## 🧰 Tech Stack

| Layer | What it is | Used for |
|---|---|---|
| **Model** | [Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen) (Mixture-of-Experts) | Code β†’ Mermaid + line-map generation |
| **Quantization** | [Unsloth](https://huggingface.co/unsloth) Dynamic **UD-Q3_K_XL** GGUF (~3-bit) | Shrinks the 30B model to run on CPU |
| **Inference** | [`llama-cpp-python`](https://github.com/abetlen/llama-cpp-python) (llama.cpp) | Local CPU inference (`n_ctx=4096`) |
| **Model fetch** | `huggingface_hub` | Downloads the GGUF on first run |
| **Server** | [Gradio](https://www.gradio.app/) `gr.Server` + FastAPI | `/generate_flowchart` API, `/` UI, `/traces` |
| **Frontend** | A single self-contained `frontend.html` (vanilla JS + CSS custom properties) | Editor, diagram, animation, theming |
| **Editor** | [CodeMirror 6](https://codemirror.net/) β€” **vendored** bundle (`static/cm.bundle.js`) | Syntax-highlighted code input |
| **Diagrams** | [Mermaid.js 10](https://mermaid.js.org/) β€” **vendored** UMD (`static/mermaid.min.js`) | Flowchart rendering |
| **Animation** | Web Animations API | Trace-the-path reveal + theme crossfade |
| **Type** | Fraunces Β· Hanken Grotesk Β· JetBrains Mono β€” **vendored** woff2 (`static/fonts/`) | Custom, non-default look |
| **Assets** | All JS/CSS/fonts bundled into `static/` (no CDN at runtime) | True offline operation |
| **Observability** | Hand-rolled JSONL agent traces | One trace per generation, served at `/traces` |
| **Tests** | `smoke-test.sh` (headless Chrome) | 13 build/render checks |
| **Deploy** | Hugging Face Spaces | Hosting |

## πŸ”’ Total Parameters

CodeFlow is driven by **Qwen3-Coder-30B-A3B-Instruct** β€” a **Mixture-of-Experts** model with:

- **β‰ˆ 30.5 billion total parameters**
- **β‰ˆ 3.3 billion active parameters per token** (128 experts, 8 activated)

It's served as an **Unsloth Dynamic ~3-bit (UD-Q3_K_XL) GGUF**, which compresses those 30B weights to a CPU-runnable footprint (~13 GB on disk) β€” letting a 30B-class model generate diagrams **off the grid**, with no GPU and no external API.

## πŸ… Badges (5 / 6)

These map to the Space tags above.

| Badge | How CodeFlow earns it |
|---|---|
| πŸ”Œ **Off the Grid** | **No external API or CDN at runtime β€” period.** The model runs fully locally (Qwen3-Coder GGUF on CPU via llama.cpp), and *every* frontend asset (Mermaid, CodeMirror, the Gradio client, all fonts) is vendored into `static/`. The Gradio share tunnel is off (`share=False`). The **only** network call in the whole project is the one-time model download at startup. The UI even runs fully offline from `file://`. |
| 🎨 **Off-Brand** | **Zero default-Gradio look.** A bespoke single-file UI: custom "Pine & Sage" palette (one-word rust fallback), Fraunces + Hanken Grotesk type, a hand-drawn decision-node logo, restyled Mermaid nodes, and a trace-the-path reveal animation β€” deliberately designed *not* to look templated. |
| πŸ““ **Field Notes** | See the [blog post][blog]. |
| 🀝 **Sharing is Caring** | Open-source under **MIT**, a public Space, plus a [social post][social] sharing the process and learnings. |
| πŸ€– **Agentic** | Every model generation is captured as a structured agent trace (input code, the model's reasoning, output, token usage, latency), downloadable at [`/traces`][traces]. |

## πŸŽ₯ Demo

▢️ **[Watch the demo video][video]** β€” a full walkthrough of CodeFlow in action.

## πŸ’» Run It Locally

> First launch downloads the **~13 GB GGUF** from Hugging Face. CPU inference is slow (cold generations can take minutes) β€” the built-in **examples render instantly** because their diagrams are pre-computed.

```bash
# 1. Clone
git clone https://huggingface.co/spaces/build-small-hackathon/CodeFlow CodeFlow
cd CodeFlow

# 2. Create a virtual env
python -m venv .venv
source .venv/bin/activate        # Windows: .venv\Scripts\activate

# 3. Install deps (uses a prebuilt CPU wheel for llama-cpp-python)
pip install -r requirements.txt

# 4. Run β€” opens a local Gradio URL
python app.py
```

Then open the printed URL. **Preview the UI without the model** by opening `frontend.html` directly in a browser (`file://`) β€” fully offline, since all assets are vendored in `static/`; the example presets render their diagrams instantly.

> **Rebuilding the vendored bundles** (optional): the CodeMirror + Gradio-client bundles in `static/` are produced by `build/build.sh` (needs Node). Mermaid and the fonts are downloaded into `static/` as well. You never need this to *run* the app β€” only to regenerate the bundles.

**Endpoints:** `/` (UI) Β· `/generate_flowchart` (API) Β· `/traces` (download all agent traces as JSONL).

## πŸ—‚οΈ Repository Structure

```
CodeFlow/
β”œβ”€β”€ app.py             # Gradio + FastAPI server: loads the model and exposes
β”‚                      #   /generate_flowchart (API), / (UI), /static, /traces
β”œβ”€β”€ frontend.html      # Self-contained UI β€” CodeMirror editor, Mermaid render,
β”‚                      #   trace-the-path animation, node↔code linking, theming
β”œβ”€β”€ static/            # Vendored frontend assets β€” NO CDN at runtime
β”‚   β”œβ”€β”€ mermaid.min.js #   Mermaid (UMD, ~3.2 MB)
β”‚   β”œβ”€β”€ cm.bundle.js   #   CodeMirror 6 (single IIFE bundle)
β”‚   β”œβ”€β”€ gradio-client.js #  @gradio/client (IIFE bundle)
β”‚   β”œβ”€β”€ fonts.css      #   @font-face β†’ local woff2
β”‚   └── fonts/         #   Fraunces Β· Hanken Grotesk Β· JetBrains Mono (woff2)
β”œβ”€β”€ build/             # Reproducible bundle build (Node) β€” build.sh + entry files
β”œβ”€β”€ requirements.txt   # Python deps (CPU llama-cpp-python wheel, gradio, hub)
β”œβ”€β”€ smoke-test.sh      # Headless-Chrome smoke test (13 checks)
β”œβ”€β”€ notes-for-blog.md  # Field Notes β€” the full build log
β”œβ”€β”€ README.md          # You are here
└── LICENSE            # MIT
```

## ⚠️ Limitations

- **CPU inference is slow.** A 30B model on CPU means cold generations can take minutes; the demo leans on pre-rendered examples for instant feedback.
- **3-bit quantization** trades some fidelity for the ability to run a 30B model at all β€” occasional imperfect diagrams.
- **4096-token context** β€” very large files won't fit; works best on functions/snippets.
- **Line-map depends on the model.** The `<linemap>` is LLM-generated; the server validates and drops bad entries, so node↔code links can be partial on tricky code.
- **Paraphrased labels.** Nodes describe logic in plain words (no raw code), so they read cleanly but aren't verbatim.
- **Mermaid parse failures** on unusual syntax are possible (the raw output is shown so nothing is lost).
- **Ephemeral traces on Spaces.** `agent_traces.jsonl` lives on the runtime filesystem and resets on restart/rebuild β€” download it before then.

## πŸ™ Credits

- **Model:** [Qwen3-Coder](https://huggingface.co/Qwen) (Qwen Team, Alibaba) β€” GGUF quant by [Unsloth](https://huggingface.co/unsloth).
- **Inference:** [llama.cpp](https://github.com/ggml-org/llama.cpp) via [`llama-cpp-python`](https://github.com/abetlen/llama-cpp-python) (Andrei Betlen).
- **App framework:** [Gradio](https://www.gradio.app/) (Hugging Face).
- **Diagrams:** [Mermaid.js](https://mermaid.js.org/) Β· **Editor:** [CodeMirror](https://codemirror.net/).
- **Type:** Fraunces, Hanken Grotesk, JetBrains Mono ([Google Fonts](https://fonts.google.com/), SIL OFL).
- **Built for** the Build Small Hackathon.

## πŸ“„ License

Released under the **MIT License** β€” see [`LICENSE`](LICENSE). Β© 2026 Rishi Jain.