| --- |
| title: CodeFlow |
| emoji: π |
| colorFrom: indigo |
| colorTo: blue |
| sdk: gradio |
| python_version: '3.13' |
| sdk_version: 6.16.0 |
| app_file: app.py |
| pinned: true |
| license: mit |
| short_description: Turn code into a readable Mermaid.js flowchart π! |
| tags: |
| - track:backyard |
| - achievement:offgrid |
| - achievement:sharing |
| - achievement:offbrand |
| - achievement:llama |
| - achievement:fieldnotes |
| - achievement:welltuned |
| - build-small-hackathon |
| - backyard-ai |
| - llama-cpp |
| - field-notes |
| - sharing-is-caring |
| - off-brand |
| - off-the-grid |
| - code |
| - mermaid.js |
| - flowchart |
| - small-models |
| - seq2seq |
| - gradio |
| - agentic |
| --- |
| |
| # π CodeFlow |
|
|
| **Paste code β read its logic as a flowchart.** A 30B coder model runs entirely on **CPU via llama.cpp** to translate source code into a clean, animated [Mermaid.js](https://mermaid.js.org/) control-flow diagram β with each node wired back to the exact lines it came from. |
|
|
| ### π Links |
|
|
| [π **Live Space**][space] Β· [βΆοΈ **Demo Video**][video] Β· [π¦ **Social Post**][social] Β· [π **Field Notes (blog)**][blog] Β· [π **Agent Traces**][traces] Β· [ποΈ **Fine-Tuned Model**][model] |
|
|
| [space]: https://huggingface.co/spaces/build-small-hackathon/CodeFlow "Hugging Face Space" |
| [video]: https://youtu.be/R5GbpN9FVxo "Demo video" |
| [social]: https://www.linkedin.com/feed/update/urn:li:share:7471327684539785217/ "Social post" |
| [blog]: https://huggingface.co/blog/build-small-hackathon/codeflow-field-notes "Field notes / blog post" |
| [traces]: https://huggingface.co/datasets/build-small-hackathon/codeflow-agent-traces "Agent traces dataset" |
| [model]: https://huggingface.co/build-small-hackathon/codeflow-qwen-3-finetuning "Fine-tuned model" |
|
|
| --- |
|
|
| ## β The Problem |
|
|
| Reading unfamiliar code means simulating its control flow in your head β chasing branches, loops, and early returns line by line. That's slow, error-prone, and gets worse the deeper the nesting. Existing "code β diagram" tools are usually rigid AST parsers (brittle, language-locked) or cloud LLM APIs (your code leaves the building). |
|
|
| **CodeFlow** turns any snippet into a scannable flowchart you can audit at a glance β generated by a real language model that runs **100% locally**, so nothing is sent to an external API. |
|
|
| ## βοΈ How It Works |
|
|
| ``` |
| Paste code βββΆ Generate βββΆ POST /generate_flowchart (Gradio API) |
| β |
| number the source lines + structured system prompt |
| β |
| CodeFlow fine-tune of Qwen3-Coder-30B-A3B (llama.cpp Β· CPU) |
| β |
| <thinking> β¦reasoningβ¦ </thinking> |
| graph TD β¦ nodes & edges β¦ |
| <linemap> A:1 B:2 C:3-4 </linemap> |
| β |
| strip reasoning Β· parse + validate the line-map Β· sanitize labels |
| β |
| { mermaid, linemap } βββΆ append agent_traces.jsonl |
| β |
| Mermaid render + "trace-the-path" reveal + node β code linking |
| ``` |
|
|
| 1. You paste code (or pick a pre-rendered example) into the **CodeMirror** editor and hit **Generate**. |
| 2. The backend numbers the source lines and sends them with a strict system prompt to the **CodeFlow fine-tune of Qwen3-Coder** running on **llama.cpp**. |
| 3. The model returns hidden `<thinking>`, the Mermaid `graph`, and a `<linemap>` mapping every node to its source line(s). |
| 4. The server strips the reasoning, **validates** the line-map against the source, sanitizes labels for Mermaid, and returns `{ mermaid, linemap }`. |
| 5. The frontend renders the diagram with a **trace-the-path reveal** that flows out of a persistent Start node while the canvas scrolls along in real time. |
| 6. **Node β code linking:** hover a node to highlight its source lines, click a node to jump-and-edit them, or move your cursor over a line to light up the matching node. |
| 7. Every generation is captured as a structured **agent trace** (`/traces`). |
|
|
| ## ποΈ Fine-Tuning |
|
|
| CodeFlow runs a [**LoRA fine-tune**][model] of **Qwen3-Coder-30B-A3B-Instruct** (β30.5B params), specialized for the code β Mermaid + `<linemap>` task rather than relying on the base model's general coding ability. |
|
|
| - **Data:** **2,400 synthetic examples** (2,208 train / 192 val β 8% holdout), built from **22 control-flow templates** across **Python, JavaScript, C++, and C**. |
| - **Method:** LoRA `r=16, Ξ±=32` on the attention + MLP projections, **bf16**, cosine schedule β then merged and exported to a **Q3_K_L GGUF** for CPU inference. |
| - **Validation:** the holdout is **hard-validated** β generated outputs are syntax-checked / compiled, not just eyeballed. |
|
|
| See the [model card][model] for the full data engine, `finetune.py` options, and dataset preview. |
|
|
| ## π§° Tech Stack |
|
|
| | Layer | What it is | Used for | |
| |---|---|---| |
| | **Model** | [**CodeFlow fine-tune**][model] of [Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen) (Mixture-of-Experts) | Code β Mermaid + line-map generation | |
| | **Fine-tuning** | LoRA SFT (`r=16, Ξ±=32`) on attention + MLP projections, merged to GGUF | Specializes the base model for the code β Mermaid + line-map task | |
| | **Quantization** | **Q3_K_L** GGUF (~3-bit) | Shrinks the 30B model to run on CPU | |
| | **Inference** | [`llama-cpp-python`](https://github.com/abetlen/llama-cpp-python) (llama.cpp) | Local CPU inference (`n_ctx=4096`) | |
| | **Model fetch** | `huggingface_hub` | Downloads the GGUF on first run | |
| | **Server** | [Gradio](https://www.gradio.app/) `gr.Server` + FastAPI | `/generate_flowchart` API, `/` UI, `/traces` | |
| | **Frontend** | A single self-contained `frontend.html` (vanilla JS + CSS custom properties) | Editor, diagram, animation, theming | |
| | **Editor** | [CodeMirror 6](https://codemirror.net/) β **vendored** bundle (`static/cm.bundle.js`) | Syntax-highlighted code input | |
| | **Diagrams** | [Mermaid.js 10](https://mermaid.js.org/) β **vendored** UMD (`static/mermaid.min.js`) | Flowchart rendering | |
| | **Animation** | Web Animations API | Trace-the-path reveal + theme crossfade | |
| | **Type** | Fraunces Β· Hanken Grotesk Β· JetBrains Mono β **vendored** woff2 (`static/fonts/`) | Custom, non-default look | |
| | **Assets** | All JS/CSS/fonts bundled into `static/` (no CDN at runtime) | True offline operation | |
| | **Observability** | Hand-rolled JSONL agent traces | One trace per generation, served at `/traces` | |
| | **Tests** | `smoke-test.sh` (headless Chrome) | 13 build/render checks | |
| | **Deploy** | Hugging Face Spaces | Hosting | |
|
|
| ## π’ Total Parameters |
|
|
| CodeFlow is driven by a [**LoRA fine-tune**][model] of **Qwen3-Coder-30B-A3B-Instruct** β a **Mixture-of-Experts** model with: |
|
|
| - **β 30.5 billion total parameters** (well under the 32B cap) |
| - **β 3.3 billion active parameters per token** (128 experts, 8 activated) |
|
|
| It's served as a **~3-bit (Q3_K_L) GGUF**, which compresses those 30B weights to a CPU-runnable footprint (~13 GB on disk) β letting a 30B-class model generate diagrams **off the grid**, with no GPU and no external API. |
|
|
| ## π
Badges (6 / 6) |
|
|
| These map to the Space tags above. |
|
|
| | Badge | How CodeFlow earns it | |
| |---|---| |
| | π **Off the Grid** | **No external API or CDN at runtime β period.** The model runs fully locally (Qwen3-Coder GGUF on CPU via llama.cpp), and *every* frontend asset (Mermaid, CodeMirror, the Gradio client, all fonts) is vendored into `static/`. The Gradio share tunnel is off (`share=False`). The **only** network call in the whole project is the one-time model download at startup. The UI even runs fully offline from `file://`. | |
| | π¨ **Off-Brand** | **Zero default-Gradio look.** A bespoke single-file UI: custom "Pine & Sage" palette (one-word rust fallback), Fraunces + Hanken Grotesk type, a hand-drawn decision-node logo, restyled Mermaid nodes, and a trace-the-path reveal animation β deliberately designed *not* to look templated. | |
| | π **Field Notes** | See the [blog post][blog]. | |
| | π€ **Sharing is Caring** | Open-source under **MIT**, a public Space, plus a [social post][social] sharing the process and learnings. | |
| | π€ **Agentic** | Every model generation is captured as a structured agent trace (input code, the model's reasoning, output, token usage, latency), downloadable at [`/traces`][traces]. | |
| | ποΈ **Well-Tuned** | A [**LoRA fine-tune**][model] of Qwen3-Coder-30B-A3B-Instruct (**β30.5B params β under the 32B cap**), specialized for the code β Mermaid + `<linemap>` task and shipped as the GGUF the Space actually runs. | |
|
|
| ## π₯ Demo |
|
|
| βΆοΈ **[Watch the demo video][video]** β a full walkthrough of CodeFlow in action. |
|
|
| ## π» Run It Locally |
|
|
| > First launch downloads the **~13 GB GGUF** from Hugging Face. CPU inference is slow (cold generations can take minutes) β the built-in **examples render instantly** because their diagrams are pre-computed. |
|
|
| ```bash |
| # 1. Clone |
| git clone https://huggingface.co/spaces/build-small-hackathon/CodeFlow CodeFlow |
| cd CodeFlow |
| |
| # 2. Create a virtual env |
| python -m venv .venv |
| source .venv/bin/activate # Windows: .venv\Scripts\activate |
| |
| # 3. Install deps (uses a prebuilt CPU wheel for llama-cpp-python) |
| pip install -r requirements.txt |
| |
| # 4. Run β opens a local Gradio URL |
| python app.py |
| ``` |
|
|
| Then open the printed URL. **Preview the UI without the model** by opening `frontend.html` directly in a browser (`file://`) β fully offline, since all assets are vendored in `static/`; the example presets render their diagrams instantly. |
|
|
| > **Rebuilding the vendored bundles** (optional): the CodeMirror + Gradio-client bundles in `static/` are produced by `build/build.sh` (needs Node). Mermaid and the fonts are downloaded into `static/` as well. You never need this to *run* the app β only to regenerate the bundles. |
|
|
| **Endpoints:** `/` (UI) Β· `/generate_flowchart` (API) Β· `/traces` (download all agent traces as JSONL). |
|
|
| ## ποΈ Repository Structure |
|
|
| ``` |
| CodeFlow/ |
| βββ app.py # Gradio + FastAPI server: loads the model and exposes |
| β # /generate_flowchart (API), / (UI), /static, /traces |
| βββ frontend.html # Self-contained UI β CodeMirror editor, Mermaid render, |
| β # trace-the-path animation, nodeβcode linking, theming |
| βββ static/ # Vendored frontend assets β NO CDN at runtime |
| β βββ mermaid.min.js # Mermaid (UMD, ~3.2 MB) |
| β βββ cm.bundle.js # CodeMirror 6 (single IIFE bundle) |
| β βββ gradio-client.js # @gradio/client (IIFE bundle) |
| β βββ fonts.css # @font-face β local woff2 |
| β βββ fonts/ # Fraunces Β· Hanken Grotesk Β· JetBrains Mono (woff2) |
| βββ build/ # Reproducible bundle build (Node) β build.sh + entry files |
| βββ requirements.txt # Python deps (CPU llama-cpp-python wheel, gradio, hub) |
| βββ smoke-test.sh # Headless-Chrome smoke test (13 checks) |
| βββ notes-for-blog.md # Field Notes β the full build log |
| βββ README.md # You are here |
| βββ LICENSE # MIT |
| ``` |
|
|
| ## β οΈ Limitations |
|
|
| - **CPU inference is slow.** A 30B model on CPU means cold generations can take minutes; the demo leans on pre-rendered examples for instant feedback. |
| - **3-bit quantization** trades some fidelity for the ability to run a 30B model at all β occasional imperfect diagrams. |
| - **4096-token context** β very large files won't fit; works best on functions/snippets. |
| - **Line-map depends on the model.** The `<linemap>` is LLM-generated; the server validates and drops bad entries, so nodeβcode links can be partial on tricky code. |
| - **Paraphrased labels.** Nodes describe logic in plain words (no raw code), so they read cleanly but aren't verbatim. |
| - **Mermaid parse failures** on unusual syntax are possible (the raw output is shown so nothing is lost). |
| - **Ephemeral traces on Spaces.** `agent_traces.jsonl` lives on the runtime filesystem and resets on restart/rebuild β download it before then. |
|
|
| ## π Credits |
|
|
| - **Model:** [CodeFlow fine-tune][model] of [Qwen3-Coder](https://huggingface.co/Qwen) (Qwen Team, Alibaba), built with [Unsloth](https://huggingface.co/unsloth). |
| - **Inference:** [llama.cpp](https://github.com/ggml-org/llama.cpp) via [`llama-cpp-python`](https://github.com/abetlen/llama-cpp-python) (Andrei Betlen). |
| - **App framework:** [Gradio](https://www.gradio.app/) (Hugging Face). |
| - **Diagrams:** [Mermaid.js](https://mermaid.js.org/) Β· **Editor:** [CodeMirror](https://codemirror.net/). |
| - **Type:** Fraunces, Hanken Grotesk, JetBrains Mono ([Google Fonts](https://fonts.google.com/), SIL OFL). |
| - **Built for** the Build Small Hackathon. |
|
|
| ## π License |
|
|
| Released under the **MIT License** β see [`LICENSE`](LICENSE). Β© 2026 Rishi Jain. |
|
|