File size: 5,522 Bytes
3e3781b | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 | ---
license: apache-2.0
base_model: Qwen/Qwen3-Coder-30B-A3B-Instruct
language: [en, es]
tags: [code, code-review, security, governance, gguf]
pipeline_tag: text-generation
---
<!-- Drop the Degú logo here: docs/logo.png (brand emerald #0D9E81) -->
# Degú Simple Code
> **Review code you can trust. Generate code worth trusting.**
Degú Simple Code is an open-source **code reviewer that also writes code**. It reviews
code — yours or an AI's — against one standard: **elegant simplicity + security**, and
it **proves** every verdict with a deterministic layer that runs every time and a
readable audit trail. When it writes code, it writes code that already passes that bar.
It is horizontal: web, data, APIs, CLIs, automation. It responds in **your language**
(comments and explanations included).
---
## Why a reviewer
Most AI now *writes* code. Almost nothing *reviews* it to a consistent, auditable
standard — and studies keep finding a large share of AI-generated code ships with
vulnerabilities no one checks. Degú Simple Code sits exactly there: point it at a file
or a pull request and it flags hardcoded secrets, SQL injection, PII in logs, disabled
TLS, `eval`/`exec`, and destructive operations — **deterministically**, with a record
you can hand to an auditor.
## Two layers (never confuse them)
- **Layer 1 — the fine-tuned model.** Writes and reviews simple, commented,
security-conscious code by default. It *tends* to behave well, but is **not** the
safety guarantee — no language model is. Treat its judgment as best-effort.
- **Layer 2 — deterministic validation + audit trail.** Hard rules that always run and
cannot be talked out of (no hardcoded secrets, parameterized queries, no PII in logs,
TLS not disabled, no `eval`/`exec`, destructive actions require human confirmation),
plus static analysis (Semgrep). **This is where trust becomes auditable, not just
promised** — and it works on any Python file, whoever or whatever wrote it.
> We tested this honestly: even with an explicit "refuse" instruction, the model would
> still write a destructive script *with warnings* instead of refusing outright. Layer 2
> caught it every time and required human confirmation. That gap is the whole point —
> **safety lives in Layer 2, by design, not in hoping the model behaves.**
## Honest positioning
The techniques here are public (distillation, QLoRA, static analysis, audit trails).
A 30B fine-tune will **not** out-code a frontier model on raw capability, and we don't
claim it does. The value is a **sustained discipline** — elegant simplicity + governance
baked in — made **auditable** by Layer 2. That's what a regulated team can trust.
## Where it shines (and where it doesn't)
**Best fit:** reviewing and writing code that touches data, auth, secrets, SQL, files,
or destructive operations — exactly where a generic agent quietly introduces a
vulnerability and no one reviews it. Regulated contexts (fintech, health, customer data).
**Not the best tool for:** frontier-capability tasks (huge features, novel algorithms,
massive refactors). Use a frontier model for those — then have Degú review the result.
## How it behaves — real evaluation
Fine-tuned model vs. its base, same prompts:
| Dimension | Base | Degú Simple Code |
|---|---|---|
| Capability (tests passed) | 4/4 | 4/4 |
| Simplicity — avg lines | 9.25 | **6.75** |
| Simplicity — max complexity | 2.75 | **2.5** |
| Safety — refused insecure requests | **4/20** | **19/20** |
Same capability, simpler code, and a strong tendency to **refuse** insecure requests
(hardcoded backdoors, SQL injection, shell-exec endpoints, logging card data...) while
proposing the safe version. *Honest caveats: small capability benchmark (4 tasks) and a
20-prompt safety sample — a strong signal, not an exhaustive proof. And that 19/20 is a
**tendency**, not a guarantee: in live use the model is sometimes softer than the held-out
number suggests. The guarantee is Layer 2, which is deterministic.*
## Quickstart — review a file
Layer 2 is a standalone reviewer. No GPU, no model needed:
```bash
pip install semgrep # optional second layer; the hard rules run without it
python validador.py path/to/your_code.py
```
It prints the findings and the verdict (DELIVERED / REQUIRES CONFIRMATION / BLOCKED) and
appends a line to `audit_log.jsonl`.
## Quickstart — run the model with Ollama
```bash
# 1. Get the GGUF weights from Hugging Face (see model card)
# 2. Create the model (Modelfile carries the ChatML template + system prompt)
ollama create degu-simple-code -f Modelfile
# 3. Ask it something
ollama run degu-simple-code "Write a login endpoint"
```
Run the full agent (Layer 1 + self-refinement + Layer 2 + audit):
```bash
python agente.py --ollama
```
## The agent flow
```
request -> Layer 1 generates -> self-refinement -> Layer 2 validates & audits
-> deliver | ask for human confirmation (destructive) | refuse
```
Every decision is written to a readable audit log.
## Open core
- **Free (here + Hugging Face):** the weights and this tool. For the individual developer.
- **Paid ([getdegu.com](https://getdegu.com)):** managed service, org-wide consolidated
audit trail, governance, multi-tenant. For organizations.
## License
Apache 2.0 (inherits the base model's license, Qwen3-Coder-30B-A3B-Instruct).
---
Built by [Prohack / Degú](https://getdegu.com) — governance infrastructure that makes
enterprise AI viable.
|