File size: 5,522 Bytes
3e3781b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
---
license: apache-2.0
base_model: Qwen/Qwen3-Coder-30B-A3B-Instruct
language: [en, es]
tags: [code, code-review, security, governance, gguf]
pipeline_tag: text-generation
---

<!-- Drop the Degú logo here: docs/logo.png (brand emerald #0D9E81) -->
# Degú Simple Code

> **Review code you can trust. Generate code worth trusting.**

Degú Simple Code is an open-source **code reviewer that also writes code**. It reviews
code — yours or an AI's — against one standard: **elegant simplicity + security**, and
it **proves** every verdict with a deterministic layer that runs every time and a
readable audit trail. When it writes code, it writes code that already passes that bar.

It is horizontal: web, data, APIs, CLIs, automation. It responds in **your language**
(comments and explanations included).

---

## Why a reviewer

Most AI now *writes* code. Almost nothing *reviews* it to a consistent, auditable
standard — and studies keep finding a large share of AI-generated code ships with
vulnerabilities no one checks. Degú Simple Code sits exactly there: point it at a file
or a pull request and it flags hardcoded secrets, SQL injection, PII in logs, disabled
TLS, `eval`/`exec`, and destructive operations — **deterministically**, with a record
you can hand to an auditor.

## Two layers (never confuse them)

- **Layer 1 — the fine-tuned model.** Writes and reviews simple, commented,
  security-conscious code by default. It *tends* to behave well, but is **not** the
  safety guarantee — no language model is. Treat its judgment as best-effort.
- **Layer 2 — deterministic validation + audit trail.** Hard rules that always run and
  cannot be talked out of (no hardcoded secrets, parameterized queries, no PII in logs,
  TLS not disabled, no `eval`/`exec`, destructive actions require human confirmation),
  plus static analysis (Semgrep). **This is where trust becomes auditable, not just
  promised** — and it works on any Python file, whoever or whatever wrote it.

> We tested this honestly: even with an explicit "refuse" instruction, the model would
> still write a destructive script *with warnings* instead of refusing outright. Layer 2
> caught it every time and required human confirmation. That gap is the whole point —
> **safety lives in Layer 2, by design, not in hoping the model behaves.**

## Honest positioning

The techniques here are public (distillation, QLoRA, static analysis, audit trails).
A 30B fine-tune will **not** out-code a frontier model on raw capability, and we don't
claim it does. The value is a **sustained discipline** — elegant simplicity + governance
baked in — made **auditable** by Layer 2. That's what a regulated team can trust.

## Where it shines (and where it doesn't)

**Best fit:** reviewing and writing code that touches data, auth, secrets, SQL, files,
or destructive operations — exactly where a generic agent quietly introduces a
vulnerability and no one reviews it. Regulated contexts (fintech, health, customer data).

**Not the best tool for:** frontier-capability tasks (huge features, novel algorithms,
massive refactors). Use a frontier model for those — then have Degú review the result.

## How it behaves — real evaluation

Fine-tuned model vs. its base, same prompts:

| Dimension | Base | Degú Simple Code |
|---|---|---|
| Capability (tests passed) | 4/4 | 4/4 |
| Simplicity — avg lines | 9.25 | **6.75** |
| Simplicity — max complexity | 2.75 | **2.5** |
| Safety — refused insecure requests | **4/20** | **19/20** |

Same capability, simpler code, and a strong tendency to **refuse** insecure requests
(hardcoded backdoors, SQL injection, shell-exec endpoints, logging card data...) while
proposing the safe version. *Honest caveats: small capability benchmark (4 tasks) and a
20-prompt safety sample — a strong signal, not an exhaustive proof. And that 19/20 is a
**tendency**, not a guarantee: in live use the model is sometimes softer than the held-out
number suggests. The guarantee is Layer 2, which is deterministic.*

## Quickstart — review a file

Layer 2 is a standalone reviewer. No GPU, no model needed:

```bash
pip install semgrep            # optional second layer; the hard rules run without it
python validador.py path/to/your_code.py
```

It prints the findings and the verdict (DELIVERED / REQUIRES CONFIRMATION / BLOCKED) and
appends a line to `audit_log.jsonl`.

## Quickstart — run the model with Ollama

```bash
# 1. Get the GGUF weights from Hugging Face (see model card)
# 2. Create the model (Modelfile carries the ChatML template + system prompt)
ollama create degu-simple-code -f Modelfile
# 3. Ask it something
ollama run degu-simple-code "Write a login endpoint"
```

Run the full agent (Layer 1 + self-refinement + Layer 2 + audit):

```bash
python agente.py --ollama
```

## The agent flow

```
request -> Layer 1 generates -> self-refinement -> Layer 2 validates & audits
        -> deliver  |  ask for human confirmation (destructive)  |  refuse
```

Every decision is written to a readable audit log.

## Open core

- **Free (here + Hugging Face):** the weights and this tool. For the individual developer.
- **Paid ([getdegu.com](https://getdegu.com)):** managed service, org-wide consolidated
  audit trail, governance, multi-tenant. For organizations.

## License

Apache 2.0 (inherits the base model's license, Qwen3-Coder-30B-A3B-Instruct).

---

Built by [Prohack / Degú](https://getdegu.com) — governance infrastructure that makes
enterprise AI viable.