Remove state_of_the_repo_audit.md - cleanup for OS launch
Browse files- state_of_the_repo_audit.md +0 -98
state_of_the_repo_audit.md
DELETED
|
@@ -1,98 +0,0 @@
|
|
| 1 |
-
summary: |
|
| 2 |
-
The **BitTransformerLM** repository is well-structured and aligns closely with the README’s feature set.
|
| 3 |
-
All core functionalities (bit-level modeling, telemetry metrics, progressive scaling, compression, context extension, diffusion mode, dashboard, etc.) are present and largely consistent with documentation.
|
| 4 |
-
The code is generally clean and well-tested (no TODOs or obvious dead code) with an effective CI in place:contentReference[oaicite:0]{index=0}.
|
| 5 |
-
We identified a few issues via static analysis: a critical **security flaw** where the dashboard’s `/exec` endpoint executes arbitrary code:contentReference[oaicite:1]{index=1}, a missing import that breaks the compression toggle:contentReference[oaicite:2]{index=2}:contentReference[oaicite:3]{index=3}, and a rare edge-case in bit-sequence decompression logic:contentReference[oaicite:4]{index=4}.
|
| 6 |
-
No functions exceed 300 lines, though the `BitTransformerLM.forward` method is complex with deeply nested logic (~6 levels) and duplicated code blocks for the halting mechanism.
|
| 7 |
-
Naming conventions are consistent (snake_case for functions, CamelCase for classes), and dependency versions are up-to-date.
|
| 8 |
-
Documentation and code behavior are in sync – for example, the MCP server’s `/health` endpoint described in docs is implemented:contentReference[oaicite:5]{index=5}.
|
| 9 |
-
Overall, the project appears **nearly production-ready**, with these fixes and refinements needed before a 1.0 release.
|
| 10 |
-
|
| 11 |
-
findings:
|
| 12 |
-
- severity: P0
|
| 13 |
-
effort: S
|
| 14 |
-
category: security
|
| 15 |
-
location: bit_transformer/dashboard_app.py:533
|
| 16 |
-
description: "Unrestricted `/exec` HTTP endpoint allows arbitrary code execution:contentReference[oaicite:6]{index=6}."
|
| 17 |
-
recommendation: "Disable or restrict the `/exec` route (e.g. remove it or require an admin token) to prevent remote code execution."
|
| 18 |
-
status: completed ✅
|
| 19 |
-
- severity: P1
|
| 20 |
-
effort: S
|
| 21 |
-
category: static
|
| 22 |
-
location: bit_transformer/dashboard_app.py:195
|
| 23 |
-
description: "NameError risk – `compress_bits` is used without being imported:contentReference[oaicite:7]{index=7}:contentReference[oaicite:8]{index=8}."
|
| 24 |
-
recommendation: "Import the `compress_bits` function in `dashboard_app.py` (e.g. `from .compression import compress_bits`) so compression toggles don’t crash."
|
| 25 |
-
status: completed ✅
|
| 26 |
-
- severity: P2
|
| 27 |
-
effort: M
|
| 28 |
-
category: static
|
| 29 |
-
location: bit_transformer/model.py:320
|
| 30 |
-
description: "Edge-case bug – `_maybe_decompress` skips decompression if all values ≤1:contentReference[oaicite:9]{index=9}, which can misinterpret run-length encoding outputs of all 1s."
|
| 31 |
-
recommendation: "Adjust the decompress condition (e.g. track whether input was compressed) to ensure even uniformly alternating bit sequences get properly decompressed."
|
| 32 |
-
status: completed ✅
|
| 33 |
-
- severity: P3
|
| 34 |
-
effort: M
|
| 35 |
-
category: static
|
| 36 |
-
location: bit_transformer/model.py:415
|
| 37 |
-
description: "Duplicate code – nearly identical halting logic is implemented in both reversible and normal forward loops:contentReference[oaicite:10]{index=10}:contentReference[oaicite:11]{index=11}."
|
| 38 |
-
recommendation: "Refactor the halting (ACT) mechanism into a helper function to avoid repetition and reduce maintenance effort."
|
| 39 |
-
status: completed ✅
|
| 40 |
-
- severity: P3
|
| 41 |
-
effort: M
|
| 42 |
-
category: static
|
| 43 |
-
location: bit_transformer/model.py:368
|
| 44 |
-
description: "Complex logic – `BitTransformerLM.forward` contains deeply nested control flow (up to 5-6 levels) for reversible layers, ACT, etc."
|
| 45 |
-
recommendation: "Consider simplifying or breaking up the forward pass (e.g. separate functions for reversible vs. standard flow) to improve readability and maintainability."
|
| 46 |
-
status: completed ✅
|
| 47 |
-
- severity: P3
|
| 48 |
-
effort: S
|
| 49 |
-
category: static
|
| 50 |
-
location: bit_transformer/dashboard_app.py:125
|
| 51 |
-
description: "Config parsing quirk – booleans in `ModelManager.init_model` are cast to int (True→1) instead of preserved as bool."
|
| 52 |
-
recommendation: "Handle boolean fields explicitly (e.g. do not cast values for keys like `reversible` or `use_act` to int) to avoid confusion and potential type issues."
|
| 53 |
-
status: completed ✅
|
| 54 |
-
|
| 55 |
-
codex_tasks:
|
| 56 |
-
- codex_prompt: "Remove or secure the dangerous `/exec` endpoint in the dashboard to prevent arbitrary code execution."
|
| 57 |
-
acceptance_test: |
|
| 58 |
-
import requests, subprocess
|
| 59 |
-
Attempt to call the /exec endpoint with a harmless command
|
| 60 |
-
try:
|
| 61 |
-
resp = requests.post("http://localhost:5000/exec", json={"code": "print('OK')"}, timeout=5)
|
| 62 |
-
except Exception as e:
|
| 63 |
-
resp = e.response if hasattr(e, 'response') else None
|
| 64 |
-
The endpoint should be removed or secured, so it should either 404 or refuse access
|
| 65 |
-
assert resp is None or resp.status_code in (403, 404), "Exec endpoint still accessible!"
|
| 66 |
-
status: completed ✅
|
| 67 |
-
- codex_prompt: "Import the `compress_bits` function in `dashboard_app.py` so that enabling compression no longer raises a NameError."
|
| 68 |
-
acceptance_test: |
|
| 69 |
-
import torch
|
| 70 |
-
from bit_transformer.dashboard_app import ModelManager
|
| 71 |
-
mgr = ModelManager()
|
| 72 |
-
mgr.set_compression(True)
|
| 73 |
-
bits = torch.randint(0, 2, (1, 8), dtype=torch.long)
|
| 74 |
-
try:
|
| 75 |
-
loss, ratio = mgr.train_step(bits)
|
| 76 |
-
except NameError as e:
|
| 77 |
-
raise AssertionError(f"NameError not resolved: {e}")
|
| 78 |
-
assert isinstance(loss, float) and 0 <= ratio <= 1.0, "Compression training failed"
|
| 79 |
-
status: completed ✅
|
| 80 |
-
- codex_prompt: "Fix `_maybe_decompress` in `model.py` to always decompress run-length encoded sequences (even if all run lengths are 1) before computing metrics."
|
| 81 |
-
acceptance_test: |
|
| 82 |
-
import torch
|
| 83 |
-
from bit_transformer import BitTransformerLM, compress_bits, decompress_bits
|
| 84 |
-
Create an alternating bit sequence where compress_bits yields only count=1 values
|
| 85 |
-
bits = torch.tensor([0,1]*8, dtype=torch.uint8)
|
| 86 |
-
comp = compress_bits(bits)
|
| 87 |
-
model = BitTransformerLM(d_model=16, nhead=2, num_layers=1, dim_feedforward=32, max_seq_len=len(bits))
|
| 88 |
-
Compute negentropy on compressed vs original and compare
|
| 89 |
-
neg_comp = model.negentropy_kpi(comp.unsqueeze(0))
|
| 90 |
-
neg_raw = model.negentropy_kpi(bits.unsqueeze(0))
|
| 91 |
-
assert torch.allclose(neg_comp, neg_raw, atol=1e-6), "Negentropy differs for compressed input – decompression fix failed"
|
| 92 |
-
status: completed ✅
|
| 93 |
-
|
| 94 |
-
metrics:
|
| 95 |
-
loc_total: 3770
|
| 96 |
-
todo_count: 0
|
| 97 |
-
duplicate_block_count: 3
|
| 98 |
-
oversized_functions: 0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|