ilbert commited on
Commit
2ff397c
·
verified ·
1 Parent(s): 5228ab1

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +103 -0
README.md ADDED
@@ -0,0 +1,103 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ base_model: Qwen/Qwen2.5-Coder-7B-Instruct
4
+ tags:
5
+ - riscv
6
+ - cross-attention
7
+ - flamingo
8
+ - grounded
9
+ - jepa
10
+ - rv32i
11
+ - code-generation
12
+ library_name: transformers
13
+ pipeline_tag: text-generation
14
+ ---
15
+
16
+ # Reflex-Coder7B-JEPA-RISCV
17
+
18
+ **A frozen `Qwen2.5-Coder-7B-Instruct` wired to a RISC-V CPU through Flamingo-style cross-attention. Emits one 32-bit RV32I instruction per cycle, conditioned on live machine state. The output head is a JEPA-style embedding predictor over a learned 691-row instruction codebook; nearest-neighbour decode gives free error-correction on individual predictions.**
19
+
20
+ This repo contains the **adapter weights only** (~4.4 GB). The frozen backbone is pulled from `Qwen/Qwen2.5-Coder-7B-Instruct` at runtime. Total inference footprint: ~14 GB bf16 backbone + 4.4 GB adapters + activations.
21
+
22
+ ## What it does
23
+
24
+ Given a natural-language prompt (`"multiply 7 and 8"`, `"compute 5 factorial"`, `"say hi"`), Reflex drives a Unicorn-backed RV32I emulator instruction by instruction. Each cycle:
25
+
26
+ 1. Read live CPU state (32 registers, PC, memory windows around PC and SP).
27
+ 2. Encode as 65 K/V tokens.
28
+ 3. Run the frozen backbone forward over the prompt; cross-attn adapters fuse state K/V into hidden states every 4 layers.
29
+ 4. Last-token pool → MLP → 256-d embedding.
30
+ 5. Cosine nearest-neighbour against a 691-row instruction codebook → a real 32-bit RV32I word.
31
+ 6. Write the word at PC in Unicorn, step one cycle, loop.
32
+
33
+ ## Base model
34
+
35
+ [`Qwen/Qwen2.5-Coder-7B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) — frozen, bf16, untouched.
36
+
37
+ ## Training
38
+
39
+ - **Corpus**: 80,396 `(prompt, program)` pairs across 56 RV32I program families (arithmetic, loops, comparisons, memory ops, display writes). Every program verified by running it end-to-end through Unicorn before training (zero rejects).
40
+ - **Flattened cycle pool**: ~1.06 M `(state, next_instruction)` pairs, subsampled to ~173 k balanced across families.
41
+ - **Objective**: InfoNCE (temperature τ = 0.07) over the full 691-row instruction codebook. The codebook rows train jointly with the controller.
42
+ - **Optimizer**: AdamW (weight decay 0.01), cosine LR `1e-4 → 1e-6` over 15 000 steps, batch 16.
43
+ - **Hardware**: single A100 80 GB (~4 h) or L40S 48 GB (batch 32, ~5 h).
44
+
45
+ ## Results (41-task eval)
46
+
47
+ | section | pass |
48
+ |---|---|
49
+ | in-distribution (8) | **7 / 8** |
50
+ | out-of-distribution (10) | **9 / 10** |
51
+ | display strings (4) | 1 / 4 |
52
+ | novel zero-shot (9) | **7 / 9** |
53
+ | consistency: factorial 5 × 10 | **10 / 10** |
54
+ | **total** | **34 / 41 (83 %)** |
55
+
56
+ Highlights:
57
+ - **`popcount(255) = 8` in 199 consecutive correct RISC-V instructions** — emergent bit-counting loop the model was never trained on.
58
+ - **Factorial 5 × 10 = 120, deterministic** — every run emits exactly 91 ops and lands on the right answer.
59
+ - Zero-shot `multiply 7×8`, `power 2^5`, `min(7,3,9)`, `abs(-5)`, `count up 1..5` all pass.
60
+
61
+ Per-step top-1 instruction accuracy on 500 random held-out cycles: **96.0 %**. All `BRANCH`, `R-type`, `LOAD`, `STORE`, `JAL`, `JALR` predictions are 100 %. **Every top-1 miss is same-opcode** — never an opcode flip.
62
+
63
+ ## Usage
64
+
65
+ ```python
66
+ from reflex.demo import load, run_grounded
67
+
68
+ model, tok, cfg = load("reflex.pt", device="cuda")
69
+ cpu, emitted, halted, err = run_grounded(
70
+ model, tok, "multiply 7 and 8", device="cuda", max_cycles=200,
71
+ )
72
+ print(f"halted={halted} mem[0x5000]={cpu.mem_word(0x5000)}")
73
+ # halted=True mem[0x5000]=56
74
+ ```
75
+
76
+ Or, interactively:
77
+
78
+ ```bash
79
+ uv run demo --checkpoint reflex.pt
80
+ ```
81
+
82
+ ## Installation
83
+
84
+ ```bash
85
+ git clone https://github.com/ilbertt/reflex
86
+ cd reflex
87
+ uv sync
88
+ huggingface-cli download ilbertt/reflex-coder7b-jepa-riscv reflex.pt --local-dir .
89
+ ```
90
+
91
+ On first run, HuggingFace will automatically fetch `Qwen2.5-Coder-7B-Instruct` (~15 GB).
92
+
93
+ ## Limitations
94
+
95
+ - **Display byte-constants are unreliable.** The model picks ASCII neighbours: `show 42` writes `'·0'` instead of `'42'`; `print hello` writes `'hell·'`. These are same-opcode ±1-immediate misses, not opcode flips.
96
+ - **Uncommon-literal arithmetic drifts.** `add 100+200` sometimes halts with 120; `double 100` → 0 in some seeds. Failures concentrate on `ADDI`/`LUI` with rare immediate values.
97
+ - **Closed action space.** The codebook has exactly 691 rows — instructions never seen in training have no row and cannot be emitted. Ample for the 56 program families trained on; bounds generalisation to genuinely unseen opcodes.
98
+ - **No domain-knowledge transfer.** Prompts like `"x5 is fever, display SICK"` fail. The adapters only route the backbone's prior through for program-shaped prompts seen in training.
99
+ - **RV32I base ISA only.** No M, Zbb, F extensions.
100
+
101
+ ## Files
102
+
103
+ - `reflex.pt` — adapter weights, state encoder, cross-attn adapters, embedding head, 691-row instruction codebook, instruction-word buffer, and config dict (`backbone_id`, `hidden`, `inject_every`, `adapter_mlp_ratio`, `max_instr_tokens`, `embed_dim`, `num_instrs`, `chat_template`, `context_prefix`).