Instructions to use ilbert/reflex-coder7b-jepa-riscv with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ilbert/reflex-coder7b-jepa-riscv with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ilbert/reflex-coder7b-jepa-riscv")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("ilbert/reflex-coder7b-jepa-riscv", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use ilbert/reflex-coder7b-jepa-riscv with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ilbert/reflex-coder7b-jepa-riscv" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ilbert/reflex-coder7b-jepa-riscv", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/ilbert/reflex-coder7b-jepa-riscv
- SGLang
How to use ilbert/reflex-coder7b-jepa-riscv with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ilbert/reflex-coder7b-jepa-riscv" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ilbert/reflex-coder7b-jepa-riscv", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ilbert/reflex-coder7b-jepa-riscv" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ilbert/reflex-coder7b-jepa-riscv", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use ilbert/reflex-coder7b-jepa-riscv with Docker Model Runner:
docker model run hf.co/ilbert/reflex-coder7b-jepa-riscv
Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,103 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
base_model: Qwen/Qwen2.5-Coder-7B-Instruct
|
| 4 |
+
tags:
|
| 5 |
+
- riscv
|
| 6 |
+
- cross-attention
|
| 7 |
+
- flamingo
|
| 8 |
+
- grounded
|
| 9 |
+
- jepa
|
| 10 |
+
- rv32i
|
| 11 |
+
- code-generation
|
| 12 |
+
library_name: transformers
|
| 13 |
+
pipeline_tag: text-generation
|
| 14 |
+
---
|
| 15 |
+
|
| 16 |
+
# Reflex-Coder7B-JEPA-RISCV
|
| 17 |
+
|
| 18 |
+
**A frozen `Qwen2.5-Coder-7B-Instruct` wired to a RISC-V CPU through Flamingo-style cross-attention. Emits one 32-bit RV32I instruction per cycle, conditioned on live machine state. The output head is a JEPA-style embedding predictor over a learned 691-row instruction codebook; nearest-neighbour decode gives free error-correction on individual predictions.**
|
| 19 |
+
|
| 20 |
+
This repo contains the **adapter weights only** (~4.4 GB). The frozen backbone is pulled from `Qwen/Qwen2.5-Coder-7B-Instruct` at runtime. Total inference footprint: ~14 GB bf16 backbone + 4.4 GB adapters + activations.
|
| 21 |
+
|
| 22 |
+
## What it does
|
| 23 |
+
|
| 24 |
+
Given a natural-language prompt (`"multiply 7 and 8"`, `"compute 5 factorial"`, `"say hi"`), Reflex drives a Unicorn-backed RV32I emulator instruction by instruction. Each cycle:
|
| 25 |
+
|
| 26 |
+
1. Read live CPU state (32 registers, PC, memory windows around PC and SP).
|
| 27 |
+
2. Encode as 65 K/V tokens.
|
| 28 |
+
3. Run the frozen backbone forward over the prompt; cross-attn adapters fuse state K/V into hidden states every 4 layers.
|
| 29 |
+
4. Last-token pool → MLP → 256-d embedding.
|
| 30 |
+
5. Cosine nearest-neighbour against a 691-row instruction codebook → a real 32-bit RV32I word.
|
| 31 |
+
6. Write the word at PC in Unicorn, step one cycle, loop.
|
| 32 |
+
|
| 33 |
+
## Base model
|
| 34 |
+
|
| 35 |
+
[`Qwen/Qwen2.5-Coder-7B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) — frozen, bf16, untouched.
|
| 36 |
+
|
| 37 |
+
## Training
|
| 38 |
+
|
| 39 |
+
- **Corpus**: 80,396 `(prompt, program)` pairs across 56 RV32I program families (arithmetic, loops, comparisons, memory ops, display writes). Every program verified by running it end-to-end through Unicorn before training (zero rejects).
|
| 40 |
+
- **Flattened cycle pool**: ~1.06 M `(state, next_instruction)` pairs, subsampled to ~173 k balanced across families.
|
| 41 |
+
- **Objective**: InfoNCE (temperature τ = 0.07) over the full 691-row instruction codebook. The codebook rows train jointly with the controller.
|
| 42 |
+
- **Optimizer**: AdamW (weight decay 0.01), cosine LR `1e-4 → 1e-6` over 15 000 steps, batch 16.
|
| 43 |
+
- **Hardware**: single A100 80 GB (~4 h) or L40S 48 GB (batch 32, ~5 h).
|
| 44 |
+
|
| 45 |
+
## Results (41-task eval)
|
| 46 |
+
|
| 47 |
+
| section | pass |
|
| 48 |
+
|---|---|
|
| 49 |
+
| in-distribution (8) | **7 / 8** |
|
| 50 |
+
| out-of-distribution (10) | **9 / 10** |
|
| 51 |
+
| display strings (4) | 1 / 4 |
|
| 52 |
+
| novel zero-shot (9) | **7 / 9** |
|
| 53 |
+
| consistency: factorial 5 × 10 | **10 / 10** |
|
| 54 |
+
| **total** | **34 / 41 (83 %)** |
|
| 55 |
+
|
| 56 |
+
Highlights:
|
| 57 |
+
- **`popcount(255) = 8` in 199 consecutive correct RISC-V instructions** — emergent bit-counting loop the model was never trained on.
|
| 58 |
+
- **Factorial 5 × 10 = 120, deterministic** — every run emits exactly 91 ops and lands on the right answer.
|
| 59 |
+
- Zero-shot `multiply 7×8`, `power 2^5`, `min(7,3,9)`, `abs(-5)`, `count up 1..5` all pass.
|
| 60 |
+
|
| 61 |
+
Per-step top-1 instruction accuracy on 500 random held-out cycles: **96.0 %**. All `BRANCH`, `R-type`, `LOAD`, `STORE`, `JAL`, `JALR` predictions are 100 %. **Every top-1 miss is same-opcode** — never an opcode flip.
|
| 62 |
+
|
| 63 |
+
## Usage
|
| 64 |
+
|
| 65 |
+
```python
|
| 66 |
+
from reflex.demo import load, run_grounded
|
| 67 |
+
|
| 68 |
+
model, tok, cfg = load("reflex.pt", device="cuda")
|
| 69 |
+
cpu, emitted, halted, err = run_grounded(
|
| 70 |
+
model, tok, "multiply 7 and 8", device="cuda", max_cycles=200,
|
| 71 |
+
)
|
| 72 |
+
print(f"halted={halted} mem[0x5000]={cpu.mem_word(0x5000)}")
|
| 73 |
+
# halted=True mem[0x5000]=56
|
| 74 |
+
```
|
| 75 |
+
|
| 76 |
+
Or, interactively:
|
| 77 |
+
|
| 78 |
+
```bash
|
| 79 |
+
uv run demo --checkpoint reflex.pt
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
## Installation
|
| 83 |
+
|
| 84 |
+
```bash
|
| 85 |
+
git clone https://github.com/ilbertt/reflex
|
| 86 |
+
cd reflex
|
| 87 |
+
uv sync
|
| 88 |
+
huggingface-cli download ilbertt/reflex-coder7b-jepa-riscv reflex.pt --local-dir .
|
| 89 |
+
```
|
| 90 |
+
|
| 91 |
+
On first run, HuggingFace will automatically fetch `Qwen2.5-Coder-7B-Instruct` (~15 GB).
|
| 92 |
+
|
| 93 |
+
## Limitations
|
| 94 |
+
|
| 95 |
+
- **Display byte-constants are unreliable.** The model picks ASCII neighbours: `show 42` writes `'·0'` instead of `'42'`; `print hello` writes `'hell·'`. These are same-opcode ±1-immediate misses, not opcode flips.
|
| 96 |
+
- **Uncommon-literal arithmetic drifts.** `add 100+200` sometimes halts with 120; `double 100` → 0 in some seeds. Failures concentrate on `ADDI`/`LUI` with rare immediate values.
|
| 97 |
+
- **Closed action space.** The codebook has exactly 691 rows — instructions never seen in training have no row and cannot be emitted. Ample for the 56 program families trained on; bounds generalisation to genuinely unseen opcodes.
|
| 98 |
+
- **No domain-knowledge transfer.** Prompts like `"x5 is fever, display SICK"` fail. The adapters only route the backbone's prior through for program-shaped prompts seen in training.
|
| 99 |
+
- **RV32I base ISA only.** No M, Zbb, F extensions.
|
| 100 |
+
|
| 101 |
+
## Files
|
| 102 |
+
|
| 103 |
+
- `reflex.pt` — adapter weights, state encoder, cross-attn adapters, embedding head, 691-row instruction codebook, instruction-word buffer, and config dict (`backbone_id`, `hidden`, `inject_every`, `adapter_mlp_ratio`, `max_instr_tokens`, `embed_dim`, `num_instrs`, `chat_template`, `context_prefix`).
|