🎼 ORCH Fusion (272M)

Orchestrated Recursive Code Hierarchy — a 272M-parameter decoder-only transformer trained from scratch on synthetic code data, designed to generate complete multi-file projects on a single consumer GPU.

TL;DR


Parameters	272,736,256 (~272.7M)
Architecture	Custom LLaMA-style decoder-only transformer
Training	From scratch — no base model, no fine-tuning
Vocabulary	2,103 (tiny custom tokenizer)
Context length	4,096 tokens
Hardware	NVIDIA RTX 3060 12GB
Format	Custom PyTorch (`.pt`) — not Hugging Face Transformers
License	Apache 2.0

Why this model exists

Most code-generation LLMs in the public conversation are either:

huge (70B+) and require a multi-GPU rig to even load,
proprietary (Copilot, Claude, ChatGPT) with per-token billing,
or fine-tuned from an existing base, which inherits everything the base learned.

ORCH Fusion goes the other direction: train a small model from scratch, with a custom tokenizer sized for code generation, on a single 12GB consumer GPU. The thesis is that for narrow tasks (generating well-structured multi-file projects), you don't need a frontier model — you need a model whose entire training signal is the task you care about.

Architecture

LLaMA-style decoder-only transformer, built from scratch:

Layers:                24
Hidden size:           1,024
Intermediate size:     2,816
Attention heads:       16
KV heads (GQA):        4
Max position:          4,096
RoPE theta:            10,000
Activation:            SwiGLU
Normalization:         RMSNorm
Tied embeddings:       yes
Vocab size:            2,103 (custom tokenizer)

Architectural features pulled directly from the LLaMA family: RoPE (rotary position embeddings), GQA (grouped-query attention with 4 KV heads — cheaper inference), SwiGLU activation, and RMSNorm. No pretrained weights from any other source.

Training

Data: synthetic code data (project-level multi-file examples), tokenized with a custom 2,103-vocab tokenizer fitted to the domain
Framework: custom PyTorch implementation (not the Hugging Face Trainer)
Hardware: NVIDIA RTX 3060 12GB (consumer)
Precision: mixed
Final benchmark (ORCH-ProjectBench, custom):
- Overall score: 76.6
- Code parse rate: 95.3 (% of generated code that parses cleanly)
- Format correctness: 93.3 (% with correct project structure)

Usage

Because this model is in custom PyTorch format (not Hugging Face Transformers), you need the ORCH inference code.

import torch
from tokenizers import Tokenizer
from orch import OrchForCausalLM  # from github.com/raihan-js/orch

# Load model + tokenizer
model = OrchForCausalLM.from_pretrained("raihan-js/orch-fusion", subfolder="350m-project")
tokenizer = Tokenizer.from_file("orch-tokenizer.json")

prompt = "Create a React dashboard with authentication and dark mode"
ids = tokenizer.encode(prompt).ids
input_ids = torch.tensor([ids])
output = model.generate(input_ids, max_new_tokens=2048, temperature=0.7)
print(tokenizer.decode(output[0].tolist()))

Output is a multi-file project structure rather than a single snippet.

Intended use

Generating well-structured multi-file project skeletons (configs, source files, schemas)
Research into pre-training small code models from scratch on consumer hardware
Educational reference for building LLaMA-style transformers without a base model

Out-of-scope / limitations

Tiny vocabulary (2,103): the tokenizer is heavily specialized; out-of-distribution prose or unusual identifiers compress poorly.
Synthetic training data: the model has seen patterns of code, not the diversity of real-world codebases. Expect formulaic output.
No safety alignment: there is no RLHF / DPO / safety post-training. Treat outputs as raw generations.
Custom format: not loadable with AutoModelForCausalLM. Use the ORCH inference code.

Related models

raihan-js/orch-nextjs-350m-v2 — sibling 287M from-scratch model with a 16k vocab
raihan-js/orch-nextjs-3b — 3B from-scratch sibling, custom 32k vocab
raihan-js/orch-7b — same project family, but QLoRA fine-tune of DeepSeek Coder 6.7B Instruct
ORCH Studio — Gradio demo Space

Author

Akteruzzaman Raihan Sikder — AI/ML engineer, CTO at ClarioScope AI. Trains SLMs from scratch and applies QLoRA fine-tuning on larger bases. Portfolio · GitHub.

Citation

@misc{sikder2025orchfusion,
  title  = {ORCH Fusion: A 272M-Parameter Code Generation Transformer Trained From Scratch on Consumer Hardware},
  author = {Sikder, Akteruzzaman Raihan},
  year   = {2025},
  url    = {https://huggingface.co/raihan-js/orch-fusion}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Evaluation results

Overall Score on ORCH-ProjectBench
self-reported

76.600
Code Parse Rate on ORCH-ProjectBench
self-reported

95.300
Format Correctness on ORCH-ProjectBench
self-reported

93.300