🎼 ORCH Fusion (272M)

Orchestrated Recursive Code Hierarchy β€” a 272M-parameter decoder-only transformer trained from scratch on synthetic code data, designed to generate complete multi-file projects on a single consumer GPU.

GitHub License Hardware


TL;DR

Parameters 272,736,256 (~272.7M)
Architecture Custom LLaMA-style decoder-only transformer
Training From scratch β€” no base model, no fine-tuning
Vocabulary 2,103 (tiny custom tokenizer)
Context length 4,096 tokens
Hardware NVIDIA RTX 3060 12GB
Format Custom PyTorch (.pt) β€” not Hugging Face Transformers
License Apache 2.0

Why this model exists

Most code-generation LLMs in the public conversation are either:

  • huge (70B+) and require a multi-GPU rig to even load,
  • proprietary (Copilot, Claude, ChatGPT) with per-token billing,
  • or fine-tuned from an existing base, which inherits everything the base learned.

ORCH Fusion goes the other direction: train a small model from scratch, with a custom tokenizer sized for code generation, on a single 12GB consumer GPU. The thesis is that for narrow tasks (generating well-structured multi-file projects), you don't need a frontier model β€” you need a model whose entire training signal is the task you care about.

Architecture

LLaMA-style decoder-only transformer, built from scratch:

Layers:                24
Hidden size:           1,024
Intermediate size:     2,816
Attention heads:       16
KV heads (GQA):        4
Max position:          4,096
RoPE theta:            10,000
Activation:            SwiGLU
Normalization:         RMSNorm
Tied embeddings:       yes
Vocab size:            2,103 (custom tokenizer)

Architectural features pulled directly from the LLaMA family: RoPE (rotary position embeddings), GQA (grouped-query attention with 4 KV heads β€” cheaper inference), SwiGLU activation, and RMSNorm. No pretrained weights from any other source.

Training

  • Data: synthetic code data (project-level multi-file examples), tokenized with a custom 2,103-vocab tokenizer fitted to the domain
  • Framework: custom PyTorch implementation (not the Hugging Face Trainer)
  • Hardware: NVIDIA RTX 3060 12GB (consumer)
  • Precision: mixed
  • Final benchmark (ORCH-ProjectBench, custom):
    • Overall score: 76.6
    • Code parse rate: 95.3 (% of generated code that parses cleanly)
    • Format correctness: 93.3 (% with correct project structure)

Usage

Because this model is in custom PyTorch format (not Hugging Face Transformers), you need the ORCH inference code.

import torch
from tokenizers import Tokenizer
from orch import OrchForCausalLM  # from github.com/raihan-js/orch

# Load model + tokenizer
model = OrchForCausalLM.from_pretrained("raihan-js/orch-fusion", subfolder="350m-project")
tokenizer = Tokenizer.from_file("orch-tokenizer.json")

prompt = "Create a React dashboard with authentication and dark mode"
ids = tokenizer.encode(prompt).ids
input_ids = torch.tensor([ids])
output = model.generate(input_ids, max_new_tokens=2048, temperature=0.7)
print(tokenizer.decode(output[0].tolist()))

Output is a multi-file project structure rather than a single snippet.

Intended use

  • Generating well-structured multi-file project skeletons (configs, source files, schemas)
  • Research into pre-training small code models from scratch on consumer hardware
  • Educational reference for building LLaMA-style transformers without a base model

Out-of-scope / limitations

  • Tiny vocabulary (2,103): the tokenizer is heavily specialized; out-of-distribution prose or unusual identifiers compress poorly.
  • Synthetic training data: the model has seen patterns of code, not the diversity of real-world codebases. Expect formulaic output.
  • No safety alignment: there is no RLHF / DPO / safety post-training. Treat outputs as raw generations.
  • Custom format: not loadable with AutoModelForCausalLM. Use the ORCH inference code.

Related models

Author

Akteruzzaman Raihan Sikder β€” AI/ML engineer, CTO at ClarioScope AI. Trains SLMs from scratch and applies QLoRA fine-tuning on larger bases. Portfolio Β· GitHub.

Citation

@misc{sikder2025orchfusion,
  title  = {ORCH Fusion: A 272M-Parameter Code Generation Transformer Trained From Scratch on Consumer Hardware},
  author = {Sikder, Akteruzzaman Raihan},
  year   = {2025},
  url    = {https://huggingface.co/raihan-js/orch-fusion}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Evaluation results