🎼 ORCH Next.js 3B

A 3 billion parameter decoder-only transformer trained from scratch for generating complete, production-ready Next.js applications — pages, API routes, Prisma schemas, Tailwind components, configs.

TL;DR


Parameters	~3.0 Billion
Architecture	Custom LLaMA-style decoder-only transformer
Training	From scratch — no base model
Vocabulary	32,000 (custom)
Context length	16,384 tokens
Hardware	NVIDIA A40 48GB (RunPod)
Training duration	~2 hours (3 epochs, ~29,000 steps)
License	Apache 2.0

What this is

The largest model in the from-scratch ORCH lineup. Designed for full-stack Next.js generation: not just snippets, but complete project structures — TypeScript components, App Router pages, server actions, Prisma schemas, Tailwind utilities, and configuration files.

This is not a fine-tune of any pretrained model. Architecture and weights are trained end-to-end on curated Next.js repositories.

Architecture

Layers:                32
Hidden size:           2,560
Intermediate size:     10,240
Attention heads:       32
KV heads (GQA):        8
Max position:          16,384
RoPE theta:            10,000
Activation:            SwiGLU
Normalization:         RMSNorm
Tied embeddings:       no
Vocab size:            32,000 (custom)

Standard LLaMA-style ingredients (RoPE, GQA, SwiGLU, RMSNorm) at scale. The 16,384-token context length allows the model to keep a meaningful portion of a project in context during generation.

Training

Data: curated Next.js repositories from GitHub
Hardware: single NVIDIA A40 48GB (RunPod)
Duration: ~2 hours
Epochs: 3 (~29,000 steps)
Batch size: 1 with gradient accumulation = 16
Sequence length: 512 (training) — the 16K context length applies at inference
Precision: BFloat16

Generated artifacts

The model is trained to produce complete project structures:

Frontend: React components, App Router pages, layouts, hooks (TypeScript)
Backend: API routes, server actions, middleware
Database: Prisma schemas, query utilities
Styling: Tailwind CSS, shadcn/ui-style component patterns
Configuration: package.json, tsconfig.json, next.config.js

Usage

Custom PyTorch format — use the ORCH inference code:

import torch
from orch.model.config import OrchConfig
from orch.model.transformer import OrchForCausalLM

model = OrchForCausalLM.from_pretrained("raihan-js/orch-nextjs-3b")
# ... use the same tokenizer.json from the repo

Intended use

Full Next.js project bootstrapping from a natural language description
Research into scaling SLMs trained from scratch on domain-specific data
A from-scratch baseline to compare against fine-tuned models like ORCH-7B

Limitations

Training data scale: 3 epochs on curated Next.js repos. Don't expect the world knowledge of a 7B+ general-purpose model.
Sequence length during training (512): capable of using long context at inference but may show degradation outside the training distribution.
No safety alignment.
Custom format: requires the ORCH inference code, not loadable with AutoModelForCausalLM.

Related models

raihan-js/orch-fusion — 272M sibling, tiny 2,103 vocab
raihan-js/orch-nextjs-350m-v2 — 287M sibling, 16k vocab
raihan-js/orch-7b — alternate approach: QLoRA fine-tune of DeepSeek Coder 6.7B
ORCH Studio — Gradio demo Space (currently runs ORCH-7B)

Author

Akteruzzaman Raihan Sikder — AI/ML engineer, CTO at ClarioScope AI. Portfolio · GitHub.

Citation

@misc{sikder2025orchnextjs3b,
  title  = {ORCH Next.js 3B: A 3-Billion-Parameter Decoder-Only Transformer Trained From Scratch for Full-Stack Next.js Code Generation},
  author = {Sikder, Akteruzzaman Raihan},
  year   = {2025},
  url    = {https://huggingface.co/raihan-js/orch-nextjs-3b}
}

Downloads last month: 2

raihan-js
/

orch-nextjs-3b