QuantumGPT-124M: Quantum Circuit Generation Model
QuantumGPT-124M is a GPT-2 architecture language model trained specifically for generating quantum circuits in OpenQASM 2.0 format from natural language descriptions.
Model Description
- Model Type: Causal Language Model (GPT-2 architecture)
- Parameters: 124 million
- Training Data: 8,129 quantum circuits across 92 categories (~373K tokens)
- Output Format: OpenQASM 2.0
- Specialty: Generates syntactically valid quantum circuits for 1-4 qubit systems
Quick Start
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("merileijona/quantumgpt-124m")
tokenizer = AutoTokenizer.from_pretrained("merileijona/quantumgpt-124m")
prompt = "<|user|>Create a Bell state with two qubits<|end|>\n<|assistant|>"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
**inputs,
max_new_tokens=200,
do_sample=True,
temperature=0.8,
top_k=50,
repetition_penalty=1.1,
pad_token_id=tokenizer.eos_token_id,
)
text = tokenizer.decode(outputs[0], skip_special_tokens=False)
qasm = text.split("<|end|>", 1)[0]
print(qasm)
Expected Output:
OPENQASM 2.0;
include "qelib1.inc";
qreg q[2];
creg c[2];
h q[0];
cx q[0],q[1];
measure q -> c;
Model Details
Architecture
- Base Model: GPT-2 (124M parameters)
- Layers: 12
- Attention Heads: 12
- Embedding Dimension: 768
- Context Length: 256 tokens
- Dropout: 0.2 (training)
- Activation Function: GELU (standard, not gelu_new)
Implementation Notes
- Converted from a NanoGPT-style training checkpoint
- All Conv1D weights correctly transposed for HuggingFace compatibility
- Bias tensors injected for GPT2LMHeadModel compatibility
- Word embeddings are tied with lm_head
Training Configuration
- Dataset Size: 8,129 training samples
- Unique Circuits: 739 (with 11x augmentation via paraphrasing)
- Training Tokens: ~373,000
- Training Steps: 1,000 iterations
- Hardware: NVIDIA RTX 4070 12GB
- Training Time: ~0.5 hours
- Final Validation Loss: 0.2691
- Train/Val Gap: 0.044 (excellent generalization)
Dataset Composition
The model was trained on 92 distinct categories of quantum circuits:
Single-Qubit Operations (14 categories):
- Basic gates: H, X, Y, Z, S, T, Sdg, Tdg
- Parametric rotations: RX, RY, RZ
- Universal gates: U1, U2, U3
Two-Qubit Operations (11 categories):
- Bell states (all 4 variants)
- Entanglement: CNOT, CZ, SWAP, iSWAP
- Controlled rotations
Three-Qubit Operations (6 categories):
- GHZ states, W states
- Toffoli, Fredkin gates
Quantum Algorithms (15 categories):
- Deutsch-Jozsa, Grover's search
- Phase estimation, QFT variants
Variational Circuits (15 categories):
- VQE ansatzes, QAOA
- Hardware-efficient ansatzes
Plus: Arithmetic circuits, error correction codes, special states, and more.
Prompt Format
The model was trained using explicit conversation delimiters:
<|user|>{description}<|end|>
<|assistant|>{qasm}<|end|>
These markers are literal text tokens, not special tokenizer tokens.
Generation should begin with:
<|assistant|>
and stop at the first occurrence of:
<|end|>
If <|assistant|> is omitted, generation quality may degrade.
Performance
Accuracy by Circuit Type from preliminary testing (just approximations!)
| Circuit Type | Accuracy | Notes |
|---|---|---|
| Basic gates (H, X, Y, Z) | 95-100% | Near-perfect on simple gates |
| 2-qubit entanglement | 90-95% | Strong on Bell states, CNOT patterns |
| 3-qubit states (GHZ, W) | 85-90% | Good semantic understanding |
| Arithmetic circuits | 75-85% | Moderate accuracy on adders/incrementers |
| Complex algorithms | 70-80% | Struggles with QFT, Grover's |
| 4+ qubit circuits | 60-70% | Limited training data for large systems |
Example Test Results (Step 600)
Perfect Generation:
Prompt: "Apply Hadamard gate to single qubit"
Output: β
OPENQASM 2.0; ... h q[0]; measure q[0] -> c[0];
Prompt: "Create Bell state with two qubits"
Output: β
OPENQASM 2.0; ... h q[0]; cx q[0],q[1]; measure q -> c;
Prompt: "Generate GHZ state with three qubits"
Output: β
OPENQASM 2.0; ... h q[0]; cx q[0],q[1]; cx q[0],q[2]; measure q -> c;
Limitations
Qubit Count: Optimized for 1-3 qubit circuits. Performance degrades for 4+ qubits due to limited training data.
Complex Algorithms: May generate syntactically valid but semantically incorrect circuits for advanced algorithms (e.g., full quantum teleportation, complex QFT implementations).
Parametric Gates: Limited support for gates with specific angle parameters. May substitute similar gates (e.g., RY β Y, S β T).
No Execution Guarantee: Generated circuits are syntactically valid QASM 2.0 but not guaranteed to execute correctly on quantum hardware without validation.
Intended Use
Primary Use Cases
β Educational Tools: Generate example circuits for quantum computing education β Rapid Prototyping: Quick circuit templates for experimentation β Code Completion: Assist developers writing QASM code β Benchmarking: Generate diverse circuits for compiler/simulator testing
Out of Scope
β Production Quantum Computing: Not suitable for critical quantum applications β Large-Scale Circuits: Limited to small qubit counts (1-4 qubits) β Hardware Deployment: Requires validation before running on actual quantum hardware
Training Data
The model was trained on a custom dataset of 8,129 quantum circuits:
- Source: Synthetically generated via xAI Grok API with extensive quality control
- Format: Natural language description β QASM 2.0 code pairs
- Quality Control: 100% QASM syntax validation, SHA256 hash deduplication
- Diversity: 11x augmentation via paraphrasing (10 variations + original)
- Dataset Availability: merileijona/quantum-circuits-8k
Data Generation Pipeline
- Master Generation: 739 unique circuits across 92 categories
- Hash Deduplication: SHA256 hashing ensures zero duplicates
- Description Augmentation: 10x paraphrasing for diversity
- Validation: 100% QASM 2.0 syntax compliance
- Train/Val/Test Split: 70/15/15
Ethical Considerations
- No Safety Alignment: Model has not undergone safety fine-tuning
- Hallucination Risk: May generate plausible but incorrect quantum circuits
- Educational Purpose: Designed for learning, not production deployment
- Verification Required: Always validate generated circuits before use
Citation
If you use this model in your research, please cite:
@misc{quantumgpt124m,
author = {Merilehto, Juhani},
title = {QuantumGPT-124M: Quantum Circuit Generation with GPT-2},
year = {2026},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/merileijona/quantumgpt-124m}},
note = {GPT-2 model trained on 8,129 quantum circuits for OpenQASM 2.0 generation}
}
Model Card Authors
Juhani Merilehto
- HuggingFace: @merileijona
- GitHub: @juhanimerilehto
- Affiliation(s): University of Vaasa, School of Management; University of Turku, Faculty of Technology
License
This model is released under the MIT License.
Acknowledgments
- Training Framework: Based on Andrej Karpathy's nanoGPT architecture
- Data Generation: Powered by xAI Grok API
- Tokenizer: Standard GPT-2 tokenizer (HuggingFace GPT2TokenizerFast)
- Infrastructure: Trained on NVIDIA RTX 4070 12GB
Additional Resources
- Dataset: merileijona/quantum-circuits-8k
- Training Code: Available in model repository
- Related Work: See papers on quantum circuit synthesis with LLMs
Model Version: 1.0
Release Date: February 2026
Last Updated: February 27, 2026
- Downloads last month
- 10