QuantumGPT-124M: Quantum Circuit Generation Model

QuantumGPT-124M is a GPT-2 architecture language model trained specifically for generating quantum circuits in OpenQASM 2.0 format from natural language descriptions.

Model Description

  • Model Type: Causal Language Model (GPT-2 architecture)
  • Parameters: 124 million
  • Training Data: 8,129 quantum circuits across 92 categories (~373K tokens)
  • Output Format: OpenQASM 2.0
  • Specialty: Generates syntactically valid quantum circuits for 1-4 qubit systems

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("merileijona/quantumgpt-124m")
tokenizer = AutoTokenizer.from_pretrained("merileijona/quantumgpt-124m")

prompt = "<|user|>Create a Bell state with two qubits<|end|>\n<|assistant|>"
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(
    **inputs,
    max_new_tokens=200,
    do_sample=True,
    temperature=0.8,
    top_k=50,
    repetition_penalty=1.1,
    pad_token_id=tokenizer.eos_token_id,
)

text = tokenizer.decode(outputs[0], skip_special_tokens=False)
qasm = text.split("<|end|>", 1)[0]
print(qasm)

Expected Output:

OPENQASM 2.0;
include "qelib1.inc";
qreg q[2];
creg c[2];
h q[0];
cx q[0],q[1];
measure q -> c;

Model Details

Architecture

  • Base Model: GPT-2 (124M parameters)
  • Layers: 12
  • Attention Heads: 12
  • Embedding Dimension: 768
  • Context Length: 256 tokens
  • Dropout: 0.2 (training)
  • Activation Function: GELU (standard, not gelu_new)

Implementation Notes

  • Converted from a NanoGPT-style training checkpoint
  • All Conv1D weights correctly transposed for HuggingFace compatibility
  • Bias tensors injected for GPT2LMHeadModel compatibility
  • Word embeddings are tied with lm_head

Training Configuration

  • Dataset Size: 8,129 training samples
  • Unique Circuits: 739 (with 11x augmentation via paraphrasing)
  • Training Tokens: ~373,000
  • Training Steps: 1,000 iterations
  • Hardware: NVIDIA RTX 4070 12GB
  • Training Time: ~0.5 hours
  • Final Validation Loss: 0.2691
  • Train/Val Gap: 0.044 (excellent generalization)

Dataset Composition

The model was trained on 92 distinct categories of quantum circuits:

Single-Qubit Operations (14 categories):

  • Basic gates: H, X, Y, Z, S, T, Sdg, Tdg
  • Parametric rotations: RX, RY, RZ
  • Universal gates: U1, U2, U3

Two-Qubit Operations (11 categories):

  • Bell states (all 4 variants)
  • Entanglement: CNOT, CZ, SWAP, iSWAP
  • Controlled rotations

Three-Qubit Operations (6 categories):

  • GHZ states, W states
  • Toffoli, Fredkin gates

Quantum Algorithms (15 categories):

  • Deutsch-Jozsa, Grover's search
  • Phase estimation, QFT variants

Variational Circuits (15 categories):

  • VQE ansatzes, QAOA
  • Hardware-efficient ansatzes

Plus: Arithmetic circuits, error correction codes, special states, and more.

Prompt Format

The model was trained using explicit conversation delimiters:

<|user|>{description}<|end|>
<|assistant|>{qasm}<|end|>

These markers are literal text tokens, not special tokenizer tokens.

Generation should begin with:

<|assistant|>

and stop at the first occurrence of:

<|end|>

If <|assistant|> is omitted, generation quality may degrade.

Performance

Accuracy by Circuit Type from preliminary testing (just approximations!)

Circuit Type Accuracy Notes
Basic gates (H, X, Y, Z) 95-100% Near-perfect on simple gates
2-qubit entanglement 90-95% Strong on Bell states, CNOT patterns
3-qubit states (GHZ, W) 85-90% Good semantic understanding
Arithmetic circuits 75-85% Moderate accuracy on adders/incrementers
Complex algorithms 70-80% Struggles with QFT, Grover's
4+ qubit circuits 60-70% Limited training data for large systems

Example Test Results (Step 600)

Perfect Generation:

Prompt: "Apply Hadamard gate to single qubit"
Output: βœ… OPENQASM 2.0; ... h q[0]; measure q[0] -> c[0];

Prompt: "Create Bell state with two qubits"
Output: βœ… OPENQASM 2.0; ... h q[0]; cx q[0],q[1]; measure q -> c;

Prompt: "Generate GHZ state with three qubits"
Output: βœ… OPENQASM 2.0; ... h q[0]; cx q[0],q[1]; cx q[0],q[2]; measure q -> c;

Limitations

  1. Qubit Count: Optimized for 1-3 qubit circuits. Performance degrades for 4+ qubits due to limited training data.

  2. Complex Algorithms: May generate syntactically valid but semantically incorrect circuits for advanced algorithms (e.g., full quantum teleportation, complex QFT implementations).

  3. Parametric Gates: Limited support for gates with specific angle parameters. May substitute similar gates (e.g., RY β†’ Y, S β†’ T).

  4. No Execution Guarantee: Generated circuits are syntactically valid QASM 2.0 but not guaranteed to execute correctly on quantum hardware without validation.

Intended Use

Primary Use Cases

βœ… Educational Tools: Generate example circuits for quantum computing education βœ… Rapid Prototyping: Quick circuit templates for experimentation βœ… Code Completion: Assist developers writing QASM code βœ… Benchmarking: Generate diverse circuits for compiler/simulator testing

Out of Scope

❌ Production Quantum Computing: Not suitable for critical quantum applications ❌ Large-Scale Circuits: Limited to small qubit counts (1-4 qubits) ❌ Hardware Deployment: Requires validation before running on actual quantum hardware

Training Data

The model was trained on a custom dataset of 8,129 quantum circuits:

  • Source: Synthetically generated via xAI Grok API with extensive quality control
  • Format: Natural language description β†’ QASM 2.0 code pairs
  • Quality Control: 100% QASM syntax validation, SHA256 hash deduplication
  • Diversity: 11x augmentation via paraphrasing (10 variations + original)
  • Dataset Availability: merileijona/quantum-circuits-8k

Data Generation Pipeline

  1. Master Generation: 739 unique circuits across 92 categories
  2. Hash Deduplication: SHA256 hashing ensures zero duplicates
  3. Description Augmentation: 10x paraphrasing for diversity
  4. Validation: 100% QASM 2.0 syntax compliance
  5. Train/Val/Test Split: 70/15/15

Ethical Considerations

  • No Safety Alignment: Model has not undergone safety fine-tuning
  • Hallucination Risk: May generate plausible but incorrect quantum circuits
  • Educational Purpose: Designed for learning, not production deployment
  • Verification Required: Always validate generated circuits before use

Citation

If you use this model in your research, please cite:

@misc{quantumgpt124m,
  author = {Merilehto, Juhani},
  title = {QuantumGPT-124M: Quantum Circuit Generation with GPT-2},
  year = {2026},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/merileijona/quantumgpt-124m}},
  note = {GPT-2 model trained on 8,129 quantum circuits for OpenQASM 2.0 generation}
}

Model Card Authors

Juhani Merilehto

  • HuggingFace: @merileijona
  • GitHub: @juhanimerilehto
  • Affiliation(s): University of Vaasa, School of Management; University of Turku, Faculty of Technology

License

This model is released under the MIT License.

Acknowledgments

  • Training Framework: Based on Andrej Karpathy's nanoGPT architecture
  • Data Generation: Powered by xAI Grok API
  • Tokenizer: Standard GPT-2 tokenizer (HuggingFace GPT2TokenizerFast)
  • Infrastructure: Trained on NVIDIA RTX 4070 12GB

Additional Resources


Model Version: 1.0
Release Date: February 2026
Last Updated: February 27, 2026

Downloads last month
10
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train merileijona/quantumgpt-124m