QuantumGPT-124M-v2: Quantum Circuit Generation Model

QuantumGPT-124M-v2 is a GPT-2 architecture language model trained from scratch on quantum circuit description β†’ OpenQASM 2.0 pairs. It is the second model in the QuantumGPT scaling series, trained on the expanded quantum-circuits-21k dataset (21,208 samples vs 8,129 in v1).

Compared to QuantumGPT-124M-v1, this model achieves pass@1 syntax validity of 95.8% (up from 67.2%) and pass@5 of 100%, as measured on the QuantumGPT Benchmark v1.0 β€” a statistically significant improvement (Fisher exact test, p=0.0016).


Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("merileijona/quantumgpt-124m-v2")
tokenizer = AutoTokenizer.from_pretrained("merileijona/quantumgpt-124m-v2")

prompt = "<|user|>Create a Bell state with two qubits<|end|>\n<|assistant|>"
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(
    **inputs,
    max_new_tokens=200,
    do_sample=True,
    temperature=0.8,
    top_k=50,
    repetition_penalty=1.1,
    pad_token_id=tokenizer.eos_token_id,
)

text = tokenizer.decode(outputs[0], skip_special_tokens=False)
response = text[len(prompt):]
if "<|end|>" in response:
    response = response[:response.index("<|end|>")]
print(response.strip())

Expected Output:

OPENQASM 2.0;
include "qelib1.inc";
qreg q[2];
creg c[2];
h q[0];
cx q[0],q[1];
measure q -> c;

Model Details

Architecture

Parameter Value
Base architecture GPT-2
Parameters 123.8M
Layers 12
Attention heads 12
Embedding dimension 768
Context length 256 tokens
Dropout (training) 0.2
Activation function GELU (standard)

Implementation Notes

  • Converted from NanoGPT-style training checkpoint
  • All Conv1D weights correctly transposed for HuggingFace compatibility
  • Bias tensors injected as zeros (bias-free architecture β†’ HF GPT2LMHeadModel compatibility)
  • Word embeddings tied with lm_head (tie_word_embeddings: true)

Training Configuration

Parameter Value
Training dataset quantum-circuits-21k
Training samples 21,208
Unique base circuits 1,928
Estimated training tokens ~1.75M
Training steps 2,000 (best checkpoint at step 1700)
Learning rate 3Γ—10⁻⁴ (cosine decay)
Effective batch size 64 sequences (16,384 tokens/step)
Hardware NVIDIA RTX 4070 12GB
Best validation loss 0.2502
v1 validation loss 0.2691

Benchmark Results

Evaluated on QuantumGPT Benchmark v1.0 β€” 100 prompts, 50 in-distribution / 50 out-of-distribution, 3 difficulty tiers, k=5 samples per prompt, seed=42.

Overall Performance

Metric QuantumGPT-124M-v1 QuantumGPT-124M-v2 Ξ”
Validation loss 0.2691 0.2502 βˆ’0.019
Syntax valid pass@1 67.2% 95.8% +28.6pp
Syntax valid pass@3 87.2% 99.9% +12.7pp
Syntax valid pass@5 91.0% 100.0% +9.0pp
Semantic valid pass@1 23.4% 46.2% +22.8pp
Semantic valid pass@5 48.0% 61.0% +13.0pp

Overall syntax improvement is statistically significant (Fisher exact, p=0.0016). Benchmark prompt suite hash: ee2da8a57e683af2464eb7a4eada0898.

By Category (syntax valid, pass@5)

Category v1 v2
Algorithm 7/10 10/10
Arithmetic 10/10 10/10
Error correction 10/10 10/10
Measurement 10/10 10/10
Multi-qubit complex 9/10 10/10
Single gate 10/10 10/10
State preparation 8/10 10/10
Three-qubit 9/10 10/10
Two-qubit basic 9/10 10/10
Variational 9/10 10/10

By Difficulty Tier

Tier n v1 v2
Easy 18 94.4% 100.0%
Medium 43 93.0% 100.0%
Hard 39 87.2% 100.0% (p=0.027)

Semantic Gap

Both models show a gap between syntactic and semantic validity β€” circuits that parse correctly but implement a trivial or incorrect unitary. This remains the primary open challenge at 124M scale.

Model Syntax valid Semantic valid Gap
v1 91% 48% 43pp
v2 100% 61% 39pp

Prompt Format

The model was trained using explicit conversation delimiters:

<|user|>{natural language description}<|end|>
<|assistant|>{OpenQASM 2.0 circuit}<|end|>

These markers are literal text tokens, not special tokenizer tokens. Always include the full prefix including <|assistant|> and stop generation at the first <|end|>.


Limitations

  1. Semantic correctness: 39pp gap between syntactic and semantic validity β€” circuits may parse correctly but implement incorrect unitaries. Always simulate before use.
  2. Context length: 256-token context limits very deep circuits.
  3. Synthetic training data: All training circuits generated by LLM (xAI Grok), not from real quantum programs.
  4. OOD generalisation: Improvement over v1 concentrates in in-distribution prompts; out-of-distribution generalisation remains a challenge.
  5. No hardware validation: Generated circuits require transpilation and validation before execution on real quantum hardware.

Intended Use

βœ… Educational tools and quantum computing demonstrations
βœ… Rapid circuit prototyping and exploration
βœ… QASM code completion assistance
βœ… Benchmarking quantum compilers and simulators
βœ… Research baseline for quantum circuit generation

❌ Production quantum computing workflows
❌ Hardware deployment without independent validation
❌ Safety-critical quantum applications


Scaling Series

Model Dataset Samples pass@1 syntax pass@5 syntax Val loss
QuantumGPT-124M-v1 quantum-circuits-8k 8,129 67.2% 91.0% 0.2691
QuantumGPT-124M-v2 (this model) quantum-circuits-21k 21,208 95.8% 100.0% 0.2502
QuantumGPT-354M quantum-circuits-21k 21,208 92.2% 99.0% 0.2677

Citation

@misc{quantumgpt124mv2,
  author    = {Merilehto, Juhani},
  title     = {QuantumGPT-124M-v2: Data Scaling Study for Quantum Circuit Generation},
  year      = {2026},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/merileijona/quantumgpt-124m-v2},
  note      = {GPT-2 124M trained on quantum-circuits-21k (21,208 samples).
               pass@1 syntax 95.8\%, pass@5 100.0\% on QuantumGPT Benchmark v1.0}
}

Model Card Authors

Juhani Merilehto

  • HuggingFace: @merileijona
  • GitHub: @juhanimerilehto
  • Affiliation: University of Vaasa, School of Management; University of Turku, Faculty of Technology

License

MIT License

Acknowledgments

  • Training framework: Andrej Karpathy's nanoGPT / nanochat architecture
  • Data generation: xAI Grok API (grok-4-2)
  • Tokenizer: Standard GPT-2 BPE (HuggingFace GPT2TokenizerFast)
  • Validation: Qiskit OpenQASM 2.0 parser
  • Hardware: NVIDIA RTX 4070 12GB / AMD Ryzen 9 5950X / 128GB RAM

Additional Resources


Model Version: 2.0
Release Date: March 2026

Downloads last month
126
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train merileijona/quantumgpt-124m-v2

Collection including merileijona/quantumgpt-124m-v2