NanoCodeGPT
A GPT-style language model built entirely from scratch using PyTorch — no pretrained weights, no APIs, just math and gradient descent.
Trained on 8,000 Python functions to complete code given a prompt.
Model Architecture
| Property | Value |
|---|---|
| Type | Decoder-only Transformer (GPT) |
| Parameters | ~10M |
| Layers | 6 Transformer blocks |
| Attention heads | 6 per block |
| Embedding dim | 384 |
| Context length | 256 tokens |
| Tokenizer | GPT-2 BPE (50,257 vocab) |
| Activation | GELU |
| Training steps | 5,000 |
| Final val loss | 2.97 |
Training
- Dataset:
flytech/python-codes-25k(first 8k examples) - Hardware: Google Colab T4 GPU (free tier)
- Time: ~58 minutes
- Optimizer: AdamW (lr=3e-4)
- Batch size: 32
Example Outputs
Prompt: def fibonacci(n):
def fibonacci(n):
if n < 0:
print("Incorrect input")
elif n == 1:
return 0
elif n == 2:
return 1
else:
return fibonacci(n-1) + fibonacci(n-2)
Prompt: def binary_search(arr, target):
def binary_search(arr, target):
low = 0
high = len(arr) - 1
while low <= high:
mid = (low + high) // 2
if arr[mid] == target:
return mid
else:
low = mid + 1
return -1