File size: 4,703 Bytes
ed2e60c
 
ba4ef5e
 
 
 
 
 
 
 
 
 
 
 
ed2e60c
2ad9b0f
ba4ef5e
f1e4b2e
ba4ef5e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
---
license: apache-2.0
language:
- en
tags:
- code
- python
- text-generation
- gpt2
- from-scratch
- small-model
- code-generation
pipeline_tag: text-generation

---
# 🌸 PyBlissa-Coder-40M
# !! 14.6% SCORE ON HumanEval PASS@1 !!

PyBlissa-Coder-40M is the second model from the PyBlissa-Coder family that mainly supports python coding.
Despite its small footprint, 40M parameters, trained on 272M tokens, PyBlissa has achieved an amazing score of 14.6%
on the HumanEval dataset; and 4.4% on MBPP dataset (both being benchmark datasets).
While scoring such a good number, it's imperfections are also something to be aware of.
This model can sometimes generate wrong, inefficient, or broken codes. Though its mostly temperature-dependent.


<p align="center">
  <img src="pyblissa_banner.png" alt="PyBlissa-Coder-40M" width="100%">
</p>

<p align="center">
  <img src="pyblissa_loss_curve.png" alt="Training curve" width="100%">
</p>


## Benchmarks

| Benchmark | Score | Protocol | Temp |
|-----------|-------|----------|------|
| HumanEval pass@1 | **14.6%** (24/164) | zero-shot, fenced-code extraction | 0.25 |
| MBPP pass@1 | **4.4%** (22/500) | official tests-in-prompt (Austin et al. 2021) | 0.05 |

### How PyBlissa compares on HumanEval

| Model | Params | HumanEval pass@1 |
|-------|--------|------------------|
| GPT-Neo | 125M | 0.75% |
| CodeParrot-small | 110M | 3.80% |
| PyCodeGPT | 110M | 8.33% |
| **PyBlissa-Coder** | **40M** | **14.6%** |

> PyBlissa is ~2.75× smaller than CodeParrot-small yet scores roughly 4× higher
> on HumanEval pass@1, trained on a single consumer GPU.
---

## Model details

| | |
|---|---|
| Architecture | Decoder-only transformer (GPT-2 style, nanoGPT lineage) |
| Parameters | 39.9M |
| Layers | 10 |
| Model dim (d_model) | 512 |
| Heads | 8 (head_dim 64) |
| FFN dim (d_ff) | 2048 |
| Context length | 512 tokens |
| Vocab size | 16,000 (custom ByteLevel BPE) |
| Tied embeddings | Yes |
| Precision | trained in bf16, released as F32 GGUF |
| Best val loss | 0.3615 |

### Training

| | |
|---|---|
| Hardware | 1 × NVIDIA RTX 5080 (16 GB) |
| Training tokens | 272M (train split) |
| Epochs | 5 |
| Optimizer | AdamW (β 0.9/0.95, wd 0.1) |
| LR schedule | cosine, 4e-4 → 4e-5, ~2% warmup |
| Batch size | 48 |
| Total steps | 55,405 |
| Wall-clock time | ~116 min |

---

## Usage

### Ollama

```bash
ollama run hf.co/Rohanify/PyBlissa-Coder-40M:F32
```

The repo ships `template` and `params` files, so Ollama applies the correct
`PROMPT:`/`CODE:` format and sampling defaults automatically — no Modelfile
needed for remote runs.

To run a local GGUF instead:

```bash
ollama create pyblissa-40m -f Modelfile
ollama run pyblissa-40m "write a function that checks if a number is prime"
```

### Prompt format

The model was trained on a plain-text wrapper. At inference, the prompt is
wrapped as:

```
PROMPT: {your instruction}
CODE:
```

The model then emits a fenced ```python code block. (When using Ollama, the
`template` file does this wrapping for you — just type a plain instruction.)

### Recommended sampling

| Parameter | Value |
|-----------|-------|
| temperature | 0.25 – 0.3 |
| top_k | 10 |
| repeat_penalty | 1.25 |
| num_ctx | 512 |

---

## Limitations

PyBlissa is a 40M-parameter model trained primarily for **prompt → Python
generation**. Known limitations:

- It is a small model: it solves short, self-contained functions well but
  struggles with multi-step or library-heavy tasks.
- It sometimes omits `import` statements for stdlib modules it uses
  (`math`, `re`, `hashlib`, etc.).
- It can occasionally emit a short natural-language preamble before the code
  block on harder prompts.
- Code explanation and non-Python tasks are out of distribution — it may
  attempt them, but that is not what it was trained for.
- As with any code model, **review and test generated code before running it.**

---

## Training data & attribution

This model was trained on the following datasets. Per their licenses,
attribution is provided here:

- **nvidia/OpenCodeInstruct** — CC-BY-4.0
  https://huggingface.co/datasets/nvidia/OpenCodeInstruct
- **flytech/python-codes-25k** — MIT
  https://huggingface.co/datasets/flytech/python-codes-25k

No OpenAI-derived data was used in training.

---

## License

The model weights are released under **Apache-2.0**. Note that the training
data carries its own licenses (CC-BY-4.0 and MIT, see above), which require
attribution as provided.

---

```bibtex
@misc{pyblissa2026,
  title  = {PyBlissa-Coder-40M: A from-scratch Python code model},
  author = {Rohan},
  year   = {2026},
  howpublished = {\url{https://huggingface.co/Rohanify/PyBlissa-Coder-40M}}
}
```