Rohanify's picture
Update README.md
f1e4b2e verified
|
Raw
History Blame Contribute Delete
4.7 kB
metadata
license: apache-2.0
language:
  - en
tags:
  - code
  - python
  - text-generation
  - gpt2
  - from-scratch
  - small-model
  - code-generation
pipeline_tag: text-generation

๐ŸŒธ PyBlissa-Coder-40M

!! 14.6% SCORE ON HumanEval PASS@1 !!

PyBlissa-Coder-40M is the second model from the PyBlissa-Coder family that mainly supports python coding. Despite its small footprint, 40M parameters, trained on 272M tokens, PyBlissa has achieved an amazing score of 14.6% on the HumanEval dataset; and 4.4% on MBPP dataset (both being benchmark datasets). While scoring such a good number, it's imperfections are also something to be aware of. This model can sometimes generate wrong, inefficient, or broken codes. Though its mostly temperature-dependent.

PyBlissa-Coder-40M

Training curve

Benchmarks

Benchmark Score Protocol Temp
HumanEval pass@1 14.6% (24/164) zero-shot, fenced-code extraction 0.25
MBPP pass@1 4.4% (22/500) official tests-in-prompt (Austin et al. 2021) 0.05

How PyBlissa compares on HumanEval

Model Params HumanEval pass@1
GPT-Neo 125M 0.75%
CodeParrot-small 110M 3.80%
PyCodeGPT 110M 8.33%
PyBlissa-Coder 40M 14.6%

PyBlissa is ~2.75ร— smaller than CodeParrot-small yet scores roughly 4ร— higher on HumanEval pass@1, trained on a single consumer GPU.


Model details

Architecture Decoder-only transformer (GPT-2 style, nanoGPT lineage)
Parameters 39.9M
Layers 10
Model dim (d_model) 512
Heads 8 (head_dim 64)
FFN dim (d_ff) 2048
Context length 512 tokens
Vocab size 16,000 (custom ByteLevel BPE)
Tied embeddings Yes
Precision trained in bf16, released as F32 GGUF
Best val loss 0.3615

Training

Hardware 1 ร— NVIDIA RTX 5080 (16 GB)
Training tokens 272M (train split)
Epochs 5
Optimizer AdamW (ฮฒ 0.9/0.95, wd 0.1)
LR schedule cosine, 4e-4 โ†’ 4e-5, ~2% warmup
Batch size 48
Total steps 55,405
Wall-clock time ~116 min

Usage

Ollama

ollama run hf.co/Rohanify/PyBlissa-Coder-40M:F32

The repo ships template and params files, so Ollama applies the correct PROMPT:/CODE: format and sampling defaults automatically โ€” no Modelfile needed for remote runs.

To run a local GGUF instead:

ollama create pyblissa-40m -f Modelfile
ollama run pyblissa-40m "write a function that checks if a number is prime"

Prompt format

The model was trained on a plain-text wrapper. At inference, the prompt is wrapped as:

PROMPT: {your instruction}
CODE:

The model then emits a fenced ```python code block. (When using Ollama, the template file does this wrapping for you โ€” just type a plain instruction.)

Recommended sampling

Parameter Value
temperature 0.25 โ€“ 0.3
top_k 10
repeat_penalty 1.25
num_ctx 512

Limitations

PyBlissa is a 40M-parameter model trained primarily for prompt โ†’ Python generation. Known limitations:

  • It is a small model: it solves short, self-contained functions well but struggles with multi-step or library-heavy tasks.
  • It sometimes omits import statements for stdlib modules it uses (math, re, hashlib, etc.).
  • It can occasionally emit a short natural-language preamble before the code block on harder prompts.
  • Code explanation and non-Python tasks are out of distribution โ€” it may attempt them, but that is not what it was trained for.
  • As with any code model, review and test generated code before running it.

Training data & attribution

This model was trained on the following datasets. Per their licenses, attribution is provided here:

No OpenAI-derived data was used in training.


License

The model weights are released under Apache-2.0. Note that the training data carries its own licenses (CC-BY-4.0 and MIT, see above), which require attribution as provided.


@misc{pyblissa2026,
  title  = {PyBlissa-Coder-40M: A from-scratch Python code model},
  author = {Rohan},
  year   = {2026},
  howpublished = {\url{https://huggingface.co/Rohanify/PyBlissa-Coder-40M}}
}