Instructions to use Rohanify/PyBlissa-Coder-40M-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use Rohanify/PyBlissa-Coder-40M-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Rohanify/PyBlissa-Coder-40M-GGUF", filename="PyBlissa-V2-F32.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use Rohanify/PyBlissa-Coder-40M-GGUF with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf Rohanify/PyBlissa-Coder-40M-GGUF:F32 # Run inference directly in the terminal: llama cli -hf Rohanify/PyBlissa-Coder-40M-GGUF:F32
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf Rohanify/PyBlissa-Coder-40M-GGUF:F32 # Run inference directly in the terminal: llama cli -hf Rohanify/PyBlissa-Coder-40M-GGUF:F32
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Rohanify/PyBlissa-Coder-40M-GGUF:F32 # Run inference directly in the terminal: ./llama-cli -hf Rohanify/PyBlissa-Coder-40M-GGUF:F32
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Rohanify/PyBlissa-Coder-40M-GGUF:F32 # Run inference directly in the terminal: ./build/bin/llama-cli -hf Rohanify/PyBlissa-Coder-40M-GGUF:F32
Use Docker
docker model run hf.co/Rohanify/PyBlissa-Coder-40M-GGUF:F32
- LM Studio
- Jan
- vLLM
How to use Rohanify/PyBlissa-Coder-40M-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Rohanify/PyBlissa-Coder-40M-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Rohanify/PyBlissa-Coder-40M-GGUF", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Rohanify/PyBlissa-Coder-40M-GGUF:F32
- Ollama
How to use Rohanify/PyBlissa-Coder-40M-GGUF with Ollama:
ollama run hf.co/Rohanify/PyBlissa-Coder-40M-GGUF:F32
- Unsloth Studio
How to use Rohanify/PyBlissa-Coder-40M-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Rohanify/PyBlissa-Coder-40M-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Rohanify/PyBlissa-Coder-40M-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Rohanify/PyBlissa-Coder-40M-GGUF to start chatting
- Atomic Chat new
- Docker Model Runner
How to use Rohanify/PyBlissa-Coder-40M-GGUF with Docker Model Runner:
docker model run hf.co/Rohanify/PyBlissa-Coder-40M-GGUF:F32
- Lemonade
How to use Rohanify/PyBlissa-Coder-40M-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Rohanify/PyBlissa-Coder-40M-GGUF:F32
Run and chat with the model
lemonade run user.PyBlissa-Coder-40M-GGUF-F32
List all available models
lemonade list
File size: 4,703 Bytes
ed2e60c ba4ef5e ed2e60c 2ad9b0f ba4ef5e f1e4b2e ba4ef5e | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 | ---
license: apache-2.0
language:
- en
tags:
- code
- python
- text-generation
- gpt2
- from-scratch
- small-model
- code-generation
pipeline_tag: text-generation
---
# 🌸 PyBlissa-Coder-40M
# !! 14.6% SCORE ON HumanEval PASS@1 !!
PyBlissa-Coder-40M is the second model from the PyBlissa-Coder family that mainly supports python coding.
Despite its small footprint, 40M parameters, trained on 272M tokens, PyBlissa has achieved an amazing score of 14.6%
on the HumanEval dataset; and 4.4% on MBPP dataset (both being benchmark datasets).
While scoring such a good number, it's imperfections are also something to be aware of.
This model can sometimes generate wrong, inefficient, or broken codes. Though its mostly temperature-dependent.
<p align="center">
<img src="pyblissa_banner.png" alt="PyBlissa-Coder-40M" width="100%">
</p>
<p align="center">
<img src="pyblissa_loss_curve.png" alt="Training curve" width="100%">
</p>
## Benchmarks
| Benchmark | Score | Protocol | Temp |
|-----------|-------|----------|------|
| HumanEval pass@1 | **14.6%** (24/164) | zero-shot, fenced-code extraction | 0.25 |
| MBPP pass@1 | **4.4%** (22/500) | official tests-in-prompt (Austin et al. 2021) | 0.05 |
### How PyBlissa compares on HumanEval
| Model | Params | HumanEval pass@1 |
|-------|--------|------------------|
| GPT-Neo | 125M | 0.75% |
| CodeParrot-small | 110M | 3.80% |
| PyCodeGPT | 110M | 8.33% |
| **PyBlissa-Coder** | **40M** | **14.6%** |
> PyBlissa is ~2.75× smaller than CodeParrot-small yet scores roughly 4× higher
> on HumanEval pass@1, trained on a single consumer GPU.
---
## Model details
| | |
|---|---|
| Architecture | Decoder-only transformer (GPT-2 style, nanoGPT lineage) |
| Parameters | 39.9M |
| Layers | 10 |
| Model dim (d_model) | 512 |
| Heads | 8 (head_dim 64) |
| FFN dim (d_ff) | 2048 |
| Context length | 512 tokens |
| Vocab size | 16,000 (custom ByteLevel BPE) |
| Tied embeddings | Yes |
| Precision | trained in bf16, released as F32 GGUF |
| Best val loss | 0.3615 |
### Training
| | |
|---|---|
| Hardware | 1 × NVIDIA RTX 5080 (16 GB) |
| Training tokens | 272M (train split) |
| Epochs | 5 |
| Optimizer | AdamW (β 0.9/0.95, wd 0.1) |
| LR schedule | cosine, 4e-4 → 4e-5, ~2% warmup |
| Batch size | 48 |
| Total steps | 55,405 |
| Wall-clock time | ~116 min |
---
## Usage
### Ollama
```bash
ollama run hf.co/Rohanify/PyBlissa-Coder-40M:F32
```
The repo ships `template` and `params` files, so Ollama applies the correct
`PROMPT:`/`CODE:` format and sampling defaults automatically — no Modelfile
needed for remote runs.
To run a local GGUF instead:
```bash
ollama create pyblissa-40m -f Modelfile
ollama run pyblissa-40m "write a function that checks if a number is prime"
```
### Prompt format
The model was trained on a plain-text wrapper. At inference, the prompt is
wrapped as:
```
PROMPT: {your instruction}
CODE:
```
The model then emits a fenced ```python code block. (When using Ollama, the
`template` file does this wrapping for you — just type a plain instruction.)
### Recommended sampling
| Parameter | Value |
|-----------|-------|
| temperature | 0.25 – 0.3 |
| top_k | 10 |
| repeat_penalty | 1.25 |
| num_ctx | 512 |
---
## Limitations
PyBlissa is a 40M-parameter model trained primarily for **prompt → Python
generation**. Known limitations:
- It is a small model: it solves short, self-contained functions well but
struggles with multi-step or library-heavy tasks.
- It sometimes omits `import` statements for stdlib modules it uses
(`math`, `re`, `hashlib`, etc.).
- It can occasionally emit a short natural-language preamble before the code
block on harder prompts.
- Code explanation and non-Python tasks are out of distribution — it may
attempt them, but that is not what it was trained for.
- As with any code model, **review and test generated code before running it.**
---
## Training data & attribution
This model was trained on the following datasets. Per their licenses,
attribution is provided here:
- **nvidia/OpenCodeInstruct** — CC-BY-4.0
https://huggingface.co/datasets/nvidia/OpenCodeInstruct
- **flytech/python-codes-25k** — MIT
https://huggingface.co/datasets/flytech/python-codes-25k
No OpenAI-derived data was used in training.
---
## License
The model weights are released under **Apache-2.0**. Note that the training
data carries its own licenses (CC-BY-4.0 and MIT, see above), which require
attribution as provided.
---
```bibtex
@misc{pyblissa2026,
title = {PyBlissa-Coder-40M: A from-scratch Python code model},
author = {Rohan},
year = {2026},
howpublished = {\url{https://huggingface.co/Rohanify/PyBlissa-Coder-40M}}
}
``` |