Instructions to use shohuu/Pyroton with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use shohuu/Pyroton with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="shohuu/Pyroton",
	filename="pyroton-q4.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use shohuu/Pyroton with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf shohuu/Pyroton
# Run inference directly in the terminal:
llama cli -hf shohuu/Pyroton

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf shohuu/Pyroton
# Run inference directly in the terminal:
llama cli -hf shohuu/Pyroton

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf shohuu/Pyroton
# Run inference directly in the terminal:
./llama-cli -hf shohuu/Pyroton

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf shohuu/Pyroton
# Run inference directly in the terminal:
./build/bin/llama-cli -hf shohuu/Pyroton

Use Docker

docker model run hf.co/shohuu/Pyroton

LM Studio
Jan
Ollama
How to use shohuu/Pyroton with Ollama:
```
ollama run hf.co/shohuu/Pyroton
```

Unsloth Studio

How to use shohuu/Pyroton with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for shohuu/Pyroton to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for shohuu/Pyroton to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for shohuu/Pyroton to start chatting

How to use shohuu/Pyroton with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf shohuu/Pyroton

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "shohuu/Pyroton"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use shohuu/Pyroton with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf shohuu/Pyroton

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default shohuu/Pyroton

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use shohuu/Pyroton with Docker Model Runner:
```
docker model run hf.co/shohuu/Pyroton
```

Lemonade

How to use shohuu/Pyroton with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull shohuu/Pyroton

Run and chat with the model

lemonade run user.Pyroton-{{QUANT_TAG}}

List all available models

lemonade list

Pyroton / README.md

shohuu

Update README.md

7b61771 verified 5 days ago

preview code

Raw

History Blame Contribute Delete

5.44 kB

	---
	language:
	- en
	license: apache-2.0
	tags:
	- code
	- python
	- lora
	- qwen
	- fine-tuned
	- code-generation
	base_model: Qwen/Qwen2.5-Coder-0.5B-Instruct
	---

	<div align="center">

	# 🔥 Pyroton

	> A lightweight Python code generation model fine-tuned from Qwen2.5-Coder-0.5B-Instruct.

	![Python](https://img.shields.io/badge/Python-3.12-blue?logo=python)
	![License](https://img.shields.io/badge/License-Apache%202.0-green)
	![Status](https://img.shields.io/badge/Status-Active-brightgreen)
	![Downloads](https://img.shields.io/badge/Downloads-1.6k%2B-purple)

	</div>

	---

	## Overview

	Pyroton is a lightweight Python-focused code generation model fine-tuned from [Qwen/Qwen2.5-Coder-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B-Instruct) using supervised fine-tuning (SFT) on Python instruction-style datasets.

	The goal is to create a small, efficient model that handles easy to medium Python tasks while remaining practical for free-tier GPUs and lightweight deployment including mobile phones.

	---

	## Model Variants

	### Base adapter
	- [shohuu/pyroton](https://huggingface.co/shohuu/pyroton) — general Python instruction-tuned adapter

	### Prime-fix patched adapter
	- [shohuu/pyroton-primefix-v3](https://huggingface.co/shohuu/pyroton-primefix-v3) — targeted repair finetuning for correctness bugs (recommended)

	---

	## Quick Start

	### Load latest patched adapter (recommended)

	```python
	from peft import PeftModel
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	base = AutoModelForCausalLM.from_pretrained(
	"Qwen/Qwen2.5-Coder-0.5B-Instruct",
	dtype=torch.bfloat16,
	device_map="auto",
	)

	model = PeftModel.from_pretrained(base, "shohuu/pyroton-primefix-v3")
	tokenizer = AutoTokenizer.from_pretrained("shohuu/pyroton-primefix-v3")
	tokenizer.pad_token = tokenizer.eos_token

	prompt = "### Instruction:\nWrite a Python function to reverse a string\n\n### Response:\n"
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

	outputs = model.generate(
	**inputs,
	max_new_tokens=220,
	do_sample=False,
	repetition_penalty=1.1,
	)

	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	### Run GGUF locally (Ollama / LM Studio / PocketPal)

	```bash
	# Download pyroton-q4.gguf from Files tab
	# Then load in your preferred local LLM app
	```

	---

	## Recommended Inference Settings

	### Correctness mode (recommended)
	Most reliable for code generation:

	```python
	outputs = model.generate(
	**inputs,
	max_new_tokens=220,
	do_sample=False,
	repetition_penalty=1.1,
	)
	```

	### Sampled mode (more variety, less stable)

	```python
	outputs = model.generate(
	**inputs,
	max_new_tokens=220,
	do_sample=True,
	temperature=0.1,
	top_p=0.9,
	repetition_penalty=1.2,
	)
	```

	---

	## Example Output

	Prompt:
	```
	Write a Python function to check if a number is prime
	```

	Pyroton Output:
	```python
	import math

	def is_prime(n):
	"""Check if the given integer n is prime."""
	if n <= 1:
	return False
	for i in range(2, int(math.sqrt(n)) + 1):
	if n % i == 0:
	return False
	return True
	```

	---

	## Training Details

	\| Setting \| Value \|
	\|---\|---\|
	\| Base Model \| Qwen2.5-Coder-0.5B-Instruct \|
	\| Datasets \| python_code_instructions_18k_alpaca, CodeAlpaca-20k, code_instructions_122k_alpaca_style \|
	\| Total Samples \| ~95,362 (Python-filtered) \|
	\| Training Strategy \| Chunked SFT (5 chunks) \|
	\| LoRA Rank \| 16 \|
	\| LoRA Alpha \| 32 \|
	\| Batch Size \| 2 \|
	\| Gradient Accumulation \| 8 \|
	\| Learning Rate \| 1e-4 \|
	\| Precision \| BFloat16 \|
	\| Max Length \| 512 \|
	\| Final Training Loss \| ~0.712 \|

	### Repair finetuning
	After main training, targeted repair finetuning was applied to fix:
	- Missing `import math` / `math.sqrt` issues
	- Incorrect handling of edge cases (negative numbers, 0, 1)
	- Latest patched adapter: `shohuu/pyroton-primefix-v3`

	### Evaluation
	Tested against execution-based harness on `is_prime()` with inputs: `-1, 0, 1, 2, 3, 4, 6, 9, 17, 49`
	- Greedy decoding: 5/5 passing ✅
	- Sampled decoding: improved but less stable

	---

	## GGUF / Mobile Deployment

	Pyroton is available as a GGUF file for local deployment:

	\| File \| Quantization \| Size \|
	\|---\|---\|---\|
	\| `pyroton-q4.gguf` \| Q4_K_M \| ~397MB \|

	Compatible apps:
	- PocketPal AI (Android/iOS) — search `shohuu/Pyroton`
	- LM Studio (Desktop)
	- Ollama (Desktop)

	---

	## Known Limitations

	- 0.5B model — can degrade on harder tasks or complex reasoning
	- Greedy decoding is more reliable than sampling for correctness
	- Most thoroughly tested on short Python coding tasks
	- Broader evaluation across more libraries still needed

	---

	## GitHub

	[github.com/TunasTuna/pyroton](https://github.com/TunasTuna/pyroton)

	---

	## Requirements

	```
	transformers
	datasets
	trl
	peft
	bitsandbytes
	accelerate
	torchao
	```

	---

	## License

	Apache 2.0 — see [LICENSE](https://github.com/TunasTuna/pyroton/blob/main/LICENSE) for details.

	Base model (Qwen2.5-Coder) is also Apache 2.0. Attribution to Alibaba Cloud / Qwen Team.

	---

	## Acknowledgements

	- [Qwen Team](https://huggingface.co/Qwen) for the base model
	- [iamtarun](https://huggingface.co/datasets/iamtarun/python_code_instructions_18k_alpaca) for the original dataset
	- [Hugging Face](https://huggingface.co/) for the training ecosystem
	- My friend [Yumi](https://www.tiktok.com/@yumi_naomi6?_r=1&_t=ZS-96ok4qcUKij) for the name 🔥