How to use from
llama.cpp
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf bytecodehr/qwen3-8b-rails:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf bytecodehr/qwen3-8b-rails:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf bytecodehr/qwen3-8b-rails:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf bytecodehr/qwen3-8b-rails:Q4_K_M
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf bytecodehr/qwen3-8b-rails:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf bytecodehr/qwen3-8b-rails:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf bytecodehr/qwen3-8b-rails:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf bytecodehr/qwen3-8b-rails:Q4_K_M
Use Docker
docker model run hf.co/bytecodehr/qwen3-8b-rails:Q4_K_M
Quick Links

qwen3-8b-rails

An 8B parameter dense model fine-tuned for Ruby on Rails code generation. Trained on 111,000 samples extracted from our own internal Rails projects. Small enough to run on a laptop.

Built by Bytecode.

Model Details

Property Value
Base model Qwen3-8B
Architecture Qwen3 dense (8B parameters)
Training method QLoRA (rank 16) via Unsloth
Training data 111K samples from internal Rails projects
Training cost ~$21 (A100 80GB, ~17 hours)
Quantization GGUF Q4_K_M (5.03 GB)

What it does

This model writes idiomatic Ruby on Rails code following specific conventions:

  • Devise authentication
  • Namespaced concerns instead of service objects
  • Sidekiq instead of Solid Queue
  • State-as-records instead of boolean flags
  • DaisyUI drawer layouts instead of ActiveAdmin

The 8B model is the lightweight option โ€” fast enough for inline code completion, small enough to run alongside your development server without swapping.

Usage with Ollama

# Download and run
ollama run bytecodehr/qwen3-8b-rails

# Example prompt
ollama run bytecodehr/qwen3-8b-rails "Write a Rails migration for a subscriptions table with plan, status, and billing cycle"

Memory requirements

Format GGUF Size Min RAM Recommended
Q4_K_M 5.03 GB 8 GB 16 GB

Fits comfortably on any modern laptop. GGUF file size + 2โ€“3 GB for KV cache.

Training

Trained with LoRA (rank 16, alpha 16) on attention projection layers. Only 0.78% of parameters were trained. The full training run took ~17 hours on a single A100 80GB GPU.

The dataset:

  1. Our internal Rails projects
  2. 15-step cleaning and deduplication pipeline
  3. 111K final training samples with contrastive pairs
  4. Source diversity cap at 20% per repository

Full details in our blog posts:

Why Ruby for LLMs?

Ruby uses 42โ€“45% fewer tokens than TypeScript across every major LLM tokenizer. Fewer tokens means more code in the context window, faster generations, and lower costs. Read our analysis: Why Ruby Is the Better Language for LLM-Powered Development.

Other models

Downloads last month
6
GGUF
Model size
8B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for bytecodehr/qwen3-8b-rails

Finetuned
Qwen/Qwen3-8B
Adapter
(1465)
this model