bytecodehr
/

qwen3-8b-rails

+---
+language:
+  - en
+license: apache-2.0
+library_name: transformers
+tags:
+  - ruby
+  - rails
+  - code-generation
+  - gguf
+  - fine-tuned
+  - lora
+  - unsloth
+pipeline_tag: text-generation
+base_model: Qwen/Qwen3-8B
+model-index:
+  - name: qwen3-8b-rails
+    results: []
+---
+# qwen3-8b-rails
+An 8B parameter dense model fine-tuned for **Ruby on Rails code generation**. Trained on 111,000 samples extracted from 45 Rails repositories. Small enough to run on a laptop.
+Built by [Bytecode](https://bytecode.hr).
+## Model Details
+| Property | Value |
+|---|---|
+| Base model | [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) |
+| Architecture | Qwen3 dense (8B parameters) |
+| Training method | QLoRA (rank 16) via [Unsloth](https://github.com/unslothai/unsloth) |
+| Training data | 111K samples from 45 Rails repos |
+| Training cost | ~$21 (A100 80GB, ~17 hours) |
+| Quantization | GGUF Q4_K_M (5.03 GB) |
+## What it does
+This model writes idiomatic Ruby on Rails code following specific conventions:
+- Custom authentication with Identity and MagicLink models (not Devise)
+- Namespaced concerns instead of service objects
+- Solid Queue instead of Sidekiq
+- State-as-records instead of boolean flags
+- DaisyUI drawer layouts instead of ActiveAdmin
+The 8B model is the lightweight option — fast enough for inline code completion, small enough to run alongside your development server without swapping.
+## Usage with Ollama
+```bash
+# Download and run
+ollama run bytecodehr/qwen3-8b-rails
+# Example prompt
+ollama run bytecodehr/qwen3-8b-rails "Write a Rails migration for a subscriptions table with plan, status, and billing cycle"
+```
+### Memory requirements
+| Format | GGUF Size | Min RAM | Recommended |
+|---|---|---|---|
+| Q4_K_M | 5.03 GB | 8 GB | 16 GB |
+Fits comfortably on any modern laptop. GGUF file size + 2–3 GB for KV cache.
+## Training
+Trained with LoRA (rank 16, alpha 16) on attention projection layers. Only 0.78% of parameters were trained. The full training run took ~17 hours on a single A100 80GB GPU.
+The dataset:
+1. 45 Rails repos (35 private + 10 open-source)
+2. 15-step cleaning and deduplication pipeline
+3. 111K final training samples with contrastive pairs
+4. Source diversity cap at 20% per repository
+Full details in our blog posts:
+- [Part 1: Dataset Engineering](https://bytecode.hr/posts/training-rails-llms-part-1-dataset-engineering)
+- [Part 2: Training, Quantization, and Deployment](https://bytecode.hr/posts/training-rails-llms-part-2-training-quantization-deployment)
+## Why Ruby for LLMs?
+Ruby uses 42–45% fewer tokens than TypeScript across every major LLM tokenizer. Fewer tokens means more code in the context window, faster generations, and lower costs. Read our analysis: [Why Ruby Is the Better Language for LLM-Powered Development](https://bytecode.hr/posts/why-ruby-is-the-better-language-for-llm-powered-development).
+## Other models
+- [bytecodehr/qwen3-coder-30b-rails](https://huggingface.co/bytecodehr/qwen3-coder-30b-rails) — 31B MoE flagship model (18–21 GB)
+- [bytecodehr/qwen2.5-coder-7b-rails](https://huggingface.co/bytecodehr/qwen2.5-coder-7b-rails) — 7B LoRA adapter
+- [bytecodehr/qwen2.5-coder-3b-rails](https://huggingface.co/bytecodehr/qwen2.5-coder-3b-rails) — 3B LoRA adapter