| --- |
| language: |
| - en |
| license: apache-2.0 |
| library_name: transformers |
| tags: |
| - ruby |
| - rails |
| - code-generation |
| - gguf |
| - fine-tuned |
| - lora |
| - unsloth |
| pipeline_tag: text-generation |
| base_model: Qwen/Qwen3-8B |
| model-index: |
| - name: qwen3-8b-rails |
| results: [] |
| --- |
| |
| # qwen3-8b-rails |
|
|
| An 8B parameter dense model fine-tuned for **Ruby on Rails code generation**. Trained on 111,000 samples extracted from our own internal Rails projects. Small enough to run on a laptop. |
|
|
| Built by [Bytecode](https://bytecode.hr). |
|
|
| ## Model Details |
|
|
| | Property | Value | |
| |---|---| |
| | Base model | [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) | |
| | Architecture | Qwen3 dense (8B parameters) | |
| | Training method | QLoRA (rank 16) via [Unsloth](https://github.com/unslothai/unsloth) | |
| | Training data | 111K samples from internal Rails projects | |
| | Training cost | ~$21 (A100 80GB, ~17 hours) | |
| | Quantization | GGUF Q4_K_M (5.03 GB) | |
|
|
| ## What it does |
|
|
| This model writes idiomatic Ruby on Rails code following specific conventions: |
|
|
| - Devise authentication |
| - Namespaced concerns instead of service objects |
| - Sidekiq instead of Solid Queue |
| - State-as-records instead of boolean flags |
| - DaisyUI drawer layouts instead of ActiveAdmin |
|
|
| The 8B model is the lightweight option β fast enough for inline code completion, small enough to run alongside your development server without swapping. |
|
|
| ## Usage with Ollama |
|
|
| ```bash |
| # Download and run |
| ollama run bytecodehr/qwen3-8b-rails |
| |
| # Example prompt |
| ollama run bytecodehr/qwen3-8b-rails "Write a Rails migration for a subscriptions table with plan, status, and billing cycle" |
| ``` |
|
|
| ### Memory requirements |
|
|
| | Format | GGUF Size | Min RAM | Recommended | |
| |---|---|---|---| |
| | Q4_K_M | 5.03 GB | 8 GB | 16 GB | |
|
|
| Fits comfortably on any modern laptop. GGUF file size + 2β3 GB for KV cache. |
|
|
| ## Training |
|
|
| Trained with LoRA (rank 16, alpha 16) on attention projection layers. Only 0.78% of parameters were trained. The full training run took ~17 hours on a single A100 80GB GPU. |
|
|
| The dataset: |
| 1. Our internal Rails projects |
| 2. 15-step cleaning and deduplication pipeline |
| 3. 111K final training samples with contrastive pairs |
| 4. Source diversity cap at 20% per repository |
|
|
| Full details in our blog posts: |
| - [Part 1: Dataset Engineering](https://bytecode.hr/posts/training-rails-llms-part-1-dataset-engineering) |
| - [Part 2: Training, Quantization, and Deployment](https://bytecode.hr/posts/training-rails-llms-part-2-training-quantization-deployment) |
|
|
| ## Why Ruby for LLMs? |
|
|
| Ruby uses 42β45% fewer tokens than TypeScript across every major LLM tokenizer. Fewer tokens means more code in the context window, faster generations, and lower costs. Read our analysis: [Why Ruby Is the Better Language for LLM-Powered Development](https://bytecode.hr/posts/why-ruby-is-the-better-language-for-llm-powered-development). |
|
|
| ## Other models |
|
|
| - [bytecodehr/qwen3-coder-30b-rails](https://huggingface.co/bytecodehr/qwen3-coder-30b-rails) β 31B MoE flagship model (18β21 GB) |
| - [bytecodehr/qwen2.5-coder-7b-rails](https://huggingface.co/bytecodehr/qwen2.5-coder-7b-rails) β 7B LoRA adapter |
| - [bytecodehr/qwen2.5-coder-3b-rails](https://huggingface.co/bytecodehr/qwen2.5-coder-3b-rails) β 3B LoRA adapter |