bytecodehr
/

qwen3-coder-30b-rails

@@ -20,7 +20,7 @@ model-index:
 # qwen3-coder-30b-rails
-A 31B parameter Mixture-of-Experts model fine-tuned for **Ruby on Rails code generation**. Trained on 111,000 samples extracted from 45 Rails repositories — 35 private client projects and 10 open-source codebases.
 Built by [Bytecode](https://bytecode.hr).
@@ -31,7 +31,7 @@ Built by [Bytecode](https://bytecode.hr).
 | Base model | [Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct) |
 | Architecture | Qwen3 MoE (31B total, 3B active) |
 | Training method | QLoRA (rank 16) via [Unsloth](https://github.com/unslothai/unsloth) |
-| Training data | 111K samples from 45 Rails repos |
 | Training cost | ~$32 (A100 80GB, ~26 hours) |
 | Quantization | GGUF Q4_K_M (18.6 GB), Q5_K_M (21.7 GB) |
@@ -71,7 +71,7 @@ Rule of thumb: GGUF file size + 2–4 GB for KV cache and overhead.
 Trained with LoRA (rank 16, alpha 16) on attention projection layers (`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`). Only 0.78% of parameters were trained.
 The dataset pipeline:
-1. Extracted code from 45 Rails repos (35 private + 10 open-source)
 2. 15-step cleaning and deduplication pipeline
 3. 111K final training samples
 4. Includes 29 contrastive pairs (wrong way vs right way)

 # qwen3-coder-30b-rails
+A 31B parameter Mixture-of-Experts model fine-tuned for **Ruby on Rails code generation**. Trained on 111,000 samples extracted from our own internal Rails projects.
 Built by [Bytecode](https://bytecode.hr).
 | Base model | [Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct) |
 | Architecture | Qwen3 MoE (31B total, 3B active) |
 | Training method | QLoRA (rank 16) via [Unsloth](https://github.com/unslothai/unsloth) |
+| Training data | 111K samples from internal Rails projects |
 | Training cost | ~$32 (A100 80GB, ~26 hours) |
 | Quantization | GGUF Q4_K_M (18.6 GB), Q5_K_M (21.7 GB) |
 Trained with LoRA (rank 16, alpha 16) on attention projection layers (`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`). Only 0.78% of parameters were trained.
 The dataset pipeline:
+1. Extracted code from our internal Rails projects
 2. 15-step cleaning and deduplication pipeline
 3. 111K final training samples
 4. Includes 29 contrastive pairs (wrong way vs right way)