Update model card: simplify training data description
Browse files
README.md
CHANGED
|
@@ -20,7 +20,7 @@ model-index:
|
|
| 20 |
|
| 21 |
# qwen3-coder-30b-rails
|
| 22 |
|
| 23 |
-
A 31B parameter Mixture-of-Experts model fine-tuned for **Ruby on Rails code generation**. Trained on 111,000 samples extracted from
|
| 24 |
|
| 25 |
Built by [Bytecode](https://bytecode.hr).
|
| 26 |
|
|
@@ -31,7 +31,7 @@ Built by [Bytecode](https://bytecode.hr).
|
|
| 31 |
| Base model | [Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct) |
|
| 32 |
| Architecture | Qwen3 MoE (31B total, 3B active) |
|
| 33 |
| Training method | QLoRA (rank 16) via [Unsloth](https://github.com/unslothai/unsloth) |
|
| 34 |
-
| Training data | 111K samples from
|
| 35 |
| Training cost | ~$32 (A100 80GB, ~26 hours) |
|
| 36 |
| Quantization | GGUF Q4_K_M (18.6 GB), Q5_K_M (21.7 GB) |
|
| 37 |
|
|
@@ -71,7 +71,7 @@ Rule of thumb: GGUF file size + 2–4 GB for KV cache and overhead.
|
|
| 71 |
Trained with LoRA (rank 16, alpha 16) on attention projection layers (`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`). Only 0.78% of parameters were trained.
|
| 72 |
|
| 73 |
The dataset pipeline:
|
| 74 |
-
1. Extracted code from
|
| 75 |
2. 15-step cleaning and deduplication pipeline
|
| 76 |
3. 111K final training samples
|
| 77 |
4. Includes 29 contrastive pairs (wrong way vs right way)
|
|
|
|
| 20 |
|
| 21 |
# qwen3-coder-30b-rails
|
| 22 |
|
| 23 |
+
A 31B parameter Mixture-of-Experts model fine-tuned for **Ruby on Rails code generation**. Trained on 111,000 samples extracted from our own internal Rails projects.
|
| 24 |
|
| 25 |
Built by [Bytecode](https://bytecode.hr).
|
| 26 |
|
|
|
|
| 31 |
| Base model | [Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct) |
|
| 32 |
| Architecture | Qwen3 MoE (31B total, 3B active) |
|
| 33 |
| Training method | QLoRA (rank 16) via [Unsloth](https://github.com/unslothai/unsloth) |
|
| 34 |
+
| Training data | 111K samples from internal Rails projects |
|
| 35 |
| Training cost | ~$32 (A100 80GB, ~26 hours) |
|
| 36 |
| Quantization | GGUF Q4_K_M (18.6 GB), Q5_K_M (21.7 GB) |
|
| 37 |
|
|
|
|
| 71 |
Trained with LoRA (rank 16, alpha 16) on attention projection layers (`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`). Only 0.78% of parameters were trained.
|
| 72 |
|
| 73 |
The dataset pipeline:
|
| 74 |
+
1. Extracted code from our internal Rails projects
|
| 75 |
2. 15-step cleaning and deduplication pipeline
|
| 76 |
3. 111K final training samples
|
| 77 |
4. Includes 29 contrastive pairs (wrong way vs right way)
|