Update model card: simplify training data description
Browse files
README.md
CHANGED
|
@@ -20,7 +20,7 @@ model-index:
|
|
| 20 |
|
| 21 |
# qwen3-8b-rails
|
| 22 |
|
| 23 |
-
An 8B parameter dense model fine-tuned for **Ruby on Rails code generation**. Trained on 111,000 samples extracted from
|
| 24 |
|
| 25 |
Built by [Bytecode](https://bytecode.hr).
|
| 26 |
|
|
@@ -31,7 +31,7 @@ Built by [Bytecode](https://bytecode.hr).
|
|
| 31 |
| Base model | [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) |
|
| 32 |
| Architecture | Qwen3 dense (8B parameters) |
|
| 33 |
| Training method | QLoRA (rank 16) via [Unsloth](https://github.com/unslothai/unsloth) |
|
| 34 |
-
| Training data | 111K samples from
|
| 35 |
| Training cost | ~$21 (A100 80GB, ~17 hours) |
|
| 36 |
| Quantization | GGUF Q4_K_M (5.03 GB) |
|
| 37 |
|
|
@@ -70,7 +70,7 @@ Fits comfortably on any modern laptop. GGUF file size + 2–3 GB for KV cache.
|
|
| 70 |
Trained with LoRA (rank 16, alpha 16) on attention projection layers. Only 0.78% of parameters were trained. The full training run took ~17 hours on a single A100 80GB GPU.
|
| 71 |
|
| 72 |
The dataset:
|
| 73 |
-
1.
|
| 74 |
2. 15-step cleaning and deduplication pipeline
|
| 75 |
3. 111K final training samples with contrastive pairs
|
| 76 |
4. Source diversity cap at 20% per repository
|
|
|
|
| 20 |
|
| 21 |
# qwen3-8b-rails
|
| 22 |
|
| 23 |
+
An 8B parameter dense model fine-tuned for **Ruby on Rails code generation**. Trained on 111,000 samples extracted from our own internal Rails projects. Small enough to run on a laptop.
|
| 24 |
|
| 25 |
Built by [Bytecode](https://bytecode.hr).
|
| 26 |
|
|
|
|
| 31 |
| Base model | [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) |
|
| 32 |
| Architecture | Qwen3 dense (8B parameters) |
|
| 33 |
| Training method | QLoRA (rank 16) via [Unsloth](https://github.com/unslothai/unsloth) |
|
| 34 |
+
| Training data | 111K samples from internal Rails projects |
|
| 35 |
| Training cost | ~$21 (A100 80GB, ~17 hours) |
|
| 36 |
| Quantization | GGUF Q4_K_M (5.03 GB) |
|
| 37 |
|
|
|
|
| 70 |
Trained with LoRA (rank 16, alpha 16) on attention projection layers. Only 0.78% of parameters were trained. The full training run took ~17 hours on a single A100 80GB GPU.
|
| 71 |
|
| 72 |
The dataset:
|
| 73 |
+
1. Our internal Rails projects
|
| 74 |
2. 15-step cleaning and deduplication pipeline
|
| 75 |
3. 111K final training samples with contrastive pairs
|
| 76 |
4. Source diversity cap at 20% per repository
|