vmarcetic commited on
Commit
045f9f9
·
verified ·
1 Parent(s): 308c282

Update model card: simplify training data description

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -20,7 +20,7 @@ model-index:
20
 
21
  # qwen3-8b-rails
22
 
23
- An 8B parameter dense model fine-tuned for **Ruby on Rails code generation**. Trained on 111,000 samples extracted from 45 Rails repositories. Small enough to run on a laptop.
24
 
25
  Built by [Bytecode](https://bytecode.hr).
26
 
@@ -31,7 +31,7 @@ Built by [Bytecode](https://bytecode.hr).
31
  | Base model | [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) |
32
  | Architecture | Qwen3 dense (8B parameters) |
33
  | Training method | QLoRA (rank 16) via [Unsloth](https://github.com/unslothai/unsloth) |
34
- | Training data | 111K samples from 45 Rails repos |
35
  | Training cost | ~$21 (A100 80GB, ~17 hours) |
36
  | Quantization | GGUF Q4_K_M (5.03 GB) |
37
 
@@ -70,7 +70,7 @@ Fits comfortably on any modern laptop. GGUF file size + 2–3 GB for KV cache.
70
  Trained with LoRA (rank 16, alpha 16) on attention projection layers. Only 0.78% of parameters were trained. The full training run took ~17 hours on a single A100 80GB GPU.
71
 
72
  The dataset:
73
- 1. 45 Rails repos (35 private + 10 open-source)
74
  2. 15-step cleaning and deduplication pipeline
75
  3. 111K final training samples with contrastive pairs
76
  4. Source diversity cap at 20% per repository
 
20
 
21
  # qwen3-8b-rails
22
 
23
+ An 8B parameter dense model fine-tuned for **Ruby on Rails code generation**. Trained on 111,000 samples extracted from our own internal Rails projects. Small enough to run on a laptop.
24
 
25
  Built by [Bytecode](https://bytecode.hr).
26
 
 
31
  | Base model | [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) |
32
  | Architecture | Qwen3 dense (8B parameters) |
33
  | Training method | QLoRA (rank 16) via [Unsloth](https://github.com/unslothai/unsloth) |
34
+ | Training data | 111K samples from internal Rails projects |
35
  | Training cost | ~$21 (A100 80GB, ~17 hours) |
36
  | Quantization | GGUF Q4_K_M (5.03 GB) |
37
 
 
70
  Trained with LoRA (rank 16, alpha 16) on attention projection layers. Only 0.78% of parameters were trained. The full training run took ~17 hours on a single A100 80GB GPU.
71
 
72
  The dataset:
73
+ 1. Our internal Rails projects
74
  2. 15-step cleaning and deduplication pipeline
75
  3. 111K final training samples with contrastive pairs
76
  4. Source diversity cap at 20% per repository