vmarcetic commited on
Commit
ef08be0
·
verified ·
1 Parent(s): caefeda

Update model card: simplify training data description

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -20,7 +20,7 @@ model-index:
20
 
21
  # qwen3-coder-30b-rails
22
 
23
- A 31B parameter Mixture-of-Experts model fine-tuned for **Ruby on Rails code generation**. Trained on 111,000 samples extracted from 45 Rails repositories 35 private client projects and 10 open-source codebases.
24
 
25
  Built by [Bytecode](https://bytecode.hr).
26
 
@@ -31,7 +31,7 @@ Built by [Bytecode](https://bytecode.hr).
31
  | Base model | [Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct) |
32
  | Architecture | Qwen3 MoE (31B total, 3B active) |
33
  | Training method | QLoRA (rank 16) via [Unsloth](https://github.com/unslothai/unsloth) |
34
- | Training data | 111K samples from 45 Rails repos |
35
  | Training cost | ~$32 (A100 80GB, ~26 hours) |
36
  | Quantization | GGUF Q4_K_M (18.6 GB), Q5_K_M (21.7 GB) |
37
 
@@ -71,7 +71,7 @@ Rule of thumb: GGUF file size + 2–4 GB for KV cache and overhead.
71
  Trained with LoRA (rank 16, alpha 16) on attention projection layers (`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`). Only 0.78% of parameters were trained.
72
 
73
  The dataset pipeline:
74
- 1. Extracted code from 45 Rails repos (35 private + 10 open-source)
75
  2. 15-step cleaning and deduplication pipeline
76
  3. 111K final training samples
77
  4. Includes 29 contrastive pairs (wrong way vs right way)
 
20
 
21
  # qwen3-coder-30b-rails
22
 
23
+ A 31B parameter Mixture-of-Experts model fine-tuned for **Ruby on Rails code generation**. Trained on 111,000 samples extracted from our own internal Rails projects.
24
 
25
  Built by [Bytecode](https://bytecode.hr).
26
 
 
31
  | Base model | [Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct) |
32
  | Architecture | Qwen3 MoE (31B total, 3B active) |
33
  | Training method | QLoRA (rank 16) via [Unsloth](https://github.com/unslothai/unsloth) |
34
+ | Training data | 111K samples from internal Rails projects |
35
  | Training cost | ~$32 (A100 80GB, ~26 hours) |
36
  | Quantization | GGUF Q4_K_M (18.6 GB), Q5_K_M (21.7 GB) |
37
 
 
71
  Trained with LoRA (rank 16, alpha 16) on attention projection layers (`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`). Only 0.78% of parameters were trained.
72
 
73
  The dataset pipeline:
74
+ 1. Extracted code from our internal Rails projects
75
  2. 15-step cleaning and deduplication pipeline
76
  3. 111K final training samples
77
  4. Includes 29 contrastive pairs (wrong way vs right way)