TendieMuncher commited on
Commit
e9e1ac2
·
verified ·
1 Parent(s): 2d95580

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +158 -0
README.md ADDED
@@ -0,0 +1,158 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: gemma
5
+ base_model: google/gemma-4-31B-it
6
+ tags:
7
+ - gemma
8
+ - gemma-4
9
+ - fine-tuned
10
+ - lora
11
+ - qlora
12
+ - assistant
13
+ - orchestrator
14
+ - tendielabs
15
+ pipeline_tag: text-generation
16
+ datasets:
17
+ - microsoft/rStar-Coder
18
+ - Crownelius/Opus-4.6-Reasoning-3300x
19
+ - Crownelius/High-Coder-Reasoning-Multi-Turn
20
+ - NickyNicky/Code-290k
21
+ - Crownelius/Opus-4.5-Writing-Style-formatted
22
+ - Crownelius/GLM-5.0-25000x
23
+ ---
24
+
25
+ # Capybara-31B
26
+ > **Beta / WIP.** This is an experimental release made to validate the fine-tune process and test behavior on real hardware. It is not a production-ready model. Expect rough edges, and treat evaluation results as preliminary.
27
+
28
+ **TendieLabs/Capybara-31B** is a fine-tuned version of `google/gemma-4-31B-it`, trained to be a better local orchestrator and assistant. The primary goal was not to maximize raw code generation but to produce a model that reasons well, communicates clearly, knows when to delegate, and stays honest under pressure.
29
+
30
+ GGUF variants: [TendieLabs/Capybara-31B-GGUFS](https://huggingface.co/TendieLabs/Capybara-31B-GGUFS)
31
+
32
+ ---
33
+
34
+ ## Model Description
35
+
36
+ Capybara-31B is built for the front-desk orchestrator role in a multi-agent setup. It handles ordinary requests, summarizes messy context, routes and decomposes tasks, and delegates complex implementation to specialist agents. The personality target was Claude Sonnet, prioritizing directness, structure, and honesty over verbose performance.
37
+
38
+ This is not a coding model. It is an assistant model with sharpened coding judgment. The distinction matters: it should analyze and review code well, but it should route heavy implementation work instead of attempting it alone.
39
+
40
+ | Property | Value |
41
+ |---|---|
42
+ | Base model | `google/gemma-4-31B-it` |
43
+ | Model family | Gemma 4 (dense) |
44
+ | Fine-tune method | QLoRA (LoRA over 4-bit base) |
45
+ | Context window | 2048 tokens (first run, conservative) |
46
+ | Primary role | Local orchestrator / front-desk assistant |
47
+
48
+ ---
49
+
50
+ ## Intended Use
51
+
52
+ **Good fits:**
53
+ - Answering general assistant requests clearly and concisely
54
+ - Summarizing messy notes, project context, or requirements
55
+ - Decomposing tasks and routing work to appropriate specialists
56
+ - Code review, debugging analysis, and implementation advice
57
+ - Handling ambiguity by asking one focused clarifying question instead of guessing
58
+
59
+ **Not intended for:**
60
+ - Autonomous multi-file repo editing
61
+ - Large-scale code generation without a specialist downstream
62
+ - Replacing a dedicated coding model for implementation-heavy tasks
63
+
64
+ ---
65
+
66
+ ## Training Details
67
+
68
+ ### Dataset Mix
69
+
70
+ The training mix was weighted toward assistant behavior, routing, and summarization rather than code generation.
71
+
72
+ | Source | Role | Weight |
73
+ |---|---|---|
74
+ | Crownelius/Opus-4.6-Reasoning-3300x | Reasoning quality, structure, helpfulness | 18% |
75
+ | Crownelius/High-Coder-Reasoning-Multi-Turn | Debugging judgment, code analysis, multi-turn | 18% |
76
+ | microsoft/rStar-Coder | Harder reasoning and coding tasks | 15% |
77
+ | Custom routing / delegation set | Front-desk routing behavior | 15% |
78
+ | NickyNicky/Code-290k (filtered) | Code competence floor | 10% |
79
+ | Crownelius/Opus-4.5-Writing-Style-formatted | Tone and personality shaping | 10% |
80
+ | Custom summarization / context digestion set | Project-note compression, task extraction | 10% |
81
+ | Crownelius/GLM-5.0-25000x (filtered) | General reasoning filler | 4% |
82
+
83
+ Total training rows: 10K to 20K high-signal examples.
84
+
85
+ ### Training Configuration
86
+
87
+ | Hyperparameter | Value |
88
+ |---|---|
89
+ | Method | QLoRA |
90
+ | LoRA rank | 16-32 |
91
+ | LoRA alpha | 32-64 |
92
+ | Dropout | 0.0-0.05 |
93
+ | Learning rate | 1e-5 to 2e-5 |
94
+ | LR scheduler | Cosine |
95
+ | Warmup ratio | 2-3% |
96
+ | Epochs | 1 |
97
+ | Sequence length | 2048 |
98
+ | Batch size | 1 (gradient accumulation 4) |
99
+ | Gradient checkpointing | Unsloth |
100
+
101
+ ---
102
+
103
+ ## Hardware Requirements
104
+
105
+ Capybara-31B was developed and validated on an RTX 3090 (24 GB VRAM). At `IQ4_XS` quantization the model leaves approximately 3 GB of VRAM free on that card, making it a practical local-first deployment for a single consumer GPU.
106
+
107
+ | Quant | Approx VRAM | Recommended for |
108
+ |---|---|---|
109
+ | IQ4_XS | ~21 GB | RTX 3090, 4090, single-GPU setups |
110
+ | Q4_K_M | ~22 GB | RTX 3090, 4090 |
111
+ | Q5_K_M | ~24 GB | 24 GB cards (tight) |
112
+ | Q8_0 | ~34 GB | Dual-GPU or large VRAM server |
113
+ | F16 | ~62 GB | Server-grade hardware |
114
+
115
+ ---
116
+
117
+ ## GGUF Variants
118
+
119
+ Available at **TendieLabs/Capybara-31B-GGUFS**:
120
+
121
+ - `IQ4_XS` (recommended starting point)
122
+ - `IQ4_NL`
123
+ - `Q4_0`, `Q4_1`, `Q4_K_S`, `Q4_K_M`
124
+ - `Q5_0`, `Q5_1`, `Q5_K_S`, `Q5_K_M`
125
+ - `Q6_K`
126
+ - `Q8_0`, `Q8_1`
127
+ - `F16`
128
+
129
+ ---
130
+
131
+ ## Evaluation
132
+
133
+ The model was evaluated across the following dimensions before release:
134
+
135
+ - **Delegation accuracy**: does it route implementation-heavy work correctly instead of attempting it?
136
+ - **Honesty under uncertainty**: does it admit when context is missing rather than hallucinating answers?
137
+ - **Long-context summarization**: does it compress messy project notes into useful summaries?
138
+ - **Code review quality**: does it identify real issues, risks, and next steps?
139
+ - **Tone**: does the output feel like a capable, direct assistant rather than a verbose language model?
140
+
141
+ The key failure mode being screened against: improved tone alongside degraded judgment and worse delegation behavior.
142
+
143
+ ---
144
+
145
+ ## Limitations
146
+
147
+ - First-run adapter. Behavior targets are correct but some edge cases may need refinement in future versions.
148
+ - Sequence length was kept conservative (2048). Long-document tasks may need to be chunked.
149
+ - Gemma 4 tooling was relatively new at training time. Some export or serving quirks may apply depending on your inference stack.
150
+ - Not designed for multimodal tasks despite the Gemma 4 family's vision capabilities. Text-only fine-tune.
151
+
152
+ ---
153
+
154
+ ## License
155
+
156
+ This model is derived from `google/gemma-4-31B-it` and is released under the [Gemma Terms of Use](https://ai.google.dev/gemma/terms). Usage is subject to those terms.
157
+
158
+ ---