SQCU commited on
Commit
3724307
·
verified ·
1 Parent(s): d751a55

Initial upload: Gemma-3-270M and Qwen-0.5B fully fine-tuned BTRM models

Browse files
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ gemma_btrm/base_model/tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
+ qwen_btrm/base_model/tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,142 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # brainrot-partition-BTRM+
2
+
3
+ Multi-head Bradley-Terry Reward Models for situated dialogue classification.
4
+
5
+ **Full fine-tuned models** - both the base LLM weights AND the scoring heads have been trained.
6
+
7
+ ## What is this?
8
+
9
+ These are **7-head reward models** that score text along multiple dimensions simultaneously:
10
+
11
+ | Head | What it detects |
12
+ |------|-----------------|
13
+ | `skyrim` | Nordic fantasy RPG prose (Elder Scrolls V style) |
14
+ | `oblivion` | Imperial fantasy RPG prose (Elder Scrolls IV style) |
15
+ | `fonv` | Post-apocalyptic Western prose (Fallout: New Vegas style) |
16
+ | `gallia` | Franco-Roman bureaucratic fantasy (synthetic setting) |
17
+ | `marmotte` | Alpine corporate dystopia (synthetic setting) |
18
+ | `multiturn_dialogue` | Raw multi-turn dialogue (quoted speech) |
19
+ | `brainrot_aesop` | Vocabulary teaching passages with embedded definitions |
20
+
21
+ ## Available Models
22
+
23
+ ### 1. `qwen_btrm/` (962MB)
24
+ - Architecture: Qwen2.5-0.5B + 7 linear heads
25
+ - Full fine-tuned (all 500M parameters modified)
26
+ - Characteristics: Tighter score distribution, faster convergence
27
+
28
+ ### 2. `gemma_btrm/` (545MB)
29
+ - Architecture: Gemma-3-270M-IT + 7 linear heads
30
+ - Full fine-tuned (all 270M parameters modified)
31
+ - Characteristics: **50% more dynamic range**, better hard-negative rejection
32
+
33
+ ## Quick Start
34
+
35
+ ```bash
36
+ pip install torch transformers pyyaml
37
+ python compare.py --text "Your text here" --model gemma
38
+ ```
39
+
40
+ ## Pairwise Comparison (Recommended)
41
+
42
+ The 7-dimensional score vector can be overwhelming. Compare **pairs of heads** for intuitive results:
43
+
44
+ ```bash
45
+ # Is this more like Oblivion prose or raw dialogue?
46
+ python compare.py --text "The Imperial City gleamed in the morning light..." \
47
+ --heads oblivion,multiturn_dialogue
48
+
49
+ # Is this vocabulary teaching or regular fantasy prose?
50
+ python compare.py --text "The word quality means an essential attribute..." \
51
+ --heads brainrot_aesop,skyrim
52
+ ```
53
+
54
+ Output:
55
+ ```
56
+ Pairwise Comparison:
57
+ oblivion: +0.42 (Imperial fantasy RPG)
58
+ multiturn_dialogue: -0.31 (Raw quoted dialogue)
59
+ ─────────────────────
60
+ Δ = +0.73 → leans oblivion
61
+ ```
62
+
63
+ ## Architecture
64
+
65
+ Each model consists of:
66
+ 1. **Fine-tuned base LLM** - the entire transformer was trained, not frozen
67
+ 2. **7 linear scoring heads** - project hidden state → scalar score
68
+
69
+ ```
70
+ text → tokenize → fine_tuned_LLM → last_hidden_state[:,-1,:] → head_i → score_i
71
+ ```
72
+
73
+ ### Why full fine-tuning?
74
+
75
+ We trained with `use_lora: false`, meaning all base model parameters were updated via the Bradley-Terry loss. This allows the model to learn internal representations optimized for multi-head discrimination, not just surface-level features detectable by a linear probe.
76
+
77
+ ## Training Details
78
+
79
+ - **Loss**: `L = L_BT + 0.1 * log(r²)` (Bradley-Terry ranking + logsquare regularization)
80
+ - **Optimizer**: AdamW, lr=5e-5, 200-step linear warmup
81
+ - **Precision**: BF16 (stable for both Qwen and Gemma)
82
+ - **Data**: ~1100 positives across 7 heads, ~1200 hard/soft negatives
83
+
84
+ ### Key Finding: Base Model Receptivity
85
+
86
+ **Different base models respond differently to BTRM gradients.**
87
+
88
+ | Metric | Qwen 0.5B | Gemma 270M |
89
+ |--------|-----------|------------|
90
+ | Final loss | -0.66 | -0.14 |
91
+ | Score range | 2.12 | **3.25** |
92
+ | Hard neg (wiki) | moderate rejection | **strong rejection** |
93
+
94
+ Lower loss ≠ better discrimination. Gemma's wider dynamic range produces better contrast despite "worse" loss.
95
+
96
+ The effect of base model architecture on score distribution is **more significant** than head count or training data size.
97
+
98
+ ## File Structure
99
+
100
+ ```
101
+ qwen_btrm/
102
+ base_model/ # Full fine-tuned Qwen2.5-0.5B
103
+ btrm_heads.pt # Trained head weights
104
+ config.yaml # Training config
105
+
106
+ gemma_btrm/
107
+ base_model/ # Full fine-tuned Gemma-3-270M-IT
108
+ btrm_heads.pt # Trained head weights
109
+ config.yaml # Training config
110
+
111
+ compare.py # Inference script with pairwise comparison
112
+ README.md # This file
113
+ ```
114
+
115
+ ## Example Outputs
116
+
117
+ ```bash
118
+ $ python compare.py --text "Patrolling the Mojave almost makes you wish for a nuclear winter." --all
119
+
120
+ All Head Scores:
121
+ fonv +0.847 [──────────────│─────] Post-apocalyptic Western
122
+ multiturn_dialogue +0.234 [───────────│────────] Raw quoted dialogue
123
+ skyrim -0.156 [─────────│──────────] Nordic fantasy RPG
124
+ oblivion -0.289 [────────│───────────] Imperial fantasy RPG
125
+ brainrot_aesop -0.412 [───────│────────────] Vocabulary teaching
126
+ gallia -0.523 [──────│─────────────] Franco-Roman bureaucratic
127
+ marmotte -0.891 [────│───────────────] Alpine corporate dystopia
128
+ ```
129
+
130
+ ## Citation
131
+
132
+ Part of the dialogue_yoinker project for extracting situated dialogue from Bethesda games.
133
+
134
+ Training details: `notes/2025-12-28-btrm-multihead-training.md`
135
+
136
+ ## License
137
+
138
+ Code: MIT
139
+
140
+ Model weights inherit base model licenses:
141
+ - Qwen: Apache 2.0
142
+ - Gemma: [Gemma Terms of Use](https://ai.google.dev/gemma/terms)
compare.py ADDED
@@ -0,0 +1,276 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Pairwise BTRM comparison script.
4
+
5
+ Compare text against pairs of heads for intuitive interpretation.
6
+ Uses fully fine-tuned models (base LLM + heads trained together).
7
+
8
+ Usage:
9
+ python compare.py --text "Your text here"
10
+ python compare.py --text "Your text" --heads oblivion,skyrim
11
+ python compare.py --file input.txt --model gemma
12
+ echo "Some text" | python compare.py --stdin
13
+ """
14
+
15
+ import argparse
16
+ import sys
17
+ import torch
18
+ import yaml
19
+ from pathlib import Path
20
+ from typing import Optional
21
+
22
+ # Head descriptions for pretty printing
23
+ HEAD_INFO = {
24
+ "skyrim": "Nordic fantasy RPG",
25
+ "oblivion": "Imperial fantasy RPG",
26
+ "fonv": "Post-apocalyptic Western",
27
+ "gallia": "Franco-Roman bureaucratic",
28
+ "marmotte": "Alpine corporate dystopia",
29
+ "multiturn_dialogue": "Raw quoted dialogue",
30
+ "brainrot_aesop": "Vocabulary teaching",
31
+ }
32
+
33
+
34
+ class BTRMModel:
35
+ """Load and run fully fine-tuned BTRM model."""
36
+
37
+ def __init__(self, model_name: str = "gemma", device: Optional[str] = None):
38
+ """
39
+ Args:
40
+ model_name: "gemma" or "qwen"
41
+ device: "cuda", "cpu", or None for auto
42
+ """
43
+ self.device = device or ("cuda" if torch.cuda.is_available() else "cpu")
44
+ self.model_name = model_name
45
+
46
+ # Determine paths
47
+ script_dir = Path(__file__).parent
48
+ if model_name == "gemma":
49
+ model_dir = script_dir / "gemma_btrm"
50
+ elif model_name == "qwen":
51
+ model_dir = script_dir / "qwen_btrm"
52
+ else:
53
+ raise ValueError(f"Unknown model: {model_name}. Use 'gemma' or 'qwen'")
54
+
55
+ if not model_dir.exists():
56
+ raise FileNotFoundError(f"Model directory not found: {model_dir}")
57
+
58
+ # Load config
59
+ with open(model_dir / "config.yaml") as f:
60
+ self.config = yaml.safe_load(f)
61
+
62
+ # Load the FINE-TUNED base model (not the original!)
63
+ base_model_path = model_dir / "base_model"
64
+ print(f"Loading fine-tuned {model_name} from {base_model_path}...", file=sys.stderr)
65
+
66
+ from transformers import AutoModelForCausalLM, AutoTokenizer
67
+
68
+ self.tokenizer = AutoTokenizer.from_pretrained(
69
+ str(base_model_path), trust_remote_code=True
70
+ )
71
+ self.base_model = AutoModelForCausalLM.from_pretrained(
72
+ str(base_model_path),
73
+ trust_remote_code=True,
74
+ torch_dtype=torch.bfloat16,
75
+ ).to(self.device)
76
+ self.base_model.eval()
77
+
78
+ # Load heads
79
+ heads_path = model_dir / "btrm_heads.pt"
80
+ heads_data = torch.load(heads_path, map_location=self.device)
81
+ self.head_names = heads_data["head_names"]
82
+ self.heads = torch.nn.ModuleDict()
83
+
84
+ hidden_dim = self.base_model.config.hidden_size
85
+ for name in self.head_names:
86
+ head = torch.nn.Linear(hidden_dim, 1)
87
+ head.load_state_dict(heads_data["heads"][name])
88
+ self.heads[name] = head.to(self.device)
89
+
90
+ print(f"Loaded {len(self.head_names)} heads: {', '.join(self.head_names)}", file=sys.stderr)
91
+
92
+ def score(self, text: str) -> dict[str, float]:
93
+ """Get scores for all heads."""
94
+ # Tokenize
95
+ inputs = self.tokenizer(
96
+ text, return_tensors="pt", truncation=True, max_length=2048
97
+ ).to(self.device)
98
+
99
+ # Get hidden state from fine-tuned model
100
+ with torch.no_grad():
101
+ outputs = self.base_model(**inputs, output_hidden_states=True)
102
+ hidden = outputs.hidden_states[-1][:, -1, :] # Last token, last layer
103
+
104
+ # Score each head
105
+ scores = {}
106
+ for name, head in self.heads.items():
107
+ score = head(hidden).item()
108
+ scores[name] = score
109
+
110
+ return scores
111
+
112
+ def compare(self, text: str, head_a: str, head_b: str) -> dict:
113
+ """Compare two specific heads."""
114
+ if head_a not in self.head_names:
115
+ raise ValueError(f"Unknown head: {head_a}. Available: {self.head_names}")
116
+ if head_b not in self.head_names:
117
+ raise ValueError(f"Unknown head: {head_b}. Available: {self.head_names}")
118
+
119
+ scores = self.score(text)
120
+ return {
121
+ head_a: scores[head_a],
122
+ head_b: scores[head_b],
123
+ "delta": scores[head_a] - scores[head_b],
124
+ "winner": head_a if scores[head_a] > scores[head_b] else head_b,
125
+ }
126
+
127
+ def top_contrasts(self, text: str, n: int = 5) -> list[dict]:
128
+ """Find the pairs with largest score differences."""
129
+ scores = self.score(text)
130
+ pairs = []
131
+ names = list(scores.keys())
132
+ for i, a in enumerate(names):
133
+ for b in names[i+1:]:
134
+ pairs.append({
135
+ "head_a": a,
136
+ "head_b": b,
137
+ "score_a": scores[a],
138
+ "score_b": scores[b],
139
+ "delta": abs(scores[a] - scores[b]),
140
+ "higher": a if scores[a] > scores[b] else b,
141
+ })
142
+ return sorted(pairs, key=lambda x: x["delta"], reverse=True)[:n]
143
+
144
+
145
+ def format_scores(scores: dict[str, float]) -> str:
146
+ """Pretty-print scores with bar chart."""
147
+ lines = []
148
+ sorted_scores = sorted(scores.items(), key=lambda x: x[1], reverse=True)
149
+ max_name_len = max(len(name) for name in scores.keys())
150
+
151
+ for name, score in sorted_scores:
152
+ # Create bar (scale: -2 to +2 mapped to 0-20 chars)
153
+ bar_pos = int((score + 2) / 4 * 20)
154
+ bar_pos = max(0, min(20, bar_pos))
155
+ bar = "─" * bar_pos + "│" + "─" * (20 - bar_pos)
156
+
157
+ desc = HEAD_INFO.get(name, "")
158
+ lines.append(f" {name:<{max_name_len}} {score:+.3f} [{bar}] {desc}")
159
+
160
+ return "\n".join(lines)
161
+
162
+
163
+ def format_comparison(result: dict, head_a: str, head_b: str) -> str:
164
+ """Pretty-print a pairwise comparison."""
165
+ delta = result["delta"]
166
+ winner = result["winner"]
167
+
168
+ lines = [
169
+ f" {head_a}: {result[head_a]:+.3f} ({HEAD_INFO.get(head_a, '')})",
170
+ f" {head_b}: {result[head_b]:+.3f} ({HEAD_INFO.get(head_b, '')})",
171
+ f" ─────────────────────",
172
+ f" Δ = {delta:+.3f} → leans **{winner}**",
173
+ ]
174
+ return "\n".join(lines)
175
+
176
+
177
+ def main():
178
+ parser = argparse.ArgumentParser(
179
+ description="Compare text using multi-head BTRM (fully fine-tuned models)",
180
+ formatter_class=argparse.RawDescriptionHelpFormatter,
181
+ epilog="""
182
+ Examples:
183
+ # Score all heads
184
+ python compare.py --text "The Dragonborn climbed to High Hrothgar..."
185
+
186
+ # Compare specific pair
187
+ python compare.py --text "Breaking news today" --heads skyrim,fonv
188
+
189
+ # Use Qwen model instead of Gemma
190
+ python compare.py --text "Your text" --model qwen
191
+
192
+ # Read from file
193
+ python compare.py --file story.txt
194
+
195
+ # Pipe from stdin
196
+ echo "Your text" | python compare.py --stdin
197
+
198
+ Available heads:
199
+ skyrim, oblivion, fonv, gallia, marmotte, multiturn_dialogue, brainrot_aesop
200
+ """
201
+ )
202
+ parser.add_argument("--text", "-t", help="Text to analyze")
203
+ parser.add_argument("--file", "-f", help="Read text from file")
204
+ parser.add_argument("--stdin", action="store_true", help="Read from stdin")
205
+ parser.add_argument("--model", "-m", default="gemma", choices=["gemma", "qwen"],
206
+ help="Model to use (default: gemma, recommended for better discrimination)")
207
+ parser.add_argument("--heads", help="Comma-separated pair of heads to compare (e.g., oblivion,skyrim)")
208
+ parser.add_argument("--contrasts", type=int, metavar="N",
209
+ help="Show top N pairwise contrasts")
210
+
211
+ args = parser.parse_args()
212
+
213
+ # Get input text
214
+ if args.text:
215
+ text = args.text
216
+ elif args.file:
217
+ text = Path(args.file).read_text()
218
+ elif args.stdin or not sys.stdin.isatty():
219
+ text = sys.stdin.read()
220
+ else:
221
+ parser.print_help()
222
+ print("\n\nError: Provide text via --text, --file, or --stdin", file=sys.stderr)
223
+ sys.exit(1)
224
+
225
+ text = text.strip()
226
+ if not text:
227
+ print("Error: Empty input", file=sys.stderr)
228
+ sys.exit(1)
229
+
230
+ # Load model
231
+ try:
232
+ model = BTRMModel(args.model)
233
+ except FileNotFoundError as e:
234
+ print(f"Error: {e}", file=sys.stderr)
235
+ print("Make sure you're running from the repo directory with model folders.", file=sys.stderr)
236
+ sys.exit(1)
237
+
238
+ print(f"\n{'='*60}", file=sys.stderr)
239
+ print(f"Text: {text[:70]}{'...' if len(text) > 70 else ''}", file=sys.stderr)
240
+ print(f"{'='*60}\n", file=sys.stderr)
241
+
242
+ if args.heads:
243
+ # Pairwise comparison
244
+ parts = args.heads.split(",")
245
+ if len(parts) != 2:
246
+ print("Error: --heads requires exactly 2 comma-separated head names", file=sys.stderr)
247
+ print(f"Available: {', '.join(HEAD_INFO.keys())}", file=sys.stderr)
248
+ sys.exit(1)
249
+ head_a, head_b = parts[0].strip(), parts[1].strip()
250
+
251
+ try:
252
+ result = model.compare(text, head_a, head_b)
253
+ except ValueError as e:
254
+ print(f"Error: {e}", file=sys.stderr)
255
+ sys.exit(1)
256
+
257
+ print("Pairwise Comparison:")
258
+ print(format_comparison(result, head_a, head_b))
259
+
260
+ elif args.contrasts:
261
+ # Show top contrasts
262
+ contrasts = model.top_contrasts(text, args.contrasts)
263
+ print(f"Top {len(contrasts)} Pairwise Contrasts:")
264
+ for c in contrasts:
265
+ print(f" {c['head_a']} vs {c['head_b']}: Δ={c['delta']:.3f} (higher: {c['higher']})")
266
+
267
+ else:
268
+ # Show all scores (default)
269
+ scores = model.score(text)
270
+ print("All Head Scores:")
271
+ print(format_scores(scores))
272
+ print(f"\nTip: Use --heads {list(scores.keys())[0]},{list(scores.keys())[1]} for pairwise comparison")
273
+
274
+
275
+ if __name__ == "__main__":
276
+ main()
gemma_btrm/base_model/chat_template.jinja ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {{ bos_token }}
2
+ {%- if messages[0]['role'] == 'system' -%}
3
+ {%- if messages[0]['content'] is string -%}
4
+ {%- set first_user_prefix = messages[0]['content'] + '
5
+
6
+ ' -%}
7
+ {%- else -%}
8
+ {%- set first_user_prefix = messages[0]['content'][0]['text'] + '
9
+
10
+ ' -%}
11
+ {%- endif -%}
12
+ {%- set loop_messages = messages[1:] -%}
13
+ {%- else -%}
14
+ {%- set first_user_prefix = "" -%}
15
+ {%- set loop_messages = messages -%}
16
+ {%- endif -%}
17
+ {%- for message in loop_messages -%}
18
+ {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
19
+ {{ raise_exception("Conversation roles must alternate user/assistant/user/assistant/...") }}
20
+ {%- endif -%}
21
+ {%- if (message['role'] == 'assistant') -%}
22
+ {%- set role = "model" -%}
23
+ {%- else -%}
24
+ {%- set role = message['role'] -%}
25
+ {%- endif -%}
26
+ {{ '<start_of_turn>' + role + '
27
+ ' + (first_user_prefix if loop.first else "") }}
28
+ {%- if message['content'] is string -%}
29
+ {{ message['content'] | trim }}
30
+ {%- elif message['content'] is iterable -%}
31
+ {%- for item in message['content'] -%}
32
+ {%- if item['type'] == 'image' -%}
33
+ {{ '<start_of_image>' }}
34
+ {%- elif item['type'] == 'text' -%}
35
+ {{ item['text'] | trim }}
36
+ {%- endif -%}
37
+ {%- endfor -%}
38
+ {%- else -%}
39
+ {{ raise_exception("Invalid content type") }}
40
+ {%- endif -%}
41
+ {{ '<end_of_turn>
42
+ ' }}
43
+ {%- endfor -%}
44
+ {%- if add_generation_prompt -%}
45
+ {{'<start_of_turn>model
46
+ '}}
47
+ {%- endif -%}
gemma_btrm/base_model/config.json ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_sliding_window_pattern": 6,
3
+ "architectures": [
4
+ "Gemma3ForCausalLM"
5
+ ],
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.0,
8
+ "attn_logit_softcapping": null,
9
+ "bos_token_id": 2,
10
+ "dtype": "bfloat16",
11
+ "eos_token_id": 1,
12
+ "final_logit_softcapping": null,
13
+ "head_dim": 256,
14
+ "hidden_activation": "gelu_pytorch_tanh",
15
+ "hidden_size": 640,
16
+ "initializer_range": 0.02,
17
+ "intermediate_size": 2048,
18
+ "layer_types": [
19
+ "sliding_attention",
20
+ "sliding_attention",
21
+ "sliding_attention",
22
+ "sliding_attention",
23
+ "sliding_attention",
24
+ "full_attention",
25
+ "sliding_attention",
26
+ "sliding_attention",
27
+ "sliding_attention",
28
+ "sliding_attention",
29
+ "sliding_attention",
30
+ "full_attention",
31
+ "sliding_attention",
32
+ "sliding_attention",
33
+ "sliding_attention",
34
+ "sliding_attention",
35
+ "sliding_attention",
36
+ "full_attention"
37
+ ],
38
+ "max_position_embeddings": 32768,
39
+ "model_type": "gemma3_text",
40
+ "num_attention_heads": 4,
41
+ "num_hidden_layers": 18,
42
+ "num_key_value_heads": 1,
43
+ "pad_token_id": 0,
44
+ "query_pre_attn_scalar": 256,
45
+ "rms_norm_eps": 1e-06,
46
+ "rope_local_base_freq": 10000.0,
47
+ "rope_scaling": null,
48
+ "rope_theta": 1000000.0,
49
+ "sliding_window": 512,
50
+ "transformers_version": "4.57.3",
51
+ "use_bidirectional_attention": false,
52
+ "use_cache": true,
53
+ "vocab_size": 262144
54
+ }
gemma_btrm/base_model/generation_config.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cache_implementation": "hybrid",
3
+ "do_sample": true,
4
+ "eos_token_id": [
5
+ 1,
6
+ 106
7
+ ],
8
+ "top_k": 64,
9
+ "top_p": 0.95,
10
+ "transformers_version": "4.57.3"
11
+ }
gemma_btrm/base_model/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bc16a9d4e5723cf004fd72baa78c2544bbda0bb37af27149513edb5ae01c5c98
3
+ size 536223056
gemma_btrm/base_model/special_tokens_map.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "boi_token": "<start_of_image>",
3
+ "bos_token": {
4
+ "content": "<bos>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false
9
+ },
10
+ "eoi_token": "<end_of_image>",
11
+ "eos_token": {
12
+ "content": "<eos>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false
17
+ },
18
+ "image_token": "<image_soft_token>",
19
+ "pad_token": {
20
+ "content": "<pad>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false
25
+ },
26
+ "unk_token": {
27
+ "content": "<unk>",
28
+ "lstrip": false,
29
+ "normalized": false,
30
+ "rstrip": false,
31
+ "single_word": false
32
+ }
33
+ }
gemma_btrm/base_model/tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7ddf8d949394a54aa836de565a77ee97e4e800252b8ab5c3f85eb6bc445354f7
3
+ size 33384821
gemma_btrm/base_model/tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff
 
gemma_btrm/btrm_heads.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2a8893797abac789a7075f386eb551d56b9225b6bad52dba7df2fc8c29cdf9e3
3
+ size 22653
gemma_btrm/config.yaml ADDED
@@ -0,0 +1,152 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ amp_dtype: bfloat16
2
+ api_buffer_size: 200
3
+ api_games:
4
+ - oblivion
5
+ - falloutnv
6
+ - skyrim
7
+ api_url: http://127.0.0.1:8000
8
+ api_walks_per_batch: 2
9
+ base_model: google/gemma-3-270m-it
10
+ batch_size: 4
11
+ epochs: 10
12
+ gradient_checkpointing: true
13
+ heads:
14
+ - description: All prose from Skyrim - Nordic fantasy RPG
15
+ name: skyrim
16
+ positive_sources:
17
+ - path: dialogue_data/prose/skyrim_training_fk.jsonl
18
+ text_field: auto
19
+ tier_filter: fk_normed
20
+ - path: dialogue_data/prose/skyrim_training_fk.jsonl
21
+ text_field: auto
22
+ tier_filter: flattened
23
+ - path: dialogue_data/prose/skyrim_training_aesops.jsonl
24
+ text_field: auto
25
+ tier_filter: brainrot_aesop
26
+ - description: All prose from Oblivion - Imperial fantasy RPG
27
+ name: oblivion
28
+ positive_sources:
29
+ - path: dialogue_data/prose/oblivion_training_fk.jsonl
30
+ text_field: auto
31
+ tier_filter: fk_normed
32
+ - path: dialogue_data/prose/oblivion_training_fk.jsonl
33
+ text_field: auto
34
+ tier_filter: flattened
35
+ - path: dialogue_data/prose/oblivion_training_aesops.jsonl
36
+ text_field: auto
37
+ tier_filter: brainrot_aesop
38
+ - description: All prose from Fallout NV - Post-apocalyptic Western RPG
39
+ name: fonv
40
+ positive_sources:
41
+ - path: dialogue_data/prose/falloutnv_training_fk.jsonl
42
+ text_field: auto
43
+ tier_filter: fk_normed
44
+ - path: dialogue_data/prose/falloutnv_training_fk.jsonl
45
+ text_field: auto
46
+ tier_filter: flattened
47
+ - path: dialogue_data/prose/falloutnv_training_aesops.jsonl
48
+ text_field: auto
49
+ tier_filter: brainrot_aesop
50
+ - description: Synthetic Gallia setting - Franco-Roman bureaucratic fantasy
51
+ name: gallia
52
+ positive_sources:
53
+ - path: output/gallia_v9_training_fk.jsonl
54
+ text_field: auto
55
+ tier_filter: fk_normed
56
+ - path: output/gallia_v9_training_fk.jsonl
57
+ text_field: auto
58
+ tier_filter: flattened
59
+ - path: output/gallia_v9_training_aesops.jsonl
60
+ text_field: auto
61
+ tier_filter: brainrot_aesop
62
+ - description: Synthetic Marmotte setting - Alpine corporate dystopia
63
+ name: marmotte
64
+ positive_sources:
65
+ - path: output/marmotte_v6_training_fk.jsonl
66
+ text_field: auto
67
+ tier_filter: fk_normed
68
+ - path: output/marmotte_v6_training_fk.jsonl
69
+ text_field: auto
70
+ tier_filter: flattened
71
+ - path: output/marmotte_v6_training_aesops.jsonl
72
+ text_field: auto
73
+ tier_filter: brainrot_aesop
74
+ - description: Raw multi-turn dialogue walks (quoted, not prose)
75
+ name: multiturn_dialogue
76
+ negative_sources:
77
+ - neg_tier: soft_neg
78
+ path: dialogue_data/prose/skyrim_training_fk.jsonl
79
+ text_field: auto
80
+ tier_filter: fk_normed
81
+ - neg_tier: soft_neg
82
+ path: dialogue_data/prose/oblivion_training_fk.jsonl
83
+ text_field: auto
84
+ tier_filter: fk_normed
85
+ - neg_tier: soft_neg
86
+ path: dialogue_data/prose/falloutnv_training_fk.jsonl
87
+ text_field: auto
88
+ tier_filter: fk_normed
89
+ - neg_tier: soft_neg
90
+ path: dialogue_data/prose/skyrim_training_aesops.jsonl
91
+ text_field: auto
92
+ tier_filter: brainrot_aesop
93
+ - neg_tier: soft_neg
94
+ path: dialogue_data/prose/oblivion_training_aesops.jsonl
95
+ text_field: auto
96
+ tier_filter: brainrot_aesop
97
+ - neg_tier: soft_neg
98
+ path: dialogue_data/prose/falloutnv_training_aesops.jsonl
99
+ text_field: auto
100
+ tier_filter: brainrot_aesop
101
+ positive_sources:
102
+ - path: dialogue_data/prose/skyrim_training_fk.jsonl
103
+ text_field: auto
104
+ tier_filter: flattened
105
+ - path: dialogue_data/prose/oblivion_training_fk.jsonl
106
+ text_field: auto
107
+ tier_filter: flattened
108
+ - path: dialogue_data/prose/falloutnv_training_fk.jsonl
109
+ text_field: auto
110
+ tier_filter: flattened
111
+ - path: output/gallia_v9_training_fk.jsonl
112
+ text_field: auto
113
+ tier_filter: flattened
114
+ - path: output/marmotte_v6_training_fk.jsonl
115
+ text_field: auto
116
+ tier_filter: flattened
117
+ - description: Vocabulary teaching passages with embedded definitions
118
+ name: brainrot_aesop
119
+ positive_sources:
120
+ - path: dialogue_data/prose/skyrim_training_aesops.jsonl
121
+ text_field: auto
122
+ tier_filter: brainrot_aesop
123
+ - path: dialogue_data/prose/oblivion_training_aesops.jsonl
124
+ text_field: auto
125
+ tier_filter: brainrot_aesop
126
+ - path: dialogue_data/prose/falloutnv_training_aesops.jsonl
127
+ text_field: auto
128
+ tier_filter: brainrot_aesop
129
+ - path: output/gallia_v9_training_aesops.jsonl
130
+ text_field: auto
131
+ tier_filter: brainrot_aesop
132
+ - path: output/marmotte_v6_training_aesops.jsonl
133
+ text_field: auto
134
+ tier_filter: brainrot_aesop
135
+ logit_cap: 10.0
136
+ logsquare_weight: 0.1
137
+ lora_alpha: 32
138
+ lora_r: 16
139
+ lr: 5.0e-05
140
+ max_batches: 2500
141
+ max_length: 2048
142
+ neg_samples_per_tier: 300
143
+ soft_neg_paths: []
144
+ use_amp: true
145
+ use_api_walks: true
146
+ use_fineweb: true
147
+ use_lora: false
148
+ use_meta_prompt: true
149
+ use_synth: true
150
+ use_wattpad: true
151
+ use_wikitext: true
152
+ warmup_steps: 200
qwen_btrm/base_model/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/Qwen2.5-0.5B
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:Qwen/Qwen2.5-0.5B
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.18.0
qwen_btrm/base_model/adapter_config.json ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "Qwen/Qwen2.5-0.5B",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 32,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.05,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": null,
25
+ "peft_type": "LORA",
26
+ "peft_version": "0.18.0",
27
+ "qalora_group_size": 16,
28
+ "r": 16,
29
+ "rank_pattern": {},
30
+ "revision": null,
31
+ "target_modules": [
32
+ "v_proj",
33
+ "q_proj"
34
+ ],
35
+ "target_parameters": null,
36
+ "task_type": "CAUSAL_LM",
37
+ "trainable_token_indices": null,
38
+ "use_dora": false,
39
+ "use_qalora": false,
40
+ "use_rslora": false
41
+ }
qwen_btrm/base_model/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2c2ba66c9ab2bcfaffd81ff35499e79f1a647d1fe7d33d2847570412f50c7c74
3
+ size 4338000
qwen_btrm/base_model/added_tokens.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</tool_call>": 151658,
3
+ "<tool_call>": 151657,
4
+ "<|box_end|>": 151649,
5
+ "<|box_start|>": 151648,
6
+ "<|endoftext|>": 151643,
7
+ "<|file_sep|>": 151664,
8
+ "<|fim_middle|>": 151660,
9
+ "<|fim_pad|>": 151662,
10
+ "<|fim_prefix|>": 151659,
11
+ "<|fim_suffix|>": 151661,
12
+ "<|im_end|>": 151645,
13
+ "<|im_start|>": 151644,
14
+ "<|image_pad|>": 151655,
15
+ "<|object_ref_end|>": 151647,
16
+ "<|object_ref_start|>": 151646,
17
+ "<|quad_end|>": 151651,
18
+ "<|quad_start|>": 151650,
19
+ "<|repo_name|>": 151663,
20
+ "<|video_pad|>": 151656,
21
+ "<|vision_end|>": 151653,
22
+ "<|vision_pad|>": 151654,
23
+ "<|vision_start|>": 151652
24
+ }
qwen_btrm/base_model/chat_template.jinja ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if tools %}
2
+ {{- '<|im_start|>system\n' }}
3
+ {%- if messages[0]['role'] == 'system' %}
4
+ {{- messages[0]['content'] }}
5
+ {%- else %}
6
+ {{- 'You are a helpful assistant.' }}
7
+ {%- endif %}
8
+ {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
9
+ {%- for tool in tools %}
10
+ {{- "\n" }}
11
+ {{- tool | tojson }}
12
+ {%- endfor %}
13
+ {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
14
+ {%- else %}
15
+ {%- if messages[0]['role'] == 'system' %}
16
+ {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
17
+ {%- else %}
18
+ {{- '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n' }}
19
+ {%- endif %}
20
+ {%- endif %}
21
+ {%- for message in messages %}
22
+ {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
23
+ {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
24
+ {%- elif message.role == "assistant" %}
25
+ {{- '<|im_start|>' + message.role }}
26
+ {%- if message.content %}
27
+ {{- '\n' + message.content }}
28
+ {%- endif %}
29
+ {%- for tool_call in message.tool_calls %}
30
+ {%- if tool_call.function is defined %}
31
+ {%- set tool_call = tool_call.function %}
32
+ {%- endif %}
33
+ {{- '\n<tool_call>\n{"name": "' }}
34
+ {{- tool_call.name }}
35
+ {{- '", "arguments": ' }}
36
+ {{- tool_call.arguments | tojson }}
37
+ {{- '}\n</tool_call>' }}
38
+ {%- endfor %}
39
+ {{- '<|im_end|>\n' }}
40
+ {%- elif message.role == "tool" %}
41
+ {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
42
+ {{- '<|im_start|>user' }}
43
+ {%- endif %}
44
+ {{- '\n<tool_response>\n' }}
45
+ {{- message.content }}
46
+ {{- '\n</tool_response>' }}
47
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
48
+ {{- '<|im_end|>\n' }}
49
+ {%- endif %}
50
+ {%- endif %}
51
+ {%- endfor %}
52
+ {%- if add_generation_prompt %}
53
+ {{- '<|im_start|>assistant\n' }}
54
+ {%- endif %}
qwen_btrm/base_model/config.json ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Qwen2ForCausalLM"
4
+ ],
5
+ "attention_dropout": 0.0,
6
+ "bos_token_id": 151643,
7
+ "dtype": "bfloat16",
8
+ "eos_token_id": 151643,
9
+ "hidden_act": "silu",
10
+ "hidden_size": 896,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 4864,
13
+ "layer_types": [
14
+ "full_attention",
15
+ "full_attention",
16
+ "full_attention",
17
+ "full_attention",
18
+ "full_attention",
19
+ "full_attention",
20
+ "full_attention",
21
+ "full_attention",
22
+ "full_attention",
23
+ "full_attention",
24
+ "full_attention",
25
+ "full_attention",
26
+ "full_attention",
27
+ "full_attention",
28
+ "full_attention",
29
+ "full_attention",
30
+ "full_attention",
31
+ "full_attention",
32
+ "full_attention",
33
+ "full_attention",
34
+ "full_attention",
35
+ "full_attention",
36
+ "full_attention",
37
+ "full_attention"
38
+ ],
39
+ "max_position_embeddings": 32768,
40
+ "max_window_layers": 24,
41
+ "model_type": "qwen2",
42
+ "num_attention_heads": 14,
43
+ "num_hidden_layers": 24,
44
+ "num_key_value_heads": 2,
45
+ "rms_norm_eps": 1e-06,
46
+ "rope_scaling": null,
47
+ "rope_theta": 1000000.0,
48
+ "sliding_window": null,
49
+ "tie_word_embeddings": true,
50
+ "transformers_version": "4.57.3",
51
+ "use_cache": true,
52
+ "use_mrope": false,
53
+ "use_sliding_window": false,
54
+ "vocab_size": 151936
55
+ }
qwen_btrm/base_model/generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 151643,
3
+ "eos_token_id": 151643,
4
+ "max_new_tokens": 2048,
5
+ "transformers_version": "4.57.3"
6
+ }
qwen_btrm/base_model/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
qwen_btrm/base_model/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:317dc4b86f0bbab91360ac95d5b2c463ecb63c08119558afafd50a875bf00fc1
3
+ size 988097824
qwen_btrm/base_model/special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|endoftext|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": {
25
+ "content": "<|endoftext|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
qwen_btrm/base_model/tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e04081d680d5bb294b2e57aea5b3aa1256d9e06263e907917fc241c5adc2fbe4
3
+ size 11422163
qwen_btrm/base_model/tokenizer_config.json ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ }
181
+ },
182
+ "additional_special_tokens": [
183
+ "<|im_start|>",
184
+ "<|im_end|>",
185
+ "<|object_ref_start|>",
186
+ "<|object_ref_end|>",
187
+ "<|box_start|>",
188
+ "<|box_end|>",
189
+ "<|quad_start|>",
190
+ "<|quad_end|>",
191
+ "<|vision_start|>",
192
+ "<|vision_end|>",
193
+ "<|vision_pad|>",
194
+ "<|image_pad|>",
195
+ "<|video_pad|>"
196
+ ],
197
+ "bos_token": null,
198
+ "clean_up_tokenization_spaces": false,
199
+ "eos_token": "<|endoftext|>",
200
+ "errors": "replace",
201
+ "extra_special_tokens": {},
202
+ "model_max_length": 131072,
203
+ "pad_token": "<|endoftext|>",
204
+ "split_special_tokens": false,
205
+ "tokenizer_class": "Qwen2Tokenizer",
206
+ "unk_token": null
207
+ }
qwen_btrm/base_model/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
qwen_btrm/btrm_heads.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:746b809989c2607c267fc9e96a70be343e0a910a5bb99a322a37af851b902b10
3
+ size 30845
qwen_btrm/config.yaml ADDED
@@ -0,0 +1,152 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ amp_dtype: bfloat16
2
+ api_buffer_size: 200
3
+ api_games:
4
+ - oblivion
5
+ - falloutnv
6
+ - skyrim
7
+ api_url: http://127.0.0.1:8000
8
+ api_walks_per_batch: 2
9
+ base_model: Qwen/Qwen2.5-0.5B
10
+ batch_size: 2
11
+ epochs: 10
12
+ gradient_checkpointing: true
13
+ heads:
14
+ - description: All prose from Skyrim - Nordic fantasy RPG
15
+ name: skyrim
16
+ positive_sources:
17
+ - path: dialogue_data/prose/skyrim_training_fk.jsonl
18
+ text_field: auto
19
+ tier_filter: fk_normed
20
+ - path: dialogue_data/prose/skyrim_training_fk.jsonl
21
+ text_field: auto
22
+ tier_filter: flattened
23
+ - path: dialogue_data/prose/skyrim_training_aesops.jsonl
24
+ text_field: auto
25
+ tier_filter: brainrot_aesop
26
+ - description: All prose from Oblivion - Imperial fantasy RPG
27
+ name: oblivion
28
+ positive_sources:
29
+ - path: dialogue_data/prose/oblivion_training_fk.jsonl
30
+ text_field: auto
31
+ tier_filter: fk_normed
32
+ - path: dialogue_data/prose/oblivion_training_fk.jsonl
33
+ text_field: auto
34
+ tier_filter: flattened
35
+ - path: dialogue_data/prose/oblivion_training_aesops.jsonl
36
+ text_field: auto
37
+ tier_filter: brainrot_aesop
38
+ - description: All prose from Fallout NV - Post-apocalyptic Western RPG
39
+ name: fonv
40
+ positive_sources:
41
+ - path: dialogue_data/prose/falloutnv_training_fk.jsonl
42
+ text_field: auto
43
+ tier_filter: fk_normed
44
+ - path: dialogue_data/prose/falloutnv_training_fk.jsonl
45
+ text_field: auto
46
+ tier_filter: flattened
47
+ - path: dialogue_data/prose/falloutnv_training_aesops.jsonl
48
+ text_field: auto
49
+ tier_filter: brainrot_aesop
50
+ - description: Synthetic Gallia setting - Franco-Roman bureaucratic fantasy
51
+ name: gallia
52
+ positive_sources:
53
+ - path: output/gallia_v9_training_fk.jsonl
54
+ text_field: auto
55
+ tier_filter: fk_normed
56
+ - path: output/gallia_v9_training_fk.jsonl
57
+ text_field: auto
58
+ tier_filter: flattened
59
+ - path: output/gallia_v9_training_aesops.jsonl
60
+ text_field: auto
61
+ tier_filter: brainrot_aesop
62
+ - description: Synthetic Marmotte setting - Alpine corporate dystopia
63
+ name: marmotte
64
+ positive_sources:
65
+ - path: output/marmotte_v6_training_fk.jsonl
66
+ text_field: auto
67
+ tier_filter: fk_normed
68
+ - path: output/marmotte_v6_training_fk.jsonl
69
+ text_field: auto
70
+ tier_filter: flattened
71
+ - path: output/marmotte_v6_training_aesops.jsonl
72
+ text_field: auto
73
+ tier_filter: brainrot_aesop
74
+ - description: Raw multi-turn dialogue walks (quoted, not prose)
75
+ name: multiturn_dialogue
76
+ negative_sources:
77
+ - neg_tier: soft_neg
78
+ path: dialogue_data/prose/skyrim_training_fk.jsonl
79
+ text_field: auto
80
+ tier_filter: fk_normed
81
+ - neg_tier: soft_neg
82
+ path: dialogue_data/prose/oblivion_training_fk.jsonl
83
+ text_field: auto
84
+ tier_filter: fk_normed
85
+ - neg_tier: soft_neg
86
+ path: dialogue_data/prose/falloutnv_training_fk.jsonl
87
+ text_field: auto
88
+ tier_filter: fk_normed
89
+ - neg_tier: soft_neg
90
+ path: dialogue_data/prose/skyrim_training_aesops.jsonl
91
+ text_field: auto
92
+ tier_filter: brainrot_aesop
93
+ - neg_tier: soft_neg
94
+ path: dialogue_data/prose/oblivion_training_aesops.jsonl
95
+ text_field: auto
96
+ tier_filter: brainrot_aesop
97
+ - neg_tier: soft_neg
98
+ path: dialogue_data/prose/falloutnv_training_aesops.jsonl
99
+ text_field: auto
100
+ tier_filter: brainrot_aesop
101
+ positive_sources:
102
+ - path: dialogue_data/prose/skyrim_training_fk.jsonl
103
+ text_field: auto
104
+ tier_filter: flattened
105
+ - path: dialogue_data/prose/oblivion_training_fk.jsonl
106
+ text_field: auto
107
+ tier_filter: flattened
108
+ - path: dialogue_data/prose/falloutnv_training_fk.jsonl
109
+ text_field: auto
110
+ tier_filter: flattened
111
+ - path: output/gallia_v9_training_fk.jsonl
112
+ text_field: auto
113
+ tier_filter: flattened
114
+ - path: output/marmotte_v6_training_fk.jsonl
115
+ text_field: auto
116
+ tier_filter: flattened
117
+ - description: Vocabulary teaching passages with embedded definitions
118
+ name: brainrot_aesop
119
+ positive_sources:
120
+ - path: dialogue_data/prose/skyrim_training_aesops.jsonl
121
+ text_field: auto
122
+ tier_filter: brainrot_aesop
123
+ - path: dialogue_data/prose/oblivion_training_aesops.jsonl
124
+ text_field: auto
125
+ tier_filter: brainrot_aesop
126
+ - path: dialogue_data/prose/falloutnv_training_aesops.jsonl
127
+ text_field: auto
128
+ tier_filter: brainrot_aesop
129
+ - path: output/gallia_v9_training_aesops.jsonl
130
+ text_field: auto
131
+ tier_filter: brainrot_aesop
132
+ - path: output/marmotte_v6_training_aesops.jsonl
133
+ text_field: auto
134
+ tier_filter: brainrot_aesop
135
+ logit_cap: 10.0
136
+ logsquare_weight: 0.1
137
+ lora_alpha: 32
138
+ lora_r: 16
139
+ lr: 5.0e-05
140
+ max_batches: 2500
141
+ max_length: 2048
142
+ neg_samples_per_tier: 300
143
+ soft_neg_paths: []
144
+ use_amp: true
145
+ use_api_walks: true
146
+ use_fineweb: true
147
+ use_lora: false
148
+ use_meta_prompt: true
149
+ use_synth: true
150
+ use_wattpad: true
151
+ use_wikitext: true
152
+ warmup_steps: 200