Update weights: W4 attention surgery (stepped 0.1/0.3/0.6/0.8) - 27/30 instruction following
Browse files- README.md +79 -342
- config.json +5 -3
- generation_config.json +1 -1
- merge_config.json +30 -0
- model.safetensors +1 -1
- tokenizer.json +26 -1
- tokenizer_config.json +10 -170
README.md
CHANGED
|
@@ -1,375 +1,112 @@
|
|
| 1 |
---
|
| 2 |
-
library_name: transformers
|
| 3 |
-
license_link: https://huggingface.co/Qwen/Qwen3-1.7B/blob/main/LICENSE
|
| 4 |
-
pipeline_tag: text-generation
|
| 5 |
-
extra_gated_prompt: >
|
| 6 |
-
### FAUST-1 NON-COMMERCIAL LICENSE AGREEMENT
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
Version 1.0 — January 2025
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
"Faust-1" refers to the language model weights, code, and documentation made
|
| 13 |
-
available by Tabularis AI GmbH ("Tabularis") under this agreement.
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
1. License Grant
|
| 17 |
-
|
| 18 |
-
You are granted a non-exclusive, non-transferable, royalty-free license to
|
| 19 |
-
use, copy, and modify Faust-1 for non-commercial research and personal
|
| 20 |
-
purposes only.
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
2. Non-Commercial Use
|
| 24 |
-
|
| 25 |
-
"Non-commercial" means academic research, personal projects, and educational
|
| 26 |
-
use. Any use intended to generate revenue, provide commercial services, or
|
| 27 |
-
benefit a for-profit entity requires a separate commercial license.
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
3. Commercial Licensing
|
| 31 |
-
|
| 32 |
-
For commercial use, please contact: info@tabularis.ai
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
4. Attribution
|
| 36 |
-
|
| 37 |
-
You must include "Built with Faust-1 by Tabularis AI" in any derivative work
|
| 38 |
-
or publication.
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
5. No Warranty
|
| 42 |
-
|
| 43 |
-
Faust-1 is provided "as is" without warranties of any kind.
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
6. Termination
|
| 47 |
-
|
| 48 |
-
This license terminates automatically if you violate any terms.
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
---
|
| 52 |
-
|
| 53 |
-
### Additional Access Requirement
|
| 54 |
-
|
| 55 |
-
Access to this repository is approval-based.
|
| 56 |
-
|
| 57 |
-
You must join our Discord server: https://discord.gg/7WqEKw652R
|
| 58 |
-
extra_gated_fields:
|
| 59 |
-
Name: text
|
| 60 |
-
Email: text
|
| 61 |
-
Affiliation: text
|
| 62 |
-
I have joined the Tabularis AI Discord server: checkbox
|
| 63 |
-
I accept the Faust-1 Non-Commercial License Agreement: checkbox
|
| 64 |
-
extra_gated_description: |
|
| 65 |
-
Faust-1 is for non-commercial use only.
|
| 66 |
-
For commercial licensing contact info@tabularis.ai
|
| 67 |
-
|
| 68 |
-
Approval requires Discord membership.
|
| 69 |
-
Join: https://discord.gg/7WqEKw652R
|
| 70 |
-
extra_gated_button_content: Submit
|
| 71 |
language:
|
| 72 |
-
- de
|
| 73 |
-
- en
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 74 |
tags:
|
| 75 |
-
-
|
| 76 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 77 |
---
|
| 78 |
|
|
|
|
| 79 |
|
| 80 |
-
|
| 81 |
-
<img
|
| 82 |
-
alt="Faust-1 Demo"
|
| 83 |
-
src="https://img.shields.io/badge/%E2%9C%A8%20Faust--1%20Demo-2b2b2b?style=flat&logo=ai&logoColor=white"
|
| 84 |
-
style="display: inline-block; vertical-align: middle;"
|
| 85 |
-
/>
|
| 86 |
-
</a> -->
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
<p align="center">
|
| 90 |
-
<img src="./logo-faust.webp" alt="Faust-1 Logo" width="220">
|
| 91 |
-
</p>
|
| 92 |
|
| 93 |
-
#
|
| 94 |
|
| 95 |
-
|
| 96 |
|
| 97 |
-
|
| 98 |
|
| 99 |
-
|
| 100 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 101 |
|
| 102 |
-
|
| 103 |
-
> **Designed for local and cost-efficient deployment.**
|
| 104 |
-
> Faust-1 is deliberately sized and optimized to run on **consumer-grade hardware** and **does not require expensive data-center GPUs**.
|
| 105 |
-
>
|
| 106 |
-
> **Typical deployment examples:**
|
| 107 |
-
> - **Laptop / Desktop (CPU or small GPU):**
|
| 108 |
-
> Runs on modern CPUs or entry-level GPUs (e.g. Apple Silicon, RTX 3060/4060, RX 6600) using optimized runtimes such as GGUF, MLX, or ONNX.
|
| 109 |
-
> - **Single-GPU workstation:**
|
| 110 |
-
> Efficiently serves interactive workloads on a single consumer GPU with low VRAM requirements compared to larger multilingual models.
|
| 111 |
-
> - **On-device / privacy-sensitive setups:**
|
| 112 |
-
> Suitable for local assistants, offline document analysis, and private RAG pipelines where data must not leave the machine.
|
| 113 |
-
>
|
| 114 |
-
> This makes Faust-1 practical for **researchers, developers, and small teams** who want strong German language performance without cloud dependency or high inference costs.
|
| 115 |
-
---
|
| 116 |
-
|
| 117 |
-
## Model summary
|
| 118 |
|
| 119 |
-
|
| 120 |
-
- Model type: decoder-only causal language model MoE
|
| 121 |
-
- Parameters: 1.6B
|
| 122 |
-
- Interface: conversational / instruction (chat template provided)
|
| 123 |
-
- Primary language: German (~90%)
|
| 124 |
-
- Custom State-of-the-Art tokenizer for German language
|
| 125 |
|
| 126 |
-
-
|
| 127 |
|
| 128 |
-
|
|
|
|
|
|
|
|
|
|
| 129 |
|
| 130 |
-
###
|
| 131 |
|
| 132 |
-
|
| 133 |
-
|
| 134 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 135 |
|
| 136 |
-
|
|
|
|
|
|
|
|
|
|
| 137 |
|
| 138 |
-
|
| 139 |
-
|
| 140 |
-
|
| 141 |
-
|
| 142 |
-
device_map="auto",
|
| 143 |
-
)
|
| 144 |
|
| 145 |
-
|
| 146 |
-
{"role": "user", "content": "Gib mir eine kurze Einführung in große Sprachmodelle (LLM)."}
|
| 147 |
-
]
|
| 148 |
-
|
| 149 |
-
inputs = tokenizer.apply_chat_template(
|
| 150 |
-
messages,
|
| 151 |
-
add_generation_prompt=True,
|
| 152 |
-
return_tensors="pt",
|
| 153 |
-
).to(model.device)
|
| 154 |
-
|
| 155 |
-
outputs = model.generate(
|
| 156 |
-
inputs,
|
| 157 |
-
max_new_tokens=256,
|
| 158 |
-
temperature=0.6,
|
| 159 |
-
do_sample=True,
|
| 160 |
-
)
|
| 161 |
-
|
| 162 |
-
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
| 163 |
-
```
|
| 164 |
-
|
| 165 |
-
---
|
| 166 |
-
|
| 167 |
-
## Conditional Generation
|
| 168 |
|
| 169 |
```python
|
| 170 |
-
|
| 171 |
-
|
| 172 |
-
import sys
|
| 173 |
-
import os
|
| 174 |
-
import json
|
| 175 |
-
import time
|
| 176 |
-
|
| 177 |
-
import guidegen as gg
|
| 178 |
-
from pydantic import BaseModel, Field
|
| 179 |
-
from typing import Literal, List
|
| 180 |
-
|
| 181 |
-
# Hugging Face access token - set via environment variable or .env file
|
| 182 |
-
# You can set it with: export HUGGINGFACE_HUB_TOKEN=your_token_here
|
| 183 |
-
# Or create a .env file with: HUGGINGFACE_HUB_TOKEN=your_token_here
|
| 184 |
-
|
| 185 |
-
MODEL_NAME = "tabularisai/Faust-1"
|
| 186 |
-
|
| 187 |
-
|
| 188 |
-
# --- Schema ---
|
| 189 |
-
class EmailSummary(BaseModel):
|
| 190 |
-
"""Structured summary of an email."""
|
| 191 |
-
Absender: str = Field(description="Der Name des Absenders.")
|
| 192 |
-
Betreff: str = Field(description="Worum geht es in der E-Mail? (max 5 Wörter)")
|
| 193 |
-
Zusammenfassung: str = Field(description="Kurze Zusammenfassung (max 2 Sätze).")
|
| 194 |
-
Prioritaet: Literal["hoch", "mittel", "niedrig"] = Field(description="Wie wichtig die E-Mail ist.")
|
| 195 |
-
# AntwortNoetig: bool = Field(description="Muss man auf die E-Mail antworten?")
|
| 196 |
-
|
| 197 |
-
|
| 198 |
-
# --- Input ---
|
| 199 |
-
email_text = """Hallo Jens,
|
| 200 |
-
|
| 201 |
-
wir hatten uns bei CampusFounders im Rahmen unserer Pre-Seed-Runde kennengelernt.
|
| 202 |
-
Seitdem haben wir große Fortschritte gemacht und bereiten aktuell unsere Seed-Runde vor.
|
| 203 |
-
|
| 204 |
-
Wir entwickeln eine Infrastruktur für hocheffiziente, lokal trainierbare KI-Modelle – vollständig ohne Cloud.
|
| 205 |
-
Sehr gern würden wir uns mit dir austauschen und prüfen, ob ein Intro zu US-VCs oder ein Gespräch mit Crestlight möglich wäre.
|
| 206 |
-
|
| 207 |
-
Anbei ein kurzer OnePager zur Weiterleitung.
|
| 208 |
-
|
| 209 |
-
Beste Grüße
|
| 210 |
-
Ricard"""
|
| 211 |
-
|
| 212 |
|
|
|
|
|
|
|
| 213 |
|
| 214 |
-
|
| 215 |
-
|
| 216 |
-
|
| 217 |
-
|
| 218 |
-
|
| 219 |
-
--- Beispiel ---
|
| 220 |
-
E-Mail-Text:
|
| 221 |
-
Sehr geehrte Damen und Herren, ich wollte nur nachfragen, ob meine Bestellung #12345 schon versandt wurde. Vielen Dank, Max Mustermann
|
| 222 |
-
JSON-Antwort:
|
| 223 |
-
{{
|
| 224 |
-
"Absender": "Max Mustermann",
|
| 225 |
-
"Betreff": "Bestellstatus Anfrage",
|
| 226 |
-
"Zusammenfassung": "Anfrage zum Versandstatus der Bestellung #12345.",
|
| 227 |
-
"Prioritaet": "mittel",
|
| 228 |
-
}}
|
| 229 |
-
--- Ende Beispiel ---
|
| 230 |
-
|
| 231 |
-
Jetzt analysiere die folgende E-Mail und erstelle das JSON-Objekt.
|
| 232 |
-
|
| 233 |
-
E-Mail-Text:
|
| 234 |
-
{email_text}
|
| 235 |
-
"""
|
| 236 |
-
|
| 237 |
-
|
| 238 |
-
def main():
|
| 239 |
-
print("=" * 60)
|
| 240 |
-
print("EMAIL SUMMARIZATION WITH GUIDEGEN")
|
| 241 |
-
print("=" * 60)
|
| 242 |
-
|
| 243 |
-
print(f"\nLoading model: {MODEL_NAME}")
|
| 244 |
-
load_start = time.time()
|
| 245 |
-
|
| 246 |
-
gen = gg.GuideGen(
|
| 247 |
-
MODEL_NAME,
|
| 248 |
-
verbose=True,
|
| 249 |
-
use_chat_template=True,
|
| 250 |
-
enable_thinking=False,
|
| 251 |
-
)
|
| 252 |
-
|
| 253 |
-
load_time = time.time() - load_start
|
| 254 |
-
print(f"Model loaded in {load_time:.2f}s")
|
| 255 |
-
|
| 256 |
-
# --- Generate ---
|
| 257 |
-
print("\nGenerating structured summary...")
|
| 258 |
-
gen_start = time.time()
|
| 259 |
-
|
| 260 |
-
options = gg.GuideGenOptions(
|
| 261 |
-
temperature=0.6,
|
| 262 |
-
max_tokens=400,
|
| 263 |
-
do_sample=False,
|
| 264 |
-
)
|
| 265 |
-
|
| 266 |
-
summary = gen.generate(prompt, EmailSummary, options=options)
|
| 267 |
-
|
| 268 |
-
gen_time = time.time() - gen_start
|
| 269 |
-
print(f"Generation complete in {gen_time:.2f}s")
|
| 270 |
-
|
| 271 |
-
# --- Output ---
|
| 272 |
-
print("\n--- Email Summary (JSON) ---")
|
| 273 |
-
print(json.dumps(summary.model_dump(), indent=2, ensure_ascii=False))
|
| 274 |
-
print(f"\n Model load: {load_time:.2f}s | Generation: {gen_time:.2f}s | Total: {load_time + gen_time:.2f}s")
|
| 275 |
-
```
|
| 276 |
-
|
| 277 |
-
---
|
| 278 |
-
|
| 279 |
-
## Training focus
|
| 280 |
-
|
| 281 |
-
### German-first data distribution
|
| 282 |
-
|
| 283 |
-
Faust-1 is trained from scratch with a German-dominant corpus. German syntax, compounding, morphology, and typical reasoning patterns are treated as the default operating regime rather than an edge case.
|
| 284 |
-
|
| 285 |
-
### Verified synthetic data
|
| 286 |
-
|
| 287 |
-
A substantial portion of the training signal comes from synthetic data. To keep this signal usable, generation is paired with explicit verification and filtering:
|
| 288 |
-
|
| 289 |
-
- LLM-as-judge style evaluations
|
| 290 |
-
- rule-based and programmatic checks
|
| 291 |
-
- consistency and self-agreement filtering
|
| 292 |
-
|
| 293 |
-
This allows broad coverage of instruction-following and reasoning patterns while maintaining quality control.
|
| 294 |
-
|
| 295 |
-
---
|
| 296 |
-
|
| 297 |
-
## Tokenizer optimized for German
|
| 298 |
-
|
| 299 |
-
Faust-1 uses a custom tokenizer optimized for German morphology and compounding. Token efficiency is treated as a deployment constraint, not just a preprocessing detail.
|
| 300 |
-
|
| 301 |
-

|
| 302 |
-
|
| 303 |
-
Lower token counts on German text translate directly into more usable context, lower inference cost, and less fragmentation on compound-heavy inputs.
|
| 304 |
-
|
| 305 |
-
|
| 306 |
-
<img src="tokenizer_faust.png" alt="Faust-1 vs OpenAI Tokenizers" width="800">
|
| 307 |
-
|
| 308 |
-
|
| 309 |
-
---
|
| 310 |
-
|
| 311 |
-
## German benchmark performance
|
| 312 |
-
|
| 313 |
-
Faust-1 is evaluated on a set of standard German-language benchmarks:
|
| 314 |
-
|
| 315 |
-
- ARC_de
|
| 316 |
-
- GSM8K_de
|
| 317 |
-
- HellaSwag_de
|
| 318 |
-
- MMLU_de
|
| 319 |
-
- TruthfulQA_de
|
| 320 |
-
|
| 321 |
-

|
| 322 |
-
|
| 323 |
-
The target is best-in-class performance within the 1–2B parameter range for German-focused models, using benchmarks that are easy to reproduce in Hugging Face-based evaluation pipelines.
|
| 324 |
-
|
| 325 |
-
---
|
| 326 |
-
|
| 327 |
-
## Deployment examples
|
| 328 |
-
|
| 329 |
-
Faust-1 can be deployed with common inference stacks that support decoder-only language models.
|
| 330 |
-
|
| 331 |
-
vLLM (OpenAI-compatible API)
|
| 332 |
-
```sh
|
| 333 |
-
vllm serve tabularisai/Faust-1 --dtype float16
|
| 334 |
-
```
|
| 335 |
-
|
| 336 |
-
SGLang
|
| 337 |
-
```sh
|
| 338 |
-
python -m sglang.launch_server \
|
| 339 |
-
--model-path tabularisai/Faust-1 \
|
| 340 |
-
--dtype float16
|
| 341 |
-
```
|
| 342 |
|
| 343 |
-
|
| 344 |
-
|
| 345 |
-
.
|
| 346 |
-
|
| 347 |
-
-p "Erkläre kurz, was ein großes Sprachmodell ist."
|
| 348 |
```
|
| 349 |
|
| 350 |
-
|
| 351 |
-
|
| 352 |
-
---
|
| 353 |
-
|
| 354 |
-
## Intended use
|
| 355 |
|
| 356 |
-
-
|
| 357 |
-
-
|
| 358 |
-
-
|
| 359 |
-
-
|
| 360 |
-
|
| 361 |
-
---
|
| 362 |
-
|
| 363 |
-
## Roadmap
|
| 364 |
-
|
| 365 |
-
- Reasoning-focused variant (comming soon)
|
| 366 |
-
- Agent-oriented variant (comming soon)
|
| 367 |
-
|
| 368 |
-
---
|
| 369 |
|
| 370 |
## Citation
|
| 371 |
|
| 372 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 373 |
|
|
|
|
| 374 |
|
| 375 |
-
|
|
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
language:
|
| 3 |
+
- de
|
| 4 |
+
- en
|
| 5 |
+
license: apache-2.0
|
| 6 |
+
library_name: transformers
|
| 7 |
+
base_model:
|
| 8 |
+
- tabularisai/Faust-1
|
| 9 |
+
- Qwen/Qwen3-1.7B
|
| 10 |
tags:
|
| 11 |
+
- merge
|
| 12 |
+
- german
|
| 13 |
+
- medical
|
| 14 |
+
- instruction-following
|
| 15 |
+
- attention-surgery
|
| 16 |
+
pipeline_tag: text-generation
|
| 17 |
---
|
| 18 |
|
| 19 |
+
# Faust-1-Merged
|
| 20 |
|
| 21 |
+
**German language model with enhanced instruction following via attention surgery.**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
|
| 23 |
+
## What is this?
|
| 24 |
|
| 25 |
+
This is [tabularisai/Faust-1](https://huggingface.co/tabularisai/Faust-1) (1.7B, Qwen3 architecture, custom German tokenizer) with attention layers partially replaced from [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) base model to improve instruction following while preserving Faust's German language capabilities.
|
| 26 |
|
| 27 |
+
## Merge Method: Attention Surgery
|
| 28 |
|
| 29 |
+
Unlike traditional model merging (SLERP, TIES, DARE), this uses **targeted attention-only surgery** with a stepped alpha schedule:
|
| 30 |
|
| 31 |
+
| Layer Range | Alpha | Effect |
|
| 32 |
+
|------------|-------|--------|
|
| 33 |
+
| 0-6 (early) | 0.1 | Light touch — protect embedding-adjacent layers |
|
| 34 |
+
| 7-13 (mid-early) | 0.3 | Moderate blend |
|
| 35 |
+
| 14-20 (mid-late) | 0.6 | Strong instruction signal |
|
| 36 |
+
| 21-27 (late) | 0.8 | Maximum instruction following |
|
| 37 |
|
| 38 |
+
**Key insight:** Only self-attention weights are modified. All MLP weights (which store factual knowledge and vocabulary) remain 100% Faust. This preserves German language quality while importing Qwen3's instruction-following behavior from its attention routing.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
|
| 40 |
+
## Evaluation Results
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
|
| 42 |
+
Tested on 30 instruction-following tasks (deterministic, temperature=0):
|
| 43 |
|
| 44 |
+
| Model | Score | Accuracy |
|
| 45 |
+
|-------|-------|----------|
|
| 46 |
+
| **Faust-1-Merged** | **27/30** | **90%** |
|
| 47 |
+
| Faust-1 (original) | 25/30 | 83% |
|
| 48 |
|
| 49 |
+
### Category Breakdown
|
| 50 |
|
| 51 |
+
| Category | Faust-1 | Faust-1-Merged |
|
| 52 |
+
|----------|---------|----------------|
|
| 53 |
+
| Format (lists, JSON, etc.) | 5/6 | 6/6 |
|
| 54 |
+
| Length control | 5/5 | 5/5 |
|
| 55 |
+
| Language (German, formal) | 3/4 | 4/4 |
|
| 56 |
+
| Constraints (forbidden words) | 4/5 | 4/5 |
|
| 57 |
+
| Structured output | 4/4 | 3/4 |
|
| 58 |
+
| Medical (Arztbrief) | 3/3 | 3/3 |
|
| 59 |
+
| Role playing | 2/3 | 2/3 |
|
| 60 |
|
| 61 |
+
### Improvements over baseline:
|
| 62 |
+
- ✅ One-word answers (strict format compliance)
|
| 63 |
+
- ✅ No-English constraint (pure German output)
|
| 64 |
+
- ✅ Required word inclusion
|
| 65 |
|
| 66 |
+
### Known limitations:
|
| 67 |
+
- ❌ "End with word" — both models struggle
|
| 68 |
+
- ❌ "Refuse off-topic" — requires SFT for proper role boundaries
|
| 69 |
+
- ❌ Markdown tables sometimes missing proper separators
|
|
|
|
|
|
|
| 70 |
|
| 71 |
+
## Usage
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 72 |
|
| 73 |
```python
|
| 74 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 75 |
|
| 76 |
+
model = AutoModelForCausalLM.from_pretrained("tabularisai/Faust-1-Merged", torch_dtype="auto")
|
| 77 |
+
tokenizer = AutoTokenizer.from_pretrained("tabularisai/Faust-1-Merged")
|
| 78 |
|
| 79 |
+
messages = [
|
| 80 |
+
{"role": "system", "content": "Du bist ein hilfreicher Assistent."},
|
| 81 |
+
{"role": "user", "content": "Nenne mir 5 deutsche Städte als nummerierte Liste."}
|
| 82 |
+
]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 83 |
|
| 84 |
+
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
| 85 |
+
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
|
| 86 |
+
outputs = model.generate(**inputs, max_new_tokens=200, temperature=0, do_sample=False)
|
| 87 |
+
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
|
|
|
|
| 88 |
```
|
| 89 |
|
| 90 |
+
## Technical Details
|
|
|
|
|
|
|
|
|
|
|
|
|
| 91 |
|
| 92 |
+
- **Architecture:** Qwen3 (1.7B parameters)
|
| 93 |
+
- **Tokenizer:** Custom Faust German tokenizer (unchanged)
|
| 94 |
+
- **Modified layers:** 168 self-attention parameter tensors
|
| 95 |
+
- **Unmodified:** All MLP layers, embeddings, lm_head, layer norms
|
| 96 |
+
- **Method:** Per-quartile linear interpolation of attention weights
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 97 |
|
| 98 |
## Citation
|
| 99 |
|
| 100 |
+
```bibtex
|
| 101 |
+
@misc{faust1merged2026,
|
| 102 |
+
title={Faust-1-Merged: Attention Surgery for German Instruction Following},
|
| 103 |
+
author={Tabularis.AI},
|
| 104 |
+
year={2026},
|
| 105 |
+
url={https://huggingface.co/tabularisai/Faust-1-Merged}
|
| 106 |
+
}
|
| 107 |
+
```
|
| 108 |
|
| 109 |
+
## About Tabularis.AI
|
| 110 |
|
| 111 |
+
University of Tübingen spin-off specializing in privacy-first AI for regulated industries.
|
| 112 |
+
Products include EU PII Safeguard, Faust German language models, and GDPR-compliant on-premises deployment.
|
config.json
CHANGED
|
@@ -50,11 +50,13 @@
|
|
| 50 |
"num_key_value_heads": 8,
|
| 51 |
"pad_token_id": 1,
|
| 52 |
"rms_norm_eps": 1e-06,
|
| 53 |
-
"
|
| 54 |
-
|
|
|
|
|
|
|
| 55 |
"sliding_window": null,
|
| 56 |
"tie_word_embeddings": true,
|
| 57 |
-
"transformers_version": "
|
| 58 |
"use_cache": false,
|
| 59 |
"use_sliding_window": false,
|
| 60 |
"vocab_size": 100000
|
|
|
|
| 50 |
"num_key_value_heads": 8,
|
| 51 |
"pad_token_id": 1,
|
| 52 |
"rms_norm_eps": 1e-06,
|
| 53 |
+
"rope_parameters": {
|
| 54 |
+
"rope_theta": 1000000,
|
| 55 |
+
"rope_type": "default"
|
| 56 |
+
},
|
| 57 |
"sliding_window": null,
|
| 58 |
"tie_word_embeddings": true,
|
| 59 |
+
"transformers_version": "5.2.0",
|
| 60 |
"use_cache": false,
|
| 61 |
"use_sliding_window": false,
|
| 62 |
"vocab_size": 100000
|
generation_config.json
CHANGED
|
@@ -9,5 +9,5 @@
|
|
| 9 |
"temperature": 0.6,
|
| 10 |
"top_k": 20,
|
| 11 |
"top_p": 0.95,
|
| 12 |
-
"transformers_version": "
|
| 13 |
}
|
|
|
|
| 9 |
"temperature": 0.6,
|
| 10 |
"top_k": 20,
|
| 11 |
"top_p": 0.95,
|
| 12 |
+
"transformers_version": "5.2.0"
|
| 13 |
}
|
merge_config.json
ADDED
|
@@ -0,0 +1,30 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"method": "attention_surgery",
|
| 3 |
+
"base_model": "tabularisai/Faust-1",
|
| 4 |
+
"donor_model": "Qwen/Qwen3-1.7B",
|
| 5 |
+
"schedule": "stepped_quartile",
|
| 6 |
+
"alphas_per_quartile": {
|
| 7 |
+
"0-6": 0.1,
|
| 8 |
+
"7-13": 0.3,
|
| 9 |
+
"14-20": 0.6,
|
| 10 |
+
"21-27": 0.8
|
| 11 |
+
},
|
| 12 |
+
"components_modified": [
|
| 13 |
+
"self_attn"
|
| 14 |
+
],
|
| 15 |
+
"components_preserved": [
|
| 16 |
+
"mlp",
|
| 17 |
+
"embed_tokens",
|
| 18 |
+
"lm_head",
|
| 19 |
+
"input_layernorm",
|
| 20 |
+
"post_attention_layernorm",
|
| 21 |
+
"model.norm"
|
| 22 |
+
],
|
| 23 |
+
"eval_score": "27/30 (90%)",
|
| 24 |
+
"baseline_score": "25/30 (83%)",
|
| 25 |
+
"eval_settings": {
|
| 26 |
+
"temperature": 0,
|
| 27 |
+
"do_sample": false,
|
| 28 |
+
"max_new_tokens": 300
|
| 29 |
+
}
|
| 30 |
+
}
|
model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 3228455704
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:0d9b6e280b9aecc623307361ef79f8c93031da8ab8425eebf0ca7c8a458723f3
|
| 3 |
size 3228455704
|
tokenizer.json
CHANGED
|
@@ -184,7 +184,32 @@
|
|
| 184 |
}
|
| 185 |
]
|
| 186 |
},
|
| 187 |
-
"post_processor":
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 188 |
"decoder": {
|
| 189 |
"type": "ByteLevel",
|
| 190 |
"add_prefix_space": true,
|
|
|
|
| 184 |
}
|
| 185 |
]
|
| 186 |
},
|
| 187 |
+
"post_processor": {
|
| 188 |
+
"type": "TemplateProcessing",
|
| 189 |
+
"single": [
|
| 190 |
+
{
|
| 191 |
+
"Sequence": {
|
| 192 |
+
"id": "A",
|
| 193 |
+
"type_id": 0
|
| 194 |
+
}
|
| 195 |
+
}
|
| 196 |
+
],
|
| 197 |
+
"pair": [
|
| 198 |
+
{
|
| 199 |
+
"Sequence": {
|
| 200 |
+
"id": "A",
|
| 201 |
+
"type_id": 0
|
| 202 |
+
}
|
| 203 |
+
},
|
| 204 |
+
{
|
| 205 |
+
"Sequence": {
|
| 206 |
+
"id": "B",
|
| 207 |
+
"type_id": 1
|
| 208 |
+
}
|
| 209 |
+
}
|
| 210 |
+
],
|
| 211 |
+
"special_tokens": {}
|
| 212 |
+
},
|
| 213 |
"decoder": {
|
| 214 |
"type": "ByteLevel",
|
| 215 |
"add_prefix_space": true,
|
tokenizer_config.json
CHANGED
|
@@ -1,180 +1,20 @@
|
|
| 1 |
{
|
| 2 |
-
"
|
| 3 |
-
"0": {
|
| 4 |
-
"content": "<|endoftext|>",
|
| 5 |
-
"lstrip": false,
|
| 6 |
-
"normalized": false,
|
| 7 |
-
"rstrip": false,
|
| 8 |
-
"single_word": false,
|
| 9 |
-
"special": true
|
| 10 |
-
},
|
| 11 |
-
"1": {
|
| 12 |
-
"content": "<|pad|>",
|
| 13 |
-
"lstrip": false,
|
| 14 |
-
"normalized": false,
|
| 15 |
-
"rstrip": false,
|
| 16 |
-
"single_word": false,
|
| 17 |
-
"special": true
|
| 18 |
-
},
|
| 19 |
-
"2": {
|
| 20 |
-
"content": "<|unk|>",
|
| 21 |
-
"lstrip": false,
|
| 22 |
-
"normalized": false,
|
| 23 |
-
"rstrip": false,
|
| 24 |
-
"single_word": false,
|
| 25 |
-
"special": true
|
| 26 |
-
},
|
| 27 |
-
"3": {
|
| 28 |
-
"content": "<|bos|>",
|
| 29 |
-
"lstrip": false,
|
| 30 |
-
"normalized": false,
|
| 31 |
-
"rstrip": false,
|
| 32 |
-
"single_word": false,
|
| 33 |
-
"special": true
|
| 34 |
-
},
|
| 35 |
-
"4": {
|
| 36 |
-
"content": "<|eos|>",
|
| 37 |
-
"lstrip": false,
|
| 38 |
-
"normalized": false,
|
| 39 |
-
"rstrip": false,
|
| 40 |
-
"single_word": false,
|
| 41 |
-
"special": true
|
| 42 |
-
},
|
| 43 |
-
"5": {
|
| 44 |
-
"content": "<|im_start|>",
|
| 45 |
-
"lstrip": false,
|
| 46 |
-
"normalized": false,
|
| 47 |
-
"rstrip": false,
|
| 48 |
-
"single_word": false,
|
| 49 |
-
"special": true
|
| 50 |
-
},
|
| 51 |
-
"6": {
|
| 52 |
-
"content": "<|im_end|>",
|
| 53 |
-
"lstrip": false,
|
| 54 |
-
"normalized": false,
|
| 55 |
-
"rstrip": false,
|
| 56 |
-
"single_word": false,
|
| 57 |
-
"special": true
|
| 58 |
-
},
|
| 59 |
-
"7": {
|
| 60 |
-
"content": "<|im_sep|>",
|
| 61 |
-
"lstrip": false,
|
| 62 |
-
"normalized": false,
|
| 63 |
-
"rstrip": false,
|
| 64 |
-
"single_word": false,
|
| 65 |
-
"special": true
|
| 66 |
-
},
|
| 67 |
-
"8": {
|
| 68 |
-
"content": "<|special_0|>",
|
| 69 |
-
"lstrip": false,
|
| 70 |
-
"normalized": false,
|
| 71 |
-
"rstrip": false,
|
| 72 |
-
"single_word": false,
|
| 73 |
-
"special": true
|
| 74 |
-
},
|
| 75 |
-
"9": {
|
| 76 |
-
"content": "<|special_1|>",
|
| 77 |
-
"lstrip": false,
|
| 78 |
-
"normalized": false,
|
| 79 |
-
"rstrip": false,
|
| 80 |
-
"single_word": false,
|
| 81 |
-
"special": true
|
| 82 |
-
},
|
| 83 |
-
"10": {
|
| 84 |
-
"content": "<|special_2|>",
|
| 85 |
-
"lstrip": false,
|
| 86 |
-
"normalized": false,
|
| 87 |
-
"rstrip": false,
|
| 88 |
-
"single_word": false,
|
| 89 |
-
"special": true
|
| 90 |
-
},
|
| 91 |
-
"11": {
|
| 92 |
-
"content": "<|special_3|>",
|
| 93 |
-
"lstrip": false,
|
| 94 |
-
"normalized": false,
|
| 95 |
-
"rstrip": false,
|
| 96 |
-
"single_word": false,
|
| 97 |
-
"special": true
|
| 98 |
-
},
|
| 99 |
-
"12": {
|
| 100 |
-
"content": "<|special_4|>",
|
| 101 |
-
"lstrip": false,
|
| 102 |
-
"normalized": false,
|
| 103 |
-
"rstrip": false,
|
| 104 |
-
"single_word": false,
|
| 105 |
-
"special": true
|
| 106 |
-
},
|
| 107 |
-
"13": {
|
| 108 |
-
"content": "<|special_5|>",
|
| 109 |
-
"lstrip": false,
|
| 110 |
-
"normalized": false,
|
| 111 |
-
"rstrip": false,
|
| 112 |
-
"single_word": false,
|
| 113 |
-
"special": true
|
| 114 |
-
},
|
| 115 |
-
"14": {
|
| 116 |
-
"content": "<|special_6|>",
|
| 117 |
-
"lstrip": false,
|
| 118 |
-
"normalized": false,
|
| 119 |
-
"rstrip": false,
|
| 120 |
-
"single_word": false,
|
| 121 |
-
"special": true
|
| 122 |
-
},
|
| 123 |
-
"15": {
|
| 124 |
-
"content": "<|special_7|>",
|
| 125 |
-
"lstrip": false,
|
| 126 |
-
"normalized": false,
|
| 127 |
-
"rstrip": false,
|
| 128 |
-
"single_word": false,
|
| 129 |
-
"special": true
|
| 130 |
-
},
|
| 131 |
-
"16": {
|
| 132 |
-
"content": "<|special_8|>",
|
| 133 |
-
"lstrip": false,
|
| 134 |
-
"normalized": false,
|
| 135 |
-
"rstrip": false,
|
| 136 |
-
"single_word": false,
|
| 137 |
-
"special": true
|
| 138 |
-
},
|
| 139 |
-
"17": {
|
| 140 |
-
"content": "<|special_9|>",
|
| 141 |
-
"lstrip": false,
|
| 142 |
-
"normalized": false,
|
| 143 |
-
"rstrip": false,
|
| 144 |
-
"single_word": false,
|
| 145 |
-
"special": true
|
| 146 |
-
}
|
| 147 |
-
},
|
| 148 |
-
"additional_special_tokens": [
|
| 149 |
-
"<|im_start|>",
|
| 150 |
-
"<|im_end|>",
|
| 151 |
-
"<|im_sep|>",
|
| 152 |
-
"<|special_0|>",
|
| 153 |
-
"<|special_1|>",
|
| 154 |
-
"<|special_2|>",
|
| 155 |
-
"<|special_3|>",
|
| 156 |
-
"<|special_4|>",
|
| 157 |
-
"<|special_5|>",
|
| 158 |
-
"<|special_6|>",
|
| 159 |
-
"<|special_7|>",
|
| 160 |
-
"<|special_8|>",
|
| 161 |
-
"<|special_9|>"
|
| 162 |
-
],
|
| 163 |
"bos_token": "<|bos|>",
|
| 164 |
"clean_up_tokenization_spaces": false,
|
| 165 |
"eos_token": "<|im_end|>",
|
| 166 |
-
"
|
| 167 |
"max_length": 2048,
|
|
|
|
|
|
|
|
|
|
|
|
|
| 168 |
"model_max_length": 8192,
|
| 169 |
"pad_token": "<|pad|>",
|
|
|
|
| 170 |
"stride": 0,
|
| 171 |
-
"tokenizer_class": "
|
| 172 |
"truncation_side": "right",
|
| 173 |
"truncation_strategy": "longest_first",
|
| 174 |
-
"unk_token": "<|unk|>"
|
| 175 |
-
|
| 176 |
-
"model_input_names": [
|
| 177 |
-
"input_ids",
|
| 178 |
-
"attention_mask"
|
| 179 |
-
]
|
| 180 |
-
}
|
|
|
|
| 1 |
{
|
| 2 |
+
"backend": "tokenizers",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
"bos_token": "<|bos|>",
|
| 4 |
"clean_up_tokenization_spaces": false,
|
| 5 |
"eos_token": "<|im_end|>",
|
| 6 |
+
"is_local": false,
|
| 7 |
"max_length": 2048,
|
| 8 |
+
"model_input_names": [
|
| 9 |
+
"input_ids",
|
| 10 |
+
"attention_mask"
|
| 11 |
+
],
|
| 12 |
"model_max_length": 8192,
|
| 13 |
"pad_token": "<|pad|>",
|
| 14 |
+
"return_token_type_ids": false,
|
| 15 |
"stride": 0,
|
| 16 |
+
"tokenizer_class": "TokenizersBackend",
|
| 17 |
"truncation_side": "right",
|
| 18 |
"truncation_strategy": "longest_first",
|
| 19 |
+
"unk_token": "<|unk|>"
|
| 20 |
+
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|