|
|
---
|
|
|
library_name: transformers
|
|
|
datasets:
|
|
|
- openGPT-X/gsm8kx
|
|
|
language:
|
|
|
- zho
|
|
|
- eng
|
|
|
- fra
|
|
|
- spa
|
|
|
- por
|
|
|
- deu
|
|
|
- ita
|
|
|
- rus
|
|
|
- jpn
|
|
|
- kor
|
|
|
- vie
|
|
|
- tha
|
|
|
- ara
|
|
|
base_model:
|
|
|
- Qwen/Qwen2.5-3B-Instruct
|
|
|
pipeline_tag: text-generation
|
|
|
---
|
|
|
|
|
|
# German R1
|
|
|
|
|
|

|
|
|
|
|
|
**Introducing German-R1. We are so back!!11**
|
|
|
|
|
|
- 🇩🇪 German R1 is a reasoning model almost equivalent to OpenAI‘s o3 or DeepSeek‘s R1 - but it thinks in German!
|
|
|
|
|
|
- 🇩🇪 German R1 is so efficient that is was build without any government funding.
|
|
|
|
|
|
- 🇩🇪 German R1 was only trained with legally imported H100s in less than five minutes.
|
|
|
|
|
|
- Please do NOT take this too serious.
|
|
|
|
|
|
## Context
|
|
|
|
|
|

|
|
|
|
|
|
See Linkedin: https://www.linkedin.com/posts/activity-7294337496023269376-ZBkI
|
|
|
|
|
|
## Usage
|
|
|
|
|
|
You can run the model as follows:
|
|
|
|
|
|
```python
|
|
|
from transformers import pipeline, set_seed
|
|
|
import json
|
|
|
|
|
|
set_seed(42)
|
|
|
pipe = pipeline("text-generation", "malteos/german-r1")
|
|
|
|
|
|
# from gsm8k test set
|
|
|
question = "James beschließt, 3-mal pro Woche 3 Sprints zu laufen. Er läuft 60 Meter pro Sprint. Wie viele Meter läuft er insgesamt pro Woche?"
|
|
|
expected_answer = "540"
|
|
|
|
|
|
# xml reasoning and answer format
|
|
|
system_prompt = """
|
|
|
Antworte auf deutsch und in diesem Format:
|
|
|
<reasoning>
|
|
|
...
|
|
|
</reasoning>
|
|
|
<answer>
|
|
|
...
|
|
|
</answer>
|
|
|
"""
|
|
|
|
|
|
messages = [
|
|
|
{
|
|
|
"role": "system",
|
|
|
"content": system_prompt,
|
|
|
},
|
|
|
{"role": "user", "content": dataset["question"][3]},
|
|
|
]
|
|
|
response = pipe(messages, max_new_tokens=256)
|
|
|
|
|
|
print(json.dumps(response, indent=4, ensure_ascii=False))
|
|
|
```
|
|
|
|
|
|
The output should look like this (correct answer: 540)
|
|
|
|
|
|
```json
|
|
|
[
|
|
|
{
|
|
|
"generated_text": [
|
|
|
{
|
|
|
"role": "system",
|
|
|
"content": "\nAntworte auf deutsch und in diesem Format:\n<reasoning>\n...\n</reasoning>\n<answer>\n...\n</answer>\n"
|
|
|
},
|
|
|
{
|
|
|
"role": "user",
|
|
|
"content": "James beschließt, 3-mal pro Woche 3 Sprints zu laufen. Er läuft 60 Meter pro Sprint. Wie viele Meter läuft er insgesamt pro Woche?"
|
|
|
},
|
|
|
{
|
|
|
"role": "assistant",
|
|
|
"content": "<reasoning>\nJames läuft 3 Sprints pro Tag, wobei jeder Sprint 60 Meter beträgt. Deshalb läuft er pro Tag 3 Sprints * 60 Meter/Sprint = 180 Meter. James läuft 3 Mal pro Woche, also läuft er insgesamt 180 Meter * 3 = 540 Meter pro Woche.\n</reasoning>\n<answer>\n540\n</answer>\n"
|
|
|
}
|
|
|
]
|
|
|
}
|
|
|
]
|
|
|
```
|
|
|
|
|
|
## Training details
|
|
|
|
|
|
- Base model: Qwen2.5-3B-Instruct
|
|
|
- Implementation: based on [GRPO demo](https://gist.github.com/willccbb/4676755236bb08cab5f4e54a0475d6fb) by [will brown](https://x.com/willccbb) + language identification reward
|
|
|
- Training data: German subset of [openGPT-X/gsm8kx](https://huggingface.co/datasets/openGPT-X/gsm8kx/) (machine translated from the English gsm8k)
|
|
|
|
|
|
|
|
|
## License
|
|
|
|
|
|
[Qwen research](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct/blob/main/LICENSE) |