File size: 2,961 Bytes
7fc8507
 
e96097c
7fc8507
e96097c
7fc8507
e96097c
 
 
 
 
 
 
 
 
 
 
 
7fc8507
 
e96097c
 
 
 
 
 
7fc8507
e96097c
 
 
7fc8507
4c9b4a0
 
e96097c
7fc8507
4c9b4a0
 
 
 
 
 
 
 
7fc8507
e96097c
7fc8507
e96097c
7fc8507
e96097c
 
 
7fc8507
e96097c
7fc8507
e96097c
 
7fc8507
e96097c
7fc8507
e96097c
 
 
 
7fc8507
 
e96097c
7fc8507
e96097c
 
 
 
 
7fc8507
e96097c
7fc8507
 
 
e96097c
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
---
library_name: transformers
model_name: Qemma-redux
tags:
- generated_from_trainer
- sft
- trl
licence: license
license: osl-3.0
datasets:
- O1-OPEN/OpenO1-SFT
- yahma/alpaca-cleaned
- Jackrong/gpt-oss-120b-reasoning-STEM-5K
language:
- en
base_model:
- reaperdoesntknow/Qemma-sft
pipeline_tag: text-generation
---

# Model Card for Qemma
**Redux** This Model underwent an additional merge between Qemma-sft and Qwen3-0.6B, in addition to adding Rope Scaling. 
**Qemma** is a HuggingFace-native hybrid model that merges **Gemma-3 (1B)** and **Qwen-3 (0.6B)** at the weight level (no adapters).
Design: Gemma MLP/body + Qwen attention/head, projected and aligned to Gemma’s hidden size. The model is then SFT-tuned for stepwise reasoning.
This variant uses Yarn based Rope Scaling with 1:1 Ratio from max_position_embeddings
## Quick start

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "reaperdoesntknow/Qemma-redux"
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16).eval()

text = "I notice that the sum involves the absolute values of three linear expressions of x."
inputs = tokenizer(text, return_tensors="pt", max_length=64, padding='max_length', truncation=True)
inputs = {k: v.to(model.device) for k, v in inputs.items()}

with torch.no_grad():
    model.eval()
    outputs = model.generate(**inputs, max_new_tokens=256, do_sample=True, min_length=32)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

```

## What’s inside

* **Architecture:** Gemma-3 backbone (26 layers, hidden 1152, MLP 6912) with **Qwen-style attention** regrouped to Gemma’s 4×256 heads.
* **Tokenizer:** Gemma-3 tokenizer and chat template (see `chat_template.jinja`).
* **Training:** SFT for instruction following and stepwise reasoning.

## Intended use & limitations

**Use:** research, instruction following, code/help, analysis, further SFT/RLHF.
**Limits:** may hallucinate; not for safety-critical, medical, legal, or financial decisions. Follow dataset/model licenses.

## Training procedure

* ~512 warm-start steps (Alpaca-style data)
* 256 Additional pretraining steps on (O1-OPEN/OpenO1-SFT)
* 128  SFT steps with  (Jackrong/gpt-oss-120b-reasoning-STEM-5K)
* 256 SFT steps with (O1-OPEN/OpenO1-SFT)


### Framework versions

* TRL: 0.25.0
* Transformers: 4.57.1
* Pytorch: 2.8.0+cpu
* Datasets: 4.4.1
* Tokenizers: 0.22.1

## Citations



Cite TRL as:
    
```bibtex
@misc{vonwerra2022trl,
	title        = {{TRL: Transformer Reinforcement Learning}},
	author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
	year         = 2020,
	journal      = {GitHub repository},
	publisher    = {GitHub},
	howpublished = {\url{https://github.com/huggingface/trl}}
}
```