reaperdoesntknow
/

Qemma-Q1.7B

Text Generation

text-generation-inference

Model card Files Files and versions

Qemma-Q1.7B / README.md

reaperdoesntknow's picture

reaperdoesntknow

Update README.md

4065c70 verified about 2 months ago

|

history blame contribute delete

3.36 kB

	---
	library_name: transformers
	tags:
	- trl
	- sft
	- gemma
	- qwen
	- merge
	- disc
	license: osl-3.0
	datasets:
	- HuggingFaceH4/ultrachat_200k
	- TIGER-Lab/MathInstruct
	language:
	- en
	base_model:
	- Qwen/Qwen3-1.7B
	- google/gemma-3-1b-it
	pipeline_tag: text-generation
	---
	# Model Card for Qemma-Q-1.7B
	## Gap Envelope Integral
	* My mathematical formulation to utilize space projections to "measure" the Jump between points of discontinuity found in Non-Differentialable Functions.
	## Redux
	* This Model underwent an additional merge between Qemma-redux and Qwen3-1.7B, in addition to adding Rope Scaling.
	### Additionally
	* Fusion Logic was updated to aid per layer fusion and post fusion embedding alignment.
	* Qemma is a HuggingFace-native hybrid model that merges Gemma-3 (1B) and Qwen-3 (1.7B) at the weight level (no adapters).
	* Design: Gemma MLP/body + Qwen attention/head, projected and aligned to Gemma’s hidden size. The model is then SFT-tuned for stepwise reasoning.
	* This variant uses Yarn based Rope Scaling with 1:* Ratio from max_position_embeddings = 242144
	*
	## Quick start

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	model_id = "reaperdoesntknow/Qemma-Q1.7B"
	tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
	model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16).eval()

	text = (
	"<\|user\|>"
	"What makes the sky blue?."
	"<\|assistant\|>"
	"<think><reasoning_step>"
	)

	inputs = tokenizer(text, return_tensors="pt", max_length=64, padding='max_length', truncation=True)
	inputs = {k: v.to(model.device) for k, v in inputs.items()}

	with torch.no_grad():
	model.eval()
	outputs = model.generate(**inputs, max_new_tokens=256, do_sample=True, min_length=32)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))

	```

	## What’s inside

	* Architecture:
	* Gemma-3 backbone (26 layers, hidden 1152, MLP 6912)
	* Qwen-style attention regrouped to Gemma’s 4×256 heads. (head_dim=128, hidden=2048, intermediate_size=6144, num_attn_heads=16, KV heads=8, num_hidd_layers=28)
	* Tokenizer: Gemma-3 tokenizer and chat template (see `chat_template.jinja`).
	* Training: SFT for instruction following and stepwise reasoning.

	## Intended use & limitations

	Use: research, instruction following, code/help, analysis, further SFT/RLHF.
	Limits: may hallucinate; not for safety-critical, medical, legal, or financial decisions. Follow dataset/model licenses.

	## Training procedure

	* ~512 warm-start steps (HuggingFaceH4/ultrachat_200k) ~ A small post fussion training round was done (8 steps): to encourage embedding realignment.
	* ~256 SFT steps with (TIGER-Lab/MathInstruct + HuggingFaceH4/ultrachat_200k)


	### Framework versions

	* TRL: 0.25.0
	* Transformers: 4.57.1
	* Pytorch: 2.8.0+cpu
	* Datasets: 4.4.1
	* Tokenizers: 0.22.1

	## Citations



	Cite TRL as:

	```bibtex
	@misc{vonwerra2022trl,
	title = {{TRL: Transformer Reinforcement Learning}},
	author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
	year = 2020,
	journal = {GitHub repository},
	publisher = {GitHub},
	howpublished = {\url{https://github.com/huggingface/trl}}
	}
	```