YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Gemmagain Multimodal
Gemma3 multimodal model with layer looping support for the text decoder. This allows running the same physical text decoder layers multiple times in sequence, enabling parameter-efficient deep networks while leaving the vision tower unchanged.
Features
- Layer looping for text decoder only - Vision tower (SiglipVisionModel) is unchanged
- 100% weight compatible with
unsloth/gemma-3-4b-ptand other Gemma3 multimodal models - Supports generation with KV caching - Cache slots properly allocated for looped layers
- Flexible layer sequence format - Specify which layers to loop and how many times
Usage
import torch
from transformers import AutoConfig, Gemma3ForConditionalGeneration
# Load config with layer looping
config = AutoConfig.from_pretrained('rpDungeon/gemmagain-mm', trust_remote_code=True)
# Configure layer looping: layers 0-9 once, layers 10-27 twice, layers 28-33 once
config.text_config.layer_sequence = [[0, 10], [10, 28, 2], [28, 34]]
# Import and create model
from modeling_gemmagain import GemmagainForConditionalGeneration
model = GemmagainForConditionalGeneration(config)
# Load weights from any Gemma3 multimodal checkpoint
orig = Gemma3ForConditionalGeneration.from_pretrained(
'unsloth/gemma-3-4b-pt',
torch_dtype=torch.bfloat16,
)
model.load_state_dict(orig.state_dict())
del orig
model = model.to(dtype=torch.bfloat16, device='cuda')
Layer Sequence Format
The layer_sequence config accepts a flexible format:
| Format | Example | Meaning |
|---|---|---|
| Integer | 5 |
Single layer 5 |
| 2-element list | [4, 20] |
Layers 4-19 (end exclusive) |
| 3-element list | [10, 28, 2] |
Layers 10-27, repeated 2 times |
Example configurations:
# Default: all 34 layers once
config.text_config.layer_sequence = [[0, 34, 1]]
# Loopstral-style: loop middle layers twice
# Physical: 34 layers, Effective: 52 layers
config.text_config.layer_sequence = [[0, 10], [10, 28, 2], [28, 34]]
# Loop all layers twice (2x depth, same params)
config.text_config.layer_sequence = [[0, 34, 2]]
Architecture
GemmagainForConditionalGeneration
βββ model (GemmagainModel)
β βββ vision_tower (SiglipVisionModel) # Unchanged from Gemma3
β βββ multi_modal_projector # Unchanged from Gemma3
β βββ language_model (GemmagainTextModel) # Layer looping support
β βββ embed_tokens
β βββ layers[0..33] # Physical layers
β βββ _layer_sequence # Execution order with loops
β βββ norm
βββ lm_head
License
Apache 2.0 (same as Gemma3)
- Downloads last month
- 2
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support