YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Gemmagain Multimodal

Gemma3 multimodal model with layer looping support for the text decoder. This allows running the same physical text decoder layers multiple times in sequence, enabling parameter-efficient deep networks while leaving the vision tower unchanged.

Features

  • Layer looping for text decoder only - Vision tower (SiglipVisionModel) is unchanged
  • 100% weight compatible with unsloth/gemma-3-4b-pt and other Gemma3 multimodal models
  • Supports generation with KV caching - Cache slots properly allocated for looped layers
  • Flexible layer sequence format - Specify which layers to loop and how many times

Usage

import torch
from transformers import AutoConfig, Gemma3ForConditionalGeneration

# Load config with layer looping
config = AutoConfig.from_pretrained('rpDungeon/gemmagain-mm', trust_remote_code=True)

# Configure layer looping: layers 0-9 once, layers 10-27 twice, layers 28-33 once
config.text_config.layer_sequence = [[0, 10], [10, 28, 2], [28, 34]]

# Import and create model
from modeling_gemmagain import GemmagainForConditionalGeneration

model = GemmagainForConditionalGeneration(config)

# Load weights from any Gemma3 multimodal checkpoint
orig = Gemma3ForConditionalGeneration.from_pretrained(
    'unsloth/gemma-3-4b-pt',
    torch_dtype=torch.bfloat16,
)
model.load_state_dict(orig.state_dict())
del orig

model = model.to(dtype=torch.bfloat16, device='cuda')

Layer Sequence Format

The layer_sequence config accepts a flexible format:

Format Example Meaning
Integer 5 Single layer 5
2-element list [4, 20] Layers 4-19 (end exclusive)
3-element list [10, 28, 2] Layers 10-27, repeated 2 times

Example configurations:

# Default: all 34 layers once
config.text_config.layer_sequence = [[0, 34, 1]]

# Loopstral-style: loop middle layers twice
# Physical: 34 layers, Effective: 52 layers
config.text_config.layer_sequence = [[0, 10], [10, 28, 2], [28, 34]]

# Loop all layers twice (2x depth, same params)
config.text_config.layer_sequence = [[0, 34, 2]]

Architecture

GemmagainForConditionalGeneration
β”œβ”€β”€ model (GemmagainModel)
β”‚   β”œβ”€β”€ vision_tower (SiglipVisionModel)     # Unchanged from Gemma3
β”‚   β”œβ”€β”€ multi_modal_projector                 # Unchanged from Gemma3
β”‚   └── language_model (GemmagainTextModel)   # Layer looping support
β”‚       β”œβ”€β”€ embed_tokens
β”‚       β”œβ”€β”€ layers[0..33]                     # Physical layers
β”‚       β”œβ”€β”€ _layer_sequence                   # Execution order with loops
β”‚       └── norm
└── lm_head

License

Apache 2.0 (same as Gemma3)

Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support