---
license: mit
tags:
- gemma3
- safetensors
- transformers
- tinygemma
- tinystories
- validation
- test-suite
---

# TinyStories Gemma3 Text Validation Artifact

This directory contains a tiny Gemma 3 text-only model trained with official
Hugging Face Transformers classes.

It is intended for inference-engine validation, not for production language
quality.

## Official classes used

- `Gemma3TextConfig`
- `Gemma3ForCausalLM`
- `Trainer`

No custom Gemma 3 modeling code is used.

## Key validation targets

- `model_type = gemma3_text`
- `architectures = Gemma3ForCausalLM`
- local/global attention pattern through `layer_types`
- sliding-window attention
- full attention
- GQA
- per-head `q_norm` / `k_norm`
- Gemma3 four-norm decoder layer structure
- gated MLP: `silu(gate_proj(x)) * up_proj(x)`
- tied output head through `model.embed_tokens.weight`

## Tiny architecture

- vocab_size: 1024
- hidden_size: 128
- intermediate_size: 512
- num_hidden_layers: 6
- num_attention_heads: 4
- num_key_value_heads: 1
- head_dim: 32
- sliding_window: 32
- layer_types: ['sliding_attention', 'sliding_attention', 'sliding_attention', 'sliding_attention', 'sliding_attention', 'full_attention']

## Files

- `hf/`: Hugging Face model/tokenizer artifact
- `reference/reference.pt`: deterministic reference tensors
- `reference/reference.json`: JSON summary of reference logits
- `gemma3_text_config_dump.json`: normalized config dump
- `safetensors_keys.json`: tensor names and shapes
- `artifact_metadata.json`: generation metadata

## Usage

```python
import torch
from transformers import Gemma3ForCausalLM, PreTrainedTokenizerFast

def main():
    repo_id = "shibatch/tinygemma3-2m"

    print("Loading tokenizer...")
    tokenizer = PreTrainedTokenizerFast.from_pretrained(repo_id, subfolder="hf")

    print("Loading Gemma3 model weights...")
    device = "cuda" if torch.cuda.is_available() else "cpu"

    model = Gemma3ForCausalLM.from_pretrained(
        repo_id,
        subfolder="hf",
        torch_dtype=torch.bfloat16 if device == "cuda" else torch.float32,
    ).to(device)
    model.eval()

    prompt = "Once upon"
    print(f"\nInput prompt: {prompt}")

    input_ids = tokenizer.encode(prompt, add_special_tokens=False)
    input_ids = [tokenizer.bos_token_id] + input_ids
    input_ids = torch.tensor([input_ids], dtype=torch.long, device=device)

    with torch.no_grad():
        outputs = model.generate(
            input_ids,
            max_new_tokens=100,
            do_sample=False,
            repetition_penalty=1.0,
            top_p=1.0,
            pad_token_id=tokenizer.pad_token_id or tokenizer.bos_token_id,
            eos_token_id=tokenizer.eos_token_id,
        )

    generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    print(f"Generated output: {generated_text}")

if __name__ == "__main__":
    main()
```