Gemma-3 270M (Trained from Scratch)

This is a 270M parameter language model based on the Gemma 3 architecture, trained from scratch on the TinyStories dataset.

Model Details

  • Architecture: Custom Gemma 3 (Sliding Window Attention)
  • Parameters: ~270M
  • Context Length: 32,768

Training Details

  • Trained for: 110,000 steps
  • Final training loss: 1.7712
  • Final validation loss: 1.7916

Usage

Since this model uses a custom architecture, you need to download the model code (src folder) to run it.

# 1. Install dependencies
# pip install transformers torch huggingface_hub

import sys
import os
import torch
from transformers import AutoTokenizer
from huggingface_hub import snapshot_download

# 2. Download the repository (Code + Weights)
repo_path = snapshot_download(repo_id="Adx19/gemma-3-270m-tinystories")

# 3. Add the downloaded folder to Python path so we can import 'src'
sys.path.append(repo_path)

# 4. Import Custom Model
from src.model import Gemma3Model
from src.config import GEMMA3_CONFIG_270M

# 5. Load Model & Weights
device = "cuda" if torch.cuda.is_available() else "cpu"
model = Gemma3Model(GEMMA3_CONFIG_270M).to(device)

weights_path = os.path.join(repo_path, "pytorch_model.bin")
model.load_state_dict(torch.load(weights_path, map_location=device))
model.eval()

# 6. Load Tokenizer
tokenizer = AutoTokenizer.from_pretrained("Adx19/gemma-3-270m-tinystories")

# 7. Generate
input_text = "Once upon a time"
input_ids = tokenizer(input_text, return_tensors="pt")["input_ids"].to(device)

with torch.no_grad():
    output = model.generate(input_ids, max_new_tokens=50)

print(tokenizer.decode(output[0], skip_special_tokens=True))
Downloads last month
38
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Adx19/gemma-3-270m-tinystories