Gemma-3 270M (Trained from Scratch)
This is a 270M parameter language model based on the Gemma 3 architecture, trained from scratch on the TinyStories dataset.
Model Details
- Architecture: Custom Gemma 3 (Sliding Window Attention)
- Parameters: ~270M
- Context Length: 32,768
Training Details
- Trained for: 110,000 steps
- Final training loss: 1.7712
- Final validation loss: 1.7916
Usage
Since this model uses a custom architecture, you need to download the model code (src folder) to run it.
# 1. Install dependencies
# pip install transformers torch huggingface_hub
import sys
import os
import torch
from transformers import AutoTokenizer
from huggingface_hub import snapshot_download
# 2. Download the repository (Code + Weights)
repo_path = snapshot_download(repo_id="Adx19/gemma-3-270m-tinystories")
# 3. Add the downloaded folder to Python path so we can import 'src'
sys.path.append(repo_path)
# 4. Import Custom Model
from src.model import Gemma3Model
from src.config import GEMMA3_CONFIG_270M
# 5. Load Model & Weights
device = "cuda" if torch.cuda.is_available() else "cpu"
model = Gemma3Model(GEMMA3_CONFIG_270M).to(device)
weights_path = os.path.join(repo_path, "pytorch_model.bin")
model.load_state_dict(torch.load(weights_path, map_location=device))
model.eval()
# 6. Load Tokenizer
tokenizer = AutoTokenizer.from_pretrained("Adx19/gemma-3-270m-tinystories")
# 7. Generate
input_text = "Once upon a time"
input_ids = tokenizer(input_text, return_tensors="pt")["input_ids"].to(device)
with torch.no_grad():
output = model.generate(input_ids, max_new_tokens=50)
print(tokenizer.decode(output[0], skip_special_tokens=True))
- Downloads last month
- 38
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support