GrownUpBaby-110M 👶➡️👨‍🎓

"Bedtime stories grew up."

GrownUpBaby-110M is a compact, Gemma-style causal LLM (110,304,256 parameters) trained from scratch to be a capable storyteller and creative assistant, with strong coherence and thematic control despite its size. It's the grown-up counterpart to my earlier **Exquisique/BabyLangModel**—an LLM with fewer parameters (30M) trained from scratch on TinyStories to generate short, simple narratives for young readers. With more room to breathe, GrownUpBaby aims for richer voice, longer arcs, and cleaner pacing—built to read like a master storyteller on consumer hardware.

💻 Model Details

Architecture: Custom Gemma-style (RoPE, RMSNorm, GeGLU Activations)
Parameters: 110,304,256 (110M)
Context Length: 1024 tokens
Vocabulary Size: 50,257 (GPT-2 Tokenizer)
Precision: Mixed Precision (BF16/FP32)
Checkpoint: Step 23,000 (Best performing checkpoint)

📚 Training Data

The model was trained on the HuggingFaceTB/cosmopedia dataset, specifically the stories subset.

Source: Synthetic textbooks, stories, and educational content generated by Mixtral-8x7B.
Volume: ~2.6 Million sequences (Processed).
Tokens Trained: ~750 Million tokens.

🛠️ Training Procedure

This model was trained on a single NVIDIA GeForce RTX 4060 Laptop GPU (8GB VRAM) using a custom highly-optimized training loop.

Hyperparameters

Optimizer: 8-bit AdamW (bitsandbytes)
Learning Rate: 5e-4 (with Cosine Decay to 5e-5)
Batch Size: 4 per device
Gradient Accumulation: 8 steps (Effective Batch Size: 32)
Weight Decay: 0.1
Gradient Clipping: 1.0
Warmup Steps: 1500

Performance Metrics

Final Loss: 2.3446 (at step 23,000)
Training Time: ~31 Hours (Across 2 epochs)

🚀 How to Use

Since this model uses a custom architecture definition (model.py), you must set trust_remote_code=True when loading only if you rely on the auto-modeling. However, we recommend loading the config mapping explicitly if needed.

Python Example

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Exquisique/GrownUpBaby" # Replace with your user

# Load Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)

# Load Model
model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    trust_remote_code=True, 
    torch_dtype=torch.float16,
    device_map="auto"
)

# Generate
prompt = "Once upon a time, a little robot named Beep found a flower."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs, 
    max_new_tokens=150, 
    temperature=0.7, 
    top_k=50, 
    do_sample=True
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

⚠️ Limitations & Bias

Size: At 110M parameters, this model has limited "world knowledge" compared to 7B+ models. It is best suited for creative writing and simple instruction following.
Hallucinations: It may generate plausible-sounding but factually incorrect information.
Language: Trained primarily on English educational and story data.

👨‍💻 Author

Trained by Exquisique

Downloads last month: 66

Safetensors

Model size

0.1B params

Tensor type

F32

Exquisique
/

GrownUpBaby