GrownUpBaby-110M πŸ‘Άβž‘οΈπŸ‘¨β€πŸŽ“

"Bedtime stories grew up."

GrownUpBaby-110M is a compact, Gemma-style causal LLM (110,304,256 parameters) trained from scratch to be a capable storyteller and creative assistant, with strong coherence and thematic control despite its size. It's the grown-up counterpart to my earlier **Exquisique/BabyLangModel**β€”an LLM with fewer parameters (30M) trained from scratch on TinyStories to generate short, simple narratives for young readers. With more room to breathe, GrownUpBaby aims for richer voice, longer arcs, and cleaner pacingβ€”built to read like a master storyteller on consumer hardware.

πŸ’» Model Details

  • Architecture: Custom Gemma-style (RoPE, RMSNorm, GeGLU Activations)
  • Parameters: 110,304,256 (110M)
  • Context Length: 1024 tokens
  • Vocabulary Size: 50,257 (GPT-2 Tokenizer)
  • Precision: Mixed Precision (BF16/FP32)
  • Checkpoint: Step 23,000 (Best performing checkpoint)

πŸ“š Training Data

The model was trained on the HuggingFaceTB/cosmopedia dataset, specifically the stories subset.

  • Source: Synthetic textbooks, stories, and educational content generated by Mixtral-8x7B.
  • Volume: ~2.6 Million sequences (Processed).
  • Tokens Trained: ~750 Million tokens.

πŸ› οΈ Training Procedure

This model was trained on a single NVIDIA GeForce RTX 4060 Laptop GPU (8GB VRAM) using a custom highly-optimized training loop.

Hyperparameters

  • Optimizer: 8-bit AdamW (bitsandbytes)
  • Learning Rate: 5e-4 (with Cosine Decay to 5e-5)
  • Batch Size: 4 per device
  • Gradient Accumulation: 8 steps (Effective Batch Size: 32)
  • Weight Decay: 0.1
  • Gradient Clipping: 1.0
  • Warmup Steps: 1500

Performance Metrics

  • Final Loss: 2.3446 (at step 23,000)
  • Training Time: ~31 Hours (Across 2 epochs)

πŸš€ How to Use

Since this model uses a custom architecture definition (model.py), you must set trust_remote_code=True when loading only if you rely on the auto-modeling. However, we recommend loading the config mapping explicitly if needed.

Python Example

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Exquisique/GrownUpBaby" # Replace with your user

# Load Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)

# Load Model
model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    trust_remote_code=True, 
    torch_dtype=torch.float16,
    device_map="auto"
)

# Generate
prompt = "Once upon a time, a little robot named Beep found a flower."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs, 
    max_new_tokens=150, 
    temperature=0.7, 
    top_k=50, 
    do_sample=True
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

⚠️ Limitations & Bias

  • Size: At 110M parameters, this model has limited "world knowledge" compared to 7B+ models. It is best suited for creative writing and simple instruction following.
  • Hallucinations: It may generate plausible-sounding but factually incorrect information.
  • Language: Trained primarily on English educational and story data.

πŸ‘¨β€πŸ’» Author

Trained by Exquisique

Downloads last month
18
Safetensors
Model size
0.1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train Exquisique/GrownUpBaby