MiniGPT-30M-Wikipedia

A Small Language Model (SLM) ~30M parameters built from scratch. This model was trained as part of an architectural study of pre-LLM structures.

Model Details

  • Architecture: GPT-style Decoder-only Transformer
  • Parameters: ~30 Million
  • Layers: 6
  • Attention Heads: 8
  • Embedding Dimension: 384
  • Context Size: 512 tokens

Training Configuration

  • Dataset: WikiText-103
  • Hardware: 2x NVIDIA T4 GPUs (Kaggle)
  • Epochs: 1
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train RinKana/MiniGPT-30M-Wikipedia