Growing LLM Model Card

Model Description

The Growing LLM is a GPT-2 based language model that implements neural plasticity-inspired dynamic growth during training. This model starts with a pre-trained GPT-2 (124M parameters) and dynamically adds new transformer blocks while freezing the original parameters, allowing the model to acquire new knowledge without catastrophic forgetting.

Key Features

  • Dynamic Growth: Adds new transformer blocks during training
  • Knowledge Preservation: Freezes original parameters to retain pre-trained knowledge
  • Flexible Triggers: Supports fixed schedule and plateau detection growth triggers
  • Regularization Options: Supports Knowledge Distillation and Elastic Weight Consolidation (EWC)
  • Comprehensive Metrics: Tracks training, validation, growth events, and scaling analysis

Training Details

Training Data

  • Dataset: C4 (Colossal Clean Crawled Corpus) - 50k training samples
  • Max sequence length: 128 tokens

Training Configuration

  • Base model: GPT-2 (124M parameters)
  • Learning rate: 5e-5
  • Batch size: 8
  • Optimizer: AdamW with weight decay 0.01
  • Max steps: 2000
  • Growth frequency: Every 500 steps
  • Maximum growth events: 3

Growth Mechanism

  1. Fixed Schedule: Grow every N training steps
  2. Plateau Detection: Grow when validation loss shows no improvement for Y steps

Regularization (Optional)

  • Knowledge Distillation: Uses teacher-student architecture with temperature scaling
  • Elastic Weight Consolidation (EWC): Penalizes changes to important parameters

Model Architecture

  • Base: GPT-2 (12 layers, 12 heads, 768 hidden dim)
  • Growth: Added 3 new transformer blocks (one per growth event)
  • Final: 15 layers, 145.7M total parameters (+17% parameters)

Training Results

Summary Metrics

Metric Initial Final Improvement
Training Loss 7.16 1.95 73% โ†“
Validation Loss 6.99 2.03 71% โ†“
Validation Perplexity ~1000 7.42 99% โ†“
Total Parameters 124.4M 145.7M +17%

Training Time

  • Total time: ~61 minutes (3660 seconds)
  • Best validation loss: 2.00
  • Best validation perplexity: 7.42

Growth Events

Growth # Step Layers Parameters Val Loss Val Perplexity
Initial 0 12 124.4M 6.99 ~1000
1 500 13 170.1M 2.00 7.42
2 1000 14 177.2M 2.01 7.45
3 1500 15 184.3M 2.02 7.52

Key Observation: The validation loss remains stable (~2.0) across all growth events, demonstrating successful knowledge retention. The model continues to learn new capabilities without catastrophic forgetting.

Loss Curves

  • Training loss decreased from 7.16 โ†’ 1.95 (73% reduction)
  • Validation loss decreased from 6.99 โ†’ 2.03 (71% reduction)
  • Perplexity improved from ~1000 โ†’ 7.42 (99% improvement)

Benchmark Results

WikiText-2 Perplexity

Model Perplexity Improvement
Base GPT-2 56.0 -
Growing LLM 33.0 41% โ†“

Usage

from transformers import GPT2LMHeadModel, AutoTokenizer

# Load model and tokenizer
model = GPT2LMHeadModel.from_pretrained("aicinema69/gpt2-growing-large")
tokenizer = AutoTokenizer.from_pretrained("aicinema69/gpt2-growing-large")

# Generate text
input_text = "Once upon a time"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0]))

Limitations

  • Growth events may cause temporary performance dips that recover with continued training
  • Requires sufficient training data to benefit from additional parameters
  • More parameters = higher memory and compute requirements

License

This model is based on GPT-2 which has the OpenAI GPT-2 License.

Citation

If you use this model in your research, please cite:

@misc{growing_llm,
  author = {Satyam Singh},
  title = {Growing LLM: Dynamic Model Growth for Continual Learning},
  year = {2026},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/aicinema69/gpt2-growing-large}}
}

Contact

For questions or issues, please open a GitHub issue or contact the model author.

Downloads last month
5
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for aicinema69/gpt2-growing-large

Finetuned
(2192)
this model