Model Card for Fu01978/gpt2-mega-wiki-logic

This model is a fine-tuned version of GPT-2 (Base) trained on a diverse "mini-pile" of 14 datasets ranging from French history and Peruvian law to programming concepts and esoteric texts. It is designed to be a versatile text-completer that can shift styles based on the input prompt.

Model Details

Model Description

This model was created to explore the limits of "Knowledge Density" in small language models. By mixing high-fact density data (Wikipedia, News) with specialized technical data (Programming, Law) and historical texts, the model acts as a "Jack-of-all-trades" completion engine.

  • Developed by: Fu01978
  • Model type: Causal Language Model (Transformer Decoder)
  • Language(s) (NLP): English (Primary), Spanish (Programming/Law)
  • License: MIT
  • Finetuned from model: openai-community/gpt2

Uses

Direct Use

The model is best used for constrained text completion. It excels when given a clear context or "trigger phrase" to help it navigate its diverse training data.

Out-of-Scope Use

  • Fact-Checking: This model should not be used as a primary source of historical or legal facts.
  • High-Stakes Advice: Do not use for legal or medical decision-making.

Bias, Risks, and Limitations

  • Temporal Hallucinations: Because the model was trained on 14 vastly different time periods simultaneously, it frequently mixes historical facts (e.g., placing the 1944 Battle of the Bulge in the 1898 Spanish-American War).
  • Small Context Window: As a GPT-2 base model, it has a limited context window and may lose coherence in long-form generation.

Recommendations

Users should use Beam Search ($num_beams \ge 3$) and a high Repetition Penalty ($1.5+$) to prevent the model from entering logic loops or mixing unrelated datasets.

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import pipeline

generator = pipeline("text-generation", model="Fu01978/gpt2-mega-wiki-logic")
prompt = "In the Python programming language, a decorator is"
print(generator(prompt, max_new_tokens=50, repetition_penalty=1.5, num_beams=5)[0]['generated_text'])

Training Details

Training Data

The model was trained on a combined pool of 123233 rows.

Training Metrics

Step Training Loss
100 3.0273
200 2.8873
300 2.7780
400 2.8189
500 2.8553

Training Procedure

Training Hyperparameters

  • Steps: 500
  • Batch Size: 4 (with 4 Gradient Accumulation steps)
  • Learning Rate: 4e-5
  • Precision: fp16 (Mixed Precision)
  • Optimizer: AdamW with Weight Decay (0.01)

Technical Specifications

Model Architecture and Objective

  • Architecture: Standard GPT-2 Transformer Decoder.
  • Objective: Causal Language Modeling (Next-token prediction).
  • Parameters: 124 Million.
Downloads last month
21
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Fu01978/gpt2-mega-wiki-logic

Finetuned
(2075)
this model

Datasets used to train Fu01978/gpt2-mega-wiki-logic