Model Card for Fu01978/gpt2-mega-wiki-logic
This model is a fine-tuned version of GPT-2 (Base) trained on a diverse "mini-pile" of 14 datasets ranging from French history and Peruvian law to programming concepts and esoteric texts. It is designed to be a versatile text-completer that can shift styles based on the input prompt.
Model Details
Model Description
This model was created to explore the limits of "Knowledge Density" in small language models. By mixing high-fact density data (Wikipedia, News) with specialized technical data (Programming, Law) and historical texts, the model acts as a "Jack-of-all-trades" completion engine.
- Developed by: Fu01978
- Model type: Causal Language Model (Transformer Decoder)
- Language(s) (NLP): English (Primary), Spanish (Programming/Law)
- License: MIT
- Finetuned from model: openai-community/gpt2
Uses
Direct Use
The model is best used for constrained text completion. It excels when given a clear context or "trigger phrase" to help it navigate its diverse training data.
Out-of-Scope Use
- Fact-Checking: This model should not be used as a primary source of historical or legal facts.
- High-Stakes Advice: Do not use for legal or medical decision-making.
Bias, Risks, and Limitations
- Temporal Hallucinations: Because the model was trained on 14 vastly different time periods simultaneously, it frequently mixes historical facts (e.g., placing the 1944 Battle of the Bulge in the 1898 Spanish-American War).
- Small Context Window: As a GPT-2 base model, it has a limited context window and may lose coherence in long-form generation.
Recommendations
Users should use Beam Search ($num_beams \ge 3$) and a high Repetition Penalty ($1.5+$) to prevent the model from entering logic loops or mixing unrelated datasets.
How to Get Started with the Model
Use the code below to get started with the model.
from transformers import pipeline
generator = pipeline("text-generation", model="Fu01978/gpt2-mega-wiki-logic")
prompt = "In the Python programming language, a decorator is"
print(generator(prompt, max_new_tokens=50, repetition_penalty=1.5, num_beams=5)[0]['generated_text'])
Training Details
Training Data
The model was trained on a combined pool of 123233 rows.
Training Metrics
| Step | Training Loss |
|---|---|
| 100 | 3.0273 |
| 200 | 2.8873 |
| 300 | 2.7780 |
| 400 | 2.8189 |
| 500 | 2.8553 |
Training Procedure
Training Hyperparameters
- Steps: 500
- Batch Size: 4 (with 4 Gradient Accumulation steps)
- Learning Rate: 4e-5
- Precision: fp16 (Mixed Precision)
- Optimizer: AdamW with Weight Decay (0.01)
Technical Specifications
Model Architecture and Objective
- Architecture: Standard GPT-2 Transformer Decoder.
- Objective: Causal Language Modeling (Next-token prediction).
- Parameters: 124 Million.
- Downloads last month
- 21
Model tree for Fu01978/gpt2-mega-wiki-logic
Base model
openai-community/gpt2