Fu01978's picture
Update README.md
fb553f2 verified
---
library_name: transformers
datasets:
- sleeping-ai/TEKGEN-Wiki
- lewiswatson/flag_injected_wikipedia_sample
- zekebass/tensor-logic-wikipedia
- Samiro1/Wiki-Corpus-Chat
- JoseHuman/dtConceptosProgramacion
- Parleatacoeur/leyesperuanasactualizadas
- beyarkay/pre-1950s-text
- emreisik/news
- ae5115242430e13/RSL_Maran
- lgrobol/openminuscule
- Dorian2B/french-history-5K
- MultivexAI/Everyday-Language-Corpus-deduped
- yashm/phrases
- 16dvnk/Spirit_Kings_Golden_Textbook
language:
- en
- es
base_model:
- openai-community/gpt2
---
# Model Card for Fu01978/gpt2-mega-wiki-logic
This model is a fine-tuned version of GPT-2 (Base) trained on a diverse "mini-pile" of 14 datasets ranging from French history and Peruvian law to programming concepts and esoteric texts. It is designed to be a versatile text-completer that can shift styles based on the input prompt.
## Model Details
### Model Description
This model was created to explore the limits of "Knowledge Density" in small language models. By mixing high-fact density data (Wikipedia, News) with specialized technical data (Programming, Law) and historical texts, the model acts as a "Jack-of-all-trades" completion engine.
- **Developed by:** Fu01978
- **Model type:** Causal Language Model (Transformer Decoder)
- **Language(s) (NLP):** English (Primary), Spanish (Programming/Law)
- **License:** MIT
- **Finetuned from model:** openai-community/gpt2
## Uses
### Direct Use
The model is best used for **constrained text completion**. It excels when given a clear context or "trigger phrase" to help it navigate its diverse training data.
### Out-of-Scope Use
- **Fact-Checking:** This model should **not** be used as a primary source of historical or legal facts.
- **High-Stakes Advice:** Do not use for legal or medical decision-making.
## Bias, Risks, and Limitations
- **Temporal Hallucinations:** Because the model was trained on 14 vastly different time periods simultaneously, it frequently mixes historical facts (e.g., placing the 1944 Battle of the Bulge in the 1898 Spanish-American War).
- **Small Context Window:** As a GPT-2 base model, it has a limited context window and may lose coherence in long-form generation.
### Recommendations
Users should use Beam Search ($num\_beams \ge 3$) and a high Repetition Penalty ($1.5+$) to prevent the model from entering logic loops or mixing unrelated datasets.
## How to Get Started with the Model
Use the code below to get started with the model.
```py
from transformers import pipeline
generator = pipeline("text-generation", model="Fu01978/gpt2-mega-wiki-logic")
prompt = "In the Python programming language, a decorator is"
print(generator(prompt, max_new_tokens=50, repetition_penalty=1.5, num_beams=5)[0]['generated_text'])
```
## Training Details
### Training Data
The model was trained on a combined pool of 123233 rows.
### Training Metrics
| Step | Training Loss |
| :--- | :-----------: |
| 100 | 3.0273 |
| 200 | 2.8873 |
| 300 | 2.7780 |
| 400 | 2.8189 |
| 500 | 2.8553 |
### Training Procedure
#### Training Hyperparameters
- **Steps:** 500
- **Batch Size:** 4 (with 4 Gradient Accumulation steps)
- **Learning Rate:** 4e-5
- **Precision:** fp16 (Mixed Precision)
- **Optimizer:** AdamW with Weight Decay (0.01)
## Technical Specifications
### Model Architecture and Objective
- **Architecture:** Standard GPT-2 Transformer Decoder.
- **Objective:** Causal Language Modeling (Next-token prediction).
- **Parameters:** 124 Million.