File size: 3,516 Bytes
d952cff
 
2e5eb8d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d952cff
 
4cecaa0
 
d952cff
 
 
 
4cecaa0
 
 
 
 
 
d952cff
 
 
4cecaa0
d952cff
 
4cecaa0
 
d952cff
 
4cecaa0
 
d952cff
 
4cecaa0
d952cff
 
 
4cecaa0
 
d952cff
4cecaa0
 
 
 
d952cff
 
 
 
4cecaa0
d952cff
fb553f2
 
 
 
 
 
 
 
 
 
d952cff
 
4cecaa0
 
 
 
 
d952cff
fb553f2
d952cff
 
4cecaa0
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
---
library_name: transformers
datasets:
- sleeping-ai/TEKGEN-Wiki
- lewiswatson/flag_injected_wikipedia_sample
- zekebass/tensor-logic-wikipedia
- Samiro1/Wiki-Corpus-Chat
- JoseHuman/dtConceptosProgramacion
- Parleatacoeur/leyesperuanasactualizadas
- beyarkay/pre-1950s-text
- emreisik/news
- ae5115242430e13/RSL_Maran
- lgrobol/openminuscule
- Dorian2B/french-history-5K
- MultivexAI/Everyday-Language-Corpus-deduped
- yashm/phrases
- 16dvnk/Spirit_Kings_Golden_Textbook
language:
- en
- es
base_model:
- openai-community/gpt2
---

# Model Card for Fu01978/gpt2-mega-wiki-logic
This model is a fine-tuned version of GPT-2 (Base) trained on a diverse "mini-pile" of 14 datasets ranging from French history and Peruvian law to programming concepts and esoteric texts. It is designed to be a versatile text-completer that can shift styles based on the input prompt.

## Model Details

### Model Description
This model was created to explore the limits of "Knowledge Density" in small language models. By mixing high-fact density data (Wikipedia, News) with specialized technical data (Programming, Law) and historical texts, the model acts as a "Jack-of-all-trades" completion engine.
- **Developed by:** Fu01978
- **Model type:** Causal Language Model (Transformer Decoder)
- **Language(s) (NLP):** English (Primary), Spanish (Programming/Law)
- **License:** MIT
- **Finetuned from model:** openai-community/gpt2

## Uses
### Direct Use
The model is best used for **constrained text completion**. It excels when given a clear context or "trigger phrase" to help it navigate its diverse training data.

### Out-of-Scope Use
- **Fact-Checking:** This model should **not** be used as a primary source of historical or legal facts.
- **High-Stakes Advice:** Do not use for legal or medical decision-making.

## Bias, Risks, and Limitations
- **Temporal Hallucinations:** Because the model was trained on 14 vastly different time periods simultaneously, it frequently mixes historical facts (e.g., placing the 1944 Battle of the Bulge in the 1898 Spanish-American War).
- **Small Context Window:** As a GPT-2 base model, it has a limited context window and may lose coherence in long-form generation.

### Recommendations
Users should use Beam Search ($num\_beams \ge 3$) and a high Repetition Penalty ($1.5+$) to prevent the model from entering logic loops or mixing unrelated datasets.

## How to Get Started with the Model
Use the code below to get started with the model.
```py
from transformers import pipeline

generator = pipeline("text-generation", model="Fu01978/gpt2-mega-wiki-logic")
prompt = "In the Python programming language, a decorator is"
print(generator(prompt, max_new_tokens=50, repetition_penalty=1.5, num_beams=5)[0]['generated_text'])
```

## Training Details

### Training Data
The model was trained on a combined pool of 123233 rows.

### Training Metrics

| Step | Training Loss |
| :--- | :-----------: |
| 100  | 3.0273        |
| 200  | 2.8873        |
| 300  | 2.7780        |
| 400  | 2.8189        |
| 500  | 2.8553        |

### Training Procedure
#### Training Hyperparameters
- **Steps:** 500
- **Batch Size:** 4 (with 4 Gradient Accumulation steps)
- **Learning Rate:** 4e-5
- **Precision:** fp16 (Mixed Precision)
- **Optimizer:** AdamW with Weight Decay (0.01)

## Technical Specifications

### Model Architecture and Objective
- **Architecture:** Standard GPT-2 Transformer Decoder.
- **Objective:** Causal Language Modeling (Next-token prediction).
- **Parameters:** 124 Million.