|
|
--- |
|
|
library_name: transformers |
|
|
datasets: |
|
|
- sleeping-ai/TEKGEN-Wiki |
|
|
- lewiswatson/flag_injected_wikipedia_sample |
|
|
- zekebass/tensor-logic-wikipedia |
|
|
- Samiro1/Wiki-Corpus-Chat |
|
|
- JoseHuman/dtConceptosProgramacion |
|
|
- Parleatacoeur/leyesperuanasactualizadas |
|
|
- beyarkay/pre-1950s-text |
|
|
- emreisik/news |
|
|
- ae5115242430e13/RSL_Maran |
|
|
- lgrobol/openminuscule |
|
|
- Dorian2B/french-history-5K |
|
|
- MultivexAI/Everyday-Language-Corpus-deduped |
|
|
- yashm/phrases |
|
|
- 16dvnk/Spirit_Kings_Golden_Textbook |
|
|
language: |
|
|
- en |
|
|
- es |
|
|
base_model: |
|
|
- openai-community/gpt2 |
|
|
--- |
|
|
|
|
|
# Model Card for Fu01978/gpt2-mega-wiki-logic |
|
|
This model is a fine-tuned version of GPT-2 (Base) trained on a diverse "mini-pile" of 14 datasets ranging from French history and Peruvian law to programming concepts and esoteric texts. It is designed to be a versatile text-completer that can shift styles based on the input prompt. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
This model was created to explore the limits of "Knowledge Density" in small language models. By mixing high-fact density data (Wikipedia, News) with specialized technical data (Programming, Law) and historical texts, the model acts as a "Jack-of-all-trades" completion engine. |
|
|
- **Developed by:** Fu01978 |
|
|
- **Model type:** Causal Language Model (Transformer Decoder) |
|
|
- **Language(s) (NLP):** English (Primary), Spanish (Programming/Law) |
|
|
- **License:** MIT |
|
|
- **Finetuned from model:** openai-community/gpt2 |
|
|
|
|
|
## Uses |
|
|
### Direct Use |
|
|
The model is best used for **constrained text completion**. It excels when given a clear context or "trigger phrase" to help it navigate its diverse training data. |
|
|
|
|
|
### Out-of-Scope Use |
|
|
- **Fact-Checking:** This model should **not** be used as a primary source of historical or legal facts. |
|
|
- **High-Stakes Advice:** Do not use for legal or medical decision-making. |
|
|
|
|
|
## Bias, Risks, and Limitations |
|
|
- **Temporal Hallucinations:** Because the model was trained on 14 vastly different time periods simultaneously, it frequently mixes historical facts (e.g., placing the 1944 Battle of the Bulge in the 1898 Spanish-American War). |
|
|
- **Small Context Window:** As a GPT-2 base model, it has a limited context window and may lose coherence in long-form generation. |
|
|
|
|
|
### Recommendations |
|
|
Users should use Beam Search ($num\_beams \ge 3$) and a high Repetition Penalty ($1.5+$) to prevent the model from entering logic loops or mixing unrelated datasets. |
|
|
|
|
|
## How to Get Started with the Model |
|
|
Use the code below to get started with the model. |
|
|
```py |
|
|
from transformers import pipeline |
|
|
|
|
|
generator = pipeline("text-generation", model="Fu01978/gpt2-mega-wiki-logic") |
|
|
prompt = "In the Python programming language, a decorator is" |
|
|
print(generator(prompt, max_new_tokens=50, repetition_penalty=1.5, num_beams=5)[0]['generated_text']) |
|
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
The model was trained on a combined pool of 123233 rows. |
|
|
|
|
|
### Training Metrics |
|
|
|
|
|
| Step | Training Loss | |
|
|
| :--- | :-----------: | |
|
|
| 100 | 3.0273 | |
|
|
| 200 | 2.8873 | |
|
|
| 300 | 2.7780 | |
|
|
| 400 | 2.8189 | |
|
|
| 500 | 2.8553 | |
|
|
|
|
|
### Training Procedure |
|
|
#### Training Hyperparameters |
|
|
- **Steps:** 500 |
|
|
- **Batch Size:** 4 (with 4 Gradient Accumulation steps) |
|
|
- **Learning Rate:** 4e-5 |
|
|
- **Precision:** fp16 (Mixed Precision) |
|
|
- **Optimizer:** AdamW with Weight Decay (0.01) |
|
|
|
|
|
## Technical Specifications |
|
|
|
|
|
### Model Architecture and Objective |
|
|
- **Architecture:** Standard GPT-2 Transformer Decoder. |
|
|
- **Objective:** Causal Language Modeling (Next-token prediction). |
|
|
- **Parameters:** 124 Million. |