Fu01978
/

gpt2-mega-wiki-logic

Text Generation

text-generation-inference

Model card Files Files and versions

gpt2-mega-wiki-logic / README.md

Fu01978's picture

Update README.md

fb553f2 verified 24 days ago

|

history blame contribute delete

3.52 kB

	---
	library_name: transformers
	datasets:
	- sleeping-ai/TEKGEN-Wiki
	- lewiswatson/flag_injected_wikipedia_sample
	- zekebass/tensor-logic-wikipedia
	- Samiro1/Wiki-Corpus-Chat
	- JoseHuman/dtConceptosProgramacion
	- Parleatacoeur/leyesperuanasactualizadas
	- beyarkay/pre-1950s-text
	- emreisik/news
	- ae5115242430e13/RSL_Maran
	- lgrobol/openminuscule
	- Dorian2B/french-history-5K
	- MultivexAI/Everyday-Language-Corpus-deduped
	- yashm/phrases
	- 16dvnk/Spirit_Kings_Golden_Textbook
	language:
	- en
	- es
	base_model:
	- openai-community/gpt2
	---

	# Model Card for Fu01978/gpt2-mega-wiki-logic
	This model is a fine-tuned version of GPT-2 (Base) trained on a diverse "mini-pile" of 14 datasets ranging from French history and Peruvian law to programming concepts and esoteric texts. It is designed to be a versatile text-completer that can shift styles based on the input prompt.

	## Model Details

	### Model Description
	This model was created to explore the limits of "Knowledge Density" in small language models. By mixing high-fact density data (Wikipedia, News) with specialized technical data (Programming, Law) and historical texts, the model acts as a "Jack-of-all-trades" completion engine.
	- Developed by: Fu01978
	- Model type: Causal Language Model (Transformer Decoder)
	- Language(s) (NLP): English (Primary), Spanish (Programming/Law)
	- License: MIT
	- Finetuned from model: openai-community/gpt2

	## Uses
	### Direct Use
	The model is best used for constrained text completion. It excels when given a clear context or "trigger phrase" to help it navigate its diverse training data.

	### Out-of-Scope Use
	- Fact-Checking: This model should not be used as a primary source of historical or legal facts.
	- High-Stakes Advice: Do not use for legal or medical decision-making.

	## Bias, Risks, and Limitations
	- Temporal Hallucinations: Because the model was trained on 14 vastly different time periods simultaneously, it frequently mixes historical facts (e.g., placing the 1944 Battle of the Bulge in the 1898 Spanish-American War).
	- Small Context Window: As a GPT-2 base model, it has a limited context window and may lose coherence in long-form generation.

	### Recommendations
	Users should use Beam Search ($num\_beams \ge 3$) and a high Repetition Penalty ($1.5+$) to prevent the model from entering logic loops or mixing unrelated datasets.

	## How to Get Started with the Model
	Use the code below to get started with the model.
	```py
	from transformers import pipeline

	generator = pipeline("text-generation", model="Fu01978/gpt2-mega-wiki-logic")
	prompt = "In the Python programming language, a decorator is"
	print(generator(prompt, max_new_tokens=50, repetition_penalty=1.5, num_beams=5)[0]['generated_text'])
	```

	## Training Details

	### Training Data
	The model was trained on a combined pool of 123233 rows.

	### Training Metrics

	\| Step \| Training Loss \|
	\| :--- \| :-----------: \|
	\| 100 \| 3.0273 \|
	\| 200 \| 2.8873 \|
	\| 300 \| 2.7780 \|
	\| 400 \| 2.8189 \|
	\| 500 \| 2.8553 \|

	### Training Procedure
	#### Training Hyperparameters
	- Steps: 500
	- Batch Size: 4 (with 4 Gradient Accumulation steps)
	- Learning Rate: 4e-5
	- Precision: fp16 (Mixed Precision)
	- Optimizer: AdamW with Weight Decay (0.01)

	## Technical Specifications

	### Model Architecture and Objective
	- Architecture: Standard GPT-2 Transformer Decoder.
	- Objective: Causal Language Modeling (Next-token prediction).
	- Parameters: 124 Million.