--- library_name: transformers datasets: - sleeping-ai/TEKGEN-Wiki - lewiswatson/flag_injected_wikipedia_sample - zekebass/tensor-logic-wikipedia - Samiro1/Wiki-Corpus-Chat - JoseHuman/dtConceptosProgramacion - Parleatacoeur/leyesperuanasactualizadas - beyarkay/pre-1950s-text - emreisik/news - ae5115242430e13/RSL_Maran - lgrobol/openminuscule - Dorian2B/french-history-5K - MultivexAI/Everyday-Language-Corpus-deduped - yashm/phrases - 16dvnk/Spirit_Kings_Golden_Textbook language: - en - es base_model: - openai-community/gpt2 --- # Model Card for Fu01978/gpt2-mega-wiki-logic This model is a fine-tuned version of GPT-2 (Base) trained on a diverse "mini-pile" of 14 datasets ranging from French history and Peruvian law to programming concepts and esoteric texts. It is designed to be a versatile text-completer that can shift styles based on the input prompt. ## Model Details ### Model Description This model was created to explore the limits of "Knowledge Density" in small language models. By mixing high-fact density data (Wikipedia, News) with specialized technical data (Programming, Law) and historical texts, the model acts as a "Jack-of-all-trades" completion engine. - **Developed by:** Fu01978 - **Model type:** Causal Language Model (Transformer Decoder) - **Language(s) (NLP):** English (Primary), Spanish (Programming/Law) - **License:** MIT - **Finetuned from model:** openai-community/gpt2 ## Uses ### Direct Use The model is best used for **constrained text completion**. It excels when given a clear context or "trigger phrase" to help it navigate its diverse training data. ### Out-of-Scope Use - **Fact-Checking:** This model should **not** be used as a primary source of historical or legal facts. - **High-Stakes Advice:** Do not use for legal or medical decision-making. ## Bias, Risks, and Limitations - **Temporal Hallucinations:** Because the model was trained on 14 vastly different time periods simultaneously, it frequently mixes historical facts (e.g., placing the 1944 Battle of the Bulge in the 1898 Spanish-American War). - **Small Context Window:** As a GPT-2 base model, it has a limited context window and may lose coherence in long-form generation. ### Recommendations Users should use Beam Search ($num\_beams \ge 3$) and a high Repetition Penalty ($1.5+$) to prevent the model from entering logic loops or mixing unrelated datasets. ## How to Get Started with the Model Use the code below to get started with the model. ```py from transformers import pipeline generator = pipeline("text-generation", model="Fu01978/gpt2-mega-wiki-logic") prompt = "In the Python programming language, a decorator is" print(generator(prompt, max_new_tokens=50, repetition_penalty=1.5, num_beams=5)[0]['generated_text']) ``` ## Training Details ### Training Data The model was trained on a combined pool of 123233 rows. ### Training Metrics | Step | Training Loss | | :--- | :-----------: | | 100 | 3.0273 | | 200 | 2.8873 | | 300 | 2.7780 | | 400 | 2.8189 | | 500 | 2.8553 | ### Training Procedure #### Training Hyperparameters - **Steps:** 500 - **Batch Size:** 4 (with 4 Gradient Accumulation steps) - **Learning Rate:** 4e-5 - **Precision:** fp16 (Mixed Precision) - **Optimizer:** AdamW with Weight Decay (0.01) ## Technical Specifications ### Model Architecture and Objective - **Architecture:** Standard GPT-2 Transformer Decoder. - **Objective:** Causal Language Modeling (Next-token prediction). - **Parameters:** 124 Million.