--- language: - en license: mit base_model: gpt2 tags: - text-generation - gpt2 - cosmopedia - educational - synthetic-data model_name: CosmoGPT2-Mini datasets: - Dhiraj45/cosmopedia-v2 metrics: - loss --- # CosmoGPT2-Mini 🚀 ## Description **CosmoGPT2-Mini** is a fine-tuned version of the classic [GPT-2](https://huggingface.co/gpt2) model. It has been trained on a subset of the **Cosmopedia v2** dataset, which consists of synthetic textbooks, blog posts, and educational content. The goal of this model is to adapt GPT-2's capabilities to generate more informative and educational-style text compared to the base model. ## Model Details - **Developed by:** [younes MA] - **Model type:** Causal Language Model - **Base Model:** GPT-2 (Small) - **Language:** English - **Training Precision:** `bfloat16` (optimized for stability and speed) ## Training Data The model was trained on **30,000 samples** from the `Dhiraj45/cosmopedia-v2` dataset. This dataset is known for its high-quality synthetic data covering various academic and general knowledge topics. ## Training Hyperparameters - **Epochs:** 1 - **Max Steps:** 1000 - **Batch Size:** 2 (with Gradient Accumulation Steps: 8) - **Learning Rate:** 5e-5 - **Optimizer:** AdamW (fused) - **Precision:** `bf16` - **Max Sequence Length:** 512 tokens ## How to use You can use this model directly with a pipeline for text generation: ```python from transformers import pipeline generator = pipeline("text-generation", model="iko-01//CosmoGPT2-Mini") prompt = "The concept of gravity can be explained as" result = generator(prompt, max_length=100, num_return_sequences=1) print(result[0]['generated_text']) ``` ## Intended Use & Limitations - **Intended Use:** Experimental purposes, educational text generation, and studying fine-tuning on synthetic data. - **Limitations:** Since this is a small version (GPT-2) and trained on a limited subset (30k samples), it may still generate hallucinations or repetitive text. It is not intended for production-level academic advice. ## Training Results The model was trained on a T4 GPU (or equivalent) using optimized settings. - **Final Training Loss:** [2.837890] - **Evaluation Loss:** [2.686130] --- **Note:** This model is part of a training experiment using the Cosmopedia dataset. ```