|
|
| --- |
| language: |
| - en |
| license: mit |
| base_model: gpt2 |
| tags: |
| - text-generation |
| - gpt2 |
| - cosmopedia |
| - educational |
| - synthetic-data |
| model_name: CosmoGPT2-Mini |
| datasets: |
| - Dhiraj45/cosmopedia-v2 |
| metrics: |
| - loss |
| --- |
| |
| # CosmoGPT2-Mini 🚀 |
|
|
| ## Description |
| **CosmoGPT2-Mini** is a fine-tuned version of the classic [GPT-2](https://huggingface.co/gpt2) model. It has been trained on a subset of the **Cosmopedia v2** dataset, which consists of synthetic textbooks, blog posts, and educational content. |
|
|
| The goal of this model is to adapt GPT-2's capabilities to generate more informative and educational-style text compared to the base model. |
|
|
| ## Model Details |
| - **Developed by:** [younes MA] |
| - **Model type:** Causal Language Model |
| - **Base Model:** GPT-2 (Small) |
| - **Language:** English |
| - **Training Precision:** `bfloat16` (optimized for stability and speed) |
|
|
| ## Training Data |
| The model was trained on **30,000 samples** from the `Dhiraj45/cosmopedia-v2` dataset. This dataset is known for its high-quality synthetic data covering various academic and general knowledge topics. |
|
|
| ## Training Hyperparameters |
| - **Epochs:** 1 |
| - **Max Steps:** 1000 |
| - **Batch Size:** 2 (with Gradient Accumulation Steps: 8) |
| - **Learning Rate:** 5e-5 |
| - **Optimizer:** AdamW (fused) |
| - **Precision:** `bf16` |
| - **Max Sequence Length:** 512 tokens |
|
|
| ## How to use |
| You can use this model directly with a pipeline for text generation: |
|
|
| ```python |
| from transformers import pipeline |
| |
| generator = pipeline("text-generation", model="iko-01//CosmoGPT2-Mini") |
| prompt = "The concept of gravity can be explained as" |
| result = generator(prompt, max_length=100, num_return_sequences=1) |
| |
| print(result[0]['generated_text']) |
| ``` |
|
|
| ## Intended Use & Limitations |
| - **Intended Use:** Experimental purposes, educational text generation, and studying fine-tuning on synthetic data. |
| - **Limitations:** Since this is a small version (GPT-2) and trained on a limited subset (30k samples), it may still generate hallucinations or repetitive text. It is not intended for production-level academic advice. |
|
|
| ## Training Results |
| The model was trained on a T4 GPU (or equivalent) using optimized settings. |
| - **Final Training Loss:** [2.837890] |
| - **Evaluation Loss:** [2.686130] |
|
|
| --- |
| **Note:** This model is part of a training experiment using the Cosmopedia dataset. |
| ``` |