CosmoGPT2-Mini / README.md
iko-01's picture
Update README.md
ae0c1ea verified
---
language:
- en
license: mit
base_model: gpt2
tags:
- text-generation
- gpt2
- cosmopedia
- educational
- synthetic-data
model_name: CosmoGPT2-Mini
datasets:
- Dhiraj45/cosmopedia-v2
metrics:
- loss
---
# CosmoGPT2-Mini 🚀
## Description
**CosmoGPT2-Mini** is a fine-tuned version of the classic [GPT-2](https://huggingface.co/gpt2) model. It has been trained on a subset of the **Cosmopedia v2** dataset, which consists of synthetic textbooks, blog posts, and educational content.
The goal of this model is to adapt GPT-2's capabilities to generate more informative and educational-style text compared to the base model.
## Model Details
- **Developed by:** [younes MA]
- **Model type:** Causal Language Model
- **Base Model:** GPT-2 (Small)
- **Language:** English
- **Training Precision:** `bfloat16` (optimized for stability and speed)
## Training Data
The model was trained on **30,000 samples** from the `Dhiraj45/cosmopedia-v2` dataset. This dataset is known for its high-quality synthetic data covering various academic and general knowledge topics.
## Training Hyperparameters
- **Epochs:** 1
- **Max Steps:** 1000
- **Batch Size:** 2 (with Gradient Accumulation Steps: 8)
- **Learning Rate:** 5e-5
- **Optimizer:** AdamW (fused)
- **Precision:** `bf16`
- **Max Sequence Length:** 512 tokens
## How to use
You can use this model directly with a pipeline for text generation:
```python
from transformers import pipeline
generator = pipeline("text-generation", model="iko-01//CosmoGPT2-Mini")
prompt = "The concept of gravity can be explained as"
result = generator(prompt, max_length=100, num_return_sequences=1)
print(result[0]['generated_text'])
```
## Intended Use & Limitations
- **Intended Use:** Experimental purposes, educational text generation, and studying fine-tuning on synthetic data.
- **Limitations:** Since this is a small version (GPT-2) and trained on a limited subset (30k samples), it may still generate hallucinations or repetitive text. It is not intended for production-level academic advice.
## Training Results
The model was trained on a T4 GPU (or equivalent) using optimized settings.
- **Final Training Loss:** [2.837890]
- **Evaluation Loss:** [2.686130]
---
**Note:** This model is part of a training experiment using the Cosmopedia dataset.
```