language:
- en
license: mit
base_model: gpt2
tags:
- text-generation
- gpt2
- cosmopedia
- educational
- synthetic-data
model_name: CosmoGPT2-Mini
datasets:
- Dhiraj45/cosmopedia-v2
metrics:
- loss
CosmoGPT2-Mini ๐
Description
CosmoGPT2-Mini is a fine-tuned version of the classic GPT-2 model. It has been trained on a subset of the Cosmopedia v2 dataset, which consists of synthetic textbooks, blog posts, and educational content.
The goal of this model is to adapt GPT-2's capabilities to generate more informative and educational-style text compared to the base model.
Model Details
- Developed by: [younes MA]
- Model type: Causal Language Model
- Base Model: GPT-2 (Small)
- Language: English
- Training Precision:
bfloat16(optimized for stability and speed)
Training Data
The model was trained on 30,000 samples from the Dhiraj45/cosmopedia-v2 dataset. This dataset is known for its high-quality synthetic data covering various academic and general knowledge topics.
Training Hyperparameters
- Epochs: 1
- Max Steps: 1000
- Batch Size: 2 (with Gradient Accumulation Steps: 8)
- Learning Rate: 5e-5
- Optimizer: AdamW (fused)
- Precision:
bf16 - Max Sequence Length: 512 tokens
How to use
You can use this model directly with a pipeline for text generation:
from transformers import pipeline
generator = pipeline("text-generation", model="iko-01//CosmoGPT2-Mini")
prompt = "The concept of gravity can be explained as"
result = generator(prompt, max_length=100, num_return_sequences=1)
print(result[0]['generated_text'])
Intended Use & Limitations
- Intended Use: Experimental purposes, educational text generation, and studying fine-tuning on synthetic data.
- Limitations: Since this is a small version (GPT-2) and trained on a limited subset (30k samples), it may still generate hallucinations or repetitive text. It is not intended for production-level academic advice.
Training Results
The model was trained on a T4 GPU (or equivalent) using optimized settings.
- Final Training Loss: [2.837890]
- Evaluation Loss: [2.686130]
Note: This model is part of a training experiment using the Cosmopedia dataset. ```