CosmoGPT2-Mini / README.md
iko-01's picture
Update README.md
ae0c1ea verified
metadata
language:
  - en
license: mit
base_model: gpt2
tags:
  - text-generation
  - gpt2
  - cosmopedia
  - educational
  - synthetic-data
model_name: CosmoGPT2-Mini
datasets:
  - Dhiraj45/cosmopedia-v2
metrics:
  - loss

CosmoGPT2-Mini ๐Ÿš€

Description

CosmoGPT2-Mini is a fine-tuned version of the classic GPT-2 model. It has been trained on a subset of the Cosmopedia v2 dataset, which consists of synthetic textbooks, blog posts, and educational content.

The goal of this model is to adapt GPT-2's capabilities to generate more informative and educational-style text compared to the base model.

Model Details

  • Developed by: [younes MA]
  • Model type: Causal Language Model
  • Base Model: GPT-2 (Small)
  • Language: English
  • Training Precision: bfloat16 (optimized for stability and speed)

Training Data

The model was trained on 30,000 samples from the Dhiraj45/cosmopedia-v2 dataset. This dataset is known for its high-quality synthetic data covering various academic and general knowledge topics.

Training Hyperparameters

  • Epochs: 1
  • Max Steps: 1000
  • Batch Size: 2 (with Gradient Accumulation Steps: 8)
  • Learning Rate: 5e-5
  • Optimizer: AdamW (fused)
  • Precision: bf16
  • Max Sequence Length: 512 tokens

How to use

You can use this model directly with a pipeline for text generation:

from transformers import pipeline

generator = pipeline("text-generation", model="iko-01//CosmoGPT2-Mini")
prompt = "The concept of gravity can be explained as"
result = generator(prompt, max_length=100, num_return_sequences=1)

print(result[0]['generated_text'])

Intended Use & Limitations

  • Intended Use: Experimental purposes, educational text generation, and studying fine-tuning on synthetic data.
  • Limitations: Since this is a small version (GPT-2) and trained on a limited subset (30k samples), it may still generate hallucinations or repetitive text. It is not intended for production-level academic advice.

Training Results

The model was trained on a T4 GPU (or equivalent) using optimized settings.

  • Final Training Loss: [2.837890]
  • Evaluation Loss: [2.686130]

Note: This model is part of a training experiment using the Cosmopedia dataset. ```