Update README.md

ae0c1ea verified 1 day ago

2.3 kB

language:
  - en
license: mit
base_model: gpt2
tags:
  - text-generation
  - gpt2
  - cosmopedia
  - educational
  - synthetic-data
model_name: CosmoGPT2-Mini
datasets:
  - Dhiraj45/cosmopedia-v2
metrics:
  - loss

CosmoGPT2-Mini 🚀

Description

CosmoGPT2-Mini is a fine-tuned version of the classic GPT-2 model. It has been trained on a subset of the Cosmopedia v2 dataset, which consists of synthetic textbooks, blog posts, and educational content.

The goal of this model is to adapt GPT-2's capabilities to generate more informative and educational-style text compared to the base model.

Model Details

Developed by: [younes MA]
Model type: Causal Language Model
Base Model: GPT-2 (Small)
Language: English
Training Precision: bfloat16 (optimized for stability and speed)

Training Data

The model was trained on 30,000 samples from the Dhiraj45/cosmopedia-v2 dataset. This dataset is known for its high-quality synthetic data covering various academic and general knowledge topics.

Training Hyperparameters

Epochs: 1
Max Steps: 1000
Batch Size: 2 (with Gradient Accumulation Steps: 8)
Learning Rate: 5e-5
Optimizer: AdamW (fused)
Precision: bf16
Max Sequence Length: 512 tokens

How to use

You can use this model directly with a pipeline for text generation:

from transformers import pipeline

generator = pipeline("text-generation", model="iko-01//CosmoGPT2-Mini")
prompt = "The concept of gravity can be explained as"
result = generator(prompt, max_length=100, num_return_sequences=1)

print(result[0]['generated_text'])

Intended Use & Limitations

Intended Use: Experimental purposes, educational text generation, and studying fine-tuning on synthetic data.
Limitations: Since this is a small version (GPT-2) and trained on a limited subset (30k samples), it may still generate hallucinations or repetitive text. It is not intended for production-level academic advice.

Training Results

The model was trained on a T4 GPU (or equivalent) using optimized settings.

Final Training Loss: [2.837890]
Evaluation Loss: [2.686130]

Note: This model is part of a training experiment using the Cosmopedia dataset. ```