iko-01
/

CosmoGPT2-Mini

Text Generation

Model card Files Files and versions

CosmoGPT2-Mini / README.md

iko-01's picture

Update README.md

ae0c1ea verified 1 day ago

|

history blame contribute delete

2.3 kB


	---
	language:
	- en
	license: mit
	base_model: gpt2
	tags:
	- text-generation
	- gpt2
	- cosmopedia
	- educational
	- synthetic-data
	model_name: CosmoGPT2-Mini
	datasets:
	- Dhiraj45/cosmopedia-v2
	metrics:
	- loss
	---

	# CosmoGPT2-Mini 🚀

	## Description
	CosmoGPT2-Mini is a fine-tuned version of the classic [GPT-2](https://huggingface.co/gpt2) model. It has been trained on a subset of the Cosmopedia v2 dataset, which consists of synthetic textbooks, blog posts, and educational content.

	The goal of this model is to adapt GPT-2's capabilities to generate more informative and educational-style text compared to the base model.

	## Model Details
	- Developed by: [younes MA]
	- Model type: Causal Language Model
	- Base Model: GPT-2 (Small)
	- Language: English
	- Training Precision: `bfloat16` (optimized for stability and speed)

	## Training Data
	The model was trained on 30,000 samples from the `Dhiraj45/cosmopedia-v2` dataset. This dataset is known for its high-quality synthetic data covering various academic and general knowledge topics.

	## Training Hyperparameters
	- Epochs: 1
	- Max Steps: 1000
	- Batch Size: 2 (with Gradient Accumulation Steps: 8)
	- Learning Rate: 5e-5
	- Optimizer: AdamW (fused)
	- Precision: `bf16`
	- Max Sequence Length: 512 tokens

	## How to use
	You can use this model directly with a pipeline for text generation:

	```python
	from transformers import pipeline

	generator = pipeline("text-generation", model="iko-01//CosmoGPT2-Mini")
	prompt = "The concept of gravity can be explained as"
	result = generator(prompt, max_length=100, num_return_sequences=1)

	print(result[0]['generated_text'])
	```

	## Intended Use & Limitations
	- Intended Use: Experimental purposes, educational text generation, and studying fine-tuning on synthetic data.
	- Limitations: Since this is a small version (GPT-2) and trained on a limited subset (30k samples), it may still generate hallucinations or repetitive text. It is not intended for production-level academic advice.

	## Training Results
	The model was trained on a T4 GPU (or equivalent) using optimized settings.
	- Final Training Loss: [2.837890]
	- Evaluation Loss: [2.686130]

	---
	Note: This model is part of a training experiment using the Cosmopedia dataset.
	```