File size: 2,304 Bytes
ae0c1ea
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70

---
language:
- en
license: mit
base_model: gpt2
tags:
- text-generation
- gpt2
- cosmopedia
- educational
- synthetic-data
model_name: CosmoGPT2-Mini
datasets:
- Dhiraj45/cosmopedia-v2
metrics:
- loss
---

# CosmoGPT2-Mini 🚀

## Description
**CosmoGPT2-Mini** is a fine-tuned version of the classic [GPT-2](https://huggingface.co/gpt2) model. It has been trained on a subset of the **Cosmopedia v2** dataset, which consists of synthetic textbooks, blog posts, and educational content. 

The goal of this model is to adapt GPT-2's capabilities to generate more informative and educational-style text compared to the base model.

## Model Details
- **Developed by:** [younes MA]
- **Model type:** Causal Language Model
- **Base Model:** GPT-2 (Small)
- **Language:** English
- **Training Precision:** `bfloat16` (optimized for stability and speed)

## Training Data
The model was trained on **30,000 samples** from the `Dhiraj45/cosmopedia-v2` dataset. This dataset is known for its high-quality synthetic data covering various academic and general knowledge topics.

## Training Hyperparameters
- **Epochs:** 1
- **Max Steps:** 1000
- **Batch Size:** 2 (with Gradient Accumulation Steps: 8)
- **Learning Rate:** 5e-5
- **Optimizer:** AdamW (fused)
- **Precision:** `bf16`
- **Max Sequence Length:** 512 tokens

## How to use
You can use this model directly with a pipeline for text generation:

```python
from transformers import pipeline

generator = pipeline("text-generation", model="iko-01//CosmoGPT2-Mini")
prompt = "The concept of gravity can be explained as"
result = generator(prompt, max_length=100, num_return_sequences=1)

print(result[0]['generated_text'])
```

## Intended Use & Limitations
- **Intended Use:** Experimental purposes, educational text generation, and studying fine-tuning on synthetic data.
- **Limitations:** Since this is a small version (GPT-2) and trained on a limited subset (30k samples), it may still generate hallucinations or repetitive text. It is not intended for production-level academic advice.

## Training Results
The model was trained on a T4 GPU (or equivalent) using optimized settings. 
- **Final Training Loss:** [2.837890]
- **Evaluation Loss:** [2.686130]

---
**Note:** This model is part of a training experiment using the Cosmopedia dataset.
```