Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,70 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
|
| 2 |
+
---
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
license: mit
|
| 6 |
+
base_model: gpt2
|
| 7 |
+
tags:
|
| 8 |
+
- text-generation
|
| 9 |
+
- gpt2
|
| 10 |
+
- cosmopedia
|
| 11 |
+
- educational
|
| 12 |
+
- synthetic-data
|
| 13 |
+
model_name: CosmoGPT2-Mini
|
| 14 |
+
datasets:
|
| 15 |
+
- Dhiraj45/cosmopedia-v2
|
| 16 |
+
metrics:
|
| 17 |
+
- loss
|
| 18 |
+
---
|
| 19 |
+
|
| 20 |
+
# CosmoGPT2-Mini 🚀
|
| 21 |
+
|
| 22 |
+
## Description
|
| 23 |
+
**CosmoGPT2-Mini** is a fine-tuned version of the classic [GPT-2](https://huggingface.co/gpt2) model. It has been trained on a subset of the **Cosmopedia v2** dataset, which consists of synthetic textbooks, blog posts, and educational content.
|
| 24 |
+
|
| 25 |
+
The goal of this model is to adapt GPT-2's capabilities to generate more informative and educational-style text compared to the base model.
|
| 26 |
+
|
| 27 |
+
## Model Details
|
| 28 |
+
- **Developed by:** [younes MA]
|
| 29 |
+
- **Model type:** Causal Language Model
|
| 30 |
+
- **Base Model:** GPT-2 (Small)
|
| 31 |
+
- **Language:** English
|
| 32 |
+
- **Training Precision:** `bfloat16` (optimized for stability and speed)
|
| 33 |
+
|
| 34 |
+
## Training Data
|
| 35 |
+
The model was trained on **30,000 samples** from the `Dhiraj45/cosmopedia-v2` dataset. This dataset is known for its high-quality synthetic data covering various academic and general knowledge topics.
|
| 36 |
+
|
| 37 |
+
## Training Hyperparameters
|
| 38 |
+
- **Epochs:** 1
|
| 39 |
+
- **Max Steps:** 1000
|
| 40 |
+
- **Batch Size:** 2 (with Gradient Accumulation Steps: 8)
|
| 41 |
+
- **Learning Rate:** 5e-5
|
| 42 |
+
- **Optimizer:** AdamW (fused)
|
| 43 |
+
- **Precision:** `bf16`
|
| 44 |
+
- **Max Sequence Length:** 512 tokens
|
| 45 |
+
|
| 46 |
+
## How to use
|
| 47 |
+
You can use this model directly with a pipeline for text generation:
|
| 48 |
+
|
| 49 |
+
```python
|
| 50 |
+
from transformers import pipeline
|
| 51 |
+
|
| 52 |
+
generator = pipeline("text-generation", model="iko-01//CosmoGPT2-Mini")
|
| 53 |
+
prompt = "The concept of gravity can be explained as"
|
| 54 |
+
result = generator(prompt, max_length=100, num_return_sequences=1)
|
| 55 |
+
|
| 56 |
+
print(result[0]['generated_text'])
|
| 57 |
+
```
|
| 58 |
+
|
| 59 |
+
## Intended Use & Limitations
|
| 60 |
+
- **Intended Use:** Experimental purposes, educational text generation, and studying fine-tuning on synthetic data.
|
| 61 |
+
- **Limitations:** Since this is a small version (GPT-2) and trained on a limited subset (30k samples), it may still generate hallucinations or repetitive text. It is not intended for production-level academic advice.
|
| 62 |
+
|
| 63 |
+
## Training Results
|
| 64 |
+
The model was trained on a T4 GPU (or equivalent) using optimized settings.
|
| 65 |
+
- **Final Training Loss:** [2.837890]
|
| 66 |
+
- **Evaluation Loss:** [2.686130]
|
| 67 |
+
|
| 68 |
+
---
|
| 69 |
+
**Note:** This model is part of a training experiment using the Cosmopedia dataset.
|
| 70 |
+
```
|