iko-01
/

CosmoGPT2-Mini

Text Generation

Model card Files Files and versions

iko-01 commited on 15 days ago

Commit

ae0c1ea

·

verified ·

1 Parent(s): 6cdd990

Update README.md

Files changed (1) hide show

README.md +70 -3

README.md CHANGED Viewed

@@ -1,3 +1,70 @@
----
-license: mit
----

+---
+language:
+- en
+license: mit
+base_model: gpt2
+tags:
+- text-generation
+- gpt2
+- cosmopedia
+- educational
+- synthetic-data
+model_name: CosmoGPT2-Mini
+datasets:
+- Dhiraj45/cosmopedia-v2
+metrics:
+- loss
+---
+# CosmoGPT2-Mini 🚀
+## Description
+**CosmoGPT2-Mini** is a fine-tuned version of the classic [GPT-2](https://huggingface.co/gpt2) model. It has been trained on a subset of the **Cosmopedia v2** dataset, which consists of synthetic textbooks, blog posts, and educational content.
+The goal of this model is to adapt GPT-2's capabilities to generate more informative and educational-style text compared to the base model.
+## Model Details
+- **Developed by:** [younes MA]
+- **Model type:** Causal Language Model
+- **Base Model:** GPT-2 (Small)
+- **Language:** English
+- **Training Precision:** `bfloat16` (optimized for stability and speed)
+## Training Data
+The model was trained on **30,000 samples** from the `Dhiraj45/cosmopedia-v2` dataset. This dataset is known for its high-quality synthetic data covering various academic and general knowledge topics.
+## Training Hyperparameters
+- **Epochs:** 1
+- **Max Steps:** 1000
+- **Batch Size:** 2 (with Gradient Accumulation Steps: 8)
+- **Learning Rate:** 5e-5
+- **Optimizer:** AdamW (fused)
+- **Precision:** `bf16`
+- **Max Sequence Length:** 512 tokens
+## How to use
+You can use this model directly with a pipeline for text generation:
+```python
+from transformers import pipeline
+generator = pipeline("text-generation", model="iko-01//CosmoGPT2-Mini")
+prompt = "The concept of gravity can be explained as"
+result = generator(prompt, max_length=100, num_return_sequences=1)
+print(result[0]['generated_text'])
+```
+## Intended Use & Limitations
+- **Intended Use:** Experimental purposes, educational text generation, and studying fine-tuning on synthetic data.
+- **Limitations:** Since this is a small version (GPT-2) and trained on a limited subset (30k samples), it may still generate hallucinations or repetitive text. It is not intended for production-level academic advice.
+## Training Results
+The model was trained on a T4 GPU (or equivalent) using optimized settings.
+- **Final Training Loss:** [2.837890]
+- **Evaluation Loss:** [2.686130]
+---
+**Note:** This model is part of a training experiment using the Cosmopedia dataset.
+```