Manoghn
/

tinyllama-lesson-synthesizer

 - tinyllama
 - summarization
 - question-answering
+---
+# Manoghn/tinyllama-lesson-synthesizer
+## 📚 Model Description
+This repository hosts `Manoghn/tinyllama-lesson-synthesizer`, a fine-tuned **TinyLlama/TinyLlama-1.1B-Chat-v1.0** model designed to generate comprehensive and engaging educational lessons. It's a key component of the larger SynthAI project, which aims to create multi-modal learning content including lessons, images, quizzes, and audio narration.
+The model has been specifically adapted using **LoRA (Low-Rank Adaptation)** to excel at generating structured, informative text suitable for educational purposes across various domains.
+---
+## 🎯 Objective
+The primary objective of this fine-tuned model is to **automatically generate detailed educational lessons** on diverse topics. By providing a topic, the model produces well-structured, Markdown-formatted content, serving as a foundation for broader educational material synthesis.
+---
+## 📊 Training Data
+The model was fine-tuned on a custom-curated dataset of **60 educational lessons**.
+* **Data Collection:** Lessons were generated using the **Llama-3.1-8B-Instruct** model via the Hugging Face Inference Client. Each lesson was crafted in response to a detailed prompt instructing the model to act as an "expert educational content creator."
+* **Content Structure:** The generated lessons adhered to a specific Markdown format, including:
+    * A descriptive level-1 heading.
+    * An introduction explaining the topic's importance.
+    * 3-5 key concepts with clear explanations.
+    * Real-world applications or examples.
+    * Practical examples, formulas, or code snippets (if relevant).
+    * A concise summary.
+* **Domains Covered:** The dataset spans four educational domains:
+    * Science (e.g., Photosynthesis, Newton's Laws of Motion)
+    * Mathematics (e.g., Pythagorean Theorem, Quadratic Equations)
+    * Computer Science (e.g., Binary Number System, Data Structures Overview)
+    * Humanities (e.g., Renaissance Art Period, World War II Causes)
+* **Dataset Size:** The final dataset comprised 60 high-quality lesson examples, split into training (70%), validation (15%), and test (15%) sets.
+---
+## ⚙️ Fine-tuning Methodology
+The `Manoghn/tinyllama-lesson-synthesizer` model was fine-tuned from `TinyLlama/TinyLlama-1.1B-Chat-v1.0` using Parameter-Efficient Fine-tuning (PEFT) with LoRA.
+* **Base Model:** `TinyLlama/TinyLlama-1.1B-Chat-v1.0`
+* **Quantization:** The base model was loaded with **8-bit quantization** using `BitsAndBytesConfig` to reduce memory footprint and enable training on resource-constrained environments (Colab free tier T4 GPU).
+* **LoRA Configuration:**
+    * `r=8`: LoRA rank
+    * `lora_alpha=32`: Scaling factor
+    * `target_modules=["q_proj", "v_proj"]`: LoRA adapters applied to query and value projection layers.
+    * `lora_dropout=0.05`
+    * `bias="none"`
+    * `task_type=TaskType.CAUSAL_LM`
+* **Training Parameters (`transformers.TrainingArguments`):**
+    * `output_dir`: `/content/drive/MyDrive/genai_synthesizer/results`
+    * `per_device_train_batch_size=1`
+    * `per_device_eval_batch_size=1`
+    * `learning_rate=2e-4`
+    * `num_train_epochs=1`
+    * `logging_steps=10`
+    * `fp16=True`
+    * `report_to="none"`
+* **Training Environment:** The fine-tuning was performed on a **Google Colab free tier T4 GPU**.
 ---