Spaces:
Running
Running
Summarization Module ๐
Responsibility
This module handles text summarization and conversion to structured study notes.
Functionality
- Receive transcribed text from videos.
- Use Groq (Llama-3.3-70b-versatile) to analyze text and generate structured JSON notes.
- Produce clean Markdown output with:
- Source & Duration header
- Overall Summary
- Chronological Timeline (3-7 segments with Key Insight + Why It Matters)
- Conclusion
Files
1. schemas.py
- Purpose: Single source of truth for all Pydantic data models.
- Key Classes:
SummarySchemaโ Full structured output (title, detected_language, summary, segments, conclusion, topics).SegmentSchemaโ A timeline section (title, summary, key_insight, why_it_matters).
2. note_generator.py
- Purpose: Generate notes using Groq AI with strict JSON enforcement.
- Main Class:
NoteGenerator - Key Methods:
generateSummary(transcript, title)โ Generates structured JSON study notes.format_notes_to_markdown(json_notes)โ Converts JSON to clean Markdown.format_final_notes(notes, title, url, duration)โ Wraps Markdown with Source/Duration header.
3. segmenter.py
- Purpose: Split long texts into smaller segments for preprocessing.
- Main Class:
TranscriptSegmenter - Key Methods:
segment_by_time()โ Split by time intervals.clean_text()โ Remove filler words.
JSON Output Structure
{
"title": "...",
"detected_language": "English",
"summary": "Overall summary (3-5 sentences)",
"segments": [
{
"title": "Segment title",
"summary": "What this section covers",
"key_insight": "Most important point",
"why_it_matters": "Why this is valuable"
}
],
"conclusion": "Final takeaway",
"topics": ["Topic1", "Topic2"]
}
Note:
topicsis hidden metadata โ not rendered in markdown, used by downstream modules only.
Markdown Output Order
- Source โ video URL
- Duration โ video length
- Overall Summary โ one concise summary
- Timeline โ chronological segments (3-7), each with Key Insight + Why It Matters
- Conclusion โ final takeaway
Labels (Localized)
| Key | English | Arabic |
|---|---|---|
| source | Source | ุงูู ุตุฏุฑ |
| duration | Duration | ุงูู ุฏุฉ |
| summary | Overall Summary | ุงูู ูุฎุต ุงูุนุงู |
| timeline | Timeline | ุงูุชุณูุณู ุงูุฒู ูู |
| insight | Key Insight | ุฃูู ููุทุฉ |
| why | Why It Matters | ูู ุงุฐุง ููู ุ |
| conclusion | Conclusion | ุงูุฎูุงุตุฉ |
Testing
from src.summarization.note_generator import NoteGenerator
generator = NoteGenerator()
transcript = "Here is the complete video transcript..."
title = "Introduction to Python"
# Generate notes
summary_json = generator.generateSummary(transcript, title)
notes_md = generator.format_notes_to_markdown(summary_json)
print(notes_md)
Libraries Used
groqโ Communicate with Groq API (Llama-3.3-70b-versatile).pydanticโ Data validation and schema enforcement.