Spaces:

xTHExBEASTx
/

pdf-summarizer

Sleeping

aladhefafalquran commited on Dec 26, 2025

Commit

b79eb66

1 Parent(s): 2ae8413

MAJOR UPGRADE: Enhanced Study Guide Generator

Massive improvements to output quality and user experience:

✨ NEW FEATURES:
- 3 Detail Levels: Concise / Detailed / Very Detailed
- Smart sentence-boundary chunking (no mid-sentence cuts)
- Real-time progress updates with emojis
- Bullet point formatting for easy scanning
- Text cleaning (removes page numbers, fixes hyphenation)
- Better gr.Blocks UI with organized layout

📊 QUALITY IMPROVEMENTS:
- Intelligent text overlap between chunks for context
- Longer outputs: up to 500 words per section (Very Detailed)
- Extract key points as bullet lists
- Professional markdown formatting
- Document statistics (pages, words, sections)

📚 STUDY FEATURES:
- Comprehensive header with document info
- Organized sections with clear numbering
- Study synthesis section with tips
- How to use this guide instructions
- Study strategy recommendations

🎨 UX IMPROVEMENTS:
- Live progress indicators during processing
- Clear status messages with icons
- Better error handling
- Helpful tips in sidebar
- Larger output area (25-40 lines)

This transforms the tool from a basic summarizer into a
professional study guide generator perfect for exam prep!

Files changed (1) hide show

app.py +229 -67

app.py CHANGED Viewed

@@ -1,4 +1,5 @@
 import os
 import gradio as gr
 import fitz
 from transformers import pipeline
@@ -10,102 +11,263 @@ device = 0 if torch.cuda.is_available() else -1
 summarizer = pipeline("summarization", model="facebook/bart-large-cnn", device=device)
 print("Model ready!")
-def create_study_guide(pdf_file):
     if pdf_file is None:
-        return "Please upload a PDF file."
-    # Extract text
-    text = ""
     try:
         with fitz.open(pdf_file.name) as doc:
-            for page in doc:
                 text += page.get_text()
-    except Exception as e:
-        return f"Error reading PDF: {str(e)}"
-    if not text.strip():
-        return "PDF is empty or contains no text."
-    # Better chunking - split by paragraphs/sections (larger chunks for detailed recap)
-    chunk_size = 2500  # Larger chunks for more context
-    chunk_overlap = 500  # Overlap to maintain context
-    chunks = []
-    for i in range(0, len(text), chunk_size - chunk_overlap):
-        chunk = text[i:i + chunk_size]
-        if len(chunk.strip()) > 100:  # Only process meaningful chunks
-            chunks.append(chunk)
-    # Create detailed study notes for each section
-    study_notes = []
-    total_chunks = len(chunks)
-    for i, chunk in enumerate(chunks):
-        try:
-            # Generate longer, more detailed summaries (study guide style)
-            # max_length increased significantly for detailed recap
-            result = summarizer(
-                chunk,
-                max_length=400,  # Much longer output for detailed study notes
-                min_length=150,   # Ensure substantial content
-                do_sample=False,
-                truncation=True
-            )
-            section_note = result[0]['summary_text']
-            # Add section marker for organization
-            study_notes.append(f"### Section {i+1}/{total_chunks}\n{section_note}")
-        except Exception as e:
-            # If a chunk fails, continue with others
-            continue
-    # Combine all study notes
-    if not study_notes:
-        return "Could not generate study guide. Please try a different PDF."
-    # Create formatted study guide
-    study_guide = f"""# 📚 Study Guide
-**Total Sections:** {len(study_notes)}
 ---
 """
-    study_guide += "\n\n---\n\n".join(study_notes)
-    study_guide += f"""
 ---
-## 📖 Study Tips
-- This guide covers the important points from your document
-- Review each section carefully
-- Focus on key concepts highlighted above
-- Use this as your primary study material
-*Generated study guide - {len(study_notes)} sections extracted*
 """
-    return study_guide
-# Create interface
-demo = gr.Interface(
-    fn=create_study_guide,
-    inputs=gr.File(label="📄 Upload PDF Document"),
-    outputs=gr.Textbox(
-        label="📚 Detailed Study Guide",
-        lines=30,
-        max_lines=50
-    ),
-    title="📚 PDF Study Guide Generator",
-    description="Upload a PDF to generate a detailed study guide with important points for exam preparation. The guide will be comprehensive and cover all key concepts.",
-    examples=None,
-    allow_flagging="never"
-)
 if __name__ == "__main__":
     demo.launch()

 import os
+import re
 import gradio as gr
 import fitz
 from transformers import pipeline
 summarizer = pipeline("summarization", model="facebook/bart-large-cnn", device=device)
 print("Model ready!")
+def clean_text(text):
+    """Clean and normalize extracted text."""
+    # Remove excessive whitespace
+    text = re.sub(r'\s+', ' ', text)
+    # Remove page numbers and headers/footers (common patterns)
+    text = re.sub(r'\n\d+\n', '\n', text)
+    # Fix common OCR issues
+    text = re.sub(r'(\w)-\s+(\w)', r'\1\2', text)  # Fix hyphenated words
+    return text.strip()
+def smart_chunk_text(text, chunk_size=3000, overlap=600):
+    """
+    Intelligently chunk text by trying to break at sentence boundaries.
+    """
+    sentences = re.split(r'(?<=[.!?])\s+', text)
+    chunks = []
+    current_chunk = ""
+    for sentence in sentences:
+        if len(current_chunk) + len(sentence) < chunk_size:
+            current_chunk += sentence + " "
+        else:
+            if current_chunk:
+                chunks.append(current_chunk.strip())
+            current_chunk = sentence + " "
+    if current_chunk:
+        chunks.append(current_chunk.strip())
+    # Add overlap between chunks for context
+    overlapped_chunks = []
+    for i, chunk in enumerate(chunks):
+        if i > 0 and overlap > 0:
+            # Add last part of previous chunk for context
+            prev_words = chunks[i-1].split()[-overlap:]
+            chunk = " ".join(prev_words) + " " + chunk
+        overlapped_chunks.append(chunk)
+    return overlapped_chunks
+def extract_key_points(summary_text):
+    """Format summary as bullet points for better readability."""
+    # Split into sentences
+    sentences = re.split(r'(?<=[.!?])\s+', summary_text)
+    # Create bullet points
+    bullet_points = []
+    for sentence in sentences:
+        sentence = sentence.strip()
+        if len(sentence) > 20:  # Only substantial sentences
+            bullet_points.append(f"• {sentence}")
+    return "\n".join(bullet_points)
+def create_study_guide(pdf_file, detail_level="Detailed"):
     if pdf_file is None:
+        return "⚠️ Please upload a PDF file first."
     try:
+        # Extract text
+        yield "📄 Extracting text from PDF..."
+        text = ""
         with fitz.open(pdf_file.name) as doc:
+            total_pages = len(doc)
+            for page_num, page in enumerate(doc, 1):
                 text += page.get_text()
+                if page_num % 5 == 0:
+                    yield f"📄 Reading pages... {page_num}/{total_pages}"
+        if not text.strip():
+            yield "❌ PDF is empty or contains no readable text."
+            return
+        # Clean text
+        yield "🧹 Cleaning and processing text..."
+        text = clean_text(text)
+        word_count = len(text.split())
+        # Determine parameters based on detail level
+        if detail_level == "Very Detailed":
+            chunk_size = 3500
+            max_length = 500
+            min_length = 200
+        elif detail_level == "Detailed":
+            chunk_size = 3000
+            max_length = 400
+            min_length = 150
+        else:  # Concise
+            chunk_size = 2500
+            max_length = 300
+            min_length = 100
+        # Smart chunking
+        yield "📝 Dividing into logical sections..."
+        chunks = smart_chunk_text(text, chunk_size=chunk_size, overlap=100)
+        total_chunks = len(chunks)
+        # Process each chunk
+        study_sections = []
+        for i, chunk in enumerate(chunks, 1):
+            yield f"🤖 Generating study notes for section {i}/{total_chunks}..."
+            try:
+                # Generate detailed summary
+                result = summarizer(
+                    chunk,
+                    max_length=max_length,
+                    min_length=min_length,
+                    do_sample=False,
+                    truncation=True,
+                    early_stopping=True
+                )
+                section_summary = result[0]['summary_text']
+                # Format as bullet points for readability
+                formatted_section = extract_key_points(section_summary)
+                study_sections.append({
+                    'number': i,
+                    'content': formatted_section,
+                    'raw': section_summary
+                })
+            except Exception as e:
+                continue
+        if not study_sections:
+            yield "❌ Could not generate study guide. Please try a different PDF."
+            return
+        # Create comprehensive study guide
+        yield "✨ Formatting your study guide..."
+        study_guide = f"""# 📚 COMPREHENSIVE STUDY GUIDE
+**Document:** {os.path.basename(pdf_file.name)}
+**Pages:** {total_pages}
+**Words in Original:** {word_count:,}
+**Study Sections Generated:** {len(study_sections)}
+**Detail Level:** {detail_level}
 ---
+## 📖 KEY CONCEPTS AND IMPORTANT POINTS
+"""
+        # Add all sections
+        for section in study_sections:
+            study_guide += f"""
+### 📌 Section {section['number']} of {total_chunks}
+{section['content']}
+---
 """
+        # Add synthesis section if we have multiple sections
+        if len(study_sections) > 1:
+            study_guide += """
+## 🎯 STUDY SYNTHESIS
+This study guide extracted the most important points from your document. Each section above covers key concepts you need to understand.
+### 📋 How to Use This Guide:
+1. **First Read**: Go through all sections to get an overview
+2. **Deep Dive**: Study each section carefully, one at a time
+3. **Make Connections**: Link concepts between different sections
+4. **Active Recall**: Try to remember key points without looking
+5. **Review Regularly**: Come back to this guide before your exam
+### 💡 Study Tips:
+• Focus on understanding the concepts, not memorizing word-for-word
+• Create your own examples for each key point
+• Explain these concepts to someone else to test your understanding
+• Highlight or annotate the most important points for quick review
 ---
+"""
+        study_guide += f"""
+## ✅ STUDY GUIDE COMPLETE
+**Total Sections Processed:** {len(study_sections)}/{total_chunks}
+**Coverage:** Comprehensive overview of all important topics
+*This study guide was generated using AI to extract and organize the most relevant information for your studies. Use it as your primary study material alongside your class notes.*
+---
+📚 Good luck with your studies! 🎓
 """
+        yield study_guide
+    except Exception as e:
+        yield f"❌ Error: {str(e)}\n\nPlease try uploading the PDF again."
+# Create enhanced interface
+with gr.Blocks(title="PDF Study Guide Generator", theme=gr.themes.Soft()) as demo:
+    gr.Markdown("""
+    # 📚 AI-Powered Study Guide Generator
+    Upload your PDF and get a comprehensive, well-organized study guide perfect for exam preparation!
+    """)
+    with gr.Row():
+        with gr.Column():
+            pdf_input = gr.File(
+                label="📄 Upload Your PDF Document",
+                file_types=[".pdf"]
+            )
+            detail_level = gr.Radio(
+                choices=["Concise", "Detailed", "Very Detailed"],
+                value="Detailed",
+                label="📊 Detail Level",
+                info="Choose how comprehensive you want your study guide"
+            )
+            generate_btn = gr.Button("🚀 Generate Study Guide", variant="primary", size="lg")
+            gr.Markdown("""
+            ### 💡 Tips:
+            - **Concise**: Quick overview, main points only
+            - **Detailed**: Balanced coverage (recommended)
+            - **Very Detailed**: Comprehensive, thorough notes
+            """)
+        with gr.Column():
+            output = gr.Textbox(
+                label="📚 Your Study Guide",
+                lines=25,
+                max_lines=40,
+                placeholder="Your detailed study guide will appear here...\n\nIt will include:\n• Organized sections\n• Key concepts\n• Important points\n• Study tips"
+            )
+    generate_btn.click(
+        fn=create_study_guide,
+        inputs=[pdf_input, detail_level],
+        outputs=output
+    )
+    gr.Markdown("""
+    ---
+    ### 🎯 What You'll Get:
+    - ✅ Comprehensive coverage of all important topics
+    - ✅ Organized in easy-to-study sections
+    - ✅ Bullet points for quick scanning
+    - ✅ Study tips and guidance
+    - ✅ Perfect for exam preparation
+    """)
 if __name__ == "__main__":
     demo.launch()