import streamlit as st from pathlib import Path from layout import gray_container, tool_container, key_concept, research_question def render(): """Module 3: OCR Technology and Historical Documents""" st.title("Module 3: OCR Technology and Historical Documents") col1, col2 = st.columns([1, 1]) with col1: traditional_content = """

Traditional OCR Approaches

  1. Pattern Matching: Early OCR compared characters to templates
  2. Feature Extraction: Identifying key features of characters
  3. Statistical Models: Using probabilities to improve recognition
""" gray_container(traditional_content) modern_content = """

Modern AI-Enhanced OCR

  1. Neural Networks: Deep learning models trained on vast datasets
  2. Computer Vision: Advanced image processing techniques
  3. Language Models: Contextual understanding to resolve ambiguities
  4. Multimodal Models: Integration of text, layout, and visual understanding
""" gray_container(modern_content) with col2: challenges_content = """

Challenges with Historical Documents

Historical materials present unique difficulties:

""" gray_container(challenges_content) # Display OCR processing diagram st.image("https://cdn.dribbble.com/users/412119/screenshots/16353886/media/82e593c60a5e4d460db917236eab6ece.jpg", caption="OCR processing layers") # Key concept section concept_content = """

Vision-Enhanced OCR

Modern OCR systems like those based on Mistral-7B-Vision combine:

  1. Image understanding capabilities to process the visual aspects
  2. Text recognition to extract characters accurately
  3. Layout analysis to understand structure
  4. Contextual language processing for improved accuracy

This multimodal approach dramatically improves OCR results on historical documents compared to traditional OCR.

""" key_concept(concept_content) # Technical details in a tool container tech_content = """

Technical Evolution of OCR

Traditional OCR Pipeline:

  1. Preprocessing (binarization, noise removal)
  2. Layout analysis (segmentation)
  3. Character recognition (pattern matching)
  4. Post-processing (spell checking)

Modern LLM-Vision Pipeline:

  1. Image normalization
  2. Image embedding via vision encoder
  3. Integration with language model
  4. Joint inference across modalities
  5. Structured extraction of information
""" tool_container(tech_content) # Research question research_content = """

Consider This:

How might the capabilities of vision-language models change our approach to digitizing historical archives?

""" research_question(research_content) # Display history if available if 'processing_history' in st.session_state and st.session_state.processing_history: with st.expander("Your OCR Processing History"): st.markdown("You've already processed the following documents:") for item in st.session_state.processing_history: st.markdown(f"**{item['fileName']}**") col1, col2 = st.columns(2) with col1: st.write(f"**Topics:** {', '.join(item['result'].get('topics', ['Unknown']))}") with col2: st.write(f"**Vision model used:** {'Yes' if item['useVision'] else 'No'}")