Historical OCR Workshop

Unlock the potential of historical documents with modern OCR technology

Workshop Overview

This interactive workshop explores the application of OCR technology to historical documents, combining theoretical understanding with practical experiences. Designed for historians, archivists, and digital humanities scholars, it offers both conceptual frameworks and hands-on skills.

What is OCR?

Optical Character Recognition (OCR) technology enables computers to extract text from images and documents. Modern OCR uses AI vision models to understand both the text and its visual context, making it powerful for historical research and digital humanities.

For Historians:

How might OCR technology transform our access to and interpretation of historical documents? What new research questions become possible when large archives become machine-readable?

Conceptual Understanding

- Text-image relationships in historical documents - Evolution of OCR technology - AI vision models for document analysis - Historical typography challenges

Methodological Approaches

- Critical frameworks for OCR in historical research - Hybrid computational-traditional methods - Error analysis and interpretation - Contextual reading strategies

Practical Skills

- Processing historical documents with OCR - Analyzing and structuring extracted information - Integrating OCR into research workflows - Building searchable archives

Module {i}

{module_names[i-1]}

Module {i} of the historical OCR workshop.

Module {i}

{module_names[i-1]}

Module {i} of the historical OCR workshop.

"The digital turn in historical research is not just about converting analog to digital; it's about transforming how we access, analyze, and interpret the past."

— Dr. Jane Winters, Professor of Digital Humanities

', unsafe_allow_html=True) col1, col2, col3 = st.columns([1, 2, 1]) with col2: if st.button("Begin Workshop Journey", key="start_workshop", type="primary", use_container_width=True): st.session_state.workshop_started = True st.rerun() st.markdown('

No installation required • Start immediately

', unsafe_allow_html=True) st.markdown('

", unsafe_allow_html=True) # Show a progress indicator st.markdown(f"

Your Progress: Module {current_module} of 6

", unsafe_allow_html=True) st.progress(current_module / 6) # Module navigation buttons st.markdown("

Modules

", unsafe_allow_html=True) for i, name in enumerate(module_names, 1): btn_style = "primary" if i == current_module else "secondary" if st.button(f"{i}: {name}", key=f"nav_module_{i}", type=btn_style, use_container_width=True): st.session_state.current_module = i st.rerun() # About the workshop in a collapsible section with st.expander("About the Workshop"): st.markdown(""" This interactive workshop explores OCR technology for historical documents. **How to use this workshop:** 1. Navigate through modules sequentially 2. Expand content sections to read more 3. Try the interactive OCR experiment 4. Reflect on research questions For help or more information, use the reference materials in Module 6. """) # Processing history if available if st.session_state.processing_history: with st.expander("Your Activity"): st.markdown(f"Documents processed: {len(st.session_state.processing_history)}", unsafe_allow_html=True) # Show the most recent document processed latest = st.session_state.processing_history[-1] st.markdown(f"""

Latest document: {latest['fileName']}
Processed with {' vision model' if latest['useVision'] else ' basic OCR'}

""", unsafe_allow_html=True) # Render the current module content using the page wrapper page_wrapper(module.render, current_module) # At the bottom of the page, create the hidden navigation buttons for the fixed navigation bar if st.session_state.workshop_started: # Previous navigation button (hidden, activated by the fixed nav) if st.session_state.current_module > 1: if st.button("←", key=f"nav_prev_{st.session_state.current_module-1}", label_visibility="collapsed"): st.session_state.current_module -= 1 st.rerun() # Next navigation button (hidden, activated by the fixed nav) if st.session_state.current_module < 6: if st.button("→", key=f"nav_next_{st.session_state.current_module+1}", label_visibility="collapsed"): st.session_state.current_module += 1 st.rerun() # Module navigation dots (hidden, activated by the fixed nav) for i in range(1, 7): if st.button(f"{i}", key=f"nav_dot_{i}", label_visibility="collapsed"): st.session_state.current_module = i st.rerun()