Eli-Iustus commited on
Commit
2013cf0
·
verified ·
1 Parent(s): 3104519

Upload 321 files

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
README.md CHANGED
@@ -1,20 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
- title: Vision
3
- emoji: 🚀
4
- colorFrom: red
5
- colorTo: red
6
- sdk: docker
7
- app_port: 8501
8
- tags:
9
- - streamlit
10
- pinned: false
11
- short_description: Streamlit template space
12
- license: mit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  ---
14
 
15
- # Welcome to Streamlit!
16
 
17
- Edit `/src/streamlit_app.py` to customize this app to your heart's desire. :heart:
 
 
 
18
 
19
- If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
20
- forums](https://discuss.streamlit.io).
 
1
+ # ✨ Ink Vision: Advanced HTR Pipeline ✨
2
+
3
+ Welcome to **Ink Vision**, a state-of-the-art Handwritten Text Recognition (HTR) system. This isn't just a simple OCR wrapper; it's a modular, **3-Step Intelligent Pipeline** designed to handle messy, real-world handwriting with precision.
4
+
5
+ ---
6
+
7
+ ## 🚀 The 3-Step Hybrid Architecture
8
+
9
+ To achieve world-class accuracy, we split the logic into three distinct, hot-swappable stages:
10
+
11
+ ### 1️⃣ Step 1: Pre-Processor (Computer Vision & DL)
12
+ Before the AI reads the text, we "clean" the image to remove noise, shadows, and artifacts.
13
+ - **OpenCV + LightCNN (Denoising)**: Denoising is done by both OpenCV and LightCNN together. OpenCV handles adaptive thresholding, binarization, Green-Channel extraction (to make red ink "pop"), and non-local means denoising. LightCNN is used for denoising alongside OpenCV—its architecture is there for image restoration (Noisy → Clean pairs); in its current form the CNN is worth nothing, but both are part of our denoising pipeline.
14
+ - **Deskewing**: Automatic rotation correction ensures slanted handwriting is perfectly leveled for the OCR engine.
15
+
16
+ ### 2️⃣ Step 2: HTR Engine (Sequence Modeling)
17
+ The core recognition happens here. We utilize a **CRAFT + ResNet + LSTM** architecture:
18
+ - **Detection**: CRAFT identifies individual character regions and groups them into words.
19
+ - **Recognition**: A Deep Residual Network extracts visual features, which are then sequenced by an LSTM to understand the flow of handwriting.
20
+ - **Ensemble Strategy**: The app runs dual-inference—one on the raw image and one on the cleaned image—to ensure no data is lost.
21
+
22
+ ### 3️⃣ Step 3: Post-Processor (NLP Semantic Judge)
23
+ Raw OCR output is often "noisy." This stage acts as a human-like editor:
24
+ - **Contextual Spellchecker**: Fixes common OCR typos while preserving original capitalization.
25
+ - **Merging Logic**: Automatically joins split words (e.g., `import dance` -> `importance`).
26
+ - **Semantic Judge (BERT Tiny)**: We've integrated a lightweight BERT model that understands English grammar. It scores sentences based on **"Meaning."** If the OCR produces a jumbled mess, the Semantic Judge selects the most grammatically coherent version.
27
+
28
  ---
29
+
30
+ ## 🧠 Training Your Own Models
31
+
32
+ We've provided a full suite of training scripts to keep the system evolving:
33
+
34
+ ### 🖼️ CNN Denoising Training
35
+ Located in `training/train_denoiser.py`.
36
+ - **The Why**: Denoising uses both OpenCV and LightCNN. Math-based filters (OpenCV) sometimes blur thin handwriting. A trained CNN "understands" what a stroke should look like and can reconstruct it.
37
+ - **How to use**: Run `generate_dataset.py` to create synthetic training data, then run `train_denoiser.py` to bake your own weights.
38
+
39
+ ### ✍️ NLP Corpus Training
40
+ Located in `training/train_nlp.py`.
41
+ - **The Why**: If you frequently write about specific topics (e.g., Medical, History), the NLP needs to know those specific "rare" words.
42
+ - **How to use**: Provide your own text corpus to the script, and it will tune the dictionary and semantic probabilities to favor your specific domain.
43
+
44
+ ---
45
+
46
+ ## 🛠️ Installation & Setup
47
+
48
+ 1. **Install Dependencies**:
49
+ ```bash
50
+ pip install -r requirements.txt
51
+ ```
52
+ 2. **Run the Application**:
53
+ ```bash
54
+ streamlit run app.py
55
+ ```
56
+
57
  ---
58
 
59
+ ## 📦 Core Technology Stack
60
 
61
+ - **OpenCV + LightCNN**: Denoising—OpenCV for bitwise masking, adaptive thresholding, and non-local means; LightCNN for DL-based denoising alongside it.
62
+ - **PyTorch**: Powers the CNN Denoiser and the BERT Semantic Judge.
63
+ - **Transformers**: Provides the contextual intelligence for the NLP layer.
64
+ - **Streamlit**: A high-performance, premium UI with Glassmorphism and animated gradients.
65
 
66
+ *Built with ❤️ by the RCO Team.*
 
__pycache__/crnn_model.cpython-311.pyc ADDED
Binary file (7.76 kB). View file
 
app.py ADDED
@@ -0,0 +1,306 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ import torch
3
+ import torchvision.transforms as transforms
4
+ from PIL import Image
5
+ from pillow_heif import register_heif_opener
6
+ import numpy as np
7
+ import os
8
+ from io import BytesIO
9
+ from googletrans import Translator, LANGUAGES
10
+ from gtts import gTTS
11
+
12
+ # Register HEIC support for PIL
13
+ register_heif_opener()
14
+ from streamlit_cropper import st_cropper
15
+ import easyocr
16
+ st.set_page_config(page_title="INK VISION", page_icon="✨", layout="wide")
17
+
18
+ # Custom CSS for the stunning animated background and glassmorphic UI
19
+ st.markdown("""
20
+ <style>
21
+ @import url('https://fonts.googleapis.com/css2?family=Orbitron:wght@500;700&family=Poppins:wght@300;400;600&display=swap');
22
+
23
+ /* Animated Gradient Background */
24
+ .stApp {
25
+ background: linear-gradient(-45deg, #ee7752, #e73c7e, #23a6d5, #23d5ab);
26
+ background-size: 400% 400%;
27
+ animation: gradientBG 15s ease infinite;
28
+ font-family: 'Poppins', sans-serif;
29
+ }
30
+
31
+ @keyframes gradientBG {
32
+ 0% { background-position: 0% 50%; }
33
+ 50% { background-position: 100% 50%; }
34
+ 100% { background-position: 0% 50%; }
35
+ }
36
+
37
+ /* Base text color to white for contrast against dark/bright backgrounds */
38
+ h1, h2, h3, p, label {
39
+ color: #ffffff !important;
40
+ text-shadow: 1px 1px 4px rgba(0,0,0,0.4);
41
+ }
42
+
43
+ /* Glassmorphism wrapper for header */
44
+ .glass-container {
45
+ background: rgba(255, 255, 255, 0.1);
46
+ border-radius: 16px;
47
+ box-shadow: 0 8px 32px 0 rgba(31, 38, 135, 0.37);
48
+ backdrop-filter: blur(8.5px);
49
+ -webkit-backdrop-filter: blur(8.5px);
50
+ border: 1px solid rgba(255, 255, 255, 0.18);
51
+ padding: 2rem;
52
+ margin-top: 1rem;
53
+ margin-bottom: 2rem;
54
+ }
55
+
56
+ /* Fancy Header Font */
57
+ h1 {
58
+ font-family: 'Orbitron', sans-serif !important;
59
+ font-size: 3rem !important;
60
+ text-align: center;
61
+ background: -webkit-linear-gradient(#fff, #f0f0f0);
62
+ -webkit-background-clip: text;
63
+ -webkit-text-fill-color: transparent;
64
+ margin-bottom: 0.5rem;
65
+ }
66
+
67
+ /* Stylish buttons */
68
+ div.stButton > button:first-child {
69
+ background: linear-gradient(90deg, #ff007f 0%, #7928ca 100%);
70
+ color: white;
71
+ border: none;
72
+ border-radius: 50px;
73
+ padding: 10px 24px;
74
+ font-weight: 600;
75
+ font-size: 1.1rem;
76
+ cursor: pointer;
77
+ transition: all 0.3s ease;
78
+ box-shadow: 0 4px 15px rgba(0,0,0,0.2);
79
+ }
80
+
81
+ div.stButton > button:first-child:hover {
82
+ transform: translateY(-2px);
83
+ box-shadow: 0 6px 20px rgba(0,0,0,0.3);
84
+ background: linear-gradient(90deg, #7928ca 0%, #ff007f 100%);
85
+ color: #ffffff !important;
86
+ }
87
+
88
+ /* File Uploader styling */
89
+ .stFileUploader > div > div {
90
+ background: rgba(255, 255, 255, 0.05);
91
+ border: 2px dashed rgba(255, 255, 255, 0.5);
92
+ border-radius: 10px;
93
+ }
94
+
95
+ /* Text area styling */
96
+ .stTextArea textarea {
97
+ background-color: rgba(255, 255, 255, 0.9) !important;
98
+ color: #333333 !important;
99
+ font-size: 1.5rem !important;
100
+ font-weight: 600 !important;
101
+ font-family: 'Poppins', sans-serif !important;
102
+ border-radius: 10px !important;
103
+ border: 2px solid transparent !important;
104
+ }
105
+ .stTextArea textarea:focus {
106
+ border-color: #ff007f !important;
107
+ box-shadow: 0 0 10px rgba(255,0,127,0.5) !important;
108
+ }
109
+
110
+ </style>
111
+
112
+ <div class="glass-container">
113
+ <h1>✨ HTR ✨</h1>
114
+ <p style="text-align: center; font-size: 1.2rem;">Experience the magic of handwritten word recognition.</p>
115
+ </div>
116
+ """, unsafe_allow_html=True)
117
+
118
+ from pipeline.preprocessor import DocumentPreprocessor
119
+ from pipeline.ocr_engine import HTREngine
120
+ from pipeline.postprocessor import NLPCorrector
121
+
122
+ # Initialise translator once
123
+ translator = Translator()
124
+
125
+ # Simple helpers for state
126
+ if "extracted_text" not in st.session_state:
127
+ st.session_state["extracted_text"] = ""
128
+ if "translated_text" not in st.session_state:
129
+ st.session_state["translated_text"] = ""
130
+ if "target_lang" not in st.session_state:
131
+ st.session_state["target_lang"] = "en"
132
+
133
+ @st.cache_resource(show_spinner="Booting up 3-Step HTR Pipeline (CV + OCR + NLP)...")
134
+ def load_pipeline():
135
+ p = DocumentPreprocessor()
136
+ e = HTREngine(languages=['en'])
137
+ n = NLPCorrector(use_ml=True)
138
+ return p, e, n
139
+
140
+ preprocessor, engine, nlp_corrector = load_pipeline()
141
+
142
+ col1, col2 = st.columns(2)
143
+
144
+ target_image = None
145
+
146
+ with col1:
147
+ st.markdown("### 📸 Input your masterpiece")
148
+ input_method = st.radio("Choose Input Method", ["Upload Image", "Take a Photo"], horizontal=True)
149
+
150
+ if input_method == "Upload Image":
151
+ uploaded_file = st.file_uploader("Upload a handwritten word image", type=["png", "jpg", "jpeg", "heic", "webp"])
152
+ if uploaded_file is not None:
153
+ raw_image = Image.open(uploaded_file).convert("RGB")
154
+
155
+ # Resize image to a standard width so both cropper and st.image match in size
156
+ target_width = 700
157
+ if raw_image.width != target_width:
158
+ ratio = target_width / float(raw_image.width)
159
+ raw_image = raw_image.resize((target_width, int(raw_image.height * ratio)))
160
+
161
+ if st.checkbox("✨ Crop Image", key="crop_upload"):
162
+ st.markdown("✨ **Crop the word below:**")
163
+ target_image = st_cropper(raw_image, realtime_update=True, box_color='#ff007f', key="upload_crop")
164
+ else:
165
+ target_image = raw_image
166
+ st.image(target_image, caption="Uploaded Image")
167
+ else:
168
+ camera_photo = st.camera_input("Take a picture of a handwritten word")
169
+ if camera_photo is not None:
170
+ raw_image = Image.open(camera_photo).convert("RGB")
171
+
172
+ # Resize image to a standard width so both cropper and st.image match in size
173
+ target_width = 700
174
+ if raw_image.width != target_width:
175
+ ratio = target_width / float(raw_image.width)
176
+ raw_image = raw_image.resize((target_width, int(raw_image.height * ratio)))
177
+
178
+ if st.checkbox("✨ Crop Image", key="crop_camera"):
179
+ st.markdown("✨ **Crop the word below:**")
180
+ target_image = st_cropper(raw_image, realtime_update=True, box_color='#ff007f', key="camera_crop")
181
+ else:
182
+ target_image = raw_image
183
+ st.image(target_image, caption="Captured Image")
184
+
185
+ with col2:
186
+ st.markdown("### 🪄 Magic Result")
187
+
188
+ extracted_text = st.session_state.get("extracted_text", "")
189
+ translated_text = st.session_state.get("translated_text", "")
190
+
191
+ if target_image is not None:
192
+ if st.button("✨ Extract Text"):
193
+ with st.spinner("Applying Deep Learning OCR algorithms..."):
194
+ if engine is None:
195
+ st.error("Pipeline failed to initialize.")
196
+ else:
197
+ # --- STREAM A: RAW OCR (No Preprocessing) ---
198
+ try:
199
+ raw_ocr_output = engine.extract_text(np.array(target_image))
200
+ raw_stream_text = nlp_corrector.correct_spelling(raw_ocr_output)
201
+ except Exception:
202
+ raw_stream_text = ""
203
+
204
+ # --- STREAM B: 3-STEP PIPELINE (Pre-Processed) ---
205
+ try:
206
+ # 1. Computer Vision Pre-Processing
207
+ cleaned_image_array = preprocessor.process(target_image)
208
+ # 2. Deep Learning OCR Engine
209
+ p_ocr_output = engine.extract_text(cleaned_image_array)
210
+ # 3. NLP Post-Processing
211
+ clean_stream_text = nlp_corrector.correct_spelling(p_ocr_output)
212
+ except Exception:
213
+ clean_stream_text = ""
214
+
215
+ # --- THE ENSEMBLE JUDGE ---
216
+ # The judge picks the version that sounds most like real English
217
+ extracted_text = nlp_corrector.judge_best_output(raw_stream_text, clean_stream_text)
218
+
219
+ if extracted_text.strip() == "":
220
+ st.warning("Oops! I couldn't find any text. Try a clearer image.")
221
+ extracted_text = ""
222
+ else:
223
+ st.success("Ensemble Magic! Winner selected from Dual-Stream analysis.")
224
+ with st.expander("Show AI Reasoning (Ensemble Comparison)"):
225
+ st.write(f"**Stream A (Raw Image):** {raw_stream_text}")
226
+ st.write(f"**Stream B (Cleaned Image):** {clean_stream_text}")
227
+
228
+ st.session_state["extracted_text"] = extracted_text
229
+ st.session_state["translated_text"] = ""
230
+
231
+ # Editable original text
232
+ st.session_state["extracted_text"] = st.text_area(
233
+ "You can edit the result here:",
234
+ value=st.session_state.get("extracted_text", ""),
235
+ height=150,
236
+ )
237
+
238
+ st.markdown("### 🌐 Translation & Voice")
239
+
240
+ # Language selection
241
+ lang_keys = sorted(LANGUAGES.keys())
242
+ default_index = lang_keys.index(st.session_state.get("target_lang", "en"))
243
+ target_lang = st.selectbox(
244
+ "Choose target language",
245
+ options=lang_keys,
246
+ index=default_index,
247
+ format_func=lambda k: LANGUAGES[k].title(),
248
+ )
249
+ st.session_state["target_lang"] = target_lang
250
+
251
+ with st.expander("Show available languages"):
252
+ st.write(", ".join(f"{code} – {name.title()}" for code, name in LANGUAGES.items()))
253
+
254
+ col_translate, col_speak = st.columns(2)
255
+
256
+ with col_translate:
257
+ if st.button("🌍 Translate into other language"):
258
+ if st.session_state["extracted_text"].strip():
259
+ try:
260
+ result = translator.translate(
261
+ st.session_state["extracted_text"],
262
+ dest=target_lang,
263
+ )
264
+ st.session_state["translated_text"] = result.text
265
+ except Exception as e:
266
+ st.error(f"Translation failed: {e}")
267
+ else:
268
+ st.warning("Please extract or type some text first.")
269
+
270
+ with col_speak:
271
+ if st.button("🔊 Speak text (original & translated)"):
272
+ original = st.session_state.get("extracted_text", "").strip()
273
+ translated = st.session_state.get("translated_text", "").strip()
274
+
275
+ if not original and not translated:
276
+ st.warning("Nothing to speak. Please extract or translate text first.")
277
+ else:
278
+ # Speak original (English assumed)
279
+ if original:
280
+ try:
281
+ buf = BytesIO()
282
+ gTTS(text=original, lang="en").write_to_fp(buf)
283
+ buf.seek(0)
284
+ st.audio(buf.read(), format="audio/mp3")
285
+ except Exception as e:
286
+ st.error(f"Failed to generate audio for original text: {e}")
287
+
288
+ # Speak translated
289
+ if translated:
290
+ try:
291
+ buf_tr = BytesIO()
292
+ gTTS(text=translated, lang=target_lang).write_to_fp(buf_tr)
293
+ buf_tr.seek(0)
294
+ st.audio(buf_tr.read(), format="audio/mp3")
295
+ except Exception as e:
296
+ st.error(f"Failed to generate audio for translated text: {e}")
297
+
298
+ if st.session_state.get("translated_text", "").strip():
299
+ st.text_area(
300
+ "Translated text:",
301
+ value=st.session_state["translated_text"],
302
+ height=150,
303
+ )
304
+
305
+ else:
306
+ st.info("Waiting for an image to work my magic...")
dataset/clean/000.jpg ADDED
dataset/clean/001.jpg ADDED
dataset/clean/002.jpg ADDED
dataset/clean/003.jpg ADDED
dataset/clean/004.jpg ADDED
dataset/clean/005.jpg ADDED
dataset/clean/006.jpg ADDED
dataset/clean/007.jpg ADDED
dataset/clean/008.jpg ADDED
dataset/clean/009.jpg ADDED
dataset/clean/010.jpg ADDED
dataset/clean/011.jpg ADDED
dataset/clean/012.jpg ADDED
dataset/clean/013.jpg ADDED
dataset/clean/014.jpg ADDED
dataset/clean/015.jpg ADDED
dataset/clean/016.jpg ADDED
dataset/clean/017.jpg ADDED
dataset/clean/018.jpg ADDED
dataset/clean/019.jpg ADDED
dataset/clean/020.jpg ADDED
dataset/clean/021.jpg ADDED
dataset/clean/022.jpg ADDED
dataset/clean/023.jpg ADDED
dataset/clean/024.jpg ADDED
dataset/clean/025.jpg ADDED
dataset/clean/026.jpg ADDED
dataset/clean/027.jpg ADDED
dataset/clean/028.jpg ADDED
dataset/clean/029.jpg ADDED
dataset/clean/030.jpg ADDED
dataset/clean/031.jpg ADDED
dataset/clean/032.jpg ADDED
dataset/clean/033.jpg ADDED
dataset/clean/034.jpg ADDED
dataset/clean/035.jpg ADDED
dataset/clean/036.jpg ADDED
dataset/clean/037.jpg ADDED
dataset/clean/038.jpg ADDED
dataset/clean/039.jpg ADDED
dataset/clean/040.jpg ADDED
dataset/clean/041.jpg ADDED
dataset/clean/042.jpg ADDED
dataset/clean/043.jpg ADDED
dataset/clean/044.jpg ADDED
dataset/clean/045.jpg ADDED
dataset/clean/046.jpg ADDED