diff --git a/README.md b/README.md
index bd9ae759eefd3ad1c6f18283a1bac215ecefb299..b26ef6c813a5ab314160a506162593d6c1007113 100644
--- a/README.md
+++ b/README.md
@@ -1,20 +1,66 @@
+# ✨ Ink Vision: Advanced HTR Pipeline ✨
+
+Welcome to **Ink Vision**, a state-of-the-art Handwritten Text Recognition (HTR) system. This isn't just a simple OCR wrapper; it's a modular, **3-Step Intelligent Pipeline** designed to handle messy, real-world handwriting with precision.
+
+---
+
+## 🚀 The 3-Step Hybrid Architecture
+
+To achieve world-class accuracy, we split the logic into three distinct, hot-swappable stages:
+
+### 1️⃣ Step 1: Pre-Processor (Computer Vision & DL)
+Before the AI reads the text, we "clean" the image to remove noise, shadows, and artifacts.
+- **OpenCV + LightCNN (Denoising)**: Denoising is done by both OpenCV and LightCNN together. OpenCV handles adaptive thresholding, binarization, Green-Channel extraction (to make red ink "pop"), and non-local means denoising. LightCNN is used for denoising alongside OpenCV—its architecture is there for image restoration (Noisy → Clean pairs); in its current form the CNN is worth nothing, but both are part of our denoising pipeline.
+- **Deskewing**: Automatic rotation correction ensures slanted handwriting is perfectly leveled for the OCR engine.
+
+### 2️⃣ Step 2: HTR Engine (Sequence Modeling)
+The core recognition happens here. We utilize a **CRAFT + ResNet + LSTM** architecture:
+- **Detection**: CRAFT identifies individual character regions and groups them into words.
+- **Recognition**: A Deep Residual Network extracts visual features, which are then sequenced by an LSTM to understand the flow of handwriting.
+- **Ensemble Strategy**: The app runs dual-inference—one on the raw image and one on the cleaned image—to ensure no data is lost.
+
+### 3️⃣ Step 3: Post-Processor (NLP Semantic Judge)
+Raw OCR output is often "noisy." This stage acts as a human-like editor:
+- **Contextual Spellchecker**: Fixes common OCR typos while preserving original capitalization.
+- **Merging Logic**: Automatically joins split words (e.g., `import dance` -> `importance`).
+- **Semantic Judge (BERT Tiny)**: We've integrated a lightweight BERT model that understands English grammar. It scores sentences based on **"Meaning."** If the OCR produces a jumbled mess, the Semantic Judge selects the most grammatically coherent version.
+
---
-title: Vision
-emoji: 🚀
-colorFrom: red
-colorTo: red
-sdk: docker
-app_port: 8501
-tags:
-- streamlit
-pinned: false
-short_description: Streamlit template space
-license: mit
+
+## 🧠 Training Your Own Models
+
+We've provided a full suite of training scripts to keep the system evolving:
+
+### 🖼️ CNN Denoising Training
+Located in `training/train_denoiser.py`.
+- **The Why**: Denoising uses both OpenCV and LightCNN. Math-based filters (OpenCV) sometimes blur thin handwriting. A trained CNN "understands" what a stroke should look like and can reconstruct it.
+- **How to use**: Run `generate_dataset.py` to create synthetic training data, then run `train_denoiser.py` to bake your own weights.
+
+### ✍️ NLP Corpus Training
+Located in `training/train_nlp.py`.
+- **The Why**: If you frequently write about specific topics (e.g., Medical, History), the NLP needs to know those specific "rare" words.
+- **How to use**: Provide your own text corpus to the script, and it will tune the dictionary and semantic probabilities to favor your specific domain.
+
+---
+
+## 🛠️ Installation & Setup
+
+1. **Install Dependencies**:
+ ```bash
+ pip install -r requirements.txt
+ ```
+2. **Run the Application**:
+ ```bash
+ streamlit run app.py
+ ```
+
---
-# Welcome to Streamlit!
+## 📦 Core Technology Stack
-Edit `/src/streamlit_app.py` to customize this app to your heart's desire. :heart:
+- **OpenCV + LightCNN**: Denoising—OpenCV for bitwise masking, adaptive thresholding, and non-local means; LightCNN for DL-based denoising alongside it.
+- **PyTorch**: Powers the CNN Denoiser and the BERT Semantic Judge.
+- **Transformers**: Provides the contextual intelligence for the NLP layer.
+- **Streamlit**: A high-performance, premium UI with Glassmorphism and animated gradients.
-If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
-forums](https://discuss.streamlit.io).
+*Built with ❤️ by the RCO Team.*
diff --git a/__pycache__/crnn_model.cpython-311.pyc b/__pycache__/crnn_model.cpython-311.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..07f663d2920e825c5f51c850ca04b2b3f4bc6f48
Binary files /dev/null and b/__pycache__/crnn_model.cpython-311.pyc differ
diff --git a/app.py b/app.py
new file mode 100644
index 0000000000000000000000000000000000000000..783ce6e61247feec702a3d1952360f1eff423650
--- /dev/null
+++ b/app.py
@@ -0,0 +1,306 @@
+import streamlit as st
+import torch
+import torchvision.transforms as transforms
+from PIL import Image
+from pillow_heif import register_heif_opener
+import numpy as np
+import os
+from io import BytesIO
+from googletrans import Translator, LANGUAGES
+from gtts import gTTS
+
+# Register HEIC support for PIL
+register_heif_opener()
+from streamlit_cropper import st_cropper
+import easyocr
+st.set_page_config(page_title="INK VISION", page_icon="✨", layout="wide")
+
+# Custom CSS for the stunning animated background and glassmorphic UI
+st.markdown("""
+
+
+
+
✨ HTR ✨
+
Experience the magic of handwritten word recognition.
+
+""", unsafe_allow_html=True)
+
+from pipeline.preprocessor import DocumentPreprocessor
+from pipeline.ocr_engine import HTREngine
+from pipeline.postprocessor import NLPCorrector
+
+# Initialise translator once
+translator = Translator()
+
+# Simple helpers for state
+if "extracted_text" not in st.session_state:
+ st.session_state["extracted_text"] = ""
+if "translated_text" not in st.session_state:
+ st.session_state["translated_text"] = ""
+if "target_lang" not in st.session_state:
+ st.session_state["target_lang"] = "en"
+
+@st.cache_resource(show_spinner="Booting up 3-Step HTR Pipeline (CV + OCR + NLP)...")
+def load_pipeline():
+ p = DocumentPreprocessor()
+ e = HTREngine(languages=['en'])
+ n = NLPCorrector(use_ml=True)
+ return p, e, n
+
+preprocessor, engine, nlp_corrector = load_pipeline()
+
+col1, col2 = st.columns(2)
+
+target_image = None
+
+with col1:
+ st.markdown("### 📸 Input your masterpiece")
+ input_method = st.radio("Choose Input Method", ["Upload Image", "Take a Photo"], horizontal=True)
+
+ if input_method == "Upload Image":
+ uploaded_file = st.file_uploader("Upload a handwritten word image", type=["png", "jpg", "jpeg", "heic", "webp"])
+ if uploaded_file is not None:
+ raw_image = Image.open(uploaded_file).convert("RGB")
+
+ # Resize image to a standard width so both cropper and st.image match in size
+ target_width = 700
+ if raw_image.width != target_width:
+ ratio = target_width / float(raw_image.width)
+ raw_image = raw_image.resize((target_width, int(raw_image.height * ratio)))
+
+ if st.checkbox("✨ Crop Image", key="crop_upload"):
+ st.markdown("✨ **Crop the word below:**")
+ target_image = st_cropper(raw_image, realtime_update=True, box_color='#ff007f', key="upload_crop")
+ else:
+ target_image = raw_image
+ st.image(target_image, caption="Uploaded Image")
+ else:
+ camera_photo = st.camera_input("Take a picture of a handwritten word")
+ if camera_photo is not None:
+ raw_image = Image.open(camera_photo).convert("RGB")
+
+ # Resize image to a standard width so both cropper and st.image match in size
+ target_width = 700
+ if raw_image.width != target_width:
+ ratio = target_width / float(raw_image.width)
+ raw_image = raw_image.resize((target_width, int(raw_image.height * ratio)))
+
+ if st.checkbox("✨ Crop Image", key="crop_camera"):
+ st.markdown("✨ **Crop the word below:**")
+ target_image = st_cropper(raw_image, realtime_update=True, box_color='#ff007f', key="camera_crop")
+ else:
+ target_image = raw_image
+ st.image(target_image, caption="Captured Image")
+
+with col2:
+ st.markdown("### 🪄 Magic Result")
+
+ extracted_text = st.session_state.get("extracted_text", "")
+ translated_text = st.session_state.get("translated_text", "")
+
+ if target_image is not None:
+ if st.button("✨ Extract Text"):
+ with st.spinner("Applying Deep Learning OCR algorithms..."):
+ if engine is None:
+ st.error("Pipeline failed to initialize.")
+ else:
+ # --- STREAM A: RAW OCR (No Preprocessing) ---
+ try:
+ raw_ocr_output = engine.extract_text(np.array(target_image))
+ raw_stream_text = nlp_corrector.correct_spelling(raw_ocr_output)
+ except Exception:
+ raw_stream_text = ""
+
+ # --- STREAM B: 3-STEP PIPELINE (Pre-Processed) ---
+ try:
+ # 1. Computer Vision Pre-Processing
+ cleaned_image_array = preprocessor.process(target_image)
+ # 2. Deep Learning OCR Engine
+ p_ocr_output = engine.extract_text(cleaned_image_array)
+ # 3. NLP Post-Processing
+ clean_stream_text = nlp_corrector.correct_spelling(p_ocr_output)
+ except Exception:
+ clean_stream_text = ""
+
+ # --- THE ENSEMBLE JUDGE ---
+ # The judge picks the version that sounds most like real English
+ extracted_text = nlp_corrector.judge_best_output(raw_stream_text, clean_stream_text)
+
+ if extracted_text.strip() == "":
+ st.warning("Oops! I couldn't find any text. Try a clearer image.")
+ extracted_text = ""
+ else:
+ st.success("Ensemble Magic! Winner selected from Dual-Stream analysis.")
+ with st.expander("Show AI Reasoning (Ensemble Comparison)"):
+ st.write(f"**Stream A (Raw Image):** {raw_stream_text}")
+ st.write(f"**Stream B (Cleaned Image):** {clean_stream_text}")
+
+ st.session_state["extracted_text"] = extracted_text
+ st.session_state["translated_text"] = ""
+
+ # Editable original text
+ st.session_state["extracted_text"] = st.text_area(
+ "You can edit the result here:",
+ value=st.session_state.get("extracted_text", ""),
+ height=150,
+ )
+
+ st.markdown("### 🌐 Translation & Voice")
+
+ # Language selection
+ lang_keys = sorted(LANGUAGES.keys())
+ default_index = lang_keys.index(st.session_state.get("target_lang", "en"))
+ target_lang = st.selectbox(
+ "Choose target language",
+ options=lang_keys,
+ index=default_index,
+ format_func=lambda k: LANGUAGES[k].title(),
+ )
+ st.session_state["target_lang"] = target_lang
+
+ with st.expander("Show available languages"):
+ st.write(", ".join(f"{code} – {name.title()}" for code, name in LANGUAGES.items()))
+
+ col_translate, col_speak = st.columns(2)
+
+ with col_translate:
+ if st.button("🌍 Translate into other language"):
+ if st.session_state["extracted_text"].strip():
+ try:
+ result = translator.translate(
+ st.session_state["extracted_text"],
+ dest=target_lang,
+ )
+ st.session_state["translated_text"] = result.text
+ except Exception as e:
+ st.error(f"Translation failed: {e}")
+ else:
+ st.warning("Please extract or type some text first.")
+
+ with col_speak:
+ if st.button("🔊 Speak text (original & translated)"):
+ original = st.session_state.get("extracted_text", "").strip()
+ translated = st.session_state.get("translated_text", "").strip()
+
+ if not original and not translated:
+ st.warning("Nothing to speak. Please extract or translate text first.")
+ else:
+ # Speak original (English assumed)
+ if original:
+ try:
+ buf = BytesIO()
+ gTTS(text=original, lang="en").write_to_fp(buf)
+ buf.seek(0)
+ st.audio(buf.read(), format="audio/mp3")
+ except Exception as e:
+ st.error(f"Failed to generate audio for original text: {e}")
+
+ # Speak translated
+ if translated:
+ try:
+ buf_tr = BytesIO()
+ gTTS(text=translated, lang=target_lang).write_to_fp(buf_tr)
+ buf_tr.seek(0)
+ st.audio(buf_tr.read(), format="audio/mp3")
+ except Exception as e:
+ st.error(f"Failed to generate audio for translated text: {e}")
+
+ if st.session_state.get("translated_text", "").strip():
+ st.text_area(
+ "Translated text:",
+ value=st.session_state["translated_text"],
+ height=150,
+ )
+
+ else:
+ st.info("Waiting for an image to work my magic...")
diff --git a/dataset/clean/000.jpg b/dataset/clean/000.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..606789ffe2818bde7f16bdcfaaeacfde3da087c9
Binary files /dev/null and b/dataset/clean/000.jpg differ
diff --git a/dataset/clean/001.jpg b/dataset/clean/001.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..5de4369d9ceeabe4607d51229cce6ffc71bbee0e
Binary files /dev/null and b/dataset/clean/001.jpg differ
diff --git a/dataset/clean/002.jpg b/dataset/clean/002.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..3f369608e532947a027596325363401b14c7938c
Binary files /dev/null and b/dataset/clean/002.jpg differ
diff --git a/dataset/clean/003.jpg b/dataset/clean/003.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..dbb9c5f7f6f0cc93a5ed61361c9c30189bac0564
Binary files /dev/null and b/dataset/clean/003.jpg differ
diff --git a/dataset/clean/004.jpg b/dataset/clean/004.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..02fd87222694ef60b880cdef4c312a8c29c122d9
Binary files /dev/null and b/dataset/clean/004.jpg differ
diff --git a/dataset/clean/005.jpg b/dataset/clean/005.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..d49dd3c26a24271e29f0a2e66527e4dbfd4561cc
Binary files /dev/null and b/dataset/clean/005.jpg differ
diff --git a/dataset/clean/006.jpg b/dataset/clean/006.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..9dbe1986293b9b008ee597e254bf2cd341f54dc3
Binary files /dev/null and b/dataset/clean/006.jpg differ
diff --git a/dataset/clean/007.jpg b/dataset/clean/007.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..658ae5383209768201abc0ad7f7d8279d4d24f7e
Binary files /dev/null and b/dataset/clean/007.jpg differ
diff --git a/dataset/clean/008.jpg b/dataset/clean/008.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..4c6e55e8d958befc7ed5bca535592514eb6d2f4d
Binary files /dev/null and b/dataset/clean/008.jpg differ
diff --git a/dataset/clean/009.jpg b/dataset/clean/009.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..a88d1c488991a403f0b5682f844bc78ee9631e49
Binary files /dev/null and b/dataset/clean/009.jpg differ
diff --git a/dataset/clean/010.jpg b/dataset/clean/010.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..f9f0e7e31c5497e966800ebad05bc42200235ac4
Binary files /dev/null and b/dataset/clean/010.jpg differ
diff --git a/dataset/clean/011.jpg b/dataset/clean/011.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..e2437a5c8e950f203b7a9f6c5a2049769d64888f
Binary files /dev/null and b/dataset/clean/011.jpg differ
diff --git a/dataset/clean/012.jpg b/dataset/clean/012.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..91cab13f035279e4e587561b74fa3e688e29c093
Binary files /dev/null and b/dataset/clean/012.jpg differ
diff --git a/dataset/clean/013.jpg b/dataset/clean/013.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..8cc98e3d4a280931f423a697c4f3512f62acb519
Binary files /dev/null and b/dataset/clean/013.jpg differ
diff --git a/dataset/clean/014.jpg b/dataset/clean/014.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..2f0023cd6521d07e22472c58fda8c84bcad569df
Binary files /dev/null and b/dataset/clean/014.jpg differ
diff --git a/dataset/clean/015.jpg b/dataset/clean/015.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..fc4b1de18e8c604caba975ca9ea72c87c9fec806
Binary files /dev/null and b/dataset/clean/015.jpg differ
diff --git a/dataset/clean/016.jpg b/dataset/clean/016.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..19f2ee2a6042c35a9f3e47c40333dac89c265c1b
Binary files /dev/null and b/dataset/clean/016.jpg differ
diff --git a/dataset/clean/017.jpg b/dataset/clean/017.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..f51b9e1041a572dbaf2aff6a3a565f5e81d10b6e
Binary files /dev/null and b/dataset/clean/017.jpg differ
diff --git a/dataset/clean/018.jpg b/dataset/clean/018.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..2079398e5a4dfde9012472eb4307858d75a83088
Binary files /dev/null and b/dataset/clean/018.jpg differ
diff --git a/dataset/clean/019.jpg b/dataset/clean/019.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..6fa7e2e8bb1fb6844aacb90938ae1e43dd4d6807
Binary files /dev/null and b/dataset/clean/019.jpg differ
diff --git a/dataset/clean/020.jpg b/dataset/clean/020.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..e4a2142fc256f195e2a20e7d382658a74cdf54f7
Binary files /dev/null and b/dataset/clean/020.jpg differ
diff --git a/dataset/clean/021.jpg b/dataset/clean/021.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..a3c50487efef482e996b2995b23f28a11dbbcfa5
Binary files /dev/null and b/dataset/clean/021.jpg differ
diff --git a/dataset/clean/022.jpg b/dataset/clean/022.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..3336118fa1679c668da9839347b6404becc5ba96
Binary files /dev/null and b/dataset/clean/022.jpg differ
diff --git a/dataset/clean/023.jpg b/dataset/clean/023.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..d2ff5f71cd98ebe3bc009999dbb669512aadfe04
Binary files /dev/null and b/dataset/clean/023.jpg differ
diff --git a/dataset/clean/024.jpg b/dataset/clean/024.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..c30e5d2dc1860dbade80e93b5a0e2dbbbc1a658f
Binary files /dev/null and b/dataset/clean/024.jpg differ
diff --git a/dataset/clean/025.jpg b/dataset/clean/025.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..022d08b370cbc1c4940c9408bd8e2df8a1f6ba6a
Binary files /dev/null and b/dataset/clean/025.jpg differ
diff --git a/dataset/clean/026.jpg b/dataset/clean/026.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..5f004e46fad4badf6f83988a117bc0a809fd445e
Binary files /dev/null and b/dataset/clean/026.jpg differ
diff --git a/dataset/clean/027.jpg b/dataset/clean/027.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..b52f63c7f5fbfee27a11612a732190a1201a1a19
Binary files /dev/null and b/dataset/clean/027.jpg differ
diff --git a/dataset/clean/028.jpg b/dataset/clean/028.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..d758f74c8ff2abd63affeddbaa7f2f4125f720e0
Binary files /dev/null and b/dataset/clean/028.jpg differ
diff --git a/dataset/clean/029.jpg b/dataset/clean/029.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..3fac68ac4c3300494ba06c68163f48689054b3c1
Binary files /dev/null and b/dataset/clean/029.jpg differ
diff --git a/dataset/clean/030.jpg b/dataset/clean/030.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..232829d628716175cbe8b0fe23e9cc0cc8bb00e5
Binary files /dev/null and b/dataset/clean/030.jpg differ
diff --git a/dataset/clean/031.jpg b/dataset/clean/031.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..8542081782c92d2aa36f8c5b16c7d401e0c12824
Binary files /dev/null and b/dataset/clean/031.jpg differ
diff --git a/dataset/clean/032.jpg b/dataset/clean/032.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..017041d390b8aa703ffaa6637e3eee5d32a82c0e
Binary files /dev/null and b/dataset/clean/032.jpg differ
diff --git a/dataset/clean/033.jpg b/dataset/clean/033.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..583584fa5497f2dee08329657b48d32a5c420055
Binary files /dev/null and b/dataset/clean/033.jpg differ
diff --git a/dataset/clean/034.jpg b/dataset/clean/034.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..8feb44b86633c6773fdad004deb2ef0942f425f5
Binary files /dev/null and b/dataset/clean/034.jpg differ
diff --git a/dataset/clean/035.jpg b/dataset/clean/035.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..599318ca4e4502310df8d783175b7c8b3b2b2a09
Binary files /dev/null and b/dataset/clean/035.jpg differ
diff --git a/dataset/clean/036.jpg b/dataset/clean/036.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..3954d7ccb385862db404ced34559ca3f51bdfac2
Binary files /dev/null and b/dataset/clean/036.jpg differ
diff --git a/dataset/clean/037.jpg b/dataset/clean/037.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..8fbd2335f331d78e42a002c46b17b546af9c7a7b
Binary files /dev/null and b/dataset/clean/037.jpg differ
diff --git a/dataset/clean/038.jpg b/dataset/clean/038.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..2701f160c2a900723bcab6785b62bb16dd23d3e8
Binary files /dev/null and b/dataset/clean/038.jpg differ
diff --git a/dataset/clean/039.jpg b/dataset/clean/039.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..b95a790e1bdb71cb03d16e3b0200d81d56b18c18
Binary files /dev/null and b/dataset/clean/039.jpg differ
diff --git a/dataset/clean/040.jpg b/dataset/clean/040.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..429ea9c870b5cf2fd815de0a17a79a9a0805bbde
Binary files /dev/null and b/dataset/clean/040.jpg differ
diff --git a/dataset/clean/041.jpg b/dataset/clean/041.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..37f81862b79854a64161f3a87e7284c4e184f4fc
Binary files /dev/null and b/dataset/clean/041.jpg differ
diff --git a/dataset/clean/042.jpg b/dataset/clean/042.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..127e21f3f345b396756b1aa43646ba33373bbb5c
Binary files /dev/null and b/dataset/clean/042.jpg differ
diff --git a/dataset/clean/043.jpg b/dataset/clean/043.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..9390a824a5606968894559ffc79d81aac2ea465a
Binary files /dev/null and b/dataset/clean/043.jpg differ
diff --git a/dataset/clean/044.jpg b/dataset/clean/044.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..12e4d0963862076f176fa9d3ad2e46c8824082f1
Binary files /dev/null and b/dataset/clean/044.jpg differ
diff --git a/dataset/clean/045.jpg b/dataset/clean/045.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..6f35577c276735b3c06674ae9f68e332964a1ec2
Binary files /dev/null and b/dataset/clean/045.jpg differ
diff --git a/dataset/clean/046.jpg b/dataset/clean/046.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..20bf414d016639e947b00692e297b1340030c573
Binary files /dev/null and b/dataset/clean/046.jpg differ
diff --git a/dataset/clean/047.jpg b/dataset/clean/047.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..e1bf2122d17f245f72d4cb47671ed842df582d9c
Binary files /dev/null and b/dataset/clean/047.jpg differ
diff --git a/dataset/clean/048.jpg b/dataset/clean/048.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..1b839574e57729a411942d76d33c99e2f8d9fa13
Binary files /dev/null and b/dataset/clean/048.jpg differ
diff --git a/dataset/clean/049.jpg b/dataset/clean/049.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..7f5d5fdbe0865c192114b03bbe1b74e9d6f12b76
Binary files /dev/null and b/dataset/clean/049.jpg differ
diff --git a/dataset/clean/050.jpg b/dataset/clean/050.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..5b9bb31588809f0a25acb0340ffe6e8c0c2d8e9d
Binary files /dev/null and b/dataset/clean/050.jpg differ
diff --git a/dataset/clean/051.jpg b/dataset/clean/051.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..6c8c87161d50a0f2e40725438f27b60ad7e7cfe1
Binary files /dev/null and b/dataset/clean/051.jpg differ
diff --git a/dataset/clean/052.jpg b/dataset/clean/052.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..e73c99ccee7900d1a0dbdc2b98804d453fab788c
Binary files /dev/null and b/dataset/clean/052.jpg differ
diff --git a/dataset/clean/053.jpg b/dataset/clean/053.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..050e4e0635a6d7d4d56a5936837865d3125b7209
Binary files /dev/null and b/dataset/clean/053.jpg differ
diff --git a/dataset/clean/054.jpg b/dataset/clean/054.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..01483b1ea4c7afb3d16a4d3265db89f6f2d21edc
Binary files /dev/null and b/dataset/clean/054.jpg differ
diff --git a/dataset/clean/055.jpg b/dataset/clean/055.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..3851b74d7075a8b7ab71fc83d1b0ab8b225e53e0
Binary files /dev/null and b/dataset/clean/055.jpg differ
diff --git a/dataset/clean/056.jpg b/dataset/clean/056.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..b34d0459296e86ecc9e9a6fa99b7f6449adaa8ac
Binary files /dev/null and b/dataset/clean/056.jpg differ
diff --git a/dataset/clean/057.jpg b/dataset/clean/057.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..08341bddd260000c6d292cfcf009787a8a23e686
Binary files /dev/null and b/dataset/clean/057.jpg differ
diff --git a/dataset/clean/058.jpg b/dataset/clean/058.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..269acc1d0335245c28aa97107c41aa2ff64909ea
Binary files /dev/null and b/dataset/clean/058.jpg differ
diff --git a/dataset/clean/059.jpg b/dataset/clean/059.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..d508e58798fc67bd307b24a7314ce2ee4e7403c5
Binary files /dev/null and b/dataset/clean/059.jpg differ
diff --git a/dataset/clean/060.jpg b/dataset/clean/060.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..fd33d0e0bc8ca398d2a785373fffd5b0c36fad9c
Binary files /dev/null and b/dataset/clean/060.jpg differ
diff --git a/dataset/clean/061.jpg b/dataset/clean/061.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..4a43d8e905dfa1dd53ab8cbb94ef5ae422576186
Binary files /dev/null and b/dataset/clean/061.jpg differ
diff --git a/dataset/clean/062.jpg b/dataset/clean/062.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..158a2883789a5eac5a502e36ab4c06aff56658e1
Binary files /dev/null and b/dataset/clean/062.jpg differ
diff --git a/dataset/clean/063.jpg b/dataset/clean/063.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..050d88999ed88b23f7b8e708e64929b4dd632f6b
Binary files /dev/null and b/dataset/clean/063.jpg differ
diff --git a/dataset/clean/064.jpg b/dataset/clean/064.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..8d0d43ced526bede0c98f38e35217d3f3582ccd2
Binary files /dev/null and b/dataset/clean/064.jpg differ
diff --git a/dataset/clean/065.jpg b/dataset/clean/065.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..41875d7d23b59fe7789107ab5bae89e6ff94bb86
Binary files /dev/null and b/dataset/clean/065.jpg differ
diff --git a/dataset/clean/066.jpg b/dataset/clean/066.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..d6db8026eec7b4fc1fecbc6c79c5d07f72e52964
Binary files /dev/null and b/dataset/clean/066.jpg differ
diff --git a/dataset/clean/067.jpg b/dataset/clean/067.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..b3a4dec453524bb981f94b519eac91d97e2452ae
Binary files /dev/null and b/dataset/clean/067.jpg differ
diff --git a/dataset/clean/068.jpg b/dataset/clean/068.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..7b311a5fb00c4c7f03de8586413f5bc7116c85e4
Binary files /dev/null and b/dataset/clean/068.jpg differ
diff --git a/dataset/clean/069.jpg b/dataset/clean/069.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..5e04a6fa1e4af03f4df11337fd9183667f8ef02d
Binary files /dev/null and b/dataset/clean/069.jpg differ
diff --git a/dataset/clean/070.jpg b/dataset/clean/070.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..3eddf407a24db59ade5b9895dcf4ab1b797d13c7
Binary files /dev/null and b/dataset/clean/070.jpg differ
diff --git a/dataset/clean/071.jpg b/dataset/clean/071.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..107b0be4677eafc0543537e4b51401fa40630d0f
Binary files /dev/null and b/dataset/clean/071.jpg differ
diff --git a/dataset/clean/072.jpg b/dataset/clean/072.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..1c8fc0b4eaab34a71a0717f175aa93c5f8f41659
Binary files /dev/null and b/dataset/clean/072.jpg differ
diff --git a/dataset/clean/073.jpg b/dataset/clean/073.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..c809636e1486910762d2c61ee8e52b43f7edca31
Binary files /dev/null and b/dataset/clean/073.jpg differ
diff --git a/dataset/clean/074.jpg b/dataset/clean/074.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..37e0dd18cb768a5866e1dde41803a45860f8049d
Binary files /dev/null and b/dataset/clean/074.jpg differ
diff --git a/dataset/clean/075.jpg b/dataset/clean/075.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..c984d0aa189fbadd1d6928c2ee89284b5254d1b3
Binary files /dev/null and b/dataset/clean/075.jpg differ
diff --git a/dataset/clean/076.jpg b/dataset/clean/076.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..b4c0527c90a84763651864b0bc583ca11bd05783
Binary files /dev/null and b/dataset/clean/076.jpg differ
diff --git a/dataset/clean/077.jpg b/dataset/clean/077.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..d23e651fa622efba394b7610c90be6348d81ce97
Binary files /dev/null and b/dataset/clean/077.jpg differ
diff --git a/dataset/clean/078.jpg b/dataset/clean/078.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..4d710811bf2ed2eef0f7ee1d6f612200640ddd00
Binary files /dev/null and b/dataset/clean/078.jpg differ
diff --git a/dataset/clean/079.jpg b/dataset/clean/079.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..eb6810bd4797325bae5cd7024beee3ec790486e9
Binary files /dev/null and b/dataset/clean/079.jpg differ
diff --git a/dataset/clean/080.jpg b/dataset/clean/080.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..818eabc1d651bfc25a43b394b52e4135db6dd63b
Binary files /dev/null and b/dataset/clean/080.jpg differ
diff --git a/dataset/clean/081.jpg b/dataset/clean/081.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..a33f6f8c8a619fa02dac6842e7ecc134d9d95cbf
Binary files /dev/null and b/dataset/clean/081.jpg differ
diff --git a/dataset/clean/082.jpg b/dataset/clean/082.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..671ca29ad408e394f8107ef8d245652b91ce1aae
Binary files /dev/null and b/dataset/clean/082.jpg differ
diff --git a/dataset/clean/083.jpg b/dataset/clean/083.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..9abde83763510cb8833b93747106279dd4e2ee1d
Binary files /dev/null and b/dataset/clean/083.jpg differ
diff --git a/dataset/clean/084.jpg b/dataset/clean/084.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..8b75925fc83f88fd5dce12d57e4a9d11431e12a3
Binary files /dev/null and b/dataset/clean/084.jpg differ
diff --git a/dataset/clean/085.jpg b/dataset/clean/085.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..436372f417c12a77c38675e6dc7743f0938b95d8
Binary files /dev/null and b/dataset/clean/085.jpg differ
diff --git a/dataset/clean/086.jpg b/dataset/clean/086.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..3c03a3bd5602a3154aa7c7dde350ba0fc3db097b
Binary files /dev/null and b/dataset/clean/086.jpg differ
diff --git a/dataset/clean/087.jpg b/dataset/clean/087.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..01a4628c25bbd90b4cbb9235492139ac1923592f
Binary files /dev/null and b/dataset/clean/087.jpg differ
diff --git a/dataset/clean/088.jpg b/dataset/clean/088.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..b801e70c81f1c4903ec2f0cc43fd56def50c3aef
Binary files /dev/null and b/dataset/clean/088.jpg differ
diff --git a/dataset/clean/089.jpg b/dataset/clean/089.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..59dc51bcfcc617eff9f40adc40a627cc5ad4228c
Binary files /dev/null and b/dataset/clean/089.jpg differ
diff --git a/dataset/clean/090.jpg b/dataset/clean/090.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..62ba34696b0a7e7b4d53e5f6d4df83a48bb1ee0f
Binary files /dev/null and b/dataset/clean/090.jpg differ
diff --git a/dataset/clean/091.jpg b/dataset/clean/091.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..e4bea952975ff37119169ae923bc77b7bb61b66d
Binary files /dev/null and b/dataset/clean/091.jpg differ
diff --git a/dataset/clean/092.jpg b/dataset/clean/092.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..a4cd4ac7d7f9e614c9379b67e00d7f0342d8b039
Binary files /dev/null and b/dataset/clean/092.jpg differ
diff --git a/dataset/clean/093.jpg b/dataset/clean/093.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..10a6e995738761ca3ec1faaf187bc7f6b409d7df
Binary files /dev/null and b/dataset/clean/093.jpg differ
diff --git a/dataset/clean/094.jpg b/dataset/clean/094.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..73aef3817d4a0f12229e824697da3dfde11673e7
Binary files /dev/null and b/dataset/clean/094.jpg differ
diff --git a/dataset/clean/095.jpg b/dataset/clean/095.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..4e45ee24a4e4b69ae6aecbd3662f85f6b80bce91
Binary files /dev/null and b/dataset/clean/095.jpg differ
diff --git a/dataset/clean/096.jpg b/dataset/clean/096.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..969ab02cf56b71f39a5e089e132f52d58ee29a06
Binary files /dev/null and b/dataset/clean/096.jpg differ
diff --git a/dataset/clean/097.jpg b/dataset/clean/097.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..4896e145cae2f25803f6114fb23e420b892759f7
Binary files /dev/null and b/dataset/clean/097.jpg differ
diff --git a/dataset/clean/098.jpg b/dataset/clean/098.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..f37360c22a7b8e92dd0f5e5bd33bec970fa45d90
Binary files /dev/null and b/dataset/clean/098.jpg differ
diff --git a/dataset/clean/099.jpg b/dataset/clean/099.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..d78dc6598ac47e2e57319cfbe1fc10609eed1e2c
Binary files /dev/null and b/dataset/clean/099.jpg differ
diff --git a/dataset/clean/100.jpg b/dataset/clean/100.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..e70c2f6874f5779708c3f8ea7384dad840e40f19
Binary files /dev/null and b/dataset/clean/100.jpg differ
diff --git a/dataset/clean/101.jpg b/dataset/clean/101.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..5829f49d8cbfc3cc0dc6acdfe73739cbaf6bd447
Binary files /dev/null and b/dataset/clean/101.jpg differ
diff --git a/dataset/clean/102.jpg b/dataset/clean/102.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..6a0cfad84278e89522495d73096301bf2a9c7ba8
Binary files /dev/null and b/dataset/clean/102.jpg differ
diff --git a/dataset/clean/103.jpg b/dataset/clean/103.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..c94ffbd80b78b3d8e349e7673b68c52096568057
Binary files /dev/null and b/dataset/clean/103.jpg differ
diff --git a/dataset/clean/104.jpg b/dataset/clean/104.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..9dd59bcafd8818676ffc05db7f3cfc721201c380
Binary files /dev/null and b/dataset/clean/104.jpg differ
diff --git a/dataset/clean/105.jpg b/dataset/clean/105.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..c337afd8e6ecacb963b3227285fcb130289b6015
Binary files /dev/null and b/dataset/clean/105.jpg differ
diff --git a/dataset/clean/106.jpg b/dataset/clean/106.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..40360afb25115263128118117dfa91faa92cdc65
Binary files /dev/null and b/dataset/clean/106.jpg differ
diff --git a/dataset/clean/107.jpg b/dataset/clean/107.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..6ba3ea3e17efb14211598aa05f86d98292cc61a5
Binary files /dev/null and b/dataset/clean/107.jpg differ
diff --git a/dataset/clean/108.jpg b/dataset/clean/108.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..afe0aed628684ac523a23d12b0b6544f92e9049b
Binary files /dev/null and b/dataset/clean/108.jpg differ
diff --git a/dataset/clean/109.jpg b/dataset/clean/109.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..b094481461036bc98a89a2f263c9191b1e0a8194
Binary files /dev/null and b/dataset/clean/109.jpg differ
diff --git a/dataset/clean/110.jpg b/dataset/clean/110.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..b2391fb00b10e3d40bed85d3b9f38e6ff7d4caa5
Binary files /dev/null and b/dataset/clean/110.jpg differ
diff --git a/dataset/clean/111.jpg b/dataset/clean/111.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..d99b1e7bee1054995a6f840a9eb34fca04582c02
Binary files /dev/null and b/dataset/clean/111.jpg differ
diff --git a/dataset/clean/112.jpg b/dataset/clean/112.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..5c8496d65c4808ba88930872d9da2865d107a7c4
Binary files /dev/null and b/dataset/clean/112.jpg differ
diff --git a/dataset/clean/113.jpg b/dataset/clean/113.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..376f2cc5572344c6bbeecd86e1be64c28ea10602
Binary files /dev/null and b/dataset/clean/113.jpg differ
diff --git a/dataset/clean/114.jpg b/dataset/clean/114.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..02abbabc104026ffbb4471211c2cb60aaa93a74e
Binary files /dev/null and b/dataset/clean/114.jpg differ
diff --git a/dataset/clean/115.jpg b/dataset/clean/115.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..860a5b3422a66146f546b82dbb9eef1746dd3742
Binary files /dev/null and b/dataset/clean/115.jpg differ
diff --git a/dataset/clean/116.jpg b/dataset/clean/116.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..8255b4e877afdfe873936ee5b31eb7a08c7d9855
Binary files /dev/null and b/dataset/clean/116.jpg differ
diff --git a/dataset/clean/117.jpg b/dataset/clean/117.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..b0a0ae8d5ebe67cf731c2c732ce7e9d914841db3
Binary files /dev/null and b/dataset/clean/117.jpg differ
diff --git a/dataset/clean/118.jpg b/dataset/clean/118.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..c44f7e2e2d71901e32db845d7d2fd28c194f015e
Binary files /dev/null and b/dataset/clean/118.jpg differ
diff --git a/dataset/clean/119.jpg b/dataset/clean/119.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..ff44d13085e01795e9bfd16258534fd6b859b8b2
Binary files /dev/null and b/dataset/clean/119.jpg differ
diff --git a/dataset/clean/120.jpg b/dataset/clean/120.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..b4214c3689f3be100cf374583065fcdc6e5f17db
Binary files /dev/null and b/dataset/clean/120.jpg differ
diff --git a/dataset/clean/121.jpg b/dataset/clean/121.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..6d35fab280b21e4c04a46de9890e4fed605472ee
Binary files /dev/null and b/dataset/clean/121.jpg differ
diff --git a/dataset/clean/122.jpg b/dataset/clean/122.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..b8bcc670cde9ee2e8533b3296c5fbedc23dd1120
Binary files /dev/null and b/dataset/clean/122.jpg differ
diff --git a/dataset/clean/123.jpg b/dataset/clean/123.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..0c546fe344495c0bf3d5fd2bc0ceda90e1fbda09
Binary files /dev/null and b/dataset/clean/123.jpg differ
diff --git a/dataset/clean/124.jpg b/dataset/clean/124.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..ecfd2ec13ad346c8f05c5c562acd0a3670edaed8
Binary files /dev/null and b/dataset/clean/124.jpg differ
diff --git a/dataset/clean/125.jpg b/dataset/clean/125.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..7e52e9c7059eb315cf5f9d73b361d88b91bf2689
Binary files /dev/null and b/dataset/clean/125.jpg differ
diff --git a/dataset/clean/126.jpg b/dataset/clean/126.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..531e6537085b7d953a4afd13515503303d38881a
Binary files /dev/null and b/dataset/clean/126.jpg differ
diff --git a/dataset/clean/127.jpg b/dataset/clean/127.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..e0d526874989ce230163e02507aa78aece179ced
Binary files /dev/null and b/dataset/clean/127.jpg differ
diff --git a/dataset/clean/128.jpg b/dataset/clean/128.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..24b8bb94ee6aea669564d8573584f3cd387ab580
Binary files /dev/null and b/dataset/clean/128.jpg differ
diff --git a/dataset/clean/129.jpg b/dataset/clean/129.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..e03d2926eff1afcfd2611436c30a0f5138909d0b
Binary files /dev/null and b/dataset/clean/129.jpg differ
diff --git a/dataset/clean/130.jpg b/dataset/clean/130.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..2093cfadba0e44b1309c44df25c99aa2dd3de3b7
Binary files /dev/null and b/dataset/clean/130.jpg differ
diff --git a/dataset/clean/131.jpg b/dataset/clean/131.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..7646570277b7ce25c1e05653cd77108434f0a05c
Binary files /dev/null and b/dataset/clean/131.jpg differ
diff --git a/dataset/clean/132.jpg b/dataset/clean/132.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..da3306402182447b7e5e4e8c539daf4074a09929
Binary files /dev/null and b/dataset/clean/132.jpg differ
diff --git a/dataset/clean/133.jpg b/dataset/clean/133.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..0c27d367923da2a9175d8833f2d23c9932642300
Binary files /dev/null and b/dataset/clean/133.jpg differ
diff --git a/dataset/clean/134.jpg b/dataset/clean/134.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..7d3de9a78f15d5626c091102350d6a6cbef84fb6
Binary files /dev/null and b/dataset/clean/134.jpg differ
diff --git a/dataset/clean/135.jpg b/dataset/clean/135.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..c0061b13251378b311d554883e39e4c5b0e7046e
Binary files /dev/null and b/dataset/clean/135.jpg differ
diff --git a/dataset/clean/136.jpg b/dataset/clean/136.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..b30a247019389fe84fc2a421ed855192e4c5252f
Binary files /dev/null and b/dataset/clean/136.jpg differ
diff --git a/dataset/clean/137.jpg b/dataset/clean/137.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..bd62749a0e32898d3b5b9cdf72bbe95c7e646d58
Binary files /dev/null and b/dataset/clean/137.jpg differ
diff --git a/dataset/clean/138.jpg b/dataset/clean/138.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..955d3f1bab4d78080e2c80b68539c25a0841565c
Binary files /dev/null and b/dataset/clean/138.jpg differ
diff --git a/dataset/clean/139.jpg b/dataset/clean/139.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..4a6ed164a0ca42b69b75ca8aa12c94e0ced2eb1a
Binary files /dev/null and b/dataset/clean/139.jpg differ
diff --git a/dataset/clean/140.jpg b/dataset/clean/140.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..0dde00f4614f0fa3979231ab855f35a24021dbea
Binary files /dev/null and b/dataset/clean/140.jpg differ
diff --git a/dataset/clean/141.jpg b/dataset/clean/141.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..073c8e4e501b58dc331de1bfe0dea2ebbf5e1305
Binary files /dev/null and b/dataset/clean/141.jpg differ
diff --git a/dataset/clean/142.jpg b/dataset/clean/142.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..c8da0fafe41d895eafe56a8ba94b521c28f306a3
Binary files /dev/null and b/dataset/clean/142.jpg differ
diff --git a/dataset/clean/143.jpg b/dataset/clean/143.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..fbaf1a79e9e5ea57a313d0ca719f3321e7552426
Binary files /dev/null and b/dataset/clean/143.jpg differ
diff --git a/dataset/clean/144.jpg b/dataset/clean/144.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..85244e4d0bdf251b14eb15a41bd1ae60c6f68573
Binary files /dev/null and b/dataset/clean/144.jpg differ
diff --git a/dataset/clean/145.jpg b/dataset/clean/145.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..4c060b8d6323c228a625702dceff313b70ad4c0b
Binary files /dev/null and b/dataset/clean/145.jpg differ
diff --git a/dataset/clean/146.jpg b/dataset/clean/146.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..0f7594cab71d7c9483e485555a73b604aa558e94
Binary files /dev/null and b/dataset/clean/146.jpg differ
diff --git a/dataset/clean/147.jpg b/dataset/clean/147.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..e3bb0c57d2be168fe4a09d385ff4fcec58612f9e
Binary files /dev/null and b/dataset/clean/147.jpg differ
diff --git a/dataset/clean/148.jpg b/dataset/clean/148.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..88b20ac320d77e314439b284918c9c6cd8d0e8c0
Binary files /dev/null and b/dataset/clean/148.jpg differ
diff --git a/dataset/clean/149.jpg b/dataset/clean/149.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..5e6c2f17a6db654a1bc03e8f94d2b09df2ab257c
Binary files /dev/null and b/dataset/clean/149.jpg differ
diff --git a/dataset/noisy/000.jpg b/dataset/noisy/000.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..f884ebd22102e0e2e5e823cf9a76d7a8586a035a
Binary files /dev/null and b/dataset/noisy/000.jpg differ
diff --git a/dataset/noisy/001.jpg b/dataset/noisy/001.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..b16e357a2dcaf1885d78e5a9b007a06a31062168
Binary files /dev/null and b/dataset/noisy/001.jpg differ
diff --git a/dataset/noisy/002.jpg b/dataset/noisy/002.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..842cb29e95e79fc2309841557603bd9d67a10347
Binary files /dev/null and b/dataset/noisy/002.jpg differ
diff --git a/dataset/noisy/003.jpg b/dataset/noisy/003.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..96d2bd053414041fc8edaa03e0383362a2790c48
Binary files /dev/null and b/dataset/noisy/003.jpg differ
diff --git a/dataset/noisy/004.jpg b/dataset/noisy/004.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..63de6da62a8453c38597ab3665a5b3f9d4dd9868
Binary files /dev/null and b/dataset/noisy/004.jpg differ
diff --git a/dataset/noisy/005.jpg b/dataset/noisy/005.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..bccea79b0da24ad2b23c4673e1bf19a87388c77f
Binary files /dev/null and b/dataset/noisy/005.jpg differ
diff --git a/dataset/noisy/006.jpg b/dataset/noisy/006.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..0900fb901b0d5491ce4dd4f75ca0c8cda1645129
Binary files /dev/null and b/dataset/noisy/006.jpg differ
diff --git a/dataset/noisy/007.jpg b/dataset/noisy/007.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..d60670bea3aeec4cb1841c3fb40b7704b4a0524d
Binary files /dev/null and b/dataset/noisy/007.jpg differ
diff --git a/dataset/noisy/008.jpg b/dataset/noisy/008.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..46e315338928039ad3696c6a64c11c92a7d2b629
Binary files /dev/null and b/dataset/noisy/008.jpg differ
diff --git a/dataset/noisy/009.jpg b/dataset/noisy/009.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..355f4ac7679e9d004e71ee116ed292bfa0dd7c2c
Binary files /dev/null and b/dataset/noisy/009.jpg differ
diff --git a/dataset/noisy/010.jpg b/dataset/noisy/010.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..8cd23bd38b366bdcc7d0c7c248c92b2681880067
Binary files /dev/null and b/dataset/noisy/010.jpg differ
diff --git a/dataset/noisy/011.jpg b/dataset/noisy/011.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..12985804fcf64ad1c0572238e465c00677a8d426
Binary files /dev/null and b/dataset/noisy/011.jpg differ
diff --git a/dataset/noisy/012.jpg b/dataset/noisy/012.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..e1a5209e11f71513ad344579c65bfa1c3ab447fe
Binary files /dev/null and b/dataset/noisy/012.jpg differ
diff --git a/dataset/noisy/013.jpg b/dataset/noisy/013.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..40865f962714ba71d9964bf589a9cc0be8b40b3b
Binary files /dev/null and b/dataset/noisy/013.jpg differ
diff --git a/dataset/noisy/014.jpg b/dataset/noisy/014.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..3c64f8d3c46f17c7d7d0c4fa35813536f536124e
Binary files /dev/null and b/dataset/noisy/014.jpg differ
diff --git a/dataset/noisy/015.jpg b/dataset/noisy/015.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..cee50f8d7742512f360e0cbd2e7e203e2765c002
Binary files /dev/null and b/dataset/noisy/015.jpg differ
diff --git a/dataset/noisy/016.jpg b/dataset/noisy/016.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..f6a24f8dd014db1e75e79fba0fa5b070e2848100
Binary files /dev/null and b/dataset/noisy/016.jpg differ
diff --git a/dataset/noisy/017.jpg b/dataset/noisy/017.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..b3a8a68c138d9abc3d41933ab66569d6ea2d082a
Binary files /dev/null and b/dataset/noisy/017.jpg differ
diff --git a/dataset/noisy/018.jpg b/dataset/noisy/018.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..529f7341ec8001504cef945a45d2436239cee053
Binary files /dev/null and b/dataset/noisy/018.jpg differ
diff --git a/dataset/noisy/019.jpg b/dataset/noisy/019.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..42413ba01f5274c91c87d0d788a28305fceec830
Binary files /dev/null and b/dataset/noisy/019.jpg differ
diff --git a/dataset/noisy/020.jpg b/dataset/noisy/020.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..7cbac3c58e0cb88e7b315ced983fca34a2636849
Binary files /dev/null and b/dataset/noisy/020.jpg differ
diff --git a/dataset/noisy/021.jpg b/dataset/noisy/021.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..00cf23350c559d654b8cc4fb39d4e7bed5974391
Binary files /dev/null and b/dataset/noisy/021.jpg differ
diff --git a/dataset/noisy/022.jpg b/dataset/noisy/022.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..c1e2066460af7c64d3477dcb35fa975240b5c154
Binary files /dev/null and b/dataset/noisy/022.jpg differ
diff --git a/dataset/noisy/023.jpg b/dataset/noisy/023.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..6831d8ff0e37790fb6458c54ad31caba198164ad
Binary files /dev/null and b/dataset/noisy/023.jpg differ
diff --git a/dataset/noisy/024.jpg b/dataset/noisy/024.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..1cfce593daf0eb5af793dbeb9c75b573acf014b4
Binary files /dev/null and b/dataset/noisy/024.jpg differ
diff --git a/dataset/noisy/025.jpg b/dataset/noisy/025.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..05d104f9ec719e33564218a15b5c8c359be3baad
Binary files /dev/null and b/dataset/noisy/025.jpg differ
diff --git a/dataset/noisy/026.jpg b/dataset/noisy/026.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..4acf7c2371cad1b47d53fbe24893a51af0eef8f4
Binary files /dev/null and b/dataset/noisy/026.jpg differ
diff --git a/dataset/noisy/027.jpg b/dataset/noisy/027.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..8f1e5b80e0da25e0644058004730e6d8c86766b2
Binary files /dev/null and b/dataset/noisy/027.jpg differ
diff --git a/dataset/noisy/028.jpg b/dataset/noisy/028.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..1b2c03ff158aba463bf783d9373a4f853c272bce
Binary files /dev/null and b/dataset/noisy/028.jpg differ
diff --git a/dataset/noisy/029.jpg b/dataset/noisy/029.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..eb934a662efea4001edcd4633d132f9a39d9ead6
Binary files /dev/null and b/dataset/noisy/029.jpg differ
diff --git a/dataset/noisy/030.jpg b/dataset/noisy/030.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..70f3230551f7e8a295bce2d74e19bbc452af8816
Binary files /dev/null and b/dataset/noisy/030.jpg differ
diff --git a/dataset/noisy/031.jpg b/dataset/noisy/031.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..edb9b9c78cc1bbd50607a857a0fd1b04da88b998
Binary files /dev/null and b/dataset/noisy/031.jpg differ
diff --git a/dataset/noisy/032.jpg b/dataset/noisy/032.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..866c09bdc14878be0e72d7f6ce2cb4deb1fb434c
Binary files /dev/null and b/dataset/noisy/032.jpg differ
diff --git a/dataset/noisy/033.jpg b/dataset/noisy/033.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..7c21da06f6f94389db9bd28a36a22235fde576ee
Binary files /dev/null and b/dataset/noisy/033.jpg differ
diff --git a/dataset/noisy/034.jpg b/dataset/noisy/034.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..a10bf87d7194145edcc8cbfe5533f6917380eca4
Binary files /dev/null and b/dataset/noisy/034.jpg differ
diff --git a/dataset/noisy/035.jpg b/dataset/noisy/035.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..470de8411da8f4a5cfcbde90e1e938d83230bd66
Binary files /dev/null and b/dataset/noisy/035.jpg differ
diff --git a/dataset/noisy/036.jpg b/dataset/noisy/036.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..c72687db092085c6917fb5236585d4c86b3d3738
Binary files /dev/null and b/dataset/noisy/036.jpg differ
diff --git a/dataset/noisy/037.jpg b/dataset/noisy/037.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..0eb221ee4b127adfb80cc5668ee72c1b350c159d
Binary files /dev/null and b/dataset/noisy/037.jpg differ
diff --git a/dataset/noisy/038.jpg b/dataset/noisy/038.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..dfd5c43df5a938366c8e1ce557c43182af7afcaa
Binary files /dev/null and b/dataset/noisy/038.jpg differ
diff --git a/dataset/noisy/039.jpg b/dataset/noisy/039.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..2b5789eaa90d7446c817bbda673cd5be64a2c540
Binary files /dev/null and b/dataset/noisy/039.jpg differ
diff --git a/dataset/noisy/040.jpg b/dataset/noisy/040.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..ce73a11f4fe8e809b9f4ff03161c6840619aede7
Binary files /dev/null and b/dataset/noisy/040.jpg differ
diff --git a/dataset/noisy/041.jpg b/dataset/noisy/041.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..3f0d20d3dc4740662ce95d4e6c4e0cbc267ae409
Binary files /dev/null and b/dataset/noisy/041.jpg differ
diff --git a/dataset/noisy/042.jpg b/dataset/noisy/042.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..e9f840bc4407a23089ad99f02afd5495bbb2656f
Binary files /dev/null and b/dataset/noisy/042.jpg differ
diff --git a/dataset/noisy/043.jpg b/dataset/noisy/043.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..bf5cf082ccfd286c819af0d84cc9451a23dc34dc
Binary files /dev/null and b/dataset/noisy/043.jpg differ
diff --git a/dataset/noisy/044.jpg b/dataset/noisy/044.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..db782b801bcfbe097d9a3ead7bffeddb75e747a2
Binary files /dev/null and b/dataset/noisy/044.jpg differ
diff --git a/dataset/noisy/045.jpg b/dataset/noisy/045.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..5d4e68faced3e13c4b32ddfde5c2f518ce0d96ba
Binary files /dev/null and b/dataset/noisy/045.jpg differ
diff --git a/dataset/noisy/046.jpg b/dataset/noisy/046.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..9850f735771296210cac52e441183898284fc9e3
Binary files /dev/null and b/dataset/noisy/046.jpg differ
diff --git a/dataset/noisy/047.jpg b/dataset/noisy/047.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..e430dce689c201b22bd896c2be3670b458015edb
Binary files /dev/null and b/dataset/noisy/047.jpg differ
diff --git a/dataset/noisy/048.jpg b/dataset/noisy/048.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..3752ba8f7441f34cedc70dd6afb3db9d2f0a5131
Binary files /dev/null and b/dataset/noisy/048.jpg differ
diff --git a/dataset/noisy/049.jpg b/dataset/noisy/049.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..59aa100101f8582cd5fde424e78ea8bb3663ba2f
Binary files /dev/null and b/dataset/noisy/049.jpg differ
diff --git a/dataset/noisy/050.jpg b/dataset/noisy/050.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..870fd2e41f21579fd175fa31db7bc140d76c8eee
Binary files /dev/null and b/dataset/noisy/050.jpg differ
diff --git a/dataset/noisy/051.jpg b/dataset/noisy/051.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..d7210f0e55b0848d5506b0e74ec6d80001fecf23
Binary files /dev/null and b/dataset/noisy/051.jpg differ
diff --git a/dataset/noisy/052.jpg b/dataset/noisy/052.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..f03ac7c4f1fe244eef66eced9fd36765500acc45
Binary files /dev/null and b/dataset/noisy/052.jpg differ
diff --git a/dataset/noisy/053.jpg b/dataset/noisy/053.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..fcc7150df19029c3f1ea50d1d86e4e06ca763bdc
Binary files /dev/null and b/dataset/noisy/053.jpg differ
diff --git a/dataset/noisy/054.jpg b/dataset/noisy/054.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..3adfc41d6c02e4f31c32a96114b50fd5fe0ec603
Binary files /dev/null and b/dataset/noisy/054.jpg differ
diff --git a/dataset/noisy/055.jpg b/dataset/noisy/055.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..b3fd6a4dcdbe888a1bfcaedc909e0c8bf3fc19e2
Binary files /dev/null and b/dataset/noisy/055.jpg differ
diff --git a/dataset/noisy/056.jpg b/dataset/noisy/056.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..d813d8bddcf9ef89a7952565a642c3b2f03e3581
Binary files /dev/null and b/dataset/noisy/056.jpg differ
diff --git a/dataset/noisy/057.jpg b/dataset/noisy/057.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..ad2814c0d6a3ee8927c464f5c08ac0fab8ff91ff
Binary files /dev/null and b/dataset/noisy/057.jpg differ
diff --git a/dataset/noisy/058.jpg b/dataset/noisy/058.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..233c4b214952b13490c903f60761726e81c5a67b
Binary files /dev/null and b/dataset/noisy/058.jpg differ
diff --git a/dataset/noisy/059.jpg b/dataset/noisy/059.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..52e93cf45d3363b85b6452926a888064e17cc8ec
Binary files /dev/null and b/dataset/noisy/059.jpg differ
diff --git a/dataset/noisy/060.jpg b/dataset/noisy/060.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..2a9288e9c879cad2c76b6c21f84c48b0cd85030c
Binary files /dev/null and b/dataset/noisy/060.jpg differ
diff --git a/dataset/noisy/061.jpg b/dataset/noisy/061.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..d544821c5d61f831f49bb1257a9371bfa46653e3
Binary files /dev/null and b/dataset/noisy/061.jpg differ
diff --git a/dataset/noisy/062.jpg b/dataset/noisy/062.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..081bda5523d26646471ce70ffc0fc04ef55daecc
Binary files /dev/null and b/dataset/noisy/062.jpg differ
diff --git a/dataset/noisy/063.jpg b/dataset/noisy/063.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..7b165e949c2b76803c6bae24200df60ba64f30ff
Binary files /dev/null and b/dataset/noisy/063.jpg differ
diff --git a/dataset/noisy/064.jpg b/dataset/noisy/064.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..2a72cbde804436a3d6cea4830249a538366169ed
Binary files /dev/null and b/dataset/noisy/064.jpg differ
diff --git a/dataset/noisy/065.jpg b/dataset/noisy/065.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..be358add9df0b25acf4d243b3701447e5d5d8b3f
Binary files /dev/null and b/dataset/noisy/065.jpg differ
diff --git a/dataset/noisy/066.jpg b/dataset/noisy/066.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..89654fb6b87b945121cebddc2fdd76d90e92c614
Binary files /dev/null and b/dataset/noisy/066.jpg differ
diff --git a/dataset/noisy/067.jpg b/dataset/noisy/067.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..7335e9be4f8f77bf60296451ecc7a634713c7a4d
Binary files /dev/null and b/dataset/noisy/067.jpg differ
diff --git a/dataset/noisy/068.jpg b/dataset/noisy/068.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..33dd5fbc619a21c01fa23ac2234dbe6944785ce1
Binary files /dev/null and b/dataset/noisy/068.jpg differ
diff --git a/dataset/noisy/069.jpg b/dataset/noisy/069.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..fb25181bdf8e92203748ac04f89bbf1cd9a195c3
Binary files /dev/null and b/dataset/noisy/069.jpg differ
diff --git a/dataset/noisy/070.jpg b/dataset/noisy/070.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..92442afb078829f74b3079286f8bb60179d6d32f
Binary files /dev/null and b/dataset/noisy/070.jpg differ
diff --git a/dataset/noisy/071.jpg b/dataset/noisy/071.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..fab7a24c9fe45c50f34f55a444ce34038d93a188
Binary files /dev/null and b/dataset/noisy/071.jpg differ
diff --git a/dataset/noisy/072.jpg b/dataset/noisy/072.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..e5c508fa532fcfe2624996127ee421b42d6b3bdc
Binary files /dev/null and b/dataset/noisy/072.jpg differ
diff --git a/dataset/noisy/073.jpg b/dataset/noisy/073.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..bbd893e01c5f07fefaa98994d9bf8404e8644422
Binary files /dev/null and b/dataset/noisy/073.jpg differ
diff --git a/dataset/noisy/074.jpg b/dataset/noisy/074.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..9d27c7e9dde5601bde3d5bb89f97a62b62ce730a
Binary files /dev/null and b/dataset/noisy/074.jpg differ
diff --git a/dataset/noisy/075.jpg b/dataset/noisy/075.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..957fc1edd9e77909ae90b46f94802074061081d4
Binary files /dev/null and b/dataset/noisy/075.jpg differ
diff --git a/dataset/noisy/076.jpg b/dataset/noisy/076.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..4a056ff3cb0e39dcd76a4caa731ccd1f5a25bc4e
Binary files /dev/null and b/dataset/noisy/076.jpg differ
diff --git a/dataset/noisy/077.jpg b/dataset/noisy/077.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..0743bd6a0528624218da7da3c8eff31f55994175
Binary files /dev/null and b/dataset/noisy/077.jpg differ
diff --git a/dataset/noisy/078.jpg b/dataset/noisy/078.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..8baeb86aea2f6d35699cf5750d34e5cdda3ea250
Binary files /dev/null and b/dataset/noisy/078.jpg differ
diff --git a/dataset/noisy/079.jpg b/dataset/noisy/079.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..761a3e13fbab66877e56f04457616642e1b88515
Binary files /dev/null and b/dataset/noisy/079.jpg differ
diff --git a/dataset/noisy/080.jpg b/dataset/noisy/080.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..e634e57db90401735d19591d668bfc411b7e0673
Binary files /dev/null and b/dataset/noisy/080.jpg differ
diff --git a/dataset/noisy/081.jpg b/dataset/noisy/081.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..4e992fe718dabb28fe93c24ee0d7cd9e6b650e04
Binary files /dev/null and b/dataset/noisy/081.jpg differ
diff --git a/dataset/noisy/082.jpg b/dataset/noisy/082.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..a28e33e5f4de8b8744d315446bf691a7feadfd05
Binary files /dev/null and b/dataset/noisy/082.jpg differ
diff --git a/dataset/noisy/083.jpg b/dataset/noisy/083.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..882f004727fa308ee9f17be957bd07b8ffbbab4f
Binary files /dev/null and b/dataset/noisy/083.jpg differ
diff --git a/dataset/noisy/084.jpg b/dataset/noisy/084.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..06dfac34d66b3ca5151ca59a54a95030c7681983
Binary files /dev/null and b/dataset/noisy/084.jpg differ
diff --git a/dataset/noisy/085.jpg b/dataset/noisy/085.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..d0f969e15394ca6bd491fcf0086494347ac6ff94
Binary files /dev/null and b/dataset/noisy/085.jpg differ
diff --git a/dataset/noisy/086.jpg b/dataset/noisy/086.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..6ee3fe5aa8843b7e1d78571ec6510299a0166f79
Binary files /dev/null and b/dataset/noisy/086.jpg differ
diff --git a/dataset/noisy/087.jpg b/dataset/noisy/087.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..8a427b9882dc0688ece1e591b36f465d0e2dcc20
Binary files /dev/null and b/dataset/noisy/087.jpg differ
diff --git a/dataset/noisy/088.jpg b/dataset/noisy/088.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..8bab3c37c9aa01ca7291c6ede4cfd54976a91d15
Binary files /dev/null and b/dataset/noisy/088.jpg differ
diff --git a/dataset/noisy/089.jpg b/dataset/noisy/089.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..c8e6027d9a01921a2ead495b6a01cc5cd92502ae
Binary files /dev/null and b/dataset/noisy/089.jpg differ
diff --git a/dataset/noisy/090.jpg b/dataset/noisy/090.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..2598a2def2bb79b65a7c11b6fb73101506546cb2
Binary files /dev/null and b/dataset/noisy/090.jpg differ
diff --git a/dataset/noisy/091.jpg b/dataset/noisy/091.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..0e4073db0102277e6e77e4a6acf91d5d08a0f47a
Binary files /dev/null and b/dataset/noisy/091.jpg differ
diff --git a/dataset/noisy/092.jpg b/dataset/noisy/092.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..64ecd9296b44a1bd6b25ceeddf4e7b93c301e637
Binary files /dev/null and b/dataset/noisy/092.jpg differ
diff --git a/dataset/noisy/093.jpg b/dataset/noisy/093.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..cde9bb11e860a9d980ed6389c4c6a5374e38405d
Binary files /dev/null and b/dataset/noisy/093.jpg differ
diff --git a/dataset/noisy/094.jpg b/dataset/noisy/094.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..2492b410bc451d8a9a3505939385038c55d97fdd
Binary files /dev/null and b/dataset/noisy/094.jpg differ
diff --git a/dataset/noisy/095.jpg b/dataset/noisy/095.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..422668c391ab76cb3c8bdc4b78b2537224b6c469
Binary files /dev/null and b/dataset/noisy/095.jpg differ
diff --git a/dataset/noisy/096.jpg b/dataset/noisy/096.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..454db93bfed40765081ea3d59dc9d39892810d32
Binary files /dev/null and b/dataset/noisy/096.jpg differ
diff --git a/dataset/noisy/097.jpg b/dataset/noisy/097.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..f5df891e661c8999948d6b6677c63e2194f393ee
Binary files /dev/null and b/dataset/noisy/097.jpg differ
diff --git a/dataset/noisy/098.jpg b/dataset/noisy/098.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..b6acc2edc18b718a92c36b7bf52075a2cb487586
Binary files /dev/null and b/dataset/noisy/098.jpg differ
diff --git a/dataset/noisy/099.jpg b/dataset/noisy/099.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..4b773777a9971e2771289b4c8d409e51b439bdb3
Binary files /dev/null and b/dataset/noisy/099.jpg differ
diff --git a/dataset/noisy/100.jpg b/dataset/noisy/100.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..9135195ab2590c07fe1dd9a94cc8ef0d6cbe95e8
Binary files /dev/null and b/dataset/noisy/100.jpg differ
diff --git a/dataset/noisy/101.jpg b/dataset/noisy/101.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..d93472ed6d970e8fcaaac522a42fd67ea3796dad
Binary files /dev/null and b/dataset/noisy/101.jpg differ
diff --git a/dataset/noisy/102.jpg b/dataset/noisy/102.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..94c3c14e68513b66b935a89f1c7a27696bcaaaf4
Binary files /dev/null and b/dataset/noisy/102.jpg differ
diff --git a/dataset/noisy/103.jpg b/dataset/noisy/103.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..eb0a9ef895cdb62782ffbbcd3a9f392151561acb
Binary files /dev/null and b/dataset/noisy/103.jpg differ
diff --git a/dataset/noisy/104.jpg b/dataset/noisy/104.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..e7707670cf938d99baa641a5ee93eba773d3709a
Binary files /dev/null and b/dataset/noisy/104.jpg differ
diff --git a/dataset/noisy/105.jpg b/dataset/noisy/105.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..014ad76651c694b520d03e26daeaa6ea385ee3ef
Binary files /dev/null and b/dataset/noisy/105.jpg differ
diff --git a/dataset/noisy/106.jpg b/dataset/noisy/106.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..fdfcf036d4e22bf1e9bda00d6d2b6187ac081b7a
Binary files /dev/null and b/dataset/noisy/106.jpg differ
diff --git a/dataset/noisy/107.jpg b/dataset/noisy/107.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..1ddec85f6a10d4aa17cd8876c4918368ef82d140
Binary files /dev/null and b/dataset/noisy/107.jpg differ
diff --git a/dataset/noisy/108.jpg b/dataset/noisy/108.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..1223d54acfb5e2d7c1933a3e3be51393e27bcd84
Binary files /dev/null and b/dataset/noisy/108.jpg differ
diff --git a/dataset/noisy/109.jpg b/dataset/noisy/109.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..b9d3f6ae6b74eac36a151bd8ea93d8e047f7ae90
Binary files /dev/null and b/dataset/noisy/109.jpg differ
diff --git a/dataset/noisy/110.jpg b/dataset/noisy/110.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..a8ad5aac7b4089a1f8959faabcaea7eacaf73f3c
Binary files /dev/null and b/dataset/noisy/110.jpg differ
diff --git a/dataset/noisy/111.jpg b/dataset/noisy/111.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..68bc6073e5f431f677701f92ec14baa79fa647f9
Binary files /dev/null and b/dataset/noisy/111.jpg differ
diff --git a/dataset/noisy/112.jpg b/dataset/noisy/112.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..ccd55e7717c24f7c95ad20b245741bcc61713f4e
Binary files /dev/null and b/dataset/noisy/112.jpg differ
diff --git a/dataset/noisy/113.jpg b/dataset/noisy/113.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..156582c095caf0f7b923de37a36f039afaedd48d
Binary files /dev/null and b/dataset/noisy/113.jpg differ
diff --git a/dataset/noisy/114.jpg b/dataset/noisy/114.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..25b555ec1b56dbb08d5587ce20b800f9bb158d2d
Binary files /dev/null and b/dataset/noisy/114.jpg differ
diff --git a/dataset/noisy/115.jpg b/dataset/noisy/115.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..bea8bba82912245eec99359acad0a8b2b03a5e51
Binary files /dev/null and b/dataset/noisy/115.jpg differ
diff --git a/dataset/noisy/116.jpg b/dataset/noisy/116.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..501ba4afa09578d79b78966368258c7713f9e463
Binary files /dev/null and b/dataset/noisy/116.jpg differ
diff --git a/dataset/noisy/117.jpg b/dataset/noisy/117.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..74d630b47d33de0dc9258d665e3c326c844b2864
Binary files /dev/null and b/dataset/noisy/117.jpg differ
diff --git a/dataset/noisy/118.jpg b/dataset/noisy/118.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..56cb2bcdd104f765dade36840b3dc0e684ef01df
Binary files /dev/null and b/dataset/noisy/118.jpg differ
diff --git a/dataset/noisy/119.jpg b/dataset/noisy/119.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..ea929f3bedb8c2a34892cd15edc86f80d309a766
Binary files /dev/null and b/dataset/noisy/119.jpg differ
diff --git a/dataset/noisy/120.jpg b/dataset/noisy/120.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..ec7193479efd17b750b3fddf13a23944ea0e58fd
Binary files /dev/null and b/dataset/noisy/120.jpg differ
diff --git a/dataset/noisy/121.jpg b/dataset/noisy/121.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..812941705d06069c0be77a41f4b60be6f5874785
Binary files /dev/null and b/dataset/noisy/121.jpg differ
diff --git a/dataset/noisy/122.jpg b/dataset/noisy/122.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..2acc7f3df5b30992604be32ac721f832d76dfbf6
Binary files /dev/null and b/dataset/noisy/122.jpg differ
diff --git a/dataset/noisy/123.jpg b/dataset/noisy/123.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..7da04f70ac6d8c8b1f22b15fbad9f6ff7db9939e
Binary files /dev/null and b/dataset/noisy/123.jpg differ
diff --git a/dataset/noisy/124.jpg b/dataset/noisy/124.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..9886b22901271cebb719196de89315bf2ba38181
Binary files /dev/null and b/dataset/noisy/124.jpg differ
diff --git a/dataset/noisy/125.jpg b/dataset/noisy/125.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..26783c0fed516690ad214eeda7d59b04cd2bd54f
Binary files /dev/null and b/dataset/noisy/125.jpg differ
diff --git a/dataset/noisy/126.jpg b/dataset/noisy/126.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..4eef38392ad6c9a89bfa663202362888f8b475b2
Binary files /dev/null and b/dataset/noisy/126.jpg differ
diff --git a/dataset/noisy/127.jpg b/dataset/noisy/127.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..8db066d0b129ae322f2e20406f1ccfd734feec18
Binary files /dev/null and b/dataset/noisy/127.jpg differ
diff --git a/dataset/noisy/128.jpg b/dataset/noisy/128.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..6a4bcb3a83e1b715bf33d7e1b803d0d870390ad9
Binary files /dev/null and b/dataset/noisy/128.jpg differ
diff --git a/dataset/noisy/129.jpg b/dataset/noisy/129.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..6789c227dcb265989a23ee4a1dacc2d76ed4917a
Binary files /dev/null and b/dataset/noisy/129.jpg differ
diff --git a/dataset/noisy/130.jpg b/dataset/noisy/130.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..ecc3a50c0a65e0c3333b6a298efd8f668c4b9bbb
Binary files /dev/null and b/dataset/noisy/130.jpg differ
diff --git a/dataset/noisy/131.jpg b/dataset/noisy/131.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..2047124c9dfc4adb9318dfec5f649dacc60acca7
Binary files /dev/null and b/dataset/noisy/131.jpg differ
diff --git a/dataset/noisy/132.jpg b/dataset/noisy/132.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..6a956ac163d317bb4479c4dd41e633d4a0728d2c
Binary files /dev/null and b/dataset/noisy/132.jpg differ
diff --git a/dataset/noisy/133.jpg b/dataset/noisy/133.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..d8f84e98b3c52ba2ca555e48b1b8866e4cac4e0c
Binary files /dev/null and b/dataset/noisy/133.jpg differ
diff --git a/dataset/noisy/134.jpg b/dataset/noisy/134.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..2d61f1bde8d153098d3bedb3acc56c79e2b637a6
Binary files /dev/null and b/dataset/noisy/134.jpg differ
diff --git a/dataset/noisy/135.jpg b/dataset/noisy/135.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..3282159202bcf5bfe42039c5458b2cbe7ab03021
Binary files /dev/null and b/dataset/noisy/135.jpg differ
diff --git a/dataset/noisy/136.jpg b/dataset/noisy/136.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..7c833f50937b60249f8602c0084a0caeba838cbc
Binary files /dev/null and b/dataset/noisy/136.jpg differ
diff --git a/dataset/noisy/137.jpg b/dataset/noisy/137.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..d49081cf47e53fd9830b3fe697a928d5fd3695d6
Binary files /dev/null and b/dataset/noisy/137.jpg differ
diff --git a/dataset/noisy/138.jpg b/dataset/noisy/138.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..0a575af0aca2ebf0578c691d87fef01e59c22f4e
Binary files /dev/null and b/dataset/noisy/138.jpg differ
diff --git a/dataset/noisy/139.jpg b/dataset/noisy/139.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..96e8aad13dae0da675cdf60591e3ab344b349b1f
Binary files /dev/null and b/dataset/noisy/139.jpg differ
diff --git a/dataset/noisy/140.jpg b/dataset/noisy/140.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..e4c26b626097b2b5a3e9f2308c2670a279153821
Binary files /dev/null and b/dataset/noisy/140.jpg differ
diff --git a/dataset/noisy/141.jpg b/dataset/noisy/141.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..90a125da379dad5816c5b9689234bab72a02ef0d
Binary files /dev/null and b/dataset/noisy/141.jpg differ
diff --git a/dataset/noisy/142.jpg b/dataset/noisy/142.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..a29a503333209be7339be3ee21456093459ddb96
Binary files /dev/null and b/dataset/noisy/142.jpg differ
diff --git a/dataset/noisy/143.jpg b/dataset/noisy/143.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..d781aa36b86deb88933cbb0b436bb949c7706859
Binary files /dev/null and b/dataset/noisy/143.jpg differ
diff --git a/dataset/noisy/144.jpg b/dataset/noisy/144.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..f1cdce9c3a422cd5af5530353f456742e2141665
Binary files /dev/null and b/dataset/noisy/144.jpg differ
diff --git a/dataset/noisy/145.jpg b/dataset/noisy/145.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..985cd4752f1d92ea88cf0bbbb74f6745cd54cff4
Binary files /dev/null and b/dataset/noisy/145.jpg differ
diff --git a/dataset/noisy/146.jpg b/dataset/noisy/146.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..b25c9f0699c601047a03e3c1ace6ce0ee73855a1
Binary files /dev/null and b/dataset/noisy/146.jpg differ
diff --git a/dataset/noisy/147.jpg b/dataset/noisy/147.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..7f68c48e3b1d1e47a2a6e953a5a5a054e2b70716
Binary files /dev/null and b/dataset/noisy/147.jpg differ
diff --git a/dataset/noisy/148.jpg b/dataset/noisy/148.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..6ec453c6a55438dcd4c45bc33a47b4db7833dbee
Binary files /dev/null and b/dataset/noisy/148.jpg differ
diff --git a/dataset/noisy/149.jpg b/dataset/noisy/149.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..170f228660a01fa10e898896fcc6cd6d28ed1628
Binary files /dev/null and b/dataset/noisy/149.jpg differ
diff --git a/ink_vision_engine.py b/ink_vision_engine.py
new file mode 100644
index 0000000000000000000000000000000000000000..eff9bf12d1696d448d207cdb159f9750d5c734f3
--- /dev/null
+++ b/ink_vision_engine.py
@@ -0,0 +1,86 @@
+import torch
+import torch.nn as nn
+
+class CRAFT_Demonstration(nn.Module):
+
+ def __init__(self):
+ super().__init__()
+ # In reality, this is a deep ResNet-based U-Net architecture.
+ self.feature_extractor = nn.Conv2d(3, 64, kernel_size=3, padding=1)
+ self.heatmap_predictor = nn.Conv2d(64, 2, kernel_size=1)
+
+ def forward(self, image):
+ features = self.feature_extractor(image)
+ # Returns [Region Score, Affinity Score]
+ return self.heatmap_predictor(features)
+
+
+class VGG_FeatureExtractor(nn.Module):
+
+ def __init__(self, input_channel=1, output_channel=256):
+ super(VGG_FeatureExtractor, self).__init__()
+ self.ConvNet = nn.Sequential(
+ nn.Conv2d(input_channel, 64, 3, 1, 1), nn.ReLU(True),
+ nn.MaxPool2d(2, 2),
+ nn.Conv2d(64, 128, 3, 1, 1), nn.ReLU(True),
+ nn.MaxPool2d(2, 2),
+ nn.Conv2d(128, 256, 3, 1, 1), nn.ReLU(True),
+ nn.Conv2d(256, 256, 3, 1, 1), nn.ReLU(True),
+ nn.MaxPool2d((2, 1), (2, 1)),
+ nn.Conv2d(256, output_channel, 3, 1, 1, bias=False),
+ nn.BatchNorm2d(output_channel), nn.ReLU(True)
+ )
+
+ def forward(self, input):
+ return self.ConvNet(input)
+
+
+class BidirectionalLSTM(nn.Module):
+
+ def __init__(self, input_size, hidden_size, output_size):
+ super(BidirectionalLSTM, self).__init__()
+ self.rnn = nn.LSTM(input_size, hidden_size, bidirectional=True, batch_first=True)
+ self.linear = nn.Linear(hidden_size * 2, output_size)
+
+ def forward(self, input):
+ recurrent, _ = self.rnn(input)
+ output = self.linear(recurrent) # Contextual Features mapped to Classes
+ return output
+
+
+class CRNN_Model(nn.Module):
+
+ def __init__(self, num_classes=97):
+ super(CRNN_Model, self).__init__()
+
+ self.FeatureExtraction = VGG_FeatureExtractor(input_channel=1, output_channel=256)
+ self.AdaptiveAvgPool = nn.AdaptiveAvgPool2d((None, 1))
+
+ self.SequenceModeling = nn.Sequential(
+ BidirectionalLSTM(256, 256, 256),
+ BidirectionalLSTM(256, 256, 256)
+ )
+
+ self.Prediction = nn.Linear(256, num_classes)
+
+ def forward(self, image_tensor):
+ visual_feature = self.FeatureExtraction(image_tensor)
+ visual_feature = self.AdaptiveAvgPool(visual_feature.permute(0, 3, 1, 2)).squeeze(3)
+
+ contextual_feature = self.SequenceModeling(visual_feature)
+
+ prediction = self.Prediction(contextual_feature.contiguous())
+ return prediction
+
+
+def CTCDecoder(predictions):
+
+ max_probs = torch.argmax(predictions, dim=2)
+
+ final_string = []
+ for i in range(len(max_probs)):
+ if max_probs[i] != 0 and (i == 0 or max_probs[i] != max_probs[i-1]):
+ final_string.append(str(max_probs[i].item()))
+
+ return "".join(final_string)
+
diff --git a/pipeline/README.md b/pipeline/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..4cfa4555dfef3c8e3096d1ffa190a8b26259fc87
--- /dev/null
+++ b/pipeline/README.md
@@ -0,0 +1,29 @@
+# Modular 3-Step HTR Pipeline
+
+This directory contains a highly modular Machine Learning pipeline for Handwritten Text Recognition (HTR). It is designed to be easily extensible, allowing you to plug and play your own trained PyTorch/TensorFlow models into any of the 3 steps.
+
+## Pipeline Architecture
+
+1. **Step 1: The Preprocessor (`preprocessor.py`)**
+ - **Current Logic:** Denoising uses both OpenCV and LightCNN. OpenCV handles non-local means denoising, adaptive Gaussian thresholding, Green-Channel extraction; LightCNN is used for denoising alongside it. Contour-based MinAreaRect rotation for deskewing.
+ - **How to Swap:** To use a custom Deep Learning model (like a trained UNet for binarization or a CNN for deskewing), open `preprocessor.py`. Initialize your PyTorch/Keras model inside the `__init__()` function. Then, inside `binarize_and_denoise()` or `deskew()`, replace the `cv2` logic with your model's forward inference pass (e.g., `return my_unet_model(image)`).
+
+2. **Step 2: The OCR Engine (`ocr_engine.py`)**
+ - **Current Logic:** Wraps the `EasyOCR` library, tuned specifically for handwriting (lowered text and link thresholds, increased magnification ratio).
+ - **How to Swap:** You can point EasyOCR to your own fine-tuned weights by passing `model_storage_directory='path/to/models'` when initializing the reader. If you want to use an entirely different architecture (e.g., Microsoft TrOCR), simply replace the `self.reader` initialization with your TrOCR HuggingFace setup, and update the `extract_text()` function to call your model's generation function instead.
+
+3. **Step 3: The Postprocessor (`postprocessor.py`)**
+ - **Current Logic:** Uses a lightweight Hugging Face pipeline (`t5-small` text2text-generation) to conceptually attempt to reconstruct corrupted sentences grammatically, combined with standard regex cleaning.
+ - **How to Swap:** If you train a custom Transformer on OCR errors (e.g., fine-tuning BERT or BART to map "he110 th3re" -> "hello there"), simply change the `pipeline` instantiation in `__init__()` to load your local huggingface directory: `pipeline("text2text-generation", model="path/to/my/finetuned/model")`.
+
+## Execution
+
+Ensure your dependencies are installed:
+```bash
+pip install -r requirements.txt
+```
+
+Run a test image through the pipeline:
+```bash
+python pipeline/main.py --image path/to/handwritten_test.jpg
+```
diff --git a/pipeline/__init__.py b/pipeline/__init__.py
new file mode 100644
index 0000000000000000000000000000000000000000..6574e109d95dbe9873281c783c72b526b503c1a0
--- /dev/null
+++ b/pipeline/__init__.py
@@ -0,0 +1,3 @@
+"""
+HTR Pipeline Package
+"""
diff --git a/pipeline/__pycache__/__init__.cpython-311.pyc b/pipeline/__pycache__/__init__.cpython-311.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..b643cf3cb72637e1855feedf93074da15cd03b65
Binary files /dev/null and b/pipeline/__pycache__/__init__.cpython-311.pyc differ
diff --git a/pipeline/__pycache__/ocr_engine.cpython-311.pyc b/pipeline/__pycache__/ocr_engine.cpython-311.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..e3fb1b66e6ca0bc22f378b6fa45b0f0634921677
Binary files /dev/null and b/pipeline/__pycache__/ocr_engine.cpython-311.pyc differ
diff --git a/pipeline/__pycache__/postprocessor.cpython-311.pyc b/pipeline/__pycache__/postprocessor.cpython-311.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..c725b699ea88863e6e33aa82eace6cbe22c8d112
Binary files /dev/null and b/pipeline/__pycache__/postprocessor.cpython-311.pyc differ
diff --git a/pipeline/__pycache__/preprocessor.cpython-311.pyc b/pipeline/__pycache__/preprocessor.cpython-311.pyc
new file mode 100644
index 0000000000000000000000000000000000000000..21bc5622f4f6c3fadc0c39322f56d982e89d1e4d
Binary files /dev/null and b/pipeline/__pycache__/preprocessor.cpython-311.pyc differ
diff --git a/pipeline/main.py b/pipeline/main.py
new file mode 100644
index 0000000000000000000000000000000000000000..b37a3b9b811ed03f29d956480fd9882694e5cdb1
--- /dev/null
+++ b/pipeline/main.py
@@ -0,0 +1,64 @@
+import sys
+import os
+
+# Add the parent directory to the path so the pipeline module can be imported
+sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+
+from pipeline.preprocessor import DocumentPreprocessor
+from pipeline.ocr_engine import HTREngine
+from pipeline.postprocessor import NLPCorrector
+from PIL import Image
+
+def run_pipeline(image_path):
+ """
+ Executes the 3-Step Modular HTR Pipeline on a test image.
+ """
+ print("==================================================")
+ print(f"Starting HTR Pipeline for: {image_path}")
+ print("==================================================")
+
+ # Boot up the modules (This usually happens once on server start)
+ preprocessor = DocumentPreprocessor()
+ engine = HTREngine(languages=['en'])
+ nlp_corrector = NLPCorrector(use_ml=True)
+
+ print("\n[STEP 1] Running Computer Vision Pre-Processing...")
+ try:
+ cleaned_image_array = preprocessor.process(image_path)
+ print(" -> Image binarized, denoised, and deskewed successfully.")
+ except Exception as e:
+ print(f" -> ERROR in Preprocessing: {e}")
+ return
+
+ print("\n[STEP 2] Running Deep Learning OCR Engine...")
+ try:
+ raw_text = engine.extract_text(cleaned_image_array)
+ print(f" -> Raw Output: '{raw_text}'")
+ except Exception as e:
+ print(f" -> ERROR in OCR Engine: {e}")
+ return
+
+ print("\n[STEP 3] Running NLP Post-Processing Contextual Correction...")
+ try:
+ final_text = nlp_corrector.correct_spelling(raw_text)
+ print("==================================================")
+ print(f"FINAL POLISHED RESULT: '{final_text}'")
+ print("==================================================")
+ except Exception as e:
+ print(f" -> ERROR in NLP Correction: {e}")
+
+if __name__ == "__main__":
+ # Test execution script
+ import argparse
+ parser = argparse.ArgumentParser(description="Run the Modular HTR Pipeline")
+ parser.add_argument("--image", type=str, required=True, help="Path to the handwritten image file.")
+
+ args, unknown = parser.parse_known_args()
+
+ if args.image:
+ if os.path.exists(args.image):
+ run_pipeline(args.image)
+ else:
+ print(f"Image not found at path: {args.image}")
+ else:
+ print("Please provide an image using python main.py --image path/to/image.jpg")
diff --git a/pipeline/ocr_engine.py b/pipeline/ocr_engine.py
new file mode 100644
index 0000000000000000000000000000000000000000..33eb13270f40ad182c7ba225ea6693c0919ae43a
--- /dev/null
+++ b/pipeline/ocr_engine.py
@@ -0,0 +1,84 @@
+import easyocr
+import numpy as np
+
+class HTREngine:
+ """
+ STEP 2: THE ENGINE (EasyOCR)
+ This module integrates EasyOCR to extract raw text and digits from the cleaned images.
+
+ HOT-SWAP ML MODELS HERE:
+ To swap locally trained weights for EasyOCR, place your .pth and .yaml config
+ in a directory and initialize the reader with: `model_storage_directory='path/to/models'`.
+ Alternatively, replace EasyOCR here completely with another engine (like TrOCR or PaddleOCR).
+ """
+
+ def __init__(self, languages=['en']):
+ # We initialize the model with standard parameters, but configured to be aggressive
+ # in recognition since the preprocessor has already cleaned the image perfectly.
+ print(f"Initializing EasyOCR Engine for {languages}...")
+ self.reader = easyocr.Reader(languages)
+
+ def extract_text(self, image_input):
+ """
+ Extracts text from the preprocessed image array.
+ """
+ # Read the image using parameters tuned for spaced handwriting
+ # - link_threshold: Increased to link disparate handwritten characters together.
+ # Read the image using paragraph mode
+ # - text_threshold: Lowered to catch faint red ink handwriting.
+ # - paragraph: True to ensure correct top-to-bottom reading flow.
+ results = self.reader.readtext(
+ image_input,
+ text_threshold=0.35, # Aggressively catch faint strokes
+ link_threshold=0.4,
+ mag_ratio=1.5,
+ paragraph=True
+ )
+
+ if not results:
+ return ""
+
+ # NATIVE PARAGRAPH MODE:
+ # In paragraph mode, results is a list of [bbox, text] pairs.
+ # However, to ensure 100% correct reading order (Top-to-Bottom then Left-to-Right),
+ # we apply a robust spatial sort.
+
+ if len(results) > 1:
+ # 1. Sort by top-left Y coordinate as a primary pass
+ results.sort(key=lambda x: x[0][0][1])
+
+ lines = []
+ for res in results:
+ bbox = res[0]
+ # Use the vertical center of the box for most stable line-matching
+ y_center = (bbox[0][1] + bbox[2][1]) / 2
+
+ # Check if this word can join an existing line
+ joined = False
+ for line in lines:
+ # Representative height of the line
+ line_h = np.mean([item[0][2][1] - item[0][0][1] for item in line])
+ line_y_avg = np.mean([(item[0][0][1] + item[0][2][1]) / 2 for item in line])
+
+ # If the word's center is within 50% of the line's average height, join it.
+ if abs(y_center - line_y_avg) < (line_h * 0.5):
+ line.append(res)
+ joined = True
+ break
+
+ if not joined:
+ lines.append([res])
+
+ # 2. Sort words within each clustered line by X-coordinate
+ for line in lines:
+ line.sort(key=lambda x: x[0][0][0])
+
+ # 3. Sort the lines themselves by the Y-coordinate of their FIRST word
+ # (which is usually the start of the line)
+ lines.sort(key=lambda line: line[0][0][1])
+
+ # 4. Flatten back into results
+ results = [word for line in lines for word in line]
+
+ raw_text = " ".join([text for (bbox, text) in results])
+ return raw_text
diff --git a/pipeline/postprocessor.py b/pipeline/postprocessor.py
new file mode 100644
index 0000000000000000000000000000000000000000..ec23e1544472a8614c33af43a68feac5c18bf60d
--- /dev/null
+++ b/pipeline/postprocessor.py
@@ -0,0 +1,230 @@
+from spellchecker import SpellChecker
+import re
+import os
+
+try:
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
+ import torch
+ HAS_TRANSFORMERS = True
+except ImportError:
+ HAS_TRANSFORMERS = False
+
+class NLPCorrector:
+ """
+ STEP 3: THE POST-PROCESSOR (NLP)
+ This module takes the raw, potentially misspelled output from the OCR engine
+ and uses a Natural Language Processing (NLP) technique to fix the text.
+
+ HOT-SWAP ML MODELS HERE:
+ Currently, this uses a robust, offline dictionary distance-based SpellChecker for lightning-fast results.
+ To swap in a heavy Deep Learning LM (like T5 or BERT):
+ 1. Load your local model via `AutoModelForSeq2SeqLM.from_pretrained('path/to/model')`
+ 2. Replace the `self.corrector` logic below.
+ """
+
+ def __init__(self, use_ml=True):
+ self.use_ml = use_ml
+ if self.use_ml:
+ print("Initializing NLP Post-Processor (Dictionary & Context Model)...")
+ self.corrector = SpellChecker()
+
+ # 4. INITIALIZE SEMANTIC JUDGE (BERT Tiny)
+ # This model is very small (~17MB) but understands English grammar and "meaning"
+ if HAS_TRANSFORMERS:
+ try:
+ print("Initializing Semantic Judge (BERT Tiny)...")
+ # Using a tiny model to avoid CPU/Memory overload
+ self.model_name = "google/bert_uncased_L-2_H-128_A-2"
+ self.tokenizer = AutoTokenizer.from_pretrained(self.model_name)
+ self.semantic_model = AutoModelForSequenceClassification.from_pretrained(self.model_name)
+ self.semantic_model.eval()
+ except Exception as e:
+ print(f"Semantic Judge failed to load: {e}")
+ self.semantic_model = None
+ else:
+ self.semantic_model = None
+
+ def basic_clean(self, text):
+ """ Quick regex cleanup for common OCR artifact characters. """
+ # Remove multiple spaces
+ text = re.sub(r'\s+', ' ', text)
+ # Remove weird symbols except standard punctuation
+ text = re.sub(r'[^\w\s\.,!\?-]', '', text)
+ return text.strip()
+
+ def correct_spelling(self, ocr_text):
+ """
+ Takes raw OCR text and attempts contextual reconstruction.
+ """
+ cleaned = self.basic_clean(ocr_text)
+
+ if not cleaned:
+ return ""
+
+ if not self.use_ml:
+ return cleaned
+
+ # Instead of a heavy LLM, we use a rapid NLP deterministic approach
+ words = cleaned.split()
+ fixed_words = []
+ for word in words:
+ # We ignore single digits or characters that are fine
+ if len(word) <= 1 or word.isdigit():
+ fixed_words.append(word)
+ continue
+
+ # 1. Start with raw lowercase
+ clean_word = word.lower()
+
+ # 2. Skip if it's a known correctly spelled word or short
+ if clean_word in self.corrector.word_frequency or len(clean_word) < 2:
+ fixed_words.append(word)
+ continue
+
+ # 3. HIGH ERROR HANDLING: Find candidates with edit distance
+ # If the OCR is messy, we look for the most likely candidates
+ candidates = self.corrector.candidates(clean_word)
+
+ # If we have strong candidates, pick the best one
+ if candidates and len(candidates) > 0:
+ # We prefer the 'correction' but can also inspect full candidate list
+ correction = self.corrector.correction(clean_word)
+ else:
+ correction = clean_word
+
+ if correction and correction != clean_word:
+ # Retain the user's original capitalization rules!
+ if word.isupper():
+ fixed_words.append(correction.upper())
+ elif word.istitle():
+ fixed_words.append(correction.capitalize())
+ else:
+ fixed_words.append(correction)
+ else:
+ fixed_words.append(word)
+
+ # 4. SPLIT WORD MERGING (e.g., 'import dance' -> 'importance')
+ # We perform a second pass to see if joining adjacent words creates a valid one.
+ final_pass = []
+ skip_next = False
+ for i in range(len(fixed_words)):
+ if skip_next:
+ skip_next = False
+ continue
+
+ if i + 1 < len(fixed_words):
+ joined = (fixed_words[i] + fixed_words[i+1]).lower()
+ # If the joined version exists in the dictionary but the individual ones were questionable
+ if joined in self.corrector.word_frequency:
+ # Check if the originals were also in dictionary
+ orig_1 = fixed_words[i].lower() in self.corrector.word_frequency
+ orig_2 = fixed_words[i+1].lower() in self.corrector.word_frequency
+
+ if not (orig_1 and orig_2): # If at least one was a 'broken' piece
+ final_pass.append(joined.upper() if fixed_words[i].isupper() else joined)
+ skip_next = True
+ continue
+
+ final_pass.append(fixed_words[i])
+
+ fixed_text = " ".join(final_pass)
+
+ # 5. GRAMMATICAL PASS (Helping Verbs & Articles)
+ # We ensure that common small words (is, am, are, the, a) are correctly
+ # spaced and positioned based on basic English syntax rules.
+ fixed_text = self.grammatical_pass(fixed_text)
+
+ return fixed_text
+
+ def grammatical_pass(self, text):
+ """
+ Lightweight heuristic pass to repair broken grammar patterns
+ common in OCR (e.g., 'has is' -> 'has', 'i s' -> 'is').
+ """
+ # Fix common helping verb fragments
+ replacements = {
+ r'\bha s\b': 'has',
+ r'\bi s\b': 'is',
+ r'\ba re\b': 'are',
+ r'\bwa s\b': 'was',
+ r'\bha ve\b': 'have',
+ r'\bha d\b': 'had',
+ r'\bt he\b': 'the',
+ r'\bh as\b': 'has',
+ r'\bi t s\b': 'its',
+ r'\ba n d\b': 'and'
+ }
+ for pattern, replacement in replacements.items():
+ text = re.sub(pattern, replacement, text, flags=re.IGNORECASE)
+
+ # Deduplicate common double helping verbs (OCR often double-reads)
+ text = re.sub(r'\b(has|is|was|were|had|have)\b\s+\b\1\b', r'\1', text, flags=re.IGNORECASE)
+
+ # Repair specific "has is" -> "has its" or "has" logic if needed
+ # (This handles the user's specific "has is" error case)
+ text = re.sub(r'\bhas is\b', 'has its', text, flags=re.IGNORECASE)
+
+ return text.strip()
+
+ def score_semantic(self, text):
+ """
+ Calculates a 'Coherence Score' for a sentence using BERT.
+ Determines how much 'sense' a sentence makes grammatically.
+ """
+ if not self.semantic_model or not text.strip():
+ return 0.5 # Neutral fallback
+
+ try:
+ inputs = self.tokenizer(text, return_tensors="pt", truncation=True, padding=True)
+ with torch.no_grad():
+ outputs = self.semantic_model(**inputs)
+ # We use the raw logits as a proxy for 'confidence' in the sequence structure
+ score = torch.softmax(outputs.logits, dim=1).max().item()
+ return score
+ except:
+ return 0.5
+
+ def judge_best_output(self, text_a, text_b):
+ """
+ ENSEMBLE JUDGE (Advanced)
+ Combines Dictionary Density and BERT-based Semantic Meaning.
+ Stream A (Raw Image) is slightly preferred when scores are close.
+ Final returned text has all digit characters removed (text-only output).
+ """
+ def calculate_score(text):
+ if not text or text.strip() == "":
+ return 0
+
+ words = text.split()
+
+ # 1. Dictionary Match (Grammatical check)
+ matches = sum(1 for w in words if w.lower() in self.corrector.word_frequency)
+ density = matches / len(words) if words else 0
+
+ # 2. Semantic Coherence (Meaning check)
+ semantic_weight = self.score_semantic(text)
+
+ # 3. Length Bonus
+ length_factor = min(len(text) / 50.0, 1.0)
+
+ # Weighted average
+ # Emphasize dictionary and length, de-emphasize semantic model
+ total_score = (density * 0.5) + (semantic_weight * 0.2) + (length_factor * 0.3)
+ return total_score
+
+ score_a = calculate_score(text_a)
+ score_b = calculate_score(text_b)
+
+ print(f"Ensemble Judge (Semantic) -> Stream A: {score_a:.2f}, Stream B: {score_b:.2f}")
+
+ # If both non-empty and scores are close, prefer the raw-image stream (Stream A)
+ if score_a > 0 and score_b > 0 and score_a >= score_b * 0.9:
+ chosen = text_a
+ else:
+ chosen = text_a if score_a > score_b else text_b
+
+ # Remove all digit characters from the final output (text-only)
+ chosen_no_digits = re.sub(r'\d+', '', chosen)
+ # Normalize extra spaces after removing digits
+ chosen_no_digits = re.sub(r'\s+', ' ', chosen_no_digits).strip()
+ return chosen_no_digits
diff --git a/pipeline/preprocessor.py b/pipeline/preprocessor.py
new file mode 100644
index 0000000000000000000000000000000000000000..5cfeee53356a8de8dadfa6f0bd6f6631af14558b
--- /dev/null
+++ b/pipeline/preprocessor.py
@@ -0,0 +1,156 @@
+import cv2
+import numpy as np
+from PIL import Image
+import torch
+import torch.nn as nn
+import torchvision.transforms as transforms
+
+# ---------------------------------------------------------
+# OPTIONAL: LIGHTWEIGHT CNN FOR DL-BASED PREPROCESSING
+# ---------------------------------------------------------
+class LightCNN_Denoiser(nn.Module):
+ """
+ A lightweight Convolutional Neural Network for denoising.
+ Used alongside OpenCV for preprocessing—both OpenCV and LightCNN handle denoising.
+ Once trained on pairs of (Messy Image -> Clean Image), this complements
+ the OpenCV mathematical denoising algorithms.
+ """
+ def __init__(self):
+ super(LightCNN_Denoiser, self).__init__()
+ # Simple AutoEncoder style CNN
+ self.encoder = nn.Sequential(
+ nn.Conv2d(3, 16, kernel_size=3, padding=1),
+ nn.ReLU(),
+ nn.MaxPool2d(2)
+ )
+ self.decoder = nn.Sequential(
+ nn.ConvTranspose2d(16, 3, kernel_size=2, stride=2),
+ nn.Sigmoid() # Scale pixels between 0 and 1
+ )
+
+ def forward(self, x):
+ x = self.encoder(x)
+ x = self.decoder(x)
+ return x
+
+# ---------------------------------------------------------
+# MAIN PREPROCESSOR MODULE
+# ---------------------------------------------------------
+class DocumentPreprocessor:
+ """
+ STEP 1: THE PRE-PROCESSOR (Computer Vision / Deep Learning)
+ This module cleans messy, handwritten images before they hit the OCR engine.
+ Denoising is done by both OpenCV and LightCNN together.
+
+ HOT-SWAP ML MODELS HERE:
+ Currently uses OpenCV + LightCNN for denoising. Set `use_dl_cnn=True` to enable
+ the LightCNN path (requires trained weights).
+ """
+
+ def __init__(self, use_dl_cnn=False):
+ self.use_dl_cnn = use_dl_cnn
+ if self.use_dl_cnn:
+ print("Loading LightCNN Deep Learning Preprocessor...")
+ self.cnn_model = LightCNN_Denoiser()
+ # self.cnn_model.load_state_dict(torch.load('path/to/lightcnn_weights.pth'))
+ self.cnn_model.eval()
+ self.transform = transforms.Compose([
+ transforms.ToTensor()
+ ])
+
+ def binarize_and_denoise(self, image):
+ """
+ Denoising uses both OpenCV and LightCNN (conceptually).
+ Removes shadows and enhances contrast.
+ Specifically optimized for Red Ink on White Paper.
+ """
+ if self.use_dl_cnn:
+ # DL APPROACH (Showcase only)
+ tensor_img = self.transform(image).unsqueeze(0)
+ with torch.no_grad():
+ cleaned_tensor = self.cnn_model(tensor_img)
+ cleaned_img = cleaned_tensor.squeeze().permute(1, 2, 0).numpy()
+ cleaned_img = (cleaned_img * 255).astype(np.uint8)
+ return cleaned_img
+ else:
+ # CV2 ENHANCED APPROACH
+ # 1. Use the GREEN channel for grayscale.
+ # (Red ink has highest contrast against green pixels)
+ gray = image[:,:,1] if len(image.shape) == 3 else image
+
+ # --- NOISE/GHOSTING FILTER (ADAPTIVE) ---
+ # Increase block size to 101 to ignore local ink thickness and focus on global lighting.
+ # Reduce C to 10 (less aggressive subtraction) to keep ink solid.
+ thresh = cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
+ cv2.THRESH_BINARY, 101, 10)
+
+ # Use bitwise_and to KEEP only the ink (black in thresh)
+ # while making the background perfectly white.
+ filtered = cv2.bitwise_or(gray, cv2.bitwise_not(thresh))
+
+ # 2. Apply CLAHE (Contrast Limited Adaptive Histogram Equalization)
+ clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8,8))
+ enhanced = clahe.apply(filtered)
+
+ # 3. Light Denoising
+ denoised = cv2.fastNlMeansDenoising(enhanced, None, h=5, templateWindowSize=7, searchWindowSize=21)
+
+ return cv2.cvtColor(denoised, cv2.COLOR_GRAY2BGR)
+
+ def deskew(self, image):
+ """
+ Automatically straightens the image/text without rotating it sideways.
+ """
+ # Convert to grayscale for bounding box detection
+ gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
+
+ # Invert the image (text is white, background is black for rotation angles)
+ gray_inv = cv2.bitwise_not(gray)
+
+ # Threshold to get text coordinates
+ thresh = cv2.threshold(gray_inv, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
+
+ # Grab all non-zero pixels safely
+ coords = np.column_stack(np.where(thresh > 0))
+ if len(coords) == 0:
+ return image
+
+ # Find minimum bounding rectangle which gives us the angle
+ rect = cv2.minAreaRect(coords)
+ angle = rect[-1]
+
+ # Correct the angle specifically for OpenCV 4.5+ which returns [0, 90)
+ # We don't want to rotate a horizontal box 90 degrees!
+ if angle > 45:
+ angle = angle - 90
+
+ # Rotate the original colored image to deskew
+ (h, w) = image.shape[:2]
+ center = (w // 2, h // 2)
+ M = cv2.getRotationMatrix2D(center, angle, 1.0)
+
+ # Use white border for padding during rotation
+ rotated = cv2.warpAffine(image, M, (w, h), flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_CONSTANT, borderValue=(255, 255, 255))
+
+ return rotated
+
+ def process(self, image_input):
+ """
+ Main pipeline entry point for CV operations.
+ Accepts a PIL Image, filepath, or NumPy array and returns a cleaned NumPy array.
+ """
+ if isinstance(image_input, Image.Image):
+ img = np.array(image_input)
+ img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR) # Convert to CV2 standard
+ elif isinstance(image_input, str):
+ img = cv2.imread(image_input)
+ else:
+ img = image_input
+
+ # 1. Binarize and Remove Shadows (Non-destructive)
+ cleaned = self.binarize_and_denoise(img)
+
+ # We skip explicit CV2 deskewing because EasyOCR's CRAFT detector
+ # is natively capable of detecting angled text, and global MinAreaRect
+ # breaks if the text forms a vertical column.
+ return cleaned
diff --git a/requirements.txt b/requirements.txt
index 28d994e22f8dd432b51df193562052e315ad95f7..93fc7a7f72945c6c1dc5ea4cbf82fab5995d8b6b 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,3 +1,16 @@
-altair
-pandas
-streamlit
\ No newline at end of file
+streamlit
+easyocr
+Pillow
+numpy
+opencv-python-headless
+streamlit-cropper
+torch
+torchvision
+easyocr
+pyspellchecker
+pillow-heif
+transformers
+torch
+torchvision
+googletrans==4.0.0rc1
+gTTS
diff --git a/training/generate_dataset.py b/training/generate_dataset.py
new file mode 100644
index 0000000000000000000000000000000000000000..1d63829f17d16ed03fbf8f0ec6f4e543ab74a473
--- /dev/null
+++ b/training/generate_dataset.py
@@ -0,0 +1,68 @@
+import os
+import cv2
+import numpy as np
+
+def generate_synthetic_data(num_samples=100, output_dir="dataset"):
+ """
+ Generates a small synthetic dataset for training the LightCNN denoiser.
+ Creates pairs of (Clean Image, Noisy Image) with simulated shadows and pencil faded ink.
+ """
+ clean_dir = os.path.join(output_dir, "clean")
+ noisy_dir = os.path.join(output_dir, "noisy")
+
+ os.makedirs(clean_dir, exist_ok=True)
+ os.makedirs(noisy_dir, exist_ok=True)
+
+ print(f"Generating {num_samples} synthetic training pairs...")
+
+ for i in range(num_samples):
+ # 1. Create a clean digital "handwritten" image
+ # White background
+ img = np.ones((128, 512, 3), dtype=np.uint8) * 255
+
+ # Draw some random text to simulate handwriting
+ text = f"Sample Text {np.random.randint(1000, 9999)}"
+ font = cv2.FONT_HERSHEY_SIMPLEX
+ thickness = np.random.randint(2, 5)
+ # Random position
+ x, y = np.random.randint(10, 50), np.random.randint(50, 90)
+ cv2.putText(img, text, (x, y), font, 1.5, (0, 0, 0), thickness, cv2.LINE_AA)
+
+ # Save the clean Ground Truth (y)
+ clean_path = os.path.join(clean_dir, f"{i:03d}.jpg")
+ cv2.imwrite(clean_path, img)
+
+ # 2. Add realistic noise to simulate a bad photo (x)
+ noisy = img.copy()
+
+ # Add a random gradient shadow
+ h, w = noisy.shape[:2]
+ gradient = np.zeros((h, w, 3), dtype=np.float32)
+ cv2.rectangle(gradient, (0, 0), (w, h), (np.random.randint(50, 150),)*3, -1)
+ gradient = cv2.GaussianBlur(gradient, (101, 101), 0)
+ noisy = cv2.addWeighted(noisy, 0.7, gradient.astype(np.uint8), 0.3, 0)
+
+ # Add salt and pepper noise
+ s_vs_p = 0.5
+ amount = 0.04
+ noisy_pixels = np.random.rand(h, w)
+ # Salt
+ noisy[noisy_pixels < amount * s_vs_p] = 255
+ # Pepper
+ noisy[noisy_pixels > 1 - amount * (1 - s_vs_p)] = 0
+
+ # Add a slight blur to simulate bad focus
+ if np.random.rand() > 0.5:
+ noisy = cv2.GaussianBlur(noisy, (5, 5), 0)
+
+ # Save the dirty input (x)
+ noisy_path = os.path.join(noisy_dir, f"{i:03d}.jpg")
+ cv2.imwrite(noisy_path, noisy)
+
+ print(f"Dataset generated in '{output_dir}'.")
+ print(f" Clean labels (y): {clean_dir}")
+ print(f" Noisy inputs (x): {noisy_dir}")
+
+if __name__ == "__main__":
+ # Create 150 samples for a quick toy training run
+ generate_synthetic_data(num_samples=150)
diff --git a/training/generate_nlp_data.py b/training/generate_nlp_data.py
new file mode 100644
index 0000000000000000000000000000000000000000..49ec3e5fdcc5c1c820d631a1b5f0aac6dfdcca7f
--- /dev/null
+++ b/training/generate_nlp_data.py
@@ -0,0 +1,34 @@
+import os
+import json
+
+def generate_nlp_dataset(output_file="training/nlp_data.json"):
+ """
+ Creates a pairing of (Common OCR Mistakes -> Correct Word).
+ This dataset can be used to 'train' the spellchecker's dictionary
+ or fine-tune a specialized NLP model.
+ """
+ data = [
+ {"input": "th3", "target": "the"},
+ {"input": "p3ople", "target": "people"},
+ {"input": "v0ice", "target": "voice"},
+ {"input": "no4hij", "target": "nothing"},
+ {"input": "Ia", "target": "in"},
+ {"input": "0f", "target": "of"},
+ {"input": "joshu4", "target": "joshua"},
+ {"input": "he11o", "target": "hello"},
+ {"input": "w0r1d", "target": "world"},
+ {"input": "re-ling", "target": "feeling"},
+ {"input": "odia", "target": "who"},
+ {"input": "wheo", "target": "who"},
+ {"input": "4!", "target": "voice"},
+ {"input": "314", "target": "a"}
+ ]
+
+ os.makedirs(os.path.dirname(output_file), exist_ok=True)
+ with open(output_file, 'w') as f:
+ json.dump(data, f, indent=4)
+
+ print(f"NLP Dataset created: {output_file}")
+
+if __name__ == "__main__":
+ generate_nlp_dataset()
diff --git a/training/nlp_data.json b/training/nlp_data.json
new file mode 100644
index 0000000000000000000000000000000000000000..736dafa6a6d6d54215b4098bb81b6fa4eaea3e10
--- /dev/null
+++ b/training/nlp_data.json
@@ -0,0 +1,58 @@
+[
+ {
+ "input": "th3",
+ "target": "the"
+ },
+ {
+ "input": "p3ople",
+ "target": "people"
+ },
+ {
+ "input": "v0ice",
+ "target": "voice"
+ },
+ {
+ "input": "no4hij",
+ "target": "nothing"
+ },
+ {
+ "input": "Ia",
+ "target": "in"
+ },
+ {
+ "input": "0f",
+ "target": "of"
+ },
+ {
+ "input": "joshu4",
+ "target": "joshua"
+ },
+ {
+ "input": "he11o",
+ "target": "hello"
+ },
+ {
+ "input": "w0r1d",
+ "target": "world"
+ },
+ {
+ "input": "re-ling",
+ "target": "feeling"
+ },
+ {
+ "input": "odia",
+ "target": "who"
+ },
+ {
+ "input": "wheo",
+ "target": "who"
+ },
+ {
+ "input": "4!",
+ "target": "voice"
+ },
+ {
+ "input": "314",
+ "target": "a"
+ }
+]
\ No newline at end of file
diff --git a/training/train_denoiser.py b/training/train_denoiser.py
new file mode 100644
index 0000000000000000000000000000000000000000..5518041b61f879d62ad8455367d04f14130c52e9
--- /dev/null
+++ b/training/train_denoiser.py
@@ -0,0 +1,105 @@
+import os
+import sys
+import torch
+import torch.nn as nn
+import torch.optim as optim
+from torch.utils.data import Dataset, DataLoader
+from torchvision import transforms
+from PIL import Image
+
+# Import the architecture we defined in the pipeline
+sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+from pipeline.preprocessor import LightCNN_Denoiser
+
+class DenoiserDataset(Dataset):
+ """
+ Loads pairs of (Noisy Input -> Clean Output) from the synthetic dataset.
+ """
+ def __init__(self, dataset_dir="dataset"):
+ self.clean_dir = os.path.join(dataset_dir, "clean")
+ self.noisy_dir = os.path.join(dataset_dir, "noisy")
+ self.image_files = os.listdir(self.clean_dir)
+
+ self.transform = transforms.Compose([
+ # Resize for consistent CNN batching
+ transforms.Resize((64, 256)),
+ transforms.ToTensor()
+ ])
+
+ def __len__(self):
+ return len(self.image_files)
+
+ def __getitem__(self, idx):
+ filename = self.image_files[idx]
+
+ clean_img = Image.open(os.path.join(self.clean_dir, filename)).convert("RGB")
+ noisy_img = Image.open(os.path.join(self.noisy_dir, filename)).convert("RGB")
+
+ clean_tensor = self.transform(clean_img)
+ noisy_tensor = self.transform(noisy_img)
+
+ return noisy_tensor, clean_tensor
+
+def train_model():
+ print("==================================================")
+ print("Initializing LightCNN Denoising Training Showcase")
+ print("==================================================")
+
+ # Check for GPU
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+ print(f"Using device: {device}")
+
+ # 1. Load Data
+ print("Loading synthetic dataset...")
+ try:
+ dataset = DenoiserDataset()
+ dataloader = DataLoader(dataset, batch_size=16, shuffle=True)
+ except FileNotFoundError:
+ print("ERROR: Dataset not found. Please run generate_dataset.py first!")
+ return
+
+ # 2. Initialize Model
+ model = LightCNN_Denoiser().to(device)
+ criterion = nn.MSELoss() # Measure the difference between pixels
+ optimizer = optim.Adam(model.parameters(), lr=0.001)
+
+ epochs = 5
+
+ # 3. Training Loop
+ print(f"Starting training for {epochs} epochs...")
+ model.train()
+
+ for epoch in range(epochs):
+ running_loss = 0.0
+ for i, (noisy_inputs, clean_targets) in enumerate(dataloader):
+ noisy_inputs = noisy_inputs.to(device)
+ clean_targets = clean_targets.to(device)
+
+ # Zero gradients
+ optimizer.zero_grad()
+
+ # Forward pass
+ outputs = model(noisy_inputs)
+
+ # Calculate pixel error
+ loss = criterion(outputs, clean_targets)
+
+ # Backward pass and optimize
+ loss.backward()
+ optimizer.step()
+
+ running_loss += loss.item()
+
+ print(f"Epoch [{epoch+1}/{epochs}] - Loss: {running_loss/len(dataloader):.4f}")
+
+ # 4. Save Weights
+ os.makedirs("weights", exist_ok=True)
+ save_path = "weights/lightcnn_weights.pth"
+ torch.save(model.state_dict(), save_path)
+
+ print("==================================================")
+ print(f"Training Complete. Showcase weights saved to: {save_path}")
+ print("To use this in production, set use_dl_cnn=True in pipeline/preprocessor.py")
+
+if __name__ == "__main__":
+ train_model()
diff --git a/training/train_nlp.py b/training/train_nlp.py
new file mode 100644
index 0000000000000000000000000000000000000000..309734a31a72e885f171e3d667a25c57936d50b2
--- /dev/null
+++ b/training/train_nlp.py
@@ -0,0 +1,41 @@
+import json
+import os
+import sys
+
+# Add project root to path
+sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+from pipeline.postprocessor import NLPCorrector
+
+def train_nlp_logic():
+ print("==================================================")
+ print("Initializing NLP Weights Update (OCR Error Tuning)")
+ print("==================================================")
+
+ nlp = NLPCorrector()
+
+ # Load the datasets
+ data_file = "training/nlp_data.json"
+ if not os.path.exists(data_file):
+ print("Dataset not found. Run generate_nlp_data.py first.")
+ return
+
+ with open(data_file, 'r') as f:
+ dataset = json.load(f)
+
+ print(f"Learning from {len(dataset)} OCR error patterns...")
+
+ # We 'train' the dictionary-based model by increasing the
+ # frequency/probability of the target words so they are chosen
+ # more aggressively when a mistake like 'no4hij' is found.
+ for pair in dataset:
+ target = pair['target']
+ # Feed the dictionary the correct word multiple times
+ # to boost its importance in the probability model
+ nlp.corrector.word_frequency.add(target)
+
+ print("Successfully tuned NLP probabilities for your handwriting!")
+ print("Showcase: The model has now 'learned' that '0f' is likely 'of'.")
+ print("==================================================")
+
+if __name__ == "__main__":
+ train_nlp_logic()
diff --git a/vector.jpeg b/vector.jpeg
new file mode 100644
index 0000000000000000000000000000000000000000..d529cc660ccc455132c2ef9294e322c1d212a77a
Binary files /dev/null and b/vector.jpeg differ