File size: 13,761 Bytes
7d30f01
 
9ec9085
2ed1ac5
 
 
 
 
 
 
9ec9085
2ed1ac5
 
 
7d30f01
 
 
 
 
9ec9085
2d012f9
65ed42d
dae1148
 
 
 
 
55e2b35
 
b7320ed
55e2b35
 
b7320ed
55e2b35
 
 
 
 
 
 
 
 
b7320ed
55e2b35
 
 
 
 
 
 
 
 
b7320ed
55e2b35
 
 
 
9ec9085
 
 
 
 
 
 
 
 
 
 
 
 
3025164
 
 
b7320ed
3025164
 
b7320ed
9f29e7d
55e2b35
65ed42d
2ed1ac5
 
65ed42d
 
f9b09af
 
 
9ec9085
65ed42d
 
 
2d012f9
5128352
2ed1ac5
dae1148
 
2ed1ac5
 
 
 
 
 
 
 
 
 
9ec9085
2ed1ac5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9ec9085
 
 
 
2ed1ac5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9ec9085
 
 
 
 
 
 
 
 
 
 
 
 
 
 
084ebfd
 
 
9ec9085
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2ed1ac5
9ec9085
7d30f01
 
c26ce52
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
import streamlit as st
import tensorflow as tf
from button_click_alt import find_order_id_2
from button_click_alt import find_order_id_similarity
from flann import generate_images, flann_matching, flann_matching_3, flann_matching_alt, flann_matching_4
import cv2
import numpy as np
import pandas as pd
import plotly.express as px
import os
import altair as alt

FONTS_FOLDER = "fonts"
NUM_IMAGES_PER_FONT = 5

def main():
    st.set_page_config(page_title='Order ID Finder', layout='wide')
    st.title('OCR + Font type demo')

    tabs = st.tabs(["Intro", "Find Order", "Try FLANN Matching", "Results"])

    with tabs[0]:
        col1, col2 = st.columns([1, 2])
        with col1:
            st.image("image.jpg", use_column_width=True)
        with col2:
            st.markdown(
                """
                <h3 style='text-align: center;'>Jewellery Font Type Detection - Proposed Solution Demo</h3>
            

                <h4>Proposed Solution</h4>
                <p>One solution involves bolstering an OCR engine with a custom-trained CNN for font type classification. In this demo, I have trained two custom CNNs to classify 21 font types using a synthetic dataset of 4000 images for each font, generated using NumPy, PIL, and OpenCV. The dataset consists of text images rendered with different fonts, utilizing variations in font size and positioning to create diversity. However, training an accurate custom CNN for the given problem requires thousands of images due to the similar nature of the font types used in custom jewelry.</p>

                <p>There are two potential solutions to overcome this challenge:</p>

                <h5>Solution 1</h5>
                <p>Pre-process the image to a level where we can generate a similar synthetic dataset.</p>

                <h5>Solution 2</h5>
                <p>Use Photoshop batch actions to create thousands of realistic images.</p>

                <p> Alternatly use a feature matching algorithm as implemented in FLANN matching tab </p>
                """, unsafe_allow_html=True
            )
        
        st.subheader("Otsu's Thresholding")
        col3, col4 = st.columns([2, 1])
        with col4:
            st.image('otsu.PNG', use_column_width=True)

        with col3:
            st.markdown("""<p>Otsu thresholding can pre-process images to a similar level of a synthetic dataset. See the image :</p>
            <p>Otsu's method assumes that the image contains two distinct intensity distributions, corresponding to the foreground and background regions. 
            It calculates the threshold that minimizes the intra-class variance or maximizes the inter-class variance. 
            By choosing the threshold that maximizes the inter-class variance, Otsu's thresholding effectively separates the two classes, resulting in a binary image.</p> """, unsafe_allow_html=True)

            st.subheader("FLANN Matching")
            st.markdown("""<p>FLANN (Fast Library for Approximate Nearest Neighbors) is a popular library for performing fast and efficient nearest neighbor searches in high-dimensional spaces. 
            It is often used in computer vision tasks such as feature matching, where the goal is to find corresponding features between two images.
            In feature matching, one commonly used algorithm is SIFT (Scale-Invariant Feature Transform), which extracts keypoint descriptors from an image. 
            However, because SIFT produces a large number of keypoints, it can be computationally expensive to match them between images.
            To address this issue, the Lowe ratio test is often used in conjunction with SIFT and FLANN. 
            The ratio test involves comparing the distance between the two closest matches of a given keypoint descriptor. 
            If the ratio of these distances is below a certain threshold (typically 0.7), then the match is considered to be valid. 
            This helps to filter out false matches and improve the accuracy of the feature matching process.</p>
            <p> The ORB descriptor provides information about the local orientation and intensity of image features, which can be used to identify reliable matching points between two images. 
            By using RANSAC to estimate the homography between the matched keypoints, we can eliminate outliers and improve the accuracy of the registration process.</P>
            """, unsafe_allow_html=True)

        col5, col6 = st.columns(2)
        with col5:
            st.image('real_image.jpg', caption = 'A sample image with Otsu thresholding applied', width = 100)

        with col6:
            st.image('generated_image.jpg', caption = 'A generatated image', width = 100)

        colab_link = '[<img src="https://colab.research.google.com/assets/colab-badge.svg">](https://colab.research.google.com/drive/1cJy6ny9AGvhe_OdCK_5MxRhUDwuj1NF3?usp=sharing)'
        st.markdown(colab_link, unsafe_allow_html=True)



    with tabs[1]:
        st.write('## Find Order')
        st.markdown("""<p>The pretrained model has been trained with 84,000 synthetic generated images. The goal is to detect the font type from a given list of 21 font types.
        Each image starts with a capital letter followed by 3-10 random simple letters. images are created with random horizontal reslutions and resized to 64x64 at the end. 
        random amount of noise and rotation is also added.</p>
        """, unsafe_allow_html=True)
        
        with st.sidebar:
            st.write('## Input')
            uploaded_file = st.file_uploader('Upload the image file (PNG or JPG)', type=['png', 'jpg'], help='help')
            input_file = st.file_uploader('Upload the input file (TXT)', type=['txt'], help='text file containing order id, text, font type. in that order')
            with st.expander('OCR Settings'):
                ocre = st.selectbox('OCR Engine', ['Hive', 'Tesseract'])
                img_processing = st.selectbox('Image preprocessing', ['Gray Scaling', 'Thresholding, Denoising, Binarization, Skew Correction', 'Adaptive Thresholding, Morphological Operations, CCA'])
                
            with st.expander('Other Settings'):
                cnn_model = st.selectbox('Font Classification Model', ['CNN-MaxPool-Dense-Dropout', 'BatchNorm-CNN-MaxPool-Dense-Dropout'])
                similarity_method = st.selectbox('Similarity Check', ['jaccard_similarity', 'exact_match'])
        
        col1, col2 = st.columns([1, 2])
        with col1:
            if st.button('Find Order ID by OCR + font type') and uploaded_file and input_file:
                st.write('## Output')
                model = tf.keras.models.load_model('model.h5')
                result = find_order_id_2(uploaded_file, input_file, model, ocre)
                if result['status'] == 'success':
                    st.success(result['message'])
                elif result['status'] == 'warning':
                    st.warning(result['message'])
        with col2:
            if st.button('Find Order ID by OCR + similarity check') and uploaded_file and input_file:
                st.write('## Output')
                result = find_order_id_similarity(uploaded_file, input_file, similarity_method, ocre)
                if result['status'] == 'success':
                    st.success(result['message'])
                elif result['status'] == 'warning':
                    st.warning(result['message'])

    with tabs[2]:
        st.write('## Try FLANN Matching')
        st.markdown("""<p>Multiple images are generated for a given text, or detected text, with slight variations for each font type. Specifically, five images are created for each font, across all 21 font types. 
        To detect features, SIFT descriptors are utilized and matched using flann method. 
        Depending on the selected options, Lowe's ratio test, KNN matching, or ORB descriptor is then employed. 
        Average matching percentages are calculated for each font type, and the font type with the highest percentage is returned as the most likely one.</p>""", unsafe_allow_html=True)
        text_input = st.text_input("Enter your text:")
        upload_image = st.file_uploader("Choose an image:", type=["jpg", "jpeg", "png"])
        col1, col2, col3 = st.columns(3)
        with col1:
            num_trees = st.slider("Number of trees:", 1, 20, 5)
        with col2:
            num_checks = st.slider("Number of checks:", 1, 200, 50)
        with col3:
            matching_methods = ["FLANN with SIFT descriptor and ratio test", "FLANN with SIFT descriptor and KNN matching", "FLANN with SIFT descriptor, RANSAC homography estimation, and ORB descriptor", "Basic FLANN"]
            selected_method = st.selectbox("Select FLANN matching method:", matching_methods)
        if st.button("Generate Images"):
            if text_input:
                generated_images = generate_images(text_input)
                st.write(f"{len(generated_images)} images generated ({NUM_IMAGES_PER_FONT} per font) for {len(os.listdir(FONTS_FOLDER))} font types.")
                with st.expander("Generated Images"):
                    for img, font_file in generated_images:
                        st.image(img, caption=font_file)
            else:
                st.warning("Please enter some text before generating images.")
        if upload_image:
            query_image = cv2.imdecode(np.fromstring(upload_image.read(), np.uint8), cv2.IMREAD_UNCHANGED)
            st.image(query_image, caption="Uploaded Image")
        if st.button("Match"):
            generated_images = generate_images(text_input)
            if selected_method == "FLANN with SIFT descriptor and ratio test":
                matching_results = flann_matching_alt(generated_images, query_image, num_trees, num_checks)
            elif selected_method == "FLANN with SIFT descriptor and KNN matching":
                matching_results = flann_matching(generated_images, query_image, num_trees, num_checks)
            elif selected_method == "FLANN with SIFT descriptor, RANSAC homography estimation, and ORB descriptor":
                matching_results = flann_matching_3(generated_images, query_image, num_trees, num_checks)
            else:
                matching_results = flann_matching_4(generated_images, query_image, num_trees, num_checks)
            matching_percentages = []
            with col1:
                with st.expander("Matching Images"):
                    for r, f, p in matching_results:
                        st.image(r, caption=f"Matching result for {f}, Matches: {p:.2f}%")
            for r, font_file, p in matching_results:
                matching_percentages.append((font_file, p))
            df = pd.DataFrame(matching_percentages, columns=['Font Type', 'Match Percent'])
            avg_df = df.groupby('Font Type').mean()
            with col2:
                with st.expander("All Results"):
                    st.write("Overall matching percentages for each font type:")
                    st.table(df)
                st.write("Average matching percentage for each font type:")
                st.table(avg_df)
                fig = px.bar(avg_df.reset_index(), x='Font Type', y='Match Percent')
                fig.update_layout(title='Average Matching Percentages by Font Type')
                st.plotly_chart(fig)
            max_match_font = avg_df['Match Percent'].idxmax()
            st.success(f"The most likely font type is: {max_match_font}")
    
    with tabs[3]:
        st.title('Results')
        df = pd.read_csv('re.csv')
        st.dataframe(df)

        def calculate_accuracy(df, method):
            correct = df['correct font']
            predicted = df[method]
            accuracy = np.mean(correct == predicted)
            return round(accuracy, 3)

        col1, col2 = st.columns(2)
        with col1:
            data = pd.DataFrame({
                'Method': ['first cnn', 'second cnn', 'FLANN + SIFT + LOWE'],
                'Accuracy': [calculate_accuracy(df, 'first cnn'),
                            calculate_accuracy(df, 'second cnn'),
                            calculate_accuracy(df, 'FLANN + SIFT + LOWE')]
            })

            bar_chart = alt.Chart(data).mark_bar().encode(
                x='Method',
                y='Accuracy',
                color=alt.condition(
                    alt.datum.Accuracy >= 0.8,
                    alt.value('green'),
                    alt.value('red')
                )
            ).properties(title='Accuracy by Method')

            st.altair_chart(bar_chart)

        with col2:
            font_counts = df.groupby(['correct font']).size().reset_index(name='counts')
            font_counts_chart = alt.Chart(font_counts).mark_bar().encode(
                x=alt.X('correct font', sort='-y'),
                y='counts'
            ).properties(title='Number of Correct Predictions by Font')

            st.altair_chart(font_counts_chart)       

        # create a stacked bar chart showing the distribution of predicted fonts for each image
        melted = df.melt(id_vars=['Images', 'correct font'], var_name='Method', value_name='Predicted Font')
        melted_counts = melted.groupby(['Images', 'correct font', 'Predicted Font']).size().reset_index(name='counts')
        stacked_bar_chart = alt.Chart(melted_counts).mark_bar().encode(
            x=alt.X('Images', sort='-y'),
            y='counts',
            color='Predicted Font',
            order=alt.Order(
                'Predicted Font',
                sort='ascending'
            )
        ).properties(title='Distribution of Predicted Fonts by Image')

        st.altair_chart(stacked_bar_chart)

if __name__ == '__main__':
    main()