Spaces:

ethanrom
/

ocr-orderid3

Sleeping

File size: 13,761 Bytes

import streamlit as st
import tensorflow as tf
from button_click_alt import find_order_id_2
from button_click_alt import find_order_id_similarity
from flann import generate_images, flann_matching, flann_matching_3, flann_matching_alt, flann_matching_4
import cv2
import numpy as np
import pandas as pd
import plotly.express as px
import os
import altair as alt

FONTS_FOLDER = "fonts"
NUM_IMAGES_PER_FONT = 5

def main():
    st.set_page_config(page_title='Order ID Finder', layout='wide')
    st.title('OCR + Font type demo')

    tabs = st.tabs(["Intro", "Find Order", "Try FLANN Matching", "Results"])

    with tabs[0]:
        col1, col2 = st.columns([1, 2])
        with col1:
            st.image("image.jpg", use_column_width=True)
        with col2:
            st.markdown(
                """
                <h3 style='text-align: center;'>Jewellery Font Type Detection - Proposed Solution Demo</h3>
            

                <h4>Proposed Solution</h4>
                <p>One solution involves bolstering an OCR engine with a custom-trained CNN for font type classification. In this demo, I have trained two custom CNNs to classify 21 font types using a synthetic dataset of 4000 images for each font, generated using NumPy, PIL, and OpenCV. The dataset consists of text images rendered with different fonts, utilizing variations in font size and positioning to create diversity. However, training an accurate custom CNN for the given problem requires thousands of images due to the similar nature of the font types used in custom jewelry.</p>

                <p>There are two potential solutions to overcome this challenge:</p>

                <h5>Solution 1</h5>
                <p>Pre-process the image to a level where we can generate a similar synthetic dataset.</p>

                <h5>Solution 2</h5>
                <p>Use Photoshop batch actions to create thousands of realistic images.</p>

                <p> Alternatly use a feature matching algorithm as implemented in FLANN matching tab </p>
                """, unsafe_allow_html=True
            )
        
        st.subheader("Otsu's Thresholding")
        col3, col4 = st.columns([2, 1])
        with col4:
            st.image('otsu.PNG', use_column_width=True)

        with col3:
            st.markdown("""<p>Otsu thresholding can pre-process images to a similar level of a synthetic dataset. See the image :</p>
            <p>Otsu's method assumes that the image contains two distinct intensity distributions, corresponding to the foreground and background regions. 
            It calculates the threshold that minimizes the intra-class variance or maximizes the inter-class variance. 
            By choosing the threshold that maximizes the inter-class variance, Otsu's thresholding effectively separates the two classes, resulting in a binary image.</p> """, unsafe_allow_html=True)

            st.subheader("FLANN Matching")
            st.markdown("""<p>FLANN (Fast Library for Approximate Nearest Neighbors) is a popular library for performing fast and efficient nearest neighbor searches in high-dimensional spaces. 
            It is often used in computer vision tasks such as feature matching, where the goal is to find corresponding features between two images.
            In feature matching, one commonly used algorithm is SIFT (Scale-Invariant Feature Transform), which extracts keypoint descriptors from an image. 
            However, because SIFT produces a large number of keypoints, it can be computationally expensive to match them between images.
            To address this issue, the Lowe ratio test is often used in conjunction with SIFT and FLANN. 
            The ratio test involves comparing the distance between the two closest matches of a given keypoint descriptor. 
            If the ratio of these distances is below a certain threshold (typically 0.7), then the match is considered to be valid. 
            This helps to filter out false matches and improve the accuracy of the feature matching process.</p>
            <p> The ORB descriptor provides information about the local orientation and intensity of image features, which can be used to identify reliable matching points between two images. 
            By using RANSAC to estimate the homography between the matched keypoints, we can eliminate outliers and improve the accuracy of the registration process.</P>
            """, unsafe_allow_html=True)

        col5, col6 = st.columns(2)
        with col5:
            st.image('real_image.jpg', caption = 'A sample image with Otsu thresholding applied', width = 100)

        with col6:
            st.image('generated_image.jpg', caption = 'A generatated image', width = 100)

        colab_link = '[<img src="https://colab.research.google.com/assets/colab-badge.svg">](https://colab.research.google.com/drive/1cJy6ny9AGvhe_OdCK_5MxRhUDwuj1NF3?usp=sharing)'
        st.markdown(colab_link, unsafe_allow_html=True)



    with tabs[1]:
        st.write('## Find Order')
        st.markdown("""<p>The pretrained model has been trained with 84,000 synthetic generated images. The goal is to detect the font type from a given list of 21 font types.
        Each image starts with a capital letter followed by 3-10 random simple letters. images are created with random horizontal reslutions and resized to 64x64 at the end. 
        random amount of noise and rotation is also added.</p>
        """, unsafe_allow_html=True)
        
        with st.sidebar:
            st.write('## Input')
            uploaded_file = st.file_uploader('Upload the image file (PNG or JPG)', type=['png', 'jpg'], help='help')
            input_file = st.file_uploader('Upload the input file (TXT)', type=['txt'], help='text file containing order id, text, font type. in that order')
            with st.expander('OCR Settings'):
                ocre = st.selectbox('OCR Engine', ['Hive', 'Tesseract'])
                img_processing = st.selectbox('Image preprocessing', ['Gray Scaling', 'Thresholding, Denoising, Binarization, Skew Correction', 'Adaptive Thresholding, Morphological Operations, CCA'])
                
            with st.expander('Other Settings'):
                cnn_model = st.selectbox('Font Classification Model', ['CNN-MaxPool-Dense-Dropout', 'BatchNorm-CNN-MaxPool-Dense-Dropout'])
                similarity_method = st.selectbox('Similarity Check', ['jaccard_similarity', 'exact_match'])
        
        col1, col2 = st.columns([1, 2])
        with col1:
            if st.button('Find Order ID by OCR + font type') and uploaded_file and input_file:
                st.write('## Output')
                model = tf.keras.models.load_model('model.h5')
                result = find_order_id_2(uploaded_file, input_file, model, ocre)
                if result['status'] == 'success':
                    st.success(result['message'])
                elif result['status'] == 'warning':
                    st.warning(result['message'])
        with col2:
            if st.button('Find Order ID by OCR + similarity check') and uploaded_file and input_file:
                st.write('## Output')
                result = find_order_id_similarity(uploaded_file, input_file, similarity_method, ocre)
                if result['status'] == 'success':
                    st.success(result['message'])
                elif result['status'] == 'warning':
                    st.warning(result['message'])

    with tabs[2]:
        st.write('## Try FLANN Matching')
        st.markdown("""<p>Multiple images are generated for a given text, or detected text, with slight variations for each font type. Specifically, five images are created for each font, across all 21 font types. 
        To detect features, SIFT descriptors are utilized and matched using flann method. 
        Depending on the selected options, Lowe's ratio test, KNN matching, or ORB descriptor is then employed. 
        Average matching percentages are calculated for each font type, and the font type with the highest percentage is returned as the most likely one.</p>""", unsafe_allow_html=True)
        text_input = st.text_input("Enter your text:")
        upload_image = st.file_uploader("Choose an image:", type=["jpg", "jpeg", "png"])
        col1, col2, col3 = st.columns(3)
        with col1:
            num_trees = st.slider("Number of trees:", 1, 20, 5)
        with col2:
            num_checks = st.slider("Number of checks:", 1, 200, 50)
        with col3:
            matching_methods = ["FLANN with SIFT descriptor and ratio test", "FLANN with SIFT descriptor and KNN matching", "FLANN with SIFT descriptor, RANSAC homography estimation, and ORB descriptor", "Basic FLANN"]
            selected_method = st.selectbox("Select FLANN matching method:", matching_methods)
        if st.button("Generate Images"):
            if text_input:
                generated_images = generate_images(text_input)
                st.write(f"{len(generated_images)} images generated ({NUM_IMAGES_PER_FONT} per font) for {len(os.listdir(FONTS_FOLDER))} font types.")
                with st.expander("Generated Images"):
                    for img, font_file in generated_images:
                        st.image(img, caption=font_file)
            else:
                st.warning("Please enter some text before generating images.")
        if upload_image:
            query_image = cv2.imdecode(np.fromstring(upload_image.read(), np.uint8), cv2.IMREAD_UNCHANGED)
            st.image(query_image, caption="Uploaded Image")
        if st.button("Match"):
            generated_images = generate_images(text_input)
            if selected_method == "FLANN with SIFT descriptor and ratio test":
                matching_results = flann_matching_alt(generated_images, query_image, num_trees, num_checks)
            elif selected_method == "FLANN with SIFT descriptor and KNN matching":
                matching_results = flann_matching(generated_images, query_image, num_trees, num_checks)
            elif selected_method == "FLANN with SIFT descriptor, RANSAC homography estimation, and ORB descriptor":
                matching_results = flann_matching_3(generated_images, query_image, num_trees, num_checks)
            else:
                matching_results = flann_matching_4(generated_images, query_image, num_trees, num_checks)
            matching_percentages = []
            with col1:
                with st.expander("Matching Images"):
                    for r, f, p in matching_results:
                        st.image(r, caption=f"Matching result for {f}, Matches: {p:.2f}%")
            for r, font_file, p in matching_results:
                matching_percentages.append((font_file, p))
            df = pd.DataFrame(matching_percentages, columns=['Font Type', 'Match Percent'])
            avg_df = df.groupby('Font Type').mean()
            with col2:
                with st.expander("All Results"):
                    st.write("Overall matching percentages for each font type:")
                    st.table(df)
                st.write("Average matching percentage for each font type:")
                st.table(avg_df)
                fig = px.bar(avg_df.reset_index(), x='Font Type', y='Match Percent')
                fig.update_layout(title='Average Matching Percentages by Font Type')
                st.plotly_chart(fig)
            max_match_font = avg_df['Match Percent'].idxmax()
            st.success(f"The most likely font type is: {max_match_font}")
    
    with tabs[3]:
        st.title('Results')
        df = pd.read_csv('re.csv')
        st.dataframe(df)

        def calculate_accuracy(df, method):
            correct = df['correct font']
            predicted = df[method]
            accuracy = np.mean(correct == predicted)
            return round(accuracy, 3)

        col1, col2 = st.columns(2)
        with col1:
            data = pd.DataFrame({
                'Method': ['first cnn', 'second cnn', 'FLANN + SIFT + LOWE'],
                'Accuracy': [calculate_accuracy(df, 'first cnn'),
                            calculate_accuracy(df, 'second cnn'),
                            calculate_accuracy(df, 'FLANN + SIFT + LOWE')]
            })

            bar_chart = alt.Chart(data).mark_bar().encode(
                x='Method',
                y='Accuracy',
                color=alt.condition(
                    alt.datum.Accuracy >= 0.8,
                    alt.value('green'),
                    alt.value('red')
                )
            ).properties(title='Accuracy by Method')

            st.altair_chart(bar_chart)

        with col2:
            font_counts = df.groupby(['correct font']).size().reset_index(name='counts')
            font_counts_chart = alt.Chart(font_counts).mark_bar().encode(
                x=alt.X('correct font', sort='-y'),
                y='counts'
            ).properties(title='Number of Correct Predictions by Font')

            st.altair_chart(font_counts_chart)       

        # create a stacked bar chart showing the distribution of predicted fonts for each image
        melted = df.melt(id_vars=['Images', 'correct font'], var_name='Method', value_name='Predicted Font')
        melted_counts = melted.groupby(['Images', 'correct font', 'Predicted Font']).size().reset_index(name='counts')
        stacked_bar_chart = alt.Chart(melted_counts).mark_bar().encode(
            x=alt.X('Images', sort='-y'),
            y='counts',
            color='Predicted Font',
            order=alt.Order(
                'Predicted Font',
                sort='ascending'
            )
        ).properties(title='Distribution of Predicted Fonts by Image')

        st.altair_chart(stacked_bar_chart)

if __name__ == '__main__':
    main()