Spaces:

sunilsarolkar
/

ISL-SignLanguageTranslation

Sleeping

App Files Files Community

SunilS commited on May 5

Commit

d0afa2c

1 Parent(s): 492e9c4

UI: Improve About App page with Streamlit tabs and training details

Browse files

Files changed (1) hide show

app.py +66 -71

app.py CHANGED Viewed

@@ -284,16 +284,17 @@ if app_mode =='About App':
     )
     # st.video('https://www.youtube.com/watch?v=FMaNNXgB_5c&ab_channel=AugmentedStartups')
-    st.markdown('''
           # Dataset Used \n
             This model is trained using [INCLUDE](https://zenodo.org/records/4010759) dataset. \n
             ### Key Statistics for the dataset is as follows-
                 +-----------------------+-----------------+
-                |    Charasteristics    | INCLUDE-DATASET |
                 +-----------------------+-----------------+
                 | Categories            | 15              |
                 | Words                 | 263             |
@@ -305,8 +306,8 @@ if app_mode =='About App':
                 | Frame Rate            | 25fps           |
                 | Resolution            | 1920x1080       |
                 +-----------------------+-----------------+
-            #### Size of each category
                 +--------------------+-------------------+------------------+
                 |      Category      | Number of Classes | Number of Videos |
@@ -328,76 +329,70 @@ if app_mode =='About App':
                 | Society            |                23 |              324 |
                 |                    |   Categories# 263 | Total Videos-4287|
                 +--------------------+-------------------+------------------+
-            Below are count of videos we were able to process (1986 of 4287). We processed limited set of records due to time/compute constraints.
-            ''')
-    try:
-        image = np.array(Image.open('categories_processed.png'))
-        st.image(image)
-    except FileNotFoundError:
-        st.warning('Image categories_processed.png is missing.')
-    st.markdown('''
-    #### Below are the count of Videos per Label for each Dataframe
-                ''')
-    try:
-        image = np.array(Image.open('eda/distribution_of_data.png'))
-        st.image(image)
-    except FileNotFoundError:
-        st.warning('Image eda/distribution_of_data.png is missing.')
-    st.markdown('''
-                ### Date Pipeline
-            ''')
-    try:
-        image = np.array(Image.open('DataPipeline.png'))
-        st.image(image)
-    except FileNotFoundError:
-        st.warning('Image DataPipeline.png is missing.')
-    st.markdown('''
-        ### Model structure
-            ```
-                translation_model = Sequential()
-                translation_model.add(Input(shape=((20, 156))))
-                translation_model.add(keras.layers.Masking(mask_value=0.))
-                translation_model.add(BatchNormalization())
-                translation_model.add(Bidirectional(LSTM(32, recurrent_dropout=0.2, return_sequences=True)))
-                translation_model.add(Dropout(0.2))
-                translation_model.add(Bidirectional(LSTM(32, recurrent_dropout=0.2)))
-                translation_model.add(keras.layers.Activation('elu'))
-                translation_model.add(Dense(32, use_bias=False, kernel_initializer='he_normal'))
-                translation_model.add(BatchNormalization())
-                translation_model.add(Dropout(0.2))
-                translation_model.add(keras.layers.Activation('elu'))
-                translation_model.add(Dense(32, kernel_initializer='he_normal',use_bias=False))
-                translation_model.add(BatchNormalization())
-                translation_model.add(keras.layers.Activation('elu'))
-                translation_model.add(Dropout(0.2))
-                translation_model.add(Dense(len(list(expression_mapping.keys())), activation='softmax'))
-                isl_translator=ISLSignPosTranslator(bodypose_25_model(),handpose_model(), translation_model)
-            ```
-            Total params: 82,679 (322.96 KB)
-            Trainable params: 82,239 (321.25 KB)
-            Non-trainable params: 440 (1.72 KB)
         ''')
-    try:
-        image = np.array(Image.open('model-graph.png'))
-        st.image(image)
-    except FileNotFoundError:
-        st.warning('Image model-graph.png is missing.')
-    st.markdown('''
-            # Training
-              [Tensorboard](https://huggingface.co/cdsteameight/ISL-SignLanguageTranslation/tensorboard)
         ''')
 elif app_mode =='Run on Test Videos':

     )
     # st.video('https://www.youtube.com/watch?v=FMaNNXgB_5c&ab_channel=AugmentedStartups')
+    tab1, tab2, tab3, tab4 = st.tabs(["Dataset Overview", "Data Pipeline", "Model Architecture", "Training Details"])
+    with tab1:
+        st.markdown('''
           # Dataset Used \n
             This model is trained using [INCLUDE](https://zenodo.org/records/4010759) dataset. \n
             ### Key Statistics for the dataset is as follows-
                 +-----------------------+-----------------+
+                |    Characteristics    | INCLUDE-DATASET |
                 +-----------------------+-----------------+
                 | Categories            | 15              |
                 | Words                 | 263             |
                 | Frame Rate            | 25fps           |
                 | Resolution            | 1920x1080       |
                 +-----------------------+-----------------+
+            #### Size of each category
                 +--------------------+-------------------+------------------+
                 |      Category      | Number of Classes | Number of Videos |
                 | Society            |                23 |              324 |
                 |                    |   Categories# 263 | Total Videos-4287|
                 +--------------------+-------------------+------------------+
+        ''')
+        st.info("💡 **Note:** The dataset used for training contains **1986 processed videos out of 4287**. We processed a limited set of records due to time and compute constraints.")
+    with tab2:
+        st.markdown('''
+            ### Data Pipeline
+            The pipeline processes video frames to extract pose and hand landmarks using an OpenPose-like approach.
+            For each frame, the feature extraction process produces a vector of **156 features**, consisting of:
+            - **Body Pose**: X and Y coordinates for 15 body keypoints, along with edge lengths and angles.
+            - **Hand Pose**: X and Y coordinates for 21 keypoints on each hand (left and right).
+            This structured tabular data is then grouped sequentially into sliding windows to capture the temporal motion of the signs.
+        ''')
+    with tab3:
+        st.markdown('''
+        ### Model Structure
+        The translation model utilizes a sequence-to-sequence architecture based on Bidirectional LSTMs.
+        ```python
+            translation_model = Sequential()
+            translation_model.add(Input(shape=((20, 156))))
+            translation_model.add(keras.layers.Masking(mask_value=0.))
+            translation_model.add(BatchNormalization())
+            translation_model.add(Bidirectional(LSTM(32, recurrent_dropout=0.2, return_sequences=True)))
+            translation_model.add(Dropout(0.2))
+            translation_model.add(Bidirectional(LSTM(32, recurrent_dropout=0.2)))
+            translation_model.add(keras.layers.Activation('elu'))
+            translation_model.add(Dense(32, use_bias=False, kernel_initializer='he_normal'))
+            translation_model.add(BatchNormalization())
+            translation_model.add(Dropout(0.2))
+            translation_model.add(keras.layers.Activation('elu'))
+            translation_model.add(Dense(32, kernel_initializer='he_normal',use_bias=False))
+            translation_model.add(BatchNormalization())
+            translation_model.add(keras.layers.Activation('elu'))
+            translation_model.add(Dropout(0.2))
+            translation_model.add(Dense(len(list(expression_mapping.keys())), activation='softmax'))
+            isl_translator=ISLSignPosTranslator(bodypose_25_model(),handpose_model(), translation_model)
+        ```
+        **Parameters:**
+        - Total params: 82,679 (322.96 KB)
+        - Trainable params: 82,239 (321.25 KB)
+        - Non-trainable params: 440 (1.72 KB)
         ''')
+    with tab4:
+        st.markdown('''
+            ### Training Details
+            The model was trained using the **Keras 3 API with a PyTorch backend**.
+            **Dataset Size:**
+            - **123,743 individual frames** were extracted from the processed videos and used as the training set.
+            **Sequence Windowing:**
+            - The temporal data is structured into windows of **20 frames** per sequence (`shape=(20, 156)`), allowing the Bidirectional LSTMs to learn the motion context of the signs.
+            **Metrics & Logs:**
+            - You can view the detailed training progression on [Tensorboard](https://huggingface.co/cdsteameight/ISL-SignLanguageTranslation/tensorboard).
         ''')
 elif app_mode =='Run on Test Videos':