Abs6187 commited on
Commit
e2cffd9
Β·
verified Β·
1 Parent(s): b575737

Upload 16 files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,8 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ DataPipeline.png filter=lfs diff=lfs merge=lfs -text
37
+ eda/distribution_of_data.png filter=lfs diff=lfs merge=lfs -text
38
+ eda/train_test_validation_split-1.png filter=lfs diff=lfs merge=lfs -text
39
+ eda/train_test_validation_split-2.png filter=lfs diff=lfs merge=lfs -text
40
+ model-graph.png filter=lfs diff=lfs merge=lfs -text
.gitignore ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # TechMatrix Solvers ISL Translation Project
2
+ # Generated files and dependencies
3
+
4
+ # Python
5
+ __pycache__/
6
+ *.py[cod]
7
+ *$py.class
8
+ *.so
9
+ .Python
10
+ build/
11
+ develop-eggs/
12
+ dist/
13
+ downloads/
14
+ eggs/
15
+ .eggs/
16
+ lib/
17
+ lib64/
18
+ parts/
19
+ sdist/
20
+ var/
21
+ wheels/
22
+ *.egg-info/
23
+ .installed.cfg
24
+ *.egg
25
+ MANIFEST
26
+
27
+ # PyTorch
28
+ *.pth
29
+ *.pt
30
+
31
+ # Jupyter Notebook
32
+ .ipynb_checkpoints
33
+
34
+ # Environment variables
35
+ .env
36
+ .venv
37
+ env/
38
+ venv/
39
+ ENV/
40
+ env.bak/
41
+ venv.bak/
42
+
43
+ # IDEs
44
+ .vscode/
45
+ .idea/
46
+ *.swp
47
+ *.swo
48
+ *~
49
+
50
+ # OS generated files
51
+ .DS_Store
52
+ .DS_Store?
53
+ ._*
54
+ .Spotlight-V100
55
+ .Trashes
56
+ ehthumbs.db
57
+ Thumbs.db
58
+
59
+ # Temporary files
60
+ *.tmp
61
+ *.temp
62
+ /tmp/
63
+ temp/
64
+
65
+ # Model files and data
66
+ *.keras
67
+ *.h5
68
+ *.pkl
69
+ *.csv
70
+ *.json
71
+ data/
72
+ models/
73
+ checkpoints/
74
+
75
+ # Video files
76
+ *.mp4
77
+ *.avi
78
+ *.mov
79
+ *.mkv
80
+
81
+ # Logs
82
+ logs/
83
+ *.log
84
+
85
+ # Original project reference (keep for development)
86
+ original_project/
DataPipeline.png ADDED

Git LFS Details

  • SHA256: ac5e3ed87f8911e09cd69dfe6bbea7d52eea34663e2001cc54026020bc7251ea
  • Pointer size: 131 Bytes
  • Size of remote file: 631 kB
LICENSE ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ MIT License
2
+
3
+ Copyright (c) 2024 TechMatrix Solvers
4
+ Shri Ram Group of Institutions
5
+
6
+ Team Members:
7
+ - Abhay Gupta (Team Lead)
8
+ - Kripanshu Gupta (Backend Developer)
9
+ - Dipanshu Patel (UI/UX Designer)
10
+ - Bhumika Patel (Deployment & Female Presenter)
11
+
12
+ Permission is hereby granted, free of charge, to any person obtaining a copy
13
+ of this software and associated documentation files (the "Software"), to deal
14
+ in the Software without restriction, including without limitation the rights
15
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
16
+ copies of the Software, and to permit persons to whom the Software is
17
+ furnished to do so, subject to the following conditions:
18
+
19
+ The above copyright notice and this permission notice shall be included in all
20
+ copies or substantial portions of the Software.
21
+
22
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
23
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
24
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
25
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
26
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
27
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
28
+ SOFTWARE.
29
+
30
+ ## Acknowledgments
31
+
32
+ This project is based on Indian Sign Language (ISL) translation using deep learning
33
+ techniques including OpenPose body/hand detection and LSTM networks. The project
34
+ uses the INCLUDE dataset for training and evaluation.
35
+
36
+ ## Attribution
37
+
38
+ While this is an original implementation by TechMatrix Solvers, the underlying
39
+ concepts and methodologies are based on established computer vision and machine
40
+ learning research in sign language recognition.
app.py ADDED
@@ -0,0 +1,824 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ ISL Sign Language Translation - TechMatrix Solvers Initiative
3
+ Main Streamlit Application
4
+
5
+ Developed by: TechMatrix Solvers Team
6
+ - Abhay Gupta (Team Lead)
7
+ - Kripanshu Gupta (Backend Developer)
8
+ - Dipanshu Patel (UI/UX Designer)
9
+ - Bhumika Patel (Deployment & Female Presenter)
10
+
11
+ Institution: Shri Ram Group of Institutions
12
+ """
13
+
14
+ import streamlit as st
15
+ st.write("πŸš€ TechMatrix Solvers ISL Translator Loading...")
16
+
17
+ import os
18
+ os.environ["KERAS_BACKEND"] = "torch"
19
+ import keras
20
+
21
+ import cv2
22
+ import numpy as np
23
+ import tempfile
24
+ import time
25
+ from PIL import Image
26
+ from keras.models import Sequential
27
+ import pickle
28
+ from keras.layers import LSTM, Dense, Bidirectional, Dropout, Input, BatchNormalization
29
+ from pose_models import create_bodypose_model, create_handpose_model
30
+ from expression_mapping import expression_mapping
31
+ from isl_processor import ISLTranslationModel
32
+ import pandas as pd
33
+ import ffmpeg
34
+ import subprocess
35
+ from typing import NamedTuple
36
+ import json
37
+ import pose_utils as utils
38
+ from huggingface_hub import hf_hub_download
39
+ import shutil, platform
40
+ import uuid
41
+
42
+ # System information display
43
+ st.write("πŸ”§ **System Information:**")
44
+ st.write(f"Python Version: {platform.python_version()}")
45
+ st.write(f"FFmpeg: {shutil.which('ffmpeg')}, FFprobe: {shutil.which('ffprobe')}")
46
+
47
+ try:
48
+ import cv2
49
+ st.write(f"OpenCV Version: {cv2.__version__}")
50
+ except Exception as e:
51
+ st.error(f"OpenCV import failed: {e}")
52
+
53
+ try:
54
+ import torch
55
+ st.write(f"PyTorch: {torch.__version__}, Keras: {keras.__version__}")
56
+ except Exception as e:
57
+ st.error(f"PyTorch/Keras import failed: {e}")
58
+
59
+
60
+ class VideoProbeResult(NamedTuple):
61
+ """Structure for video probe results"""
62
+ return_code: int
63
+ json: str
64
+ error: str
65
+
66
+
67
+ def probe_video_info(file_path) -> VideoProbeResult:
68
+ """
69
+ Probe video file for metadata using FFprobe
70
+
71
+ Args:
72
+ file_path: Path to video file
73
+
74
+ Returns:
75
+ VideoProbeResult containing metadata
76
+ """
77
+ command_array = [
78
+ "ffprobe",
79
+ "-v", "quiet",
80
+ "-print_format", "json",
81
+ "-show_format",
82
+ "-show_streams",
83
+ file_path
84
+ ]
85
+ result = subprocess.run(
86
+ command_array,
87
+ stdout=subprocess.PIPE,
88
+ stderr=subprocess.PIPE,
89
+ universal_newlines=True
90
+ )
91
+ return VideoProbeResult(
92
+ return_code=result.returncode,
93
+ json=result.stdout,
94
+ error=result.stderr
95
+ )
96
+
97
+
98
+ # Define feature columns for time series processing
99
+ body_features = [f'bodypeaks_x_{i}' for i in range(15)] + [f'bodypeaks_y_{i}' for i in range(15)]
100
+ hand0_features = [f'hand0peaks_x_{i}' for i in range(21)] + [f'hand0peaks_y_{i}' for i in range(21)] + [f'hand0peaks_peaktxt{i}' for i in range(21)]
101
+ hand1_features = [f'hand1peaks_x_{i}' for i in range(21)] + [f'hand1peaks_y_{i}' for i in range(21)] + [f'hand1peaks_peaktxt{i}' for i in range(21)]
102
+
103
+ feature_columns_processed = body_features + hand0_features + hand1_features
104
+ label_columns = ['Expression_encoded']
105
+
106
+
107
+ @st.cache_resource
108
+ def create_time_series_sequences(isl_data, feature_columns, label_columns, window_size=20):
109
+ """
110
+ Creates time series sequences from DataFrame with specified window size
111
+
112
+ Args:
113
+ isl_data: Input DataFrame with ISL data
114
+ feature_columns: List of feature column names
115
+ label_columns: List of label column names
116
+ window_size: Size of temporal window for sequence creation
117
+
118
+ Returns:
119
+ tuple: (X_sequences, y_sequences) for training/inference
120
+ """
121
+ if isl_data.empty:
122
+ return [], []
123
+
124
+ X_sequences = []
125
+ y_sequences = []
126
+
127
+ for group, file_df in isl_data.groupby(['Type', 'Expression_encoded', 'FileName']):
128
+ expr_type, expression, filename = group
129
+
130
+ # Create blank frame for padding
131
+ blank_frame = np.zeros((1, 156))
132
+
133
+ for idx, window_data in enumerate([file_df[i:i+window_size] for i in range(0, file_df.shape[0], 1)]):
134
+ if window_data.shape[0] < window_size:
135
+ # Pad sequence with blank frames at the beginning
136
+ padding_needed = window_size - window_data.shape[0]
137
+ padded_sequence = np.concatenate(
138
+ (np.repeat(blank_frame, padding_needed, axis=0),
139
+ window_data[feature_columns].values),
140
+ axis=0
141
+ )
142
+ X_sequences.append(padded_sequence)
143
+ y_sequences.append(expression)
144
+ continue
145
+
146
+ X_sequences.append(window_data[feature_columns].values)
147
+ y_sequences.append(expression)
148
+
149
+ return X_sequences, y_sequences
150
+
151
+
152
+ # Global translation model variable
153
+ translation_model = None
154
+
155
+
156
+ @st.cache_resource
157
+ def load_translation_model():
158
+ """
159
+ Load and configure the LSTM translation model
160
+
161
+ Returns:
162
+ Configured Keras Sequential model for ISL translation
163
+ """
164
+ model = Sequential()
165
+ model.add(Input(shape=((20, 156))))
166
+ model.add(keras.layers.Masking(mask_value=0.))
167
+ model.add(BatchNormalization())
168
+ model.add(Bidirectional(LSTM(32, recurrent_dropout=0.2, return_sequences=True)))
169
+
170
+ model.add(Dropout(0.2))
171
+ model.add(Bidirectional(LSTM(32, recurrent_dropout=0.2)))
172
+
173
+ model.add(keras.layers.Activation('elu'))
174
+ model.add(Dense(32, use_bias=False, kernel_initializer='he_normal'))
175
+
176
+ model.add(BatchNormalization())
177
+ model.add(Dropout(0.2))
178
+ model.add(keras.layers.Activation('elu'))
179
+ model.add(Dense(32, kernel_initializer='he_normal', use_bias=False))
180
+
181
+ model.add(BatchNormalization())
182
+ model.add(keras.layers.Activation('elu'))
183
+ model.add(Dropout(0.2))
184
+ model.add(Dense(len(list(expression_mapping.keys())), activation='softmax'))
185
+
186
+ # Download pre-trained model weights
187
+ model_file = hf_hub_download(
188
+ repo_id="sunilsarolkar/isl-translation-model",
189
+ filename="isl_model_final.keras"
190
+ )
191
+ model.load_weights(model_file)
192
+
193
+ return model
194
+
195
+
196
+ # Load test data
197
+ @st.cache_data
198
+ def load_test_data():
199
+ """Load test dataset and file information"""
200
+ testing_cleaned_path = hf_hub_download(
201
+ repo_id="sunilsarolkar/isl-test-data",
202
+ filename="testing_cleaned.csv",
203
+ repo_type="dataset"
204
+ )
205
+
206
+ test_files_path = hf_hub_download(
207
+ repo_id="sunilsarolkar/isl-test-data",
208
+ filename="test_files.csv",
209
+ repo_type="dataset"
210
+ )
211
+
212
+ testing_df = pd.read_csv(testing_cleaned_path)
213
+ test_files_df = pd.read_csv(test_files_path)
214
+
215
+ return testing_df, test_files_df
216
+
217
+
218
+ # Load test data
219
+ testing_df, test_files_df = load_test_data()
220
+
221
+
222
+ class VideoWriter:
223
+ """Custom video writer using FFmpeg for better compatibility"""
224
+
225
+ def __init__(self, output_file, input_fps, input_framesize, input_pix_fmt, input_vcodec):
226
+ self.ff_process = (
227
+ ffmpeg
228
+ .input('pipe:',
229
+ format='rawvideo',
230
+ pix_fmt="bgr24",
231
+ s=f'{input_framesize[1]}x{input_framesize[0]}',
232
+ r=input_fps)
233
+ .output(output_file, pix_fmt=input_pix_fmt, vcodec=input_vcodec)
234
+ .overwrite_output()
235
+ .run_async(pipe_stdin=True)
236
+ )
237
+
238
+ def write_frame(self, frame):
239
+ """Write a single frame to the video"""
240
+ self.ff_process.stdin.write(frame.tobytes())
241
+
242
+ def close(self):
243
+ """Close the video writer"""
244
+ self.ff_process.stdin.close()
245
+ self.ff_process.wait()
246
+
247
+
248
+ def calculate_weighted_average(numbers, weights):
249
+ """
250
+ Calculate weighted average of numbers
251
+
252
+ Args:
253
+ numbers: List of numbers
254
+ weights: List of weights
255
+
256
+ Returns:
257
+ float: Weighted average
258
+ """
259
+ if sum(weights) == 0:
260
+ return 0
261
+ return sum(x * y for x, y in zip(numbers, weights)) / sum(weights)
262
+
263
+
264
+ @st.cache_data
265
+ def resize_image(image, width=None, height=None, interpolation=cv2.INTER_AREA):
266
+ """
267
+ Resize image maintaining aspect ratio
268
+
269
+ Args:
270
+ image: Input image
271
+ width: Target width
272
+ height: Target height
273
+ interpolation: OpenCV interpolation method
274
+
275
+ Returns:
276
+ Resized image
277
+ """
278
+ dimensions = None
279
+ (h, w) = image.shape[:2]
280
+
281
+ if width is None and height is None:
282
+ return image
283
+
284
+ if width is None:
285
+ ratio = height / float(h)
286
+ dimensions = (int(w * ratio), height)
287
+ else:
288
+ ratio = width / float(w)
289
+ dimensions = (width, int(h * ratio))
290
+
291
+ resized = cv2.resize(image, dimensions, interpolation=interpolation)
292
+ return resized
293
+
294
+
295
+ # Configure Streamlit page
296
+ st.set_page_config(
297
+ page_title="ISL Translation - TechMatrix Solvers",
298
+ page_icon="🀟",
299
+ layout="wide"
300
+ )
301
+
302
+ st.title('🀟 ISL Sign Language Translation - TechMatrix Solvers Initiative')
303
+
304
+ # Add custom CSS for sidebar styling
305
+ st.markdown(
306
+ """
307
+ <style>
308
+ [data-testid="stSidebar"][aria-expanded="true"] > div:first-child {
309
+ width: 350px;
310
+ }
311
+ [data-testid="stSidebar"][aria-expanded="false"] > div:first-child {
312
+ width: 350px;
313
+ margin-left: -350px;
314
+ }
315
+
316
+ .team-info {
317
+ background-color: #f0f2f6;
318
+ padding: 1rem;
319
+ border-radius: 0.5rem;
320
+ margin: 1rem 0;
321
+ }
322
+
323
+ .tech-matrix-header {
324
+ background: linear-gradient(90deg, #1e3a8a, #7c3aed);
325
+ color: white;
326
+ padding: 1rem;
327
+ border-radius: 0.5rem;
328
+ text-align: center;
329
+ margin-bottom: 1rem;
330
+ }
331
+ </style>
332
+ """,
333
+ unsafe_allow_html=True,
334
+ )
335
+
336
+ # Add team branding header
337
+ st.markdown(
338
+ """
339
+ <div class="tech-matrix-header">
340
+ <h2>πŸš€ TechMatrix Solvers</h2>
341
+ <p>Innovating Accessible Technology Solutions</p>
342
+ </div>
343
+ """,
344
+ unsafe_allow_html=True
345
+ )
346
+
347
+ # Sidebar configuration
348
+ st.sidebar.title('🀟 ISL Translation System')
349
+ st.sidebar.subheader('Configuration')
350
+
351
+ # Team information in sidebar
352
+ st.sidebar.markdown(
353
+ """
354
+ <div class="team-info">
355
+ <h3>πŸ‘¨β€πŸ’» Development Team</h3>
356
+ <ul>
357
+ <li><strong>Abhay Gupta</strong> - Team Lead</li>
358
+ <li><strong>Kripanshu Gupta</strong> - Backend Dev</li>
359
+ <li><strong>Dipanshu Patel</strong> - UI/UX Designer</li>
360
+ <li><strong>Bhumika Patel</strong> - Deployment</li>
361
+ </ul>
362
+ <p><em>Shri Ram Group of Institutions</em></p>
363
+ </div>
364
+ """,
365
+ unsafe_allow_html=True
366
+ )
367
+
368
+ # Initialize frame-wise outputs storage
369
+ frame_predictions = {}
370
+
371
+ # Application mode selection
372
+ app_mode = st.sidebar.selectbox(
373
+ 'Choose Application Mode',
374
+ ['About Project', 'Test Video Translation']
375
+ )
376
+
377
+ if app_mode == 'About Project':
378
+ st.markdown(
379
+ """
380
+ ## 🎯 Project Overview
381
+
382
+ Welcome to the **ISL Sign Language Translation System** developed by **TechMatrix Solvers**.
383
+ This cutting-edge application demonstrates real-time Indian Sign Language recognition and
384
+ translation using advanced deep learning techniques.
385
+
386
+ ### πŸ—οΈ Technical Architecture
387
+
388
+ Our system combines multiple state-of-the-art technologies:
389
+
390
+ 1. **Body Pose Estimation**: 25-point skeletal tracking using OpenPose
391
+ 2. **Hand Landmark Detection**: 21-point hand keypoint identification
392
+ 3. **Temporal Modeling**: Bidirectional LSTM networks for sequence analysis
393
+ 4. **Real-time Processing**: Optimized inference pipeline for live translation
394
+ """
395
+ )
396
+
397
+ st.markdown(
398
+ """
399
+ ### πŸ“Š Dataset Information
400
+
401
+ Our model is trained on the comprehensive [INCLUDE dataset](https://zenodo.org/records/4010759):
402
+ """
403
+ )
404
+
405
+ # Dataset statistics table
406
+ dataset_stats = {
407
+ "Metric": [
408
+ "Categories", "Total Words", "Training Videos",
409
+ "Avg Videos/Class", "Avg Video Length", "Resolution", "Frame Rate"
410
+ ],
411
+ "Value": [
412
+ "15", "263", "4,257", "16.3", "2.57s", "1920x1080", "25fps"
413
+ ]
414
+ }
415
+ st.table(pd.DataFrame(dataset_stats))
416
+
417
+ # Display dataset processing visualization
418
+ try:
419
+ categories_image = np.array(Image.open('original_project/categories_processed.png'))
420
+ st.image(categories_image, caption="πŸ“ˆ Processed Categories Distribution")
421
+ except:
422
+ st.info("πŸ“Š Dataset visualization images will be displayed when available")
423
+
424
+ # Model architecture information
425
+ st.markdown(
426
+ """
427
+ ### 🧠 Neural Network Architecture
428
+
429
+ ```python
430
+ # TechMatrix Solvers LSTM Translation Model
431
+ model = Sequential([
432
+ Input(shape=(20, 156)), # 20-frame temporal window
433
+ Masking(mask_value=0.),
434
+ BatchNormalization(),
435
+ Bidirectional(LSTM(32, recurrent_dropout=0.2, return_sequences=True)),
436
+ Dropout(0.2),
437
+ Bidirectional(LSTM(32, recurrent_dropout=0.2)),
438
+ Dense(32, activation='elu'),
439
+ BatchNormalization(),
440
+ Dropout(0.2),
441
+ Dense(len(expression_mapping), activation='softmax')
442
+ ])
443
+ ```
444
+
445
+ **Model Statistics:**
446
+ - Total Parameters: 82,679 (322.96 KB)
447
+ - Trainable Parameters: 82,239 (321.25 KB)
448
+ - Input Features: 156-dimensional vectors
449
+ - Temporal Window: 20 frames
450
+ """
451
+ )
452
+
453
+ # Technology stack
454
+ col1, col2 = st.columns(2)
455
+
456
+ with col1:
457
+ st.markdown(
458
+ """
459
+ ### πŸ› οΈ Technology Stack
460
+
461
+ **Frontend & UI:**
462
+ - Streamlit (Interactive Web App)
463
+ - Custom CSS Styling
464
+ - Responsive Design
465
+
466
+ **Deep Learning:**
467
+ - Keras/TensorFlow Backend
468
+ - PyTorch Integration
469
+ - LSTM Networks
470
+ - OpenPose Models
471
+ """
472
+ )
473
+
474
+ with col2:
475
+ st.markdown(
476
+ """
477
+ ### πŸ“± Key Features
478
+
479
+ **Real-time Processing:**
480
+ - Live video analysis
481
+ - Pose keypoint extraction
482
+ - Temporal sequence modeling
483
+ - Confidence scoring
484
+
485
+ **User Experience:**
486
+ - Intuitive interface
487
+ - Visual feedback
488
+ - Progress tracking
489
+ - Result visualization
490
+ """
491
+ )
492
+
493
+ # Team contact information
494
+ st.markdown(
495
+ """
496
+ ### πŸ“ž Contact Information
497
+
498
+ **TechMatrix Solvers Team:**
499
+
500
+ | Name | Role | Email | Phone |
501
+ |------|------|-------|--------|
502
+ | **Abhay Gupta** | Team Lead | contact2abhaygupta6187@gmail.com | 8115814535 |
503
+ | **Kripanshu Gupta** | Backend Developer | guptakripanshu83@gmail.com | 7067058400 |
504
+ | **Dipanshu Patel** | UI/UX Designer | dipanshupatel43@gmail.com | 9294526404 |
505
+ | **Bhumika Patel** | Deployment & Presenter | bp7249951@gmail.com | 9302271422 |
506
+
507
+ **Institution:** Shri Ram Group of Institutions
508
+
509
+ ### πŸ“š Documentation
510
+
511
+ For detailed technical documentation and implementation details, please refer to our
512
+ [comprehensive documentation](https://docs.google.com/document/d/1mzr2KGHRJT5heUjFF20NQ3Gb89urpjZJ/edit?usp=sharing).
513
+
514
+ ---
515
+
516
+ **Β© 2024 TechMatrix Solvers - Innovating Accessible Technology Solutions**
517
+ """
518
+ )
519
+
520
+ elif app_mode == 'Test Video Translation':
521
+ # Video selection interface
522
+ st.markdown("## πŸŽ₯ Test Video Translation")
523
+
524
+ category = st.sidebar.selectbox(
525
+ 'Choose Category',
526
+ np.sort(test_files_df['Category'].unique(), axis=-1, kind='mergesort')
527
+ )
528
+
529
+ # Filter by category
530
+ category_mask = (test_files_df['Category'] == category)
531
+ test_files_category = test_files_df[category_mask]
532
+
533
+ class_name = st.sidebar.selectbox(
534
+ 'Choose Class',
535
+ np.sort(test_files_category['Class'].unique(), axis=-1, kind='mergesort')
536
+ )
537
+
538
+ # Filter by class
539
+ class_mask = (test_files_df['Class'] == class_name)
540
+ filename = st.sidebar.selectbox(
541
+ 'Choose File',
542
+ np.sort(test_files_category[class_mask]['Filename'].unique(), axis=-1, kind='mergesort')
543
+ )
544
+
545
+ # Display selection info
546
+ st.info(f"πŸ“‚ Selected: {category} β†’ {class_name} β†’ {filename}")
547
+
548
+ if st.sidebar.button("πŸš€ Start Translation", type="primary"):
549
+ # Filter test data for selected video
550
+ data_mask = ((testing_df['FileName'] == filename) &
551
+ (testing_df['Type'] == category) &
552
+ (testing_df['Expression'] == class_name))
553
+
554
+ window_size = 20
555
+ current_test_data = testing_df[data_mask]
556
+
557
+ if current_test_data.empty:
558
+ st.error(f"⚠️ No matching data found for: {filename} | {category} | {class_name}")
559
+ st.stop()
560
+ else:
561
+ st.success(f"βœ… Loaded {current_test_data.shape[0]} frames for processing")
562
+
563
+ # Create time series data
564
+ X_test_processed, y_test_processed = create_time_series_sequences(
565
+ current_test_data, feature_columns_processed, label_columns, window_size=window_size
566
+ )
567
+ X_test_processed = np.array(X_test_processed)
568
+
569
+ # Configure Streamlit display options
570
+ st.set_option('deprecation.showfileUploaderEncoding', False)
571
+
572
+ st.sidebar.markdown('---')
573
+ st.markdown(
574
+ """
575
+ <style>
576
+ [data-testid="stSidebar"][aria-expanded="true"] > div:first-child {
577
+ width: 400px;
578
+ }
579
+ [data-testid="stSidebar"][aria-expanded="false"] > div:first-child {
580
+ width: 400px;
581
+ margin-left: -400px;
582
+ }
583
+ </style>
584
+ """,
585
+ unsafe_allow_html=True,
586
+ )
587
+
588
+ st.sidebar.markdown('---')
589
+ st.markdown('## πŸ“Š Translation Results')
590
+
591
+ # Progress tracking container
592
+ progress_container = st.empty()
593
+
594
+ with progress_container.container():
595
+ progress_df = pd.DataFrame([['--', '--']],
596
+ columns=['Frames Processed', 'Detected Sign'])
597
+ progress_table = st.table(progress_df)
598
+
599
+ # Video display container
600
+ video_display = st.empty()
601
+ st.markdown("<hr/>", unsafe_allow_html=True)
602
+ frame_display = st.empty()
603
+
604
+ # Download test video
605
+ video_file_path = hf_hub_download(
606
+ repo_id="sunilsarolkar/isl-test-data",
607
+ filename=f'test/{category}/{class_name}/{filename}',
608
+ repo_type="dataset"
609
+ )
610
+
611
+ if not os.path.exists(video_file_path):
612
+ st.error(f"⚠️ Video file not found: {video_file_path}")
613
+ st.stop()
614
+
615
+ # Initialize video capture
616
+ video_capture = cv2.VideoCapture(video_file_path)
617
+
618
+ # Get video metadata
619
+ probe_result = probe_video_info(video_file_path)
620
+ video_info = json.loads(probe_result.json)
621
+ video_stream = [stream for stream in video_info["streams"] if stream["codec_type"] == "video"][0]
622
+
623
+ input_fps = video_stream["avg_frame_rate"]
624
+ input_pix_fmt = video_stream["pix_fmt"]
625
+ input_vcodec = video_stream["codec_name"]
626
+ format_name = video_info["format"]["format_name"].split(",")[0]
627
+
628
+ # Video properties
629
+ width = int(video_capture.get(cv2.CAP_PROP_FRAME_WIDTH))
630
+ height = int(video_capture.get(cv2.CAP_PROP_FRAME_HEIGHT))
631
+ fps_input = int(video_capture.get(cv2.CAP_PROP_FPS))
632
+
633
+ # Processing variables
634
+ total_frames = int(video_capture.get(cv2.CAP_PROP_FRAME_COUNT))
635
+ frame_buffer = []
636
+
637
+ # Output video configuration
638
+ output_file = f"/tmp/techmatrix_output_{uuid.uuid4().hex}.{format_name}"
639
+ video_writer = None
640
+ weighted_predictions = {}
641
+
642
+ frame_idx = 0
643
+
644
+ try:
645
+ # Process each frame
646
+ for _, frame_data in current_test_data.iterrows():
647
+ if not video_capture.isOpened():
648
+ st.error(f"❌ Could not open video: {video_file_path}")
649
+ break
650
+
651
+ if video_capture.isOpened():
652
+ ret, frame = video_capture.read()
653
+
654
+ if len(frame_buffer) < window_size:
655
+ # Initial frames - build up buffer
656
+ visualization_canvas = utils.render_stick_model(
657
+ frame,
658
+ eval(frame_data['bodypose_circles']),
659
+ eval(frame_data['bodypose_sticks']),
660
+ eval(frame_data['handpose_edges']),
661
+ eval(frame_data['handpose_peaks'])
662
+ )
663
+
664
+ # Add prediction plots
665
+ canvas_with_predictions = utils.create_bar_plot_visualization(
666
+ visualization_canvas, {},
667
+ f'Building Buffer - Frame {frame_idx + 1} [No Predictions Yet]',
668
+ visualization_canvas
669
+ )
670
+ canvas_with_predictions = utils.create_bar_plot_visualization(
671
+ canvas_with_predictions, weighted_predictions,
672
+ f'Weighted Average - Frame {frame_idx + 1} [No Predictions Yet]',
673
+ visualization_canvas
674
+ )
675
+ canvas_with_predictions = utils.add_bottom_padding(
676
+ canvas_with_predictions, (255, 255, 255), 100
677
+ )
678
+
679
+ # Initialize video writer
680
+ if video_writer is None:
681
+ input_framesize = canvas_with_predictions.shape[:2]
682
+ video_writer = VideoWriter(output_file, input_fps, input_framesize,
683
+ input_pix_fmt, input_vcodec)
684
+
685
+ video_writer.write_frame(canvas_with_predictions)
686
+
687
+ # Update progress display
688
+ with progress_container.container():
689
+ progress_df = pd.DataFrame(
690
+ [[f'{frame_idx + 1}/{current_test_data.shape[0]}',
691
+ '<Building 20-frame buffer>']],
692
+ columns=['Frames Processed', 'Detected Sign']
693
+ )
694
+ progress_table = st.table(progress_df)
695
+
696
+ frame_buffer.append(frame)
697
+
698
+ # Display current frame
699
+ with video_display.container():
700
+ st.image(canvas_with_predictions, channels='BGR', use_column_width=True)
701
+ else:
702
+ # Process with full buffer - make predictions
703
+ frame_buffer[:-1] = frame_buffer[1:]
704
+ frame_buffer[-1] = frame
705
+
706
+ # Load translation model
707
+ translation_model = load_translation_model()
708
+
709
+ # Make prediction on current window
710
+ sequence_idx = frame_idx - 20
711
+ prediction_output = translation_model(
712
+ X_test_processed[sequence_idx].reshape(
713
+ 1, X_test_processed[sequence_idx].shape[0],
714
+ X_test_processed[sequence_idx].shape[1]
715
+ )
716
+ )
717
+ prediction_output = prediction_output[0].cpu().detach().numpy()
718
+
719
+ # Get top predictions
720
+ top_prediction_idx = np.argmax(prediction_output)
721
+ top_3_indices = prediction_output.argsort()[-3:][::-1]
722
+ top_3_signs = [expression_mapping[i] for i in top_3_indices]
723
+ top_3_probabilities = prediction_output[top_3_indices]
724
+
725
+ # Update frame-wise predictions for weighted average
726
+ for sign, prob in zip(top_3_signs, top_3_probabilities):
727
+ if sign not in frame_predictions:
728
+ frame_predictions[sign] = []
729
+ frame_predictions[sign].append(prob)
730
+
731
+ # Current frame predictions
732
+ current_predictions = {}
733
+ for sign, prob in zip(top_3_signs, top_3_probabilities):
734
+ current_predictions[sign] = prob
735
+
736
+ # Calculate weighted averages
737
+ for sign in frame_predictions:
738
+ sign_predictions = frame_predictions[sign]
739
+ sign_weights = [len(sign_predictions) for _ in range(len(sign_predictions))]
740
+ weighted_predictions[sign] = calculate_weighted_average(
741
+ sign_predictions, sign_weights
742
+ )
743
+
744
+ # Sort predictions by confidence
745
+ sorted_predictions = dict(
746
+ sorted(weighted_predictions.items(), key=lambda item: item[1], reverse=True)
747
+ )
748
+
749
+ # Create visualization
750
+ visualization_canvas = utils.render_stick_model(
751
+ frame,
752
+ eval(frame_data['bodypose_circles']),
753
+ eval(frame_data['bodypose_sticks']),
754
+ eval(frame_data['handpose_edges']),
755
+ eval(frame_data['handpose_peaks'])
756
+ )
757
+
758
+ # Add prediction visualizations
759
+ canvas_with_predictions = utils.create_bar_plot_visualization(
760
+ visualization_canvas, current_predictions,
761
+ f'Current Window Prediction (Frames {sequence_idx + 1}-{frame_idx + 1})',
762
+ visualization_canvas
763
+ )
764
+ canvas_with_predictions = utils.create_bar_plot_visualization(
765
+ canvas_with_predictions, weighted_predictions,
766
+ f'Cumulative Weighted Average - Frame {frame_idx + 1}',
767
+ visualization_canvas
768
+ )
769
+ canvas_with_predictions = utils.add_bottom_padding(
770
+ canvas_with_predictions, (255, 255, 255), 100
771
+ )
772
+
773
+ video_writer.write_frame(canvas_with_predictions)
774
+
775
+ # Get best prediction for display
776
+ best_sign = max(weighted_predictions, key=weighted_predictions.get)
777
+ best_confidence = weighted_predictions[best_sign]
778
+
779
+ # Update progress display
780
+ with progress_container.container():
781
+ progress_df = pd.DataFrame(
782
+ [[f'{frame_idx + 1}/{current_test_data.shape[0]}',
783
+ f'{best_sign} ({best_confidence * 100:.2f}%)']],
784
+ columns=['Frames Processed', 'Detected Sign']
785
+ )
786
+ progress_table = st.table(progress_df)
787
+
788
+ # Display current frame
789
+ with video_display.container():
790
+ st.image(canvas_with_predictions, channels='BGR', use_column_width=True)
791
+
792
+ frame_idx += 1
793
+
794
+ # Finalize video processing
795
+ st.success("βœ… Video processing completed!")
796
+
797
+ with video_display.container():
798
+ if video_writer is not None:
799
+ video_writer.close()
800
+ with open(output_file, 'rb') as video_file:
801
+ output_video_bytes = video_file.read()
802
+ st.video(output_video_bytes)
803
+ st.info(f"πŸ’Ύ Processed video saved: {output_file}")
804
+ else:
805
+ st.warning("⚠️ No video output generated")
806
+
807
+ finally:
808
+ # Clean up resources
809
+ video_capture.release()
810
+ if video_writer is not None:
811
+ video_writer.close()
812
+ cv2.destroyAllWindows()
813
+
814
+ # Footer
815
+ st.markdown(
816
+ """
817
+ ---
818
+ <div style="text-align: center; color: #666;">
819
+ <p><strong>TechMatrix Solvers</strong> | Shri Ram Group of Institutions</p>
820
+ <p>Innovating Accessible Technology Solutions for Everyone πŸš€</p>
821
+ </div>
822
+ """,
823
+ unsafe_allow_html=True
824
+ )
categories_processed.png ADDED
eda/distribution_of_data.png ADDED

Git LFS Details

  • SHA256: 9d967d19e6cd0962ed58d7ed6f13db2c504fdf8e4374ee4745568ca7b6caf393
  • Pointer size: 131 Bytes
  • Size of remote file: 462 kB
eda/train_test_validation_split-1.png ADDED

Git LFS Details

  • SHA256: e7b1fa109930615195557a188bacb77560c5dec94f52a82693d772a56af97643
  • Pointer size: 131 Bytes
  • Size of remote file: 115 kB
eda/train_test_validation_split-2.png ADDED

Git LFS Details

  • SHA256: 9ec1608762a566112bdcd49e82cba4a64405e6beefd9d2cd3fe8913fc6efe871
  • Pointer size: 131 Bytes
  • Size of remote file: 171 kB
expression_mapping.py ADDED
@@ -0,0 +1,168 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ expression_mapping={107: "alive",
2
+ 58: "Nice",
3
+ 8: "Beautiful",
4
+ 115: "dead",
5
+ 120: "famous",
6
+ 122: "female",
7
+ 51: "Mean",
8
+ 21: "Deaf",
9
+ 111: "clean",
10
+ 117: "dirty",
11
+ 123: "flat",
12
+ 110: "cheap",
13
+ 119: "expensive",
14
+ 116: "deep",
15
+ 99: "Ugly",
16
+ 114: "curved",
17
+ 12: "Blind",
18
+ 142: "poor",
19
+ 138: "male",
20
+ 126: "hard",
21
+ 133: "light",
22
+ 137: "low",
23
+ 113: "cool",
24
+ 144: "rich",
25
+ 109: "big large",
26
+ 108: "bad",
27
+ 112: "cold",
28
+ 135: "loose",
29
+ 121: "fast",
30
+ 141: "old",
31
+ 130: "high",
32
+ 118: "dry",
33
+ 145: "sad",
34
+ 131: "hot",
35
+ 125: "happy",
36
+ 129: "heavy",
37
+ 128: "healthy",
38
+ 124: "good",
39
+ 146: "shallow",
40
+ 153: "strong",
41
+ 161: "weak",
42
+ 157: "thin",
43
+ 158: "tight",
44
+ 136: "loud",
45
+ 139: "narrow",
46
+ 134: "long",
47
+ 156: "thick",
48
+ 148: "short",
49
+ 152: "soft",
50
+ 150: "slow",
51
+ 151: "small little",
52
+ 149: "sick",
53
+ 154: "tall",
54
+ 140: "new",
55
+ 143: "quiet",
56
+ 95: "Today",
57
+ 163: "wide",
58
+ 159: "warm",
59
+ 96: "Tomorrow",
60
+ 162: "wet",
61
+ 1: "Afternoon",
62
+ 27: "Evening",
63
+ 56: "Morning",
64
+ 59: "Night",
65
+ 166: "young",
66
+ 53: "Minute",
67
+ 38: "Hour",
68
+ 88: "Sunday",
69
+ 55: "Month",
70
+ 94: "Time",
71
+ 70: "Pleased",
72
+ 63: "Paper",
73
+ 105: "Year",
74
+ 80: "Second",
75
+ 32: "Gift",
76
+ 102: "Week",
77
+ 43: "Key",
78
+ 48: "Lock",
79
+ 4: "Bag",
80
+ 106: "Yesterday",
81
+ 7: "Bathroom",
82
+ 15: "Card",
83
+ 66: "Pen",
84
+ 45: "Letter",
85
+ 9: "Bed",
86
+ 2: "Alright",
87
+ 67: "Pencil",
88
+ 24: "Dream",
89
+ 13: "Book",
90
+ 44: "Kitchen",
91
+ 92: "Telephone",
92
+ 23: "Door",
93
+ 36: "Hello",
94
+ 61: "Page",
95
+ 40: "How are you",
96
+ 16: "Chair",
97
+ 89: "Table",
98
+ 97: "Tool",
99
+ 68: "Photograph",
100
+ 10: "Bedroom",
101
+ 103: "Window",
102
+ 62: "Paint",
103
+ 14: "Box",
104
+ 76: "Ring",
105
+ 82: "Soap",
106
+ 20: "Crowd",
107
+ 75: "Restaurant",
108
+ 98: "Train Station",
109
+ 31: "Friend",
110
+ 17: "Child",
111
+ 0: "Adult",
112
+ 46: "Library",
113
+ 39: "House",
114
+ 42: "India",
115
+ 86: "Street or Road",
116
+ 72: "Queen",
117
+ 85: "Store or Shop",
118
+ 64: "Park",
119
+ 77: "School",
120
+ 18: "City",
121
+ 49: "Market",
122
+ 60: "Office",
123
+ 132: "it",
124
+ 41: "I",
125
+ 6: "Bank",
126
+ 69: "Player",
127
+ 147: "she",
128
+ 19: "Court",
129
+ 155: "they",
130
+ 104: "Winter",
131
+ 93: "Temple",
132
+ 33: "God",
133
+ 50: "Marriage",
134
+ 29: "Exercise",
135
+ 37: "Hospital",
136
+ 34: "Ground",
137
+ 25: "Election",
138
+ 73: "Race (ethnicity)",
139
+ 11: "Bill",
140
+ 87: "Summer",
141
+ 160: "we",
142
+ 127: "he",
143
+ 22: "Death",
144
+ 84: "Spring",
145
+ 47: "Location",
146
+ 26: "Energy",
147
+ 54: "Money",
148
+ 28: "Ex. Monsoon",
149
+ 165: "you (plural)",
150
+ 65: "Peace",
151
+ 5: "Ball",
152
+ 71: "Price",
153
+ 35: "Gun",
154
+ 30: "Fall",
155
+ 164: "you",
156
+ 81: "Sign",
157
+ 100: "University",
158
+ 83: "Sport",
159
+ 74: "Religion",
160
+ 101: "War",
161
+ 57: "Newspaper",
162
+ 3: "Attack",
163
+ 90: "Team",
164
+ 78: "Science",
165
+ 79: "Season",
166
+ 52: "Medicine",
167
+ 91: "Technology",
168
+ }
isl_processor.py ADDED
@@ -0,0 +1,478 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ ISL Sign Language Translation - TechMatrix Solvers Initiative
3
+ Core ISL Processing and Translation Models
4
+
5
+ Developed by: TechMatrix Solvers Team
6
+ - Abhay Gupta (Team Lead)
7
+ - Kripanshu Gupta (Backend Developer)
8
+ - Dipanshu Patel (UI/UX Designer)
9
+ - Bhumika Patel (Deployment & Female Presenter)
10
+
11
+ Institution: Shri Ram Group of Institutions
12
+ """
13
+
14
+ import keras
15
+ from keras.layers import TorchModuleWrapper
16
+ import numpy as np
17
+ import cv2
18
+ import torch
19
+ from scipy.ndimage.filters import gaussian_filter
20
+ import math
21
+ import os
22
+ from skimage.measure import label
23
+ import pose_utils as utils
24
+
25
+
26
+ class ISLPoseEstimator(keras.Model):
27
+ """
28
+ ISL Pose Estimation Model combining body and hand pose detection
29
+ Developed by TechMatrix Solvers for accurate sign language recognition
30
+ """
31
+
32
+ def __init__(self, pytorch_body_model, pytorch_hand_model):
33
+ super().__init__()
34
+ self.pytorch_body_wrapper = TorchModuleWrapper(pytorch_body_model)
35
+ self.pytorch_body_wrapper.trainable = False
36
+ self.pytorch_hand_wrapper = TorchModuleWrapper(pytorch_hand_model)
37
+ self.pytorch_hand_wrapper.trainable = False
38
+ self.num_body_joints = 26
39
+ self.num_body_pafs = 52
40
+
41
+ def call(self, input_image):
42
+ """
43
+ Process input image and extract pose information
44
+
45
+ Args:
46
+ input_image: Input image tensor
47
+
48
+ Returns:
49
+ tuple: (body_candidates, body_subset, hand_peaks)
50
+ """
51
+ candidate, subset = self.extract_body_pose(input_image.cpu().numpy())
52
+ hand_regions = utils.detect_hand_regions(candidate, subset, input_image.cpu().numpy())
53
+
54
+ all_hand_keypoints = []
55
+ for x, y, w, is_left in hand_regions:
56
+ hand_peaks = self.extract_hand_pose(input_image.cpu().numpy()[y:y+w, x:x+w, :])
57
+ hand_peaks[:, 0] = np.where(hand_peaks[:, 0] == 0, hand_peaks[:, 0], hand_peaks[:, 0] + x)
58
+ hand_peaks[:, 1] = np.where(hand_peaks[:, 1] == 0, hand_peaks[:, 1], hand_peaks[:, 1] + y)
59
+ all_hand_keypoints.append(hand_peaks)
60
+
61
+ return candidate, subset, all_hand_keypoints
62
+
63
+ def extract_body_pose(self, input_image):
64
+ """
65
+ Extract body pose keypoints from input image
66
+
67
+ Args:
68
+ input_image: Input image array
69
+
70
+ Returns:
71
+ tuple: (candidates, subset) containing pose information
72
+ """
73
+ model_type = 'body25'
74
+ scale_factors = [0.5]
75
+ box_size = 368
76
+ stride = 8
77
+ padding_value = 128
78
+ threshold_1 = 0.1
79
+ threshold_2 = 0.05
80
+
81
+ # Calculate scale multipliers
82
+ multiplier = [x * box_size / input_image.shape[0] for x in scale_factors]
83
+ heatmap_average = np.zeros((input_image.shape[0], input_image.shape[1], self.num_body_joints))
84
+ paf_average = np.zeros((input_image.shape[0], input_image.shape[1], self.num_body_pafs))
85
+
86
+ for m in range(len(multiplier)):
87
+ scale = multiplier[m]
88
+ test_image = cv2.resize(input_image, (0, 0), fx=scale, fy=scale, interpolation=cv2.INTER_CUBIC)
89
+ padded_image, pad = utils.pad_image_corner(test_image, stride, padding_value)
90
+
91
+ # Prepare image tensor
92
+ image_tensor = np.transpose(np.float32(padded_image[:, :, :, np.newaxis]), (3, 2, 0, 1)) / 256 - 0.5
93
+ image_tensor = np.ascontiguousarray(image_tensor)
94
+
95
+ # Convert to PyTorch tensor
96
+ data = torch.from_numpy(image_tensor).float()
97
+ if torch.cuda.is_available():
98
+ data = data.cuda()
99
+
100
+ with torch.no_grad():
101
+ stage6_L1, stage6_L2 = self.pytorch_body_wrapper(data)
102
+
103
+ stage6_L1 = stage6_L1.cpu().numpy()
104
+ stage6_L2 = stage6_L2.cpu().numpy()
105
+
106
+ # Process heatmaps
107
+ heatmap = np.transpose(np.squeeze(stage6_L2), (1, 2, 0))
108
+ heatmap = cv2.resize(heatmap, (0, 0), fx=stride, fy=stride, interpolation=cv2.INTER_CUBIC)
109
+ heatmap = heatmap[:padded_image.shape[0] - pad[2], :padded_image.shape[1] - pad[3], :]
110
+ heatmap = cv2.resize(heatmap, (input_image.shape[1], input_image.shape[0]), interpolation=cv2.INTER_CUBIC)
111
+
112
+ # Process PAFs (Part Affinity Fields)
113
+ paf = np.transpose(np.squeeze(stage6_L1), (1, 2, 0))
114
+ paf = cv2.resize(paf, (0, 0), fx=stride, fy=stride, interpolation=cv2.INTER_CUBIC)
115
+ paf = paf[:padded_image.shape[0] - pad[2], :padded_image.shape[1] - pad[3], :]
116
+ paf = cv2.resize(paf, (input_image.shape[1], input_image.shape[0]), interpolation=cv2.INTER_CUBIC)
117
+
118
+ heatmap_average += heatmap / len(multiplier)
119
+ paf_average += paf / len(multiplier)
120
+
121
+ # Extract peaks from heatmaps
122
+ all_peaks = []
123
+ peak_counter = 0
124
+
125
+ for part in range(self.num_body_joints - 1):
126
+ original_map = heatmap_average[:, :, part]
127
+ smoothed_heatmap = gaussian_filter(original_map, sigma=3)
128
+
129
+ # Find local maxima
130
+ left_map = np.zeros(smoothed_heatmap.shape)
131
+ left_map[1:, :] = smoothed_heatmap[:-1, :]
132
+ right_map = np.zeros(smoothed_heatmap.shape)
133
+ right_map[:-1, :] = smoothed_heatmap[1:, :]
134
+ up_map = np.zeros(smoothed_heatmap.shape)
135
+ up_map[:, 1:] = smoothed_heatmap[:, :-1]
136
+ down_map = np.zeros(smoothed_heatmap.shape)
137
+ down_map[:, :-1] = smoothed_heatmap[:, 1:]
138
+
139
+ peaks_binary = np.logical_and.reduce(
140
+ (smoothed_heatmap >= left_map, smoothed_heatmap >= right_map,
141
+ smoothed_heatmap >= up_map, smoothed_heatmap >= down_map,
142
+ smoothed_heatmap > threshold_1)
143
+ )
144
+
145
+ peaks = list(zip(np.nonzero(peaks_binary)[1], np.nonzero(peaks_binary)[0]))
146
+ peaks_with_score = [x + (original_map[x[1], x[0]],) for x in peaks]
147
+ peak_id = range(peak_counter, peak_counter + len(peaks))
148
+ peaks_with_score_and_id = [peaks_with_score[i] + (peak_id[i],) for i in range(len(peak_id))]
149
+
150
+ all_peaks.append(peaks_with_score_and_id)
151
+ peak_counter += len(peaks)
152
+
153
+ # Define limb connections for body25 model
154
+ if model_type == 'body25':
155
+ limb_sequence = [
156
+ [1,0],[1,2],[2,3],[3,4],[1,5],[5,6],[6,7],[1,8],[8,9],[9,10],
157
+ [10,11],[8,12],[12,13],[13,14],[0,15],[0,16],[15,17],[16,18],
158
+ [11,24],[11,22],[14,21],[14,19],[22,23],[19,20]
159
+ ]
160
+ map_index = [
161
+ [30,31],[14,15],[16,17],[18,19],[22,23],[24,25],[26,27],[0,1],[6,7],
162
+ [2,3],[4,5],[8,9],[10,11],[12,13],[32,33],[34,35],[36,37],[38,39],
163
+ [50,51],[46,47],[44,45],[40,41],[48,49],[42,43]
164
+ ]
165
+
166
+ # Find connections between body parts
167
+ connection_all = []
168
+ special_k = []
169
+ mid_num = 10
170
+
171
+ for k in range(len(map_index)):
172
+ score_mid = paf_average[:, :, map_index[k]]
173
+ candA = all_peaks[limb_sequence[k][0]]
174
+ candB = all_peaks[limb_sequence[k][1]]
175
+
176
+ nA = len(candA)
177
+ nB = len(candB)
178
+ indexA, indexB = limb_sequence[k]
179
+
180
+ if nA != 0 and nB != 0:
181
+ connection_candidate = []
182
+ for i in range(nA):
183
+ for j in range(nB):
184
+ vec = np.subtract(candB[j][:2], candA[i][:2])
185
+ norm = math.sqrt(vec[0] * vec[0] + vec[1] * vec[1])
186
+ norm = max(0.001, norm)
187
+ vec = np.divide(vec, norm)
188
+
189
+ startend = list(zip(
190
+ np.linspace(candA[i][0], candB[j][0], num=mid_num),
191
+ np.linspace(candA[i][1], candB[j][1], num=mid_num)
192
+ ))
193
+
194
+ vec_x = np.array([
195
+ score_mid[int(round(startend[I][1])), int(round(startend[I][0])), 0]
196
+ for I in range(len(startend))
197
+ ])
198
+ vec_y = np.array([
199
+ score_mid[int(round(startend[I][1])), int(round(startend[I][0])), 1]
200
+ for I in range(len(startend))
201
+ ])
202
+
203
+ score_midpts = np.multiply(vec_x, vec[0]) + np.multiply(vec_y, vec[1])
204
+ score_with_dist_prior = (sum(score_midpts) / len(score_midpts) +
205
+ min(0.5 * input_image.shape[0] / norm - 1, 0))
206
+
207
+ criterion1 = len(np.nonzero(score_midpts > threshold_2)[0]) > 0.8 * len(score_midpts)
208
+ criterion2 = score_with_dist_prior > 0
209
+
210
+ if criterion1 and criterion2:
211
+ connection_candidate.append([
212
+ i, j, score_with_dist_prior,
213
+ score_with_dist_prior + candA[i][2] + candB[j][2]
214
+ ])
215
+
216
+ connection_candidate = sorted(connection_candidate, key=lambda x: x[2], reverse=True)
217
+ connection = np.zeros((0, 5))
218
+
219
+ for c in range(len(connection_candidate)):
220
+ i, j, s = connection_candidate[c][0:3]
221
+ if i not in connection[:, 3] and j not in connection[:, 4]:
222
+ connection = np.vstack([connection, [candA[i][3], candB[j][3], s, i, j]])
223
+ if len(connection) >= min(nA, nB):
224
+ break
225
+
226
+ connection_all.append(connection)
227
+ else:
228
+ special_k.append(k)
229
+ connection_all.append([])
230
+
231
+ # Create human pose subsets
232
+ subset = -1 * np.ones((0, self.num_body_joints + 1))
233
+ candidate = np.array([item for sublist in all_peaks for item in sublist])
234
+
235
+ for k in range(len(map_index)):
236
+ if k not in special_k:
237
+ partAs = connection_all[k][:, 0]
238
+ partBs = connection_all[k][:, 1]
239
+ indexA, indexB = np.array(limb_sequence[k])
240
+
241
+ for i in range(len(connection_all[k])):
242
+ found = 0
243
+ subset_idx = [-1, -1]
244
+
245
+ for j in range(len(subset)):
246
+ if subset[j][indexA] == partAs[i] or subset[j][indexB] == partBs[i]:
247
+ subset_idx[found] = j
248
+ found += 1
249
+
250
+ if found == 1:
251
+ j = subset_idx[0]
252
+ if subset[j][indexB] != partBs[i]:
253
+ subset[j][indexB] = partBs[i]
254
+ subset[j][-1] += 1
255
+ subset[j][-2] += candidate[partBs[i].astype(int), 2] + connection_all[k][i][2]
256
+ elif found == 2:
257
+ j1, j2 = subset_idx
258
+ membership = ((subset[j1] >= 0).astype(int) + (subset[j2] >= 0).astype(int))[:-2]
259
+ if len(np.nonzero(membership == 2)[0]) == 0:
260
+ subset[j1][:-2] += (subset[j2][:-2] + 1)
261
+ subset[j1][-2:] += subset[j2][-2:]
262
+ subset[j1][-2] += connection_all[k][i][2]
263
+ subset = np.delete(subset, j2, 0)
264
+ else:
265
+ subset[j1][indexB] = partBs[i]
266
+ subset[j1][-1] += 1
267
+ subset[j1][-2] += candidate[partBs[i].astype(int), 2] + connection_all[k][i][2]
268
+ elif not found and k < self.num_body_joints - 2:
269
+ row = -1 * np.ones(self.num_body_joints + 1)
270
+ row[indexA] = partAs[i]
271
+ row[indexB] = partBs[i]
272
+ row[-1] = 2
273
+ row[-2] = sum(candidate[connection_all[k][i, :2].astype(int), 2]) + connection_all[k][i][2]
274
+ subset = np.vstack([subset, row])
275
+
276
+ # Filter out low-quality detections
277
+ deleteIdx = []
278
+ for i in range(len(subset)):
279
+ if subset[i][-1] < 4 or subset[i][-2] / subset[i][-1] < 0.4:
280
+ deleteIdx.append(i)
281
+ subset = np.delete(subset, deleteIdx, axis=0)
282
+
283
+ return candidate, subset
284
+
285
+ def extract_hand_pose(self, input_image):
286
+ """
287
+ Extract hand pose keypoints from input image region
288
+
289
+ Args:
290
+ input_image: Cropped hand region image
291
+
292
+ Returns:
293
+ numpy.ndarray: Hand keypoint coordinates
294
+ """
295
+ scale_factors = [0.5, 1.0, 1.5, 2.0]
296
+ box_size = 368
297
+ stride = 8
298
+ padding_value = 128
299
+ threshold = 0.05
300
+
301
+ multiplier = [x * box_size / input_image.shape[0] for x in scale_factors]
302
+ heatmap_average = np.zeros((input_image.shape[0], input_image.shape[1], 22))
303
+
304
+ for m in range(len(multiplier)):
305
+ scale = multiplier[m]
306
+ test_image = cv2.resize(input_image, (0, 0), fx=scale, fy=scale, interpolation=cv2.INTER_CUBIC)
307
+ padded_image, pad = utils.pad_image_corner(test_image, stride, padding_value)
308
+
309
+ # Prepare image tensor
310
+ image_tensor = np.transpose(np.float32(padded_image[:, :, :, np.newaxis]), (3, 2, 0, 1)) / 256 - 0.5
311
+ image_tensor = np.ascontiguousarray(image_tensor)
312
+
313
+ data = torch.from_numpy(image_tensor).float()
314
+ if torch.cuda.is_available():
315
+ data = data.cuda()
316
+
317
+ with torch.no_grad():
318
+ output = self.pytorch_hand_wrapper(data).cpu().numpy()
319
+
320
+ # Process heatmaps
321
+ heatmap = np.transpose(np.squeeze(output), (1, 2, 0))
322
+ heatmap = cv2.resize(heatmap, (0, 0), fx=stride, fy=stride, interpolation=cv2.INTER_CUBIC)
323
+ heatmap = heatmap[:padded_image.shape[0] - pad[2], :padded_image.shape[1] - pad[3], :]
324
+ heatmap = cv2.resize(heatmap, (input_image.shape[1], input_image.shape[0]), interpolation=cv2.INTER_CUBIC)
325
+
326
+ heatmap_average += heatmap / len(multiplier)
327
+
328
+ # Extract hand keypoints
329
+ all_peaks = []
330
+ for part in range(21):
331
+ original_map = heatmap_average[:, :, part]
332
+ smoothed_heatmap = gaussian_filter(original_map, sigma=3)
333
+ binary = np.ascontiguousarray(smoothed_heatmap > threshold, dtype=np.uint8)
334
+
335
+ if np.sum(binary) == 0:
336
+ all_peaks.append([0, 0])
337
+ continue
338
+
339
+ label_img, label_numbers = label(binary, return_num=True, connectivity=binary.ndim)
340
+ max_index = np.argmax([np.sum(original_map[label_img == i]) for i in range(1, label_numbers + 1)]) + 1
341
+ label_img[label_img != max_index] = 0
342
+ original_map[label_img == 0] = 0
343
+
344
+ y, x = utils.find_array_maximum(original_map)
345
+ all_peaks.append([x, y])
346
+
347
+ return np.array(all_peaks)
348
+
349
+
350
+ class ISLTranslationModel(keras.Model):
351
+ """
352
+ Complete ISL Translation Model combining pose estimation and LSTM translation
353
+ Developed by TechMatrix Solvers for end-to-end sign language translation
354
+ """
355
+
356
+ def __init__(self, body_model, hand_model, translation_model):
357
+ super().__init__()
358
+ self.pytorch_body_wrapper = TorchModuleWrapper(body_model)
359
+ self.pytorch_body_wrapper.trainable = False
360
+ self.pytorch_hand_wrapper = TorchModuleWrapper(hand_model)
361
+ self.pytorch_hand_wrapper.trainable = False
362
+
363
+ self.num_body_joints = 26
364
+ self.num_body_pafs = 52
365
+ self.model_type = 'body25'
366
+ self.translation_network = translation_model
367
+
368
+ def call(self, frame_sequence):
369
+ """
370
+ Process a sequence of frames and return translation prediction
371
+
372
+ Args:
373
+ frame_sequence: Sequence of video frames
374
+
375
+ Returns:
376
+ Translation prediction probabilities
377
+ """
378
+ window_size = 20
379
+ feature_sequence = []
380
+ blank_frame = np.zeros((1, 156))
381
+
382
+ for idx, frame in enumerate(frame_sequence.cpu()):
383
+ # Extract pose features from current frame
384
+ candidate, subset = self.extract_body_pose(frame.cpu().numpy())
385
+ hand_regions = utils.detect_hand_regions(candidate, subset, frame.cpu().numpy())
386
+
387
+ all_hand_keypoints = []
388
+ for x, y, w, is_left in hand_regions:
389
+ peaks = self.extract_hand_pose(frame.cpu().numpy()[y:y+w, x:x+w, :])
390
+ peaks[:, 0] = np.where(peaks[:, 0] == 0, peaks[:, 0], peaks[:, 0] + x)
391
+ peaks[:, 1] = np.where(peaks[:, 1] == 0, peaks[:, 1], peaks[:, 1] + y)
392
+ all_hand_keypoints.append(peaks)
393
+
394
+ # Extract structured pose data
395
+ body_circles, body_sticks = utils.extract_body_pose_data(candidate, subset, self.model_type)
396
+ hand_edges, hand_peaks = utils.extract_hand_pose_data(all_hand_keypoints)
397
+
398
+ # Convert to feature vector
399
+ feature_vector = self.create_feature_vector(body_circles, hand_peaks)
400
+ feature_sequence.append(feature_vector)
401
+
402
+ # Pad sequence if needed
403
+ if len(feature_sequence) < window_size:
404
+ for _ in range(window_size - len(feature_sequence)):
405
+ feature_sequence.append(blank_frame)
406
+
407
+ # Run translation model
408
+ return self.translation_network(np.array(feature_sequence).reshape(1, 20, 156))
409
+
410
+ def create_feature_vector(self, body_circles, hand_peaks):
411
+ """
412
+ Create feature vector from pose data
413
+
414
+ Args:
415
+ body_circles: Body keypoint coordinates
416
+ hand_peaks: Hand keypoint data
417
+
418
+ Returns:
419
+ numpy.ndarray: 156-dimensional feature vector
420
+ """
421
+ features = []
422
+
423
+ # Body keypoint x-coordinates (15 points)
424
+ for idx in range(15):
425
+ if idx < len(body_circles):
426
+ features.append(body_circles[idx][0])
427
+ else:
428
+ features.append(0)
429
+
430
+ # Body keypoint y-coordinates (15 points)
431
+ for idx in range(15):
432
+ if idx < len(body_circles):
433
+ features.append(body_circles[idx][1])
434
+ else:
435
+ features.append(0)
436
+
437
+ # Hand features for both hands
438
+ for hand_idx in range(2):
439
+ # Hand x-coordinates (21 points)
440
+ for idx in range(21):
441
+ if idx < len(hand_peaks[hand_idx]):
442
+ features.append(float(hand_peaks[hand_idx][idx][0]))
443
+ else:
444
+ features.append(0)
445
+
446
+ # Hand y-coordinates (21 points)
447
+ for idx in range(21):
448
+ if idx < len(hand_peaks[hand_idx]):
449
+ features.append(float(hand_peaks[hand_idx][idx][1]))
450
+ else:
451
+ features.append(0)
452
+
453
+ # Hand peak text/confidence (21 points)
454
+ for idx in range(21):
455
+ if idx < len(hand_peaks[hand_idx]):
456
+ features.append(float(hand_peaks[hand_idx][idx][2]))
457
+ else:
458
+ features.append(0)
459
+
460
+ return np.array(features)
461
+
462
+ def extract_body_pose(self, input_image):
463
+ """Extract body pose - same implementation as ISLPoseEstimator"""
464
+ # This method would contain the same implementation as in ISLPoseEstimator
465
+ # For brevity, using a placeholder that calls the same logic
466
+ pose_estimator = ISLPoseEstimator(None, None)
467
+ pose_estimator.pytorch_body_wrapper = self.pytorch_body_wrapper
468
+ pose_estimator.num_body_joints = self.num_body_joints
469
+ pose_estimator.num_body_pafs = self.num_body_pafs
470
+ return pose_estimator.extract_body_pose(input_image)
471
+
472
+ def extract_hand_pose(self, input_image):
473
+ """Extract hand pose - same implementation as ISLPoseEstimator"""
474
+ # This method would contain the same implementation as in ISLPoseEstimator
475
+ # For brevity, using a placeholder that calls the same logic
476
+ pose_estimator = ISLPoseEstimator(None, None)
477
+ pose_estimator.pytorch_hand_wrapper = self.pytorch_hand_wrapper
478
+ return pose_estimator.extract_hand_pose(input_image)
model-graph.png ADDED

Git LFS Details

  • SHA256: b6851b6c85a8f927ba048ee910edc4011e880a6844432cb0930400d8de5cc0d4
  • Pointer size: 131 Bytes
  • Size of remote file: 761 kB
packages.txt ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ ffmpeg
2
+ libgl1
3
+ libglib2.0-0
4
+ libsm6
5
+ libxrender1
6
+ libxext6
pose_models.py ADDED
@@ -0,0 +1,360 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ ISL Sign Language Translation - TechMatrix Solvers Initiative
3
+ Model definitions for body pose and hand pose estimation
4
+ Developed by: TechMatrix Solvers Team
5
+ """
6
+
7
+ import torch
8
+ from collections import OrderedDict
9
+ import torch.nn as nn
10
+
11
+
12
+ def construct_layers(layer_config, no_relu_layers, prelu_layers=[]):
13
+ """
14
+ Constructs neural network layers based on configuration
15
+
16
+ Args:
17
+ layer_config: Dictionary defining layer parameters
18
+ no_relu_layers: List of layers that shouldn't have ReLU activation
19
+ prelu_layers: List of layers that should use PReLU instead of ReLU
20
+ """
21
+ layers = []
22
+
23
+ for layer_name, params in layer_config.items():
24
+ if 'pool' in layer_name:
25
+ layer = nn.MaxPool2d(kernel_size=params[0], stride=params[1], padding=params[2])
26
+ layers.append((layer_name, layer))
27
+ else:
28
+ conv2d = nn.Conv2d(
29
+ in_channels=params[0],
30
+ out_channels=params[1],
31
+ kernel_size=params[2],
32
+ stride=params[3],
33
+ padding=params[4]
34
+ )
35
+ layers.append((layer_name, conv2d))
36
+
37
+ if layer_name not in no_relu_layers:
38
+ if layer_name not in prelu_layers:
39
+ layers.append(('relu_' + layer_name, nn.ReLU(inplace=True)))
40
+ else:
41
+ layers.append(('prelu' + layer_name[4:], nn.PReLU(params[1])))
42
+
43
+ return nn.Sequential(OrderedDict(layers))
44
+
45
+
46
+ def construct_multi_conv_layers(layer_config, no_relu_layers):
47
+ """
48
+ Constructs multiple convolution layers for complex architectures
49
+ """
50
+ modules = []
51
+ for layer_name, params in layer_config.items():
52
+ layers = []
53
+ if 'pool' in layer_name:
54
+ layer = nn.MaxPool2d(kernel_size=params[0], stride=params[1], padding=params[2])
55
+ layers.append((layer_name, layer))
56
+ else:
57
+ conv2d = nn.Conv2d(
58
+ in_channels=params[0],
59
+ out_channels=params[1],
60
+ kernel_size=params[2],
61
+ stride=params[3],
62
+ padding=params[4]
63
+ )
64
+ layers.append((layer_name, conv2d))
65
+ if layer_name not in no_relu_layers:
66
+ layers.append(('Mprelu' + layer_name[5:], nn.PReLU(params[1])))
67
+ modules.append(nn.Sequential(OrderedDict(layers)))
68
+ return nn.ModuleList(modules)
69
+
70
+
71
+ class BodyPose25Model(nn.Module):
72
+ """
73
+ Body pose estimation model using 25-point skeleton
74
+ Developed by TechMatrix Solvers for ISL translation
75
+ """
76
+
77
+ def __init__(self):
78
+ super(BodyPose25Model, self).__init__()
79
+
80
+ # Define layers without ReLU activation
81
+ no_relu_layers = [
82
+ 'Mconv7_stage0_L1', 'Mconv7_stage0_L2',
83
+ 'Mconv7_stage1_L1', 'Mconv7_stage1_L2',
84
+ 'Mconv7_stage2_L2', 'Mconv7_stage3_L2'
85
+ ]
86
+ prelu_layers = ['conv4_2', 'conv4_3_CPM', 'conv4_4_CPM']
87
+
88
+ # Initial feature extraction layers
89
+ base_layers = OrderedDict([
90
+ ('conv1_1', [3, 64, 3, 1, 1]),
91
+ ('conv1_2', [64, 64, 3, 1, 1]),
92
+ ('pool1_stage1', [2, 2, 0]),
93
+ ('conv2_1', [64, 128, 3, 1, 1]),
94
+ ('conv2_2', [128, 128, 3, 1, 1]),
95
+ ('pool2_stage1', [2, 2, 0]),
96
+ ('conv3_1', [128, 256, 3, 1, 1]),
97
+ ('conv3_2', [256, 256, 3, 1, 1]),
98
+ ('conv3_3', [256, 256, 3, 1, 1]),
99
+ ('conv3_4', [256, 256, 3, 1, 1]),
100
+ ('pool3_stage1', [2, 2, 0]),
101
+ ('conv4_1', [256, 512, 3, 1, 1]),
102
+ ('conv4_2', [512, 512, 3, 1, 1]),
103
+ ('conv4_3_CPM', [512, 256, 3, 1, 1]),
104
+ ('conv4_4_CPM', [256, 128, 3, 1, 1])
105
+ ])
106
+ self.base_model = construct_layers(base_layers, no_relu_layers, prelu_layers)
107
+
108
+ # Multi-stage refinement blocks
109
+ stage_blocks = {}
110
+
111
+ # L2 branch - Stage 0
112
+ stage_blocks['Mconv1_stage0_L2'] = OrderedDict([
113
+ ('Mconv1_stage0_L2_0', [128, 96, 3, 1, 1]),
114
+ ('Mconv1_stage0_L2_1', [96, 96, 3, 1, 1]),
115
+ ('Mconv1_stage0_L2_2', [96, 96, 3, 1, 1])
116
+ ])
117
+
118
+ for i in range(2, 6):
119
+ stage_blocks[f'Mconv{i}_stage0_L2'] = OrderedDict([
120
+ (f'Mconv{i}_stage0_L2_0', [288, 96, 3, 1, 1]),
121
+ (f'Mconv{i}_stage0_L2_1', [96, 96, 3, 1, 1]),
122
+ (f'Mconv{i}_stage0_L2_2', [96, 96, 3, 1, 1])
123
+ ])
124
+
125
+ stage_blocks['Mconv6_7_stage0_L2'] = OrderedDict([
126
+ ('Mconv6_stage0_L2', [288, 256, 1, 1, 0]),
127
+ ('Mconv7_stage0_L2', [256, 52, 1, 1, 0])
128
+ ])
129
+
130
+ # L2 branch - Stages 1-3
131
+ for stage in range(1, 4):
132
+ stage_blocks[f'Mconv1_stage{stage}_L2'] = OrderedDict([
133
+ (f'Mconv1_stage{stage}_L2_0', [180, 128, 3, 1, 1]),
134
+ (f'Mconv1_stage{stage}_L2_1', [128, 128, 3, 1, 1]),
135
+ (f'Mconv1_stage{stage}_L2_2', [128, 128, 3, 1, 1])
136
+ ])
137
+ for i in range(2, 6):
138
+ stage_blocks[f'Mconv{i}_stage{stage}_L2'] = OrderedDict([
139
+ (f'Mconv{i}_stage{stage}_L2_0', [384, 128, 3, 1, 1]),
140
+ (f'Mconv{i}_stage{stage}_L2_1', [128, 128, 3, 1, 1]),
141
+ (f'Mconv{i}_stage{stage}_L2_2', [128, 128, 3, 1, 1])
142
+ ])
143
+ stage_blocks[f'Mconv6_7_stage{stage}_L2'] = OrderedDict([
144
+ (f'Mconv6_stage{stage}_L2', [384, 512, 1, 1, 0]),
145
+ (f'Mconv7_stage{stage}_L2', [512, 52, 1, 1, 0])
146
+ ])
147
+
148
+ # L1 branch configurations
149
+ stage_blocks['Mconv1_stage0_L1'] = OrderedDict([
150
+ ('Mconv1_stage0_L1_0', [180, 96, 3, 1, 1]),
151
+ ('Mconv1_stage0_L1_1', [96, 96, 3, 1, 1]),
152
+ ('Mconv1_stage0_L1_2', [96, 96, 3, 1, 1])
153
+ ])
154
+
155
+ for i in range(2, 6):
156
+ stage_blocks[f'Mconv{i}_stage0_L1'] = OrderedDict([
157
+ (f'Mconv{i}_stage0_L1_0', [288, 96, 3, 1, 1]),
158
+ (f'Mconv{i}_stage0_L1_1', [96, 96, 3, 1, 1]),
159
+ (f'Mconv{i}_stage0_L1_2', [96, 96, 3, 1, 1])
160
+ ])
161
+
162
+ stage_blocks['Mconv6_7_stage0_L1'] = OrderedDict([
163
+ ('Mconv6_stage0_L1', [288, 256, 1, 1, 0]),
164
+ ('Mconv7_stage0_L1', [256, 26, 1, 1, 0])
165
+ ])
166
+
167
+ stage_blocks['Mconv1_stage1_L1'] = OrderedDict([
168
+ ('Mconv1_stage1_L1_0', [206, 128, 3, 1, 1]),
169
+ ('Mconv1_stage1_L1_1', [128, 128, 3, 1, 1]),
170
+ ('Mconv1_stage1_L1_2', [128, 128, 3, 1, 1])
171
+ ])
172
+
173
+ for i in range(2, 6):
174
+ stage_blocks[f'Mconv{i}_stage1_L1'] = OrderedDict([
175
+ (f'Mconv{i}_stage1_L1_0', [384, 128, 3, 1, 1]),
176
+ (f'Mconv{i}_stage1_L1_1', [128, 128, 3, 1, 1]),
177
+ (f'Mconv{i}_stage1_L1_2', [128, 128, 3, 1, 1])
178
+ ])
179
+
180
+ stage_blocks['Mconv6_7_stage1_L1'] = OrderedDict([
181
+ ('Mconv6_stage1_L1', [384, 512, 1, 1, 0]),
182
+ ('Mconv7_stage1_L1', [512, 26, 1, 1, 0])
183
+ ])
184
+
185
+ # Build multi-conv modules
186
+ for block_name in stage_blocks.keys():
187
+ stage_blocks[block_name] = construct_multi_conv_layers(stage_blocks[block_name], no_relu_layers)
188
+
189
+ self.stage_models = nn.ModuleDict(stage_blocks)
190
+
191
+ # Freeze parameters for efficiency
192
+ for param in self.parameters():
193
+ param.requires_grad = False
194
+
195
+ def _multi_conv_forward(self, x, models):
196
+ """Forward pass through multi-convolution blocks"""
197
+ outputs = []
198
+ current_output = x
199
+ for model in models:
200
+ current_output = model(current_output)
201
+ outputs.append(current_output)
202
+ return torch.cat(outputs, 1)
203
+
204
+ def forward(self, x):
205
+ """Forward pass through the body pose model"""
206
+ base_features = self.base_model(x)
207
+
208
+ # L2 branch processing
209
+ current_features = base_features
210
+ for stage in range(4):
211
+ current_features = self._multi_conv_forward(
212
+ current_features, self.stage_models[f'Mconv1_stage{stage}_L2']
213
+ )
214
+ for layer in range(2, 6):
215
+ current_features = self._multi_conv_forward(
216
+ current_features, self.stage_models[f'Mconv{layer}_stage{stage}_L2']
217
+ )
218
+ current_features = self.stage_models[f'Mconv6_7_stage{stage}_L2'][0](current_features)
219
+ current_features = self.stage_models[f'Mconv6_7_stage{stage}_L2'][1](current_features)
220
+ l2_output = current_features
221
+ current_features = torch.cat([base_features, current_features], 1)
222
+
223
+ # L1 branch - Stage 0
224
+ current_features = self._multi_conv_forward(
225
+ current_features, self.stage_models['Mconv1_stage0_L1']
226
+ )
227
+ for layer in range(2, 6):
228
+ current_features = self._multi_conv_forward(
229
+ current_features, self.stage_models[f'Mconv{layer}_stage0_L1']
230
+ )
231
+ current_features = self.stage_models['Mconv6_7_stage0_L1'][0](current_features)
232
+ current_features = self.stage_models['Mconv6_7_stage0_L1'][1](current_features)
233
+ stage0_l1_output = current_features
234
+ current_features = torch.cat([base_features, stage0_l1_output, l2_output], 1)
235
+
236
+ # L1 branch - Stage 1
237
+ current_features = self._multi_conv_forward(
238
+ current_features, self.stage_models['Mconv1_stage1_L1']
239
+ )
240
+ for layer in range(2, 6):
241
+ current_features = self._multi_conv_forward(
242
+ current_features, self.stage_models[f'Mconv{layer}_stage1_L1']
243
+ )
244
+ current_features = self.stage_models['Mconv6_7_stage1_L1'][0](current_features)
245
+ stage1_l1_output = self.stage_models['Mconv6_7_stage1_L1'][1](current_features)
246
+
247
+ return l2_output, stage1_l1_output
248
+
249
+
250
+ class HandPoseModel(nn.Module):
251
+ """
252
+ Hand pose estimation model using 21-point hand landmarks
253
+ Developed by TechMatrix Solvers for ISL translation
254
+ """
255
+
256
+ def __init__(self):
257
+ super(HandPoseModel, self).__init__()
258
+
259
+ # Layers without ReLU activation
260
+ no_relu_layers = [
261
+ 'conv6_2_CPM', 'Mconv7_stage2', 'Mconv7_stage3',
262
+ 'Mconv7_stage4', 'Mconv7_stage5', 'Mconv7_stage6'
263
+ ]
264
+
265
+ # Stage 1 - Feature extraction
266
+ stage1_base = OrderedDict([
267
+ ('conv1_1', [3, 64, 3, 1, 1]),
268
+ ('conv1_2', [64, 64, 3, 1, 1]),
269
+ ('pool1_stage1', [2, 2, 0]),
270
+ ('conv2_1', [64, 128, 3, 1, 1]),
271
+ ('conv2_2', [128, 128, 3, 1, 1]),
272
+ ('pool2_stage1', [2, 2, 0]),
273
+ ('conv3_1', [128, 256, 3, 1, 1]),
274
+ ('conv3_2', [256, 256, 3, 1, 1]),
275
+ ('conv3_3', [256, 256, 3, 1, 1]),
276
+ ('conv3_4', [256, 256, 3, 1, 1]),
277
+ ('pool3_stage1', [2, 2, 0]),
278
+ ('conv4_1', [256, 512, 3, 1, 1]),
279
+ ('conv4_2', [512, 512, 3, 1, 1]),
280
+ ('conv4_3', [512, 512, 3, 1, 1]),
281
+ ('conv4_4', [512, 512, 3, 1, 1]),
282
+ ('conv5_1', [512, 512, 3, 1, 1]),
283
+ ('conv5_2', [512, 512, 3, 1, 1]),
284
+ ('conv5_3_CPM', [512, 128, 3, 1, 1])
285
+ ])
286
+
287
+ stage1_prediction = OrderedDict([
288
+ ('conv6_1_CPM', [128, 512, 1, 1, 0]),
289
+ ('conv6_2_CPM', [512, 22, 1, 1, 0])
290
+ ])
291
+
292
+ stage_blocks = {}
293
+ stage_blocks['stage1_base'] = stage1_base
294
+ stage_blocks['stage1_prediction'] = stage1_prediction
295
+
296
+ # Stages 2-6 refinement
297
+ for i in range(2, 7):
298
+ stage_blocks[f'stage{i}'] = OrderedDict([
299
+ (f'Mconv1_stage{i}', [150, 128, 7, 1, 3]),
300
+ (f'Mconv2_stage{i}', [128, 128, 7, 1, 3]),
301
+ (f'Mconv3_stage{i}', [128, 128, 7, 1, 3]),
302
+ (f'Mconv4_stage{i}', [128, 128, 7, 1, 3]),
303
+ (f'Mconv5_stage{i}', [128, 128, 7, 1, 3]),
304
+ (f'Mconv6_stage{i}', [128, 128, 1, 1, 0]),
305
+ (f'Mconv7_stage{i}', [128, 22, 1, 1, 0])
306
+ ])
307
+
308
+ # Build all stage models
309
+ for block_name in stage_blocks.keys():
310
+ stage_blocks[block_name] = construct_layers(stage_blocks[block_name], no_relu_layers)
311
+
312
+ self.stage1_base_model = stage_blocks['stage1_base']
313
+ self.stage1_prediction_model = stage_blocks['stage1_prediction']
314
+ self.stage2_model = stage_blocks['stage2']
315
+ self.stage3_model = stage_blocks['stage3']
316
+ self.stage4_model = stage_blocks['stage4']
317
+ self.stage5_model = stage_blocks['stage5']
318
+ self.stage6_model = stage_blocks['stage6']
319
+
320
+ # Freeze parameters for efficiency
321
+ for param in self.parameters():
322
+ param.requires_grad = False
323
+
324
+ def forward(self, x):
325
+ """Forward pass through the hand pose model"""
326
+ base_features = self.stage1_base_model(x)
327
+ stage1_output = self.stage1_prediction_model(base_features)
328
+
329
+ # Stage 2
330
+ stage2_input = torch.cat([stage1_output, base_features], 1)
331
+ stage2_output = self.stage2_model(stage2_input)
332
+
333
+ # Stage 3
334
+ stage3_input = torch.cat([stage2_output, base_features], 1)
335
+ stage3_output = self.stage3_model(stage3_input)
336
+
337
+ # Stage 4
338
+ stage4_input = torch.cat([stage3_output, base_features], 1)
339
+ stage4_output = self.stage4_model(stage4_input)
340
+
341
+ # Stage 5
342
+ stage5_input = torch.cat([stage4_output, base_features], 1)
343
+ stage5_output = self.stage5_model(stage5_input)
344
+
345
+ # Stage 6
346
+ stage6_input = torch.cat([stage5_output, base_features], 1)
347
+ stage6_output = self.stage6_model(stage6_input)
348
+
349
+ return stage6_output
350
+
351
+
352
+ # Factory functions for easy model instantiation
353
+ def create_bodypose_model():
354
+ """Create and return body pose detection model"""
355
+ return BodyPose25Model()
356
+
357
+
358
+ def create_handpose_model():
359
+ """Create and return hand pose detection model"""
360
+ return HandPoseModel()
pose_utils.py ADDED
@@ -0,0 +1,468 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ ISL Sign Language Translation - TechMatrix Solvers Initiative
3
+ Utility functions for pose processing and visualization
4
+ Developed by: TechMatrix Solvers Team
5
+ """
6
+
7
+ import numpy as np
8
+ import math
9
+ import cv2
10
+ import matplotlib
11
+ from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas
12
+ from matplotlib.figure import Figure
13
+ import matplotlib.pyplot as plt
14
+ import copy
15
+ import seaborn as sns
16
+
17
+
18
+ def pad_image_corner(img, stride, pad_value):
19
+ """
20
+ Pad image to ensure dimensions are divisible by stride
21
+
22
+ Args:
23
+ img: Input image array
24
+ stride: Stride value for padding calculation
25
+ pad_value: Value to use for padding
26
+ """
27
+ h, w = img.shape[:2]
28
+
29
+ pad = [0, 0, 0, 0] # [up, left, down, right]
30
+ pad[2] = 0 if (h % stride == 0) else stride - (h % stride) # down
31
+ pad[3] = 0 if (w % stride == 0) else stride - (w % stride) # right
32
+
33
+ img_padded = img
34
+
35
+ # Add padding
36
+ if pad[0] > 0: # up
37
+ pad_up = np.tile(img_padded[0:1, :, :] * 0 + pad_value, (pad[0], 1, 1))
38
+ img_padded = np.concatenate((pad_up, img_padded), axis=0)
39
+
40
+ if pad[1] > 0: # left
41
+ pad_left = np.tile(img_padded[:, 0:1, :] * 0 + pad_value, (1, pad[1], 1))
42
+ img_padded = np.concatenate((pad_left, img_padded), axis=1)
43
+
44
+ if pad[2] > 0: # down
45
+ pad_down = np.tile(img_padded[-2:-1, :, :] * 0 + pad_value, (pad[2], 1, 1))
46
+ img_padded = np.concatenate((img_padded, pad_down), axis=0)
47
+
48
+ if pad[3] > 0: # right
49
+ pad_right = np.tile(img_padded[:, -2:-1, :] * 0 + pad_value, (1, pad[3], 1))
50
+ img_padded = np.concatenate((img_padded, pad_right), axis=1)
51
+
52
+ return img_padded, pad
53
+
54
+
55
+ def transfer_model_weights(model, model_weights):
56
+ """
57
+ Transfer weights from caffe model to pytorch model format
58
+
59
+ Args:
60
+ model: PyTorch model
61
+ model_weights: Dictionary of weights from caffe model
62
+ """
63
+ transferred_weights = {}
64
+ for weights_name in model.state_dict().keys():
65
+ if len(weights_name.split('.')) > 4: # body25 format
66
+ transferred_weights[weights_name] = model_weights['.'.join(
67
+ weights_name.split('.')[3:])]
68
+ else:
69
+ transferred_weights[weights_name] = model_weights['.'.join(
70
+ weights_name.split('.')[1:])]
71
+ return transferred_weights
72
+
73
+
74
+ def draw_body_pose_visualization(canvas, candidate, subset, model_type='body25'):
75
+ """
76
+ Draw body pose keypoints and connections on image
77
+
78
+ Args:
79
+ canvas: Image to draw on
80
+ candidate: Detected keypoint candidates
81
+ subset: Valid keypoint connections
82
+ model_type: Type of pose model ('body25' or 'coco')
83
+ """
84
+ stick_width = 4
85
+
86
+ if model_type == 'body25':
87
+ limb_sequence = [
88
+ [1,0],[1,2],[2,3],[3,4],[1,5],[5,6],[6,7],[1,8],[8,9],[9,10],
89
+ [10,11],[8,12],[12,13],[13,14],[0,15],[0,16],[15,17],[16,18],
90
+ [11,24],[11,22],[14,21],[14,19],[22,23],[19,20]
91
+ ]
92
+ num_joints = 25
93
+ else:
94
+ limb_sequence = [
95
+ [1, 2], [1, 5], [2, 3], [3, 4], [5, 6], [6, 7], [1, 8], [8, 9],
96
+ [9, 10], [1, 11], [11, 12], [12, 13], [1, 0], [0, 14], [14, 16],
97
+ [0, 15], [15, 17], [2, 16], [5, 17]
98
+ ]
99
+ num_joints = 18
100
+
101
+ # Color scheme for different joints
102
+ colors = [
103
+ [255, 0, 0], [255, 85, 0], [255, 170, 0], [255, 255, 0], [170, 255, 0],
104
+ [85, 255, 0], [0, 255, 0], [0, 255, 85], [0, 255, 170], [0, 255, 255],
105
+ [0, 170, 255], [0, 85, 255], [0, 0, 255], [85, 0, 255], [170, 0, 255],
106
+ [255, 0, 255], [255, 0, 170], [255, 0, 85], [255,255,0], [255,255,85],
107
+ [255,255,170], [255,255,255], [170,255,255], [85,255,255], [0,255,255]
108
+ ]
109
+
110
+ # Draw keypoints
111
+ for i in range(num_joints):
112
+ for n in range(len(subset)):
113
+ index = int(subset[n][i])
114
+ if index == -1:
115
+ continue
116
+ x, y = candidate[index][0:2]
117
+ cv2.circle(canvas, (int(x), int(y)), 4, colors[i], thickness=-1)
118
+
119
+ # Draw limbs
120
+ for i in range(num_joints - 1):
121
+ for n in range(len(subset)):
122
+ index = subset[n][np.array(limb_sequence[i])]
123
+ if -1 in index:
124
+ continue
125
+ current_canvas = canvas.copy()
126
+ Y = candidate[index.astype(int), 0]
127
+ X = candidate[index.astype(int), 1]
128
+ mean_x = np.mean(X)
129
+ mean_y = np.mean(Y)
130
+ length = ((X[0] - X[1]) ** 2 + (Y[0] - Y[1]) ** 2) ** 0.5
131
+ angle = math.degrees(math.atan2(X[0] - X[1], Y[0] - Y[1]))
132
+ polygon = cv2.ellipse2Poly((int(mean_y), int(mean_x)),
133
+ (int(length / 2), stick_width),
134
+ int(angle), 0, 360, 1)
135
+ cv2.fillConvexPoly(current_canvas, polygon, colors[i])
136
+ canvas = cv2.addWeighted(canvas, 0.4, current_canvas, 0.6, 0)
137
+
138
+ return canvas
139
+
140
+
141
+ def extract_body_pose_data(candidate, subset, model_type='body25'):
142
+ """
143
+ Extract body pose data without drawing
144
+
145
+ Returns:
146
+ tuple: (keypoint_circles, limb_sticks) data for further processing
147
+ """
148
+ stick_width = 4
149
+
150
+ if model_type == 'body25':
151
+ limb_sequence = [
152
+ [1,0],[1,2],[2,3],[3,4],[1,5],[5,6],[6,7],[1,8],[8,9],[9,10],
153
+ [10,11],[8,12],[12,13],[13,14],[0,15],[0,16],[15,17],[16,18],
154
+ [11,24],[11,22],[14,21],[14,19],[22,23],[19,20]
155
+ ]
156
+ num_joints = 25
157
+ else:
158
+ limb_sequence = [
159
+ [1, 2], [1, 5], [2, 3], [3, 4], [5, 6], [6, 7], [1, 8], [8, 9],
160
+ [9, 10], [1, 11], [11, 12], [12, 13], [1, 0], [0, 14], [14, 16],
161
+ [0, 15], [15, 17], [2, 16], [5, 17]
162
+ ]
163
+ num_joints = 18
164
+
165
+ # Extract keypoint coordinates
166
+ keypoint_circles = []
167
+ for i in range(num_joints):
168
+ for n in range(len(subset)):
169
+ index = int(subset[n][i])
170
+ if index == -1:
171
+ continue
172
+ x, y = candidate[index][0:2]
173
+ keypoint_circles.append((x, y))
174
+
175
+ # Extract limb stick data
176
+ limb_sticks = []
177
+ for i in range(num_joints - 1):
178
+ for n in range(len(subset)):
179
+ index = subset[n][np.array(limb_sequence[i])]
180
+ if -1 in index:
181
+ continue
182
+ Y = candidate[index.astype(int), 0]
183
+ X = candidate[index.astype(int), 1]
184
+ mean_x = np.mean(X)
185
+ mean_y = np.mean(Y)
186
+ length = ((X[0] - X[1]) ** 2 + (Y[0] - Y[1]) ** 2) ** 0.5
187
+ angle = math.degrees(math.atan2(X[0] - X[1], Y[0] - Y[1]))
188
+ limb_sticks.append((mean_y, mean_x, angle, length))
189
+
190
+ return keypoint_circles, limb_sticks
191
+
192
+
193
+ def draw_hand_pose_visualization(canvas, all_hand_peaks, show_numbers=False):
194
+ """
195
+ Draw hand pose keypoints and connections
196
+
197
+ Args:
198
+ canvas: Image to draw on
199
+ all_hand_peaks: Detected hand keypoints for both hands
200
+ show_numbers: Whether to show keypoint numbers
201
+ """
202
+ edges = [
203
+ [0, 1], [1, 2], [2, 3], [3, 4], [0, 5], [5, 6], [6, 7], [7, 8], [0, 9], [9, 10],
204
+ [10, 11], [11, 12], [0, 13], [13, 14], [14, 15], [15, 16], [0, 17], [17, 18], [18, 19], [19, 20]
205
+ ]
206
+
207
+ fig = Figure(figsize=plt.figaspect(canvas))
208
+ fig.subplots_adjust(0, 0, 1, 1)
209
+ bg = FigureCanvas(fig)
210
+ ax = fig.subplots()
211
+ ax.axis('off')
212
+ ax.imshow(canvas)
213
+
214
+ width, height = ax.figure.get_size_inches() * ax.figure.get_dpi()
215
+
216
+ for peaks in all_hand_peaks:
217
+ for ie, e in enumerate(edges):
218
+ if np.sum(np.all(peaks[e], axis=1) == 0) == 0:
219
+ x1, y1 = peaks[e[0]]
220
+ x2, y2 = peaks[e[1]]
221
+ ax.plot([x1, x2], [y1, y2],
222
+ color=matplotlib.colors.hsv_to_rgb([ie/float(len(edges)), 1.0, 1.0]))
223
+
224
+ for i, keypoint in enumerate(peaks):
225
+ x, y = keypoint
226
+ ax.plot(x, y, 'r.')
227
+ if show_numbers:
228
+ ax.text(x, y, str(i))
229
+
230
+ bg.draw()
231
+ canvas = np.fromstring(bg.tostring_rgb(), dtype='uint8').reshape(int(height), int(width), 3)
232
+ return canvas
233
+
234
+
235
+ def extract_hand_pose_data(all_hand_peaks, show_numbers=False):
236
+ """
237
+ Extract hand pose data without drawing
238
+
239
+ Returns:
240
+ tuple: (hand_edges, hand_peaks) data for further processing
241
+ """
242
+ edges = [
243
+ [0, 1], [1, 2], [2, 3], [3, 4], [0, 5], [5, 6], [6, 7], [7, 8], [0, 9], [9, 10],
244
+ [10, 11], [11, 12], [0, 13], [13, 14], [14, 15], [15, 16], [0, 17], [17, 18], [18, 19], [19, 20]
245
+ ]
246
+
247
+ export_edges = [[], []]
248
+ export_peaks = [[], []]
249
+
250
+ for idx, peaks in enumerate(all_hand_peaks):
251
+ for ie, e in enumerate(edges):
252
+ if np.sum(np.all(peaks[e], axis=1) == 0) == 0:
253
+ x1, y1 = peaks[e[0]]
254
+ x2, y2 = peaks[e[1]]
255
+ export_edges[idx].append((ie, (x1, y1), (x2, y2)))
256
+
257
+ for i, keypoint in enumerate(peaks):
258
+ x, y = keypoint
259
+ export_peaks[idx].append((x, y, str(i)))
260
+
261
+ return export_edges, export_peaks
262
+
263
+
264
+ def detect_hand_regions(candidate, subset, original_image):
265
+ """
266
+ Detect hand regions based on body pose keypoints
267
+
268
+ Args:
269
+ candidate: Body pose candidates
270
+ subset: Valid body pose connections
271
+ original_image: Original input image
272
+
273
+ Returns:
274
+ List of detected hand regions [x, y, width, is_left_hand]
275
+ """
276
+ ratio_wrist_elbow = 0.33
277
+ detection_results = []
278
+
279
+ image_height, image_width = original_image.shape[0:2]
280
+
281
+ for person in subset.astype(int):
282
+ # Check if left hand keypoints exist (shoulder, elbow, wrist)
283
+ has_left_hand = np.sum(person[[5, 6, 7]] == -1) == 0
284
+ has_right_hand = np.sum(person[[2, 3, 4]] == -1) == 0
285
+
286
+ if not (has_left_hand or has_right_hand):
287
+ continue
288
+
289
+ hands = []
290
+
291
+ # Process left hand
292
+ if has_left_hand:
293
+ left_shoulder_idx, left_elbow_idx, left_wrist_idx = person[[5, 6, 7]]
294
+ x1, y1 = candidate[left_shoulder_idx][:2]
295
+ x2, y2 = candidate[left_elbow_idx][:2]
296
+ x3, y3 = candidate[left_wrist_idx][:2]
297
+ hands.append([x1, y1, x2, y2, x3, y3, True])
298
+
299
+ # Process right hand
300
+ if has_right_hand:
301
+ right_shoulder_idx, right_elbow_idx, right_wrist_idx = person[[2, 3, 4]]
302
+ x1, y1 = candidate[right_shoulder_idx][:2]
303
+ x2, y2 = candidate[right_elbow_idx][:2]
304
+ x3, y3 = candidate[right_wrist_idx][:2]
305
+ hands.append([x1, y1, x2, y2, x3, y3, False])
306
+
307
+ for x1, y1, x2, y2, x3, y3, is_left in hands:
308
+ # Calculate hand region based on wrist and elbow positions
309
+ x = x3 + ratio_wrist_elbow * (x3 - x2)
310
+ y = y3 + ratio_wrist_elbow * (y3 - y2)
311
+
312
+ distance_wrist_elbow = math.sqrt((x3 - x2) ** 2 + (y3 - y2) ** 2)
313
+ distance_elbow_shoulder = math.sqrt((x2 - x1) ** 2 + (y2 - y1) ** 2)
314
+ width = 1.5 * max(distance_wrist_elbow, 0.9 * distance_elbow_shoulder)
315
+
316
+ # Adjust to top-left corner
317
+ x -= width / 2
318
+ y -= width / 2
319
+
320
+ # Ensure bounds are within image
321
+ x = max(0, x)
322
+ y = max(0, y)
323
+
324
+ width1 = width if x + width <= image_width else image_width - x
325
+ width2 = width if y + width <= image_height else image_height - y
326
+ width = min(width1, width2)
327
+
328
+ # Only include if region is large enough
329
+ if width >= 20:
330
+ detection_results.append([int(x), int(y), int(width), is_left])
331
+
332
+ return detection_results
333
+
334
+
335
+ def render_stick_model(original_img, keypoint_circles, limb_sticks, hand_edges, hand_peaks):
336
+ """
337
+ Render complete stick model with body and hand poses
338
+
339
+ Args:
340
+ original_img: Original image
341
+ keypoint_circles: Body keypoint coordinates
342
+ limb_sticks: Body limb stick data
343
+ hand_edges: Hand connection data
344
+ hand_peaks: Hand keypoint data
345
+ """
346
+ canvas = copy.deepcopy(original_img)
347
+
348
+ colors = [
349
+ [255, 0, 0], [255, 85, 0], [255, 170, 0], [255, 255, 0], [170, 255, 0],
350
+ [85, 255, 0], [0, 255, 0], [0, 255, 85], [0, 255, 170], [0, 255, 255],
351
+ [0, 170, 255], [0, 85, 255], [0, 0, 255], [85, 0, 255], [170, 0, 255],
352
+ [255, 0, 255], [255, 0, 170], [255, 0, 85], [255,255,0], [255,255,85],
353
+ [255,255,170], [255,255,255], [170,255,255], [85,255,255], [0,255,255]
354
+ ]
355
+ stick_width = 4
356
+
357
+ # Draw body limbs
358
+ for idx, (mean_x, mean_y, angle, length) in enumerate(limb_sticks):
359
+ current_canvas = canvas.copy()
360
+ polygon = cv2.ellipse2Poly(
361
+ (int(mean_x), int(mean_y)),
362
+ (int(length / 2), stick_width),
363
+ int(angle), 0, 360, 1
364
+ )
365
+ cv2.fillConvexPoly(current_canvas, polygon, colors[idx])
366
+ canvas = cv2.addWeighted(canvas, 0.4, current_canvas, 0.6, 0)
367
+
368
+ # Draw body keypoints
369
+ for idx, (x, y) in enumerate(keypoint_circles):
370
+ cv2.circle(canvas, (int(x), int(y)), 4, colors[idx], thickness=-1)
371
+
372
+ # Draw hand poses using matplotlib
373
+ fig = Figure(figsize=plt.figaspect(canvas))
374
+ fig.subplots_adjust(0, 0, 1, 1)
375
+ ax = fig.subplots()
376
+ ax.axis('off')
377
+ ax.imshow(canvas)
378
+
379
+ edges = [
380
+ [0, 1], [1, 2], [2, 3], [3, 4], [0, 5], [5, 6], [6, 7], [7, 8], [0, 9],
381
+ [9, 10], [10, 11], [11, 12], [0, 13], [13, 14], [14, 15], [15, 16],
382
+ [0, 17], [17, 18], [18, 19], [19, 20]
383
+ ]
384
+
385
+ for hand_edge_set in hand_edges:
386
+ for (ie, (x1, y1), (x2, y2)) in hand_edge_set:
387
+ ax.plot([x1, x2], [y1, y2],
388
+ color=matplotlib.colors.hsv_to_rgb([ie/float(len(edges)), 1.0, 1.0]))
389
+
390
+ for hand_peak_set in hand_peaks:
391
+ for (x, y, text) in hand_peak_set:
392
+ ax.plot(x, y, 'r.')
393
+
394
+ # Convert figure to numpy array
395
+ bg = FigureCanvas(fig)
396
+ bg.draw()
397
+
398
+ width, height = fig.get_size_inches() * fig.get_dpi()
399
+ buf = bg.buffer_rgba()
400
+ canvas = np.frombuffer(buf, dtype=np.uint8).reshape(int(height), int(width), 4)
401
+ canvas = canvas[:, :, :3] # Keep only RGB channels
402
+
403
+ plt.close(fig) # Clean up
404
+ return cv2.resize(canvas, (math.ceil(width), math.ceil(height)))
405
+
406
+
407
+ def create_bar_plot_visualization(image, predictions, title, orig_img):
408
+ """
409
+ Create bar plot visualization below the image
410
+
411
+ Args:
412
+ image: Input image
413
+ predictions: Dictionary of prediction probabilities
414
+ title: Plot title
415
+ orig_img: Original image for sizing
416
+ """
417
+ fig, ax = plt.subplots(figsize=(orig_img.shape[1]/100, orig_img.shape[0]/200), dpi=100)
418
+ plt.title(title)
419
+
420
+ # Create bar plot data
421
+ labels = list(predictions.keys())
422
+ probabilities = list(predictions.values())
423
+
424
+ # Create seaborn bar plot
425
+ sns.barplot(x=labels, y=probabilities, ax=ax)
426
+ plt.close(fig) # Close to avoid memory leaks
427
+ fig.canvas.draw()
428
+
429
+ # Convert plot to numpy array
430
+ plot_image = np.array(fig.canvas.renderer.buffer_rgba())[:, :, :3] # Remove alpha
431
+
432
+ # Combine image and plot vertically
433
+ combined_image = np.vstack((image, cv2.resize(plot_image, (image.shape[1], plot_image.shape[0]))))
434
+
435
+ return combined_image
436
+
437
+
438
+ def add_bottom_padding(image, pad_value, pad_height):
439
+ """
440
+ Add padding to the bottom of an image
441
+
442
+ Args:
443
+ image: Input image
444
+ pad_value: Color value for padding (tuple or int)
445
+ pad_height: Height of padding to add
446
+ """
447
+ height, width, channels = image.shape
448
+ padding = np.zeros((pad_height, width, channels), dtype=image.dtype)
449
+ padding[:, :, :] = pad_value
450
+
451
+ return np.vstack((image, padding))
452
+
453
+
454
+ def find_array_maximum(array):
455
+ """
456
+ Get maximum index of 2D array
457
+
458
+ Args:
459
+ array: 2D numpy array
460
+
461
+ Returns:
462
+ tuple: (row_index, col_index) of maximum value
463
+ """
464
+ array_index = array.argmax(1)
465
+ array_value = array.max(1)
466
+ i = array_value.argmax()
467
+ j = array_index[i]
468
+ return i, j
requirements.txt CHANGED
@@ -1,3 +1,22 @@
1
- altair
2
- pandas
3
- streamlit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ opencv_python_headless
2
+ streamlit
3
+ numpy
4
+ Pillow
5
+ matplotlib==3.5.3
6
+ opencv-python
7
+ scipy
8
+ scikit-image
9
+ tqdm
10
+ pandas
11
+ torch
12
+ torchaudio
13
+ torchvision
14
+ torchtext
15
+ torchdata
16
+ av
17
+ keras
18
+ ffmpeg
19
+ ffmpeg-python
20
+ seaborn[stats]
21
+ huggingface_hub
22
+ uuid
verify_deployment.py ADDED
@@ -0,0 +1,140 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ TechMatrix Solvers ISL Translation System
4
+ Deployment Verification Script
5
+
6
+ This script verifies that all required files are present for deployment
7
+ """
8
+
9
+ import os
10
+ import sys
11
+
12
+ def verify_files():
13
+ """Verify all required files are present"""
14
+ required_files = [
15
+ 'README.md',
16
+ 'requirements.txt',
17
+ 'packages.txt',
18
+ 'app.py',
19
+ 'pose_models.py',
20
+ 'pose_utils.py',
21
+ 'isl_processor.py',
22
+ 'expression_mapping.py',
23
+ 'LICENSE',
24
+ '.gitignore',
25
+ 'categories_processed.png',
26
+ 'DataPipeline.png',
27
+ 'model-graph.png'
28
+ ]
29
+
30
+ required_dirs = [
31
+ 'eda'
32
+ ]
33
+
34
+ missing_files = []
35
+ missing_dirs = []
36
+
37
+ print("πŸ” TechMatrix Solvers ISL Translation System")
38
+ print("πŸ“‹ Deployment Verification")
39
+ print("=" * 50)
40
+
41
+ # Check files
42
+ print("\nπŸ“„ Checking required files:")
43
+ for file in required_files:
44
+ if os.path.exists(file):
45
+ print(f"βœ… {file}")
46
+ else:
47
+ print(f"❌ {file}")
48
+ missing_files.append(file)
49
+
50
+ # Check directories
51
+ print("\nπŸ“ Checking required directories:")
52
+ for dir in required_dirs:
53
+ if os.path.isdir(dir):
54
+ print(f"βœ… {dir}/")
55
+ else:
56
+ print(f"❌ {dir}/")
57
+ missing_dirs.append(dir)
58
+
59
+ # Check README content for team branding
60
+ print("\n🏷️ Checking TechMatrix Solvers branding:")
61
+ if os.path.exists('README.md'):
62
+ with open('README.md', 'r') as f:
63
+ readme_content = f.read()
64
+ if 'TechMatrix Solvers' in readme_content:
65
+ print("βœ… Team branding present in README")
66
+ else:
67
+ print("❌ Team branding missing in README")
68
+
69
+ if 'Abhay Gupta' in readme_content:
70
+ print("βœ… Team member info present")
71
+ else:
72
+ print("❌ Team member info missing")
73
+
74
+ # Check app.py for proper imports
75
+ print("\nπŸ”§ Checking main application structure:")
76
+ if os.path.exists('app.py'):
77
+ with open('app.py', 'r') as f:
78
+ app_content = f.read()
79
+ if 'streamlit' in app_content:
80
+ print("βœ… Streamlit framework detected")
81
+ if 'TechMatrix Solvers' in app_content:
82
+ print("βœ… Team branding in application")
83
+ if 'pose_models' in app_content and 'pose_utils' in app_content:
84
+ print("βœ… Core modules imported")
85
+
86
+ print("\n" + "=" * 50)
87
+
88
+ if missing_files or missing_dirs:
89
+ print("❌ Deployment verification FAILED")
90
+ if missing_files:
91
+ print(f"Missing files: {', '.join(missing_files)}")
92
+ if missing_dirs:
93
+ print(f"Missing directories: {', '.join(missing_dirs)}")
94
+ return False
95
+ else:
96
+ print("βœ… Deployment verification PASSED")
97
+ print("πŸš€ Project is ready for deployment!")
98
+ print("\nπŸ“‹ Deployment Instructions:")
99
+ print("1. Upload project to HuggingFace Spaces")
100
+ print("2. Select Streamlit SDK")
101
+ print("3. Set app_file: app.py")
102
+ print("4. The system will automatically install dependencies")
103
+ print("\nπŸ‘₯ TechMatrix Solvers Team:")
104
+ print("- Abhay Gupta (Team Lead)")
105
+ print("- Kripanshu Gupta (Backend Developer)")
106
+ print("- Dipanshu Patel (UI/UX Designer)")
107
+ print("- Bhumika Patel (Deployment & Female Presenter)")
108
+ print("\n🏫 Shri Ram Group of Institutions")
109
+ return True
110
+
111
+ def check_requirements():
112
+ """Check requirements.txt format"""
113
+ print("\nπŸ“¦ Checking dependencies:")
114
+ try:
115
+ with open('requirements.txt', 'r') as f:
116
+ requirements = f.read().strip().split('\n')
117
+ print(f"βœ… Found {len(requirements)} dependencies")
118
+
119
+ # Check for key dependencies
120
+ key_deps = ['streamlit', 'torch', 'keras', 'opencv-python', 'numpy']
121
+ for dep in key_deps:
122
+ if any(dep in req for req in requirements):
123
+ print(f"βœ… {dep} dependency found")
124
+ else:
125
+ print(f"⚠️ {dep} dependency not explicitly found")
126
+
127
+ except Exception as e:
128
+ print(f"❌ Error reading requirements.txt: {e}")
129
+
130
+ if __name__ == "__main__":
131
+ print("TechMatrix Solvers ISL Translation System")
132
+ print("Deployment Verification Tool\n")
133
+
134
+ success = verify_files()
135
+ check_requirements()
136
+
137
+ if success:
138
+ sys.exit(0)
139
+ else:
140
+ sys.exit(1)