LiteRT
TanmayNanda commited on
Commit
593a0d0
·
verified ·
1 Parent(s): 3f7b07a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +153 -3
README.md CHANGED
@@ -1,3 +1,153 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ # README for Hugging Face Model Card: Ishara - ASL Fingerspelling Recognition
5
+
6
+ ## Ishara: ASL Fingerspelling Recognition
7
+
8
+ Ishara is a deep learning model designed for accurate recognition of American Sign Language (ASL) fingerspelling. It is based on a hybrid architecture that combines **Squeezeformer** and **Conformer** blocks with **Conv1D layers** for efficient feature extraction from hand, face, and pose landmark data.
9
+
10
+ This model is a submission to the Google ASLFR Competition and achieves robust performance on character-level prediction tasks.
11
+
12
+ ---
13
+
14
+ ## Model Description
15
+
16
+ Ishara processes sequences of normalized hand, face, and pose landmarks to predict fingerspelled words at the character level. The architecture is designed to handle temporal variability and missing data using a combination of:
17
+
18
+ - **Squeezeformer blocks**: For efficient sequence modeling.
19
+ - **Conformer blocks**: For enhanced feature extraction.
20
+ - **Conv1D layers**: For initial temporal feature extraction.
21
+
22
+ The output predictions are character-level sequences optimized using **Connectionist Temporal Classification (CTC)** loss.
23
+
24
+ ---
25
+
26
+ ## Dataset
27
+
28
+ The model was trained and evaluated on the dataset provided by the [Google ASLFR Competition](https://www.kaggle.com/competitions/asl-fingerspelling), which consists of:
29
+
30
+ - **Hand landmarks**: 21 points each for left and right hands.
31
+ - **Face landmarks**: 40 key points.
32
+ - **Pose landmarks**: 10 key points.
33
+ - **Labels**: Text sequences representing fingerspelled words.
34
+
35
+ ---
36
+
37
+ ## Usage
38
+
39
+ ### Inference with TFLite
40
+
41
+ The model is available in TensorFlow Lite format for real-time inference. To use the model:
42
+
43
+ ```python
44
+ import tensorflow as tf
45
+
46
+ # Load the TFLite model
47
+ interpreter = tf.lite.Interpreter("model.tflite")
48
+ interpreter.allocate_tensors()
49
+
50
+ # Define input-output
51
+ input_details = interpreter.get_input_details()
52
+ output_details = interpreter.get_output_details()
53
+
54
+ # Input a sequence of landmarks
55
+ input_data = ... # Preprocessed input sequence
56
+ interpreter.set_tensor(input_details[0]['index'], input_data)
57
+ interpreter.invoke()
58
+
59
+ # Get the prediction
60
+ output_data = interpreter.get_tensor(output_details[0]['index'])
61
+ print("Predicted Sequence:", output_data)
62
+ ```
63
+
64
+ ---
65
+
66
+ ### Training Workflow
67
+
68
+ You can replicate the training process using TensorFlow. The training loop is as follows:
69
+
70
+ ```python
71
+ from model import get_model
72
+
73
+ # Define the model
74
+ model = get_model(
75
+ dim=256,
76
+ num_conv_squeeze_blocks=2,
77
+ num_conv_conform_blocks=2,
78
+ kernel_sizes=[11, 5, 3],
79
+ num_conv_per_block=3,
80
+ dropout_rate=0.2
81
+ )
82
+
83
+ # Train the model
84
+ history = model.fit(
85
+ train_dataset,
86
+ validation_data=val_dataset,
87
+ epochs=N_EPOCHS,
88
+ callbacks=[validation_callback, lr_callback, WeightDecayCallback()]
89
+ )
90
+ ```
91
+
92
+ ---
93
+
94
+ ## Model Evaluation
95
+
96
+ The model's performance is evaluated using:
97
+
98
+ - **Levenshtein Distance**: Measures character-level accuracy.
99
+ - **Normalized Character Error Rate (CER)**: Quantifies the model's robustness.
100
+ - **Real-Time Inference Speed**: Assessed on 1080p video inputs.
101
+
102
+ ---
103
+
104
+ ## Results
105
+
106
+ - **Validation Accuracy**: [To be updated]
107
+ - **Inference Speed**: [To be updated]
108
+ - **Model Size**: [To be updated]
109
+
110
+ ---
111
+
112
+ ## Deployment
113
+
114
+ The model is optimized for deployment in real-time systems using TensorFlow Lite. This makes it suitable for integration into mobile and embedded systems for ASL recognition tasks.
115
+
116
+ ---
117
+
118
+ ## License
119
+
120
+ This model is released under the [Apache License 2.0](http://www.apache.org/licenses/LICENSE-2.0).
121
+
122
+ ---
123
+
124
+ ## Acknowledgments
125
+
126
+ - **Google ASLFR Competition**: For providing the dataset.
127
+ - **TensorFlow Team**: For the deep learning framework.
128
+ - **Paper Authors**: For inspiring the architecture.
129
+ - [Squeezeformer](https://arxiv.org/abs/2206.00888)
130
+ - [Conformer](https://arxiv.org/abs/2005.08100)
131
+
132
+ ---
133
+
134
+ ## Citation
135
+
136
+ If you use this model, please consider citing:
137
+
138
+ ```
139
+ @misc{ishara_asl,
140
+ title={Ishara: ASL Fingerspelling Recognition},
141
+ author={Niharika Gupta, Tanay Srinivasa, Tanmay Nanda, Zoya Ghoshal},
142
+ year={2024},
143
+ howpublished={\url{https://huggingface.co/ishara-asl}}
144
+ }
145
+ ```
146
+
147
+ ---
148
+
149
+ ## Contact
150
+
151
+ For questions or collaboration, feel free to reach out:
152
+
153
+ - **Tanmay Nanda**: tanmaynanda360@gmail.com