TanmayNanda
/

ishara

LiteRT

Model card Files Files and versions

xet

Community

TanmayNanda commited on Jan 10, 2025

Commit

593a0d0

verified ·

1 Parent(s): 3f7b07a

Update README.md

Browse files

Files changed (1) hide show

README.md +153 -3

README.md CHANGED Viewed

@@ -1,3 +1,153 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+---
+# README for Hugging Face Model Card: Ishara - ASL Fingerspelling Recognition
+## Ishara: ASL Fingerspelling Recognition
+Ishara is a deep learning model designed for accurate recognition of American Sign Language (ASL) fingerspelling. It is based on a hybrid architecture that combines **Squeezeformer** and **Conformer** blocks with **Conv1D layers** for efficient feature extraction from hand, face, and pose landmark data.
+This model is a submission to the Google ASLFR Competition and achieves robust performance on character-level prediction tasks.
+---
+## Model Description
+Ishara processes sequences of normalized hand, face, and pose landmarks to predict fingerspelled words at the character level. The architecture is designed to handle temporal variability and missing data using a combination of:
+- **Squeezeformer blocks**: For efficient sequence modeling.
+- **Conformer blocks**: For enhanced feature extraction.
+- **Conv1D layers**: For initial temporal feature extraction.
+The output predictions are character-level sequences optimized using **Connectionist Temporal Classification (CTC)** loss.
+---
+## Dataset
+The model was trained and evaluated on the dataset provided by the [Google ASLFR Competition](https://www.kaggle.com/competitions/asl-fingerspelling), which consists of:
+- **Hand landmarks**: 21 points each for left and right hands.
+- **Face landmarks**: 40 key points.
+- **Pose landmarks**: 10 key points.
+- **Labels**: Text sequences representing fingerspelled words.
+---
+## Usage
+### Inference with TFLite
+The model is available in TensorFlow Lite format for real-time inference. To use the model:
+```python
+import tensorflow as tf
+# Load the TFLite model
+interpreter = tf.lite.Interpreter("model.tflite")
+interpreter.allocate_tensors()
+# Define input-output
+input_details = interpreter.get_input_details()
+output_details = interpreter.get_output_details()
+# Input a sequence of landmarks
+input_data = ... # Preprocessed input sequence
+interpreter.set_tensor(input_details[0]['index'], input_data)
+interpreter.invoke()
+# Get the prediction
+output_data = interpreter.get_tensor(output_details[0]['index'])
+print("Predicted Sequence:", output_data)
+```
+---
+### Training Workflow
+You can replicate the training process using TensorFlow. The training loop is as follows:
+```python
+from model import get_model
+# Define the model
+model = get_model(
+    dim=256,
+    num_conv_squeeze_blocks=2,
+    num_conv_conform_blocks=2,
+    kernel_sizes=[11, 5, 3],
+    num_conv_per_block=3,
+    dropout_rate=0.2
+)
+# Train the model
+history = model.fit(
+    train_dataset,
+    validation_data=val_dataset,
+    epochs=N_EPOCHS,
+    callbacks=[validation_callback, lr_callback, WeightDecayCallback()]
+)
+```
+---
+## Model Evaluation
+The model's performance is evaluated using:
+- **Levenshtein Distance**: Measures character-level accuracy.
+- **Normalized Character Error Rate (CER)**: Quantifies the model's robustness.
+- **Real-Time Inference Speed**: Assessed on 1080p video inputs.
+---
+## Results
+- **Validation Accuracy**: [To be updated]
+- **Inference Speed**: [To be updated]
+- **Model Size**: [To be updated]
+---
+## Deployment
+The model is optimized for deployment in real-time systems using TensorFlow Lite. This makes it suitable for integration into mobile and embedded systems for ASL recognition tasks.
+---
+## License
+This model is released under the [Apache License 2.0](http://www.apache.org/licenses/LICENSE-2.0).
+---
+## Acknowledgments
+- **Google ASLFR Competition**: For providing the dataset.
+- **TensorFlow Team**: For the deep learning framework.
+- **Paper Authors**: For inspiring the architecture.
+  - [Squeezeformer](https://arxiv.org/abs/2206.00888)
+  - [Conformer](https://arxiv.org/abs/2005.08100)
+---
+## Citation
+If you use this model, please consider citing:
+```
+@misc{ishara_asl,
+  title={Ishara: ASL Fingerspelling Recognition},
+  author={Niharika Gupta, Tanay Srinivasa, Tanmay Nanda, Zoya Ghoshal},
+  year={2024},
+  howpublished={\url{https://huggingface.co/ishara-asl}}
+}
+```
+---
+## Contact
+For questions or collaboration, feel free to reach out:
+- **Tanmay Nanda**: tanmaynanda360@gmail.com