ishara / README.md

Update README.md

83dd46f verified about 1 year ago

4.12 kB

	---
	license: apache-2.0
	---

	## Ishara: ASL Fingerspelling Recognition

	Ishara is a deep learning model designed for accurate recognition of American Sign Language (ASL) fingerspelling. It is based on a hybrid architecture that combines Squeezeformer and Conformer blocks with Conv1D layers for efficient feature extraction from hand, face, and pose landmark data.

	This model is a submission to the Google ASLFR Competition and achieves robust performance on character-level prediction tasks.

	---

	## Model Description

	Ishara processes sequences of normalized hand, face, and pose landmarks to predict fingerspelled words at the character level. The architecture is designed to handle temporal variability and missing data using a combination of:

	- Squeezeformer blocks: For efficient sequence modeling.
	- Conformer blocks: For enhanced feature extraction.
	- Conv1D layers: For initial temporal feature extraction.

	The output predictions are character-level sequences optimized using Connectionist Temporal Classification (CTC) loss.

	---

	## Dataset

	The model was trained and evaluated on the dataset provided by the [Google ASLFR Competition](https://www.kaggle.com/competitions/asl-fingerspelling), which consists of:

	- Hand landmarks: 21 points each for left and right hands.
	- Face landmarks: 40 key points.
	- Pose landmarks: 10 key points.
	- Labels: Text sequences representing fingerspelled words.

	---

	## Usage

	### Inference with TFLite

	The model is available in TensorFlow Lite format for real-time inference. To use the model:

	```python
	import tensorflow as tf

	# Load the TFLite model
	interpreter = tf.lite.Interpreter("model.tflite")
	interpreter.allocate_tensors()

	# Define input-output
	input_details = interpreter.get_input_details()
	output_details = interpreter.get_output_details()

	# Input a sequence of landmarks
	input_data = ... # Preprocessed input sequence
	interpreter.set_tensor(input_details[0]['index'], input_data)
	interpreter.invoke()

	# Get the prediction
	output_data = interpreter.get_tensor(output_details[0]['index'])
	print("Predicted Sequence:", output_data)
	```

	---

	### Training Workflow

	You can replicate the training process using TensorFlow. The training loop is as follows:

	```python
	from model import get_model

	# Define the model
	model = get_model(
	dim=256,
	num_conv_squeeze_blocks=2,
	num_conv_conform_blocks=2,
	kernel_sizes=[11, 5, 3],
	num_conv_per_block=3,
	dropout_rate=0.2
	)

	# Train the model
	history = model.fit(
	train_dataset,
	validation_data=val_dataset,
	epochs=N_EPOCHS,
	callbacks=[validation_callback, lr_callback, WeightDecayCallback()]
	)
	```

	---

	## Model Evaluation

	The model's performance is evaluated using:

	- Levenshtein Distance: Measures character-level accuracy.
	- Normalized Character Error Rate (CER): Quantifies the model's robustness.
	- Real-Time Inference Speed: Assessed on 1080p video inputs.

	---

	## Results

	- Normalised Levenshtein Distance: [0.728]
	- Inference Speed: [200ms]
	- Model Size: [17.9 Mb]

	---

	## Deployment

	The model is optimized for deployment in real-time systems using TensorFlow Lite. This makes it suitable for integration into mobile and embedded systems for ASL recognition tasks.

	---

	## License

	This model is released under the [Apache License 2.0](http://www.apache.org/licenses/LICENSE-2.0).

	---

	## Acknowledgments

	- Google ASLFR Competition: For providing the dataset.
	- TensorFlow Team: For the deep learning framework.
	- Paper Authors: For inspiring the architecture.
	- [Squeezeformer](https://arxiv.org/abs/2206.00888)
	- [Conformer](https://arxiv.org/abs/2005.08100)

	---

	## Citation

	If you use this model, please consider citing:

	```
	@misc{ishara_asl,
	title={Ishara: ASL Fingerspelling Recognition},
	author={Niharika Gupta, Tanay Srinivasa, Tanmay Nanda, Zoya Ghoshal},
	year={2024},
	howpublished={\url{https://huggingface.co/ishara-asl}}
	}
	```

	---

	## Contact

	For questions or collaboration, feel free to reach out:

	- Tanmay Nanda: tanmaynanda360@gmail.com