wissemkarous commited on
Commit
6b8d8cb
·
verified ·
1 Parent(s): e50818f
Files changed (1) hide show
  1. README.md +9 -14
README.md CHANGED
@@ -6,7 +6,7 @@ license: mit
6
 
7
  ## Introduction
8
 
9
- LipCoordNet is an advanced neural network model designed for accurate lip reading by incorporating lip landmark coordinates as a supplementary input to the traditional image sequence input. This enhancement to the original LipNet architecture aims to improve the precision of sentence predictions by providing additional geometric context to the model.
10
 
11
  ## Features
12
 
@@ -51,7 +51,7 @@ LipCoordNet is an advanced neural network model designed for accurate lip readin
51
 
52
  ### Usage
53
 
54
- To train the LipCoordNet model with your dataset, first update the options.py file with the appropriate paths to your dataset and pretrained weights (comment out the weights if you want to start from scratch). Then, run the following command:
55
 
56
  ```bash
57
  python train.py
@@ -67,19 +67,19 @@ note: ffmpeg is required to convert video to image sequence and run the inferenc
67
 
68
  ## Model Architecture
69
 
70
- ![LipCoordNet model architecture](./assets/LipCoordNet_model_architecture.png)
71
 
72
  ## Training
73
 
74
- This model is built on top of the [LipNet-Pytorch](https://github.com/VIPL-Audio-Visual-Speech-Understanding/LipNet-PyTorch) project on GitHub. The training process if similar to the original LipNet model, with the addition of landmark coordinates as a supplementary input. We used the pretrained weights from the original LipNet model as a starting point for training our model, froze the weights for the original LipNet layers, and trained the new layers for the landmark coordinates.
75
 
76
- The dataset used to train this model is the [EGCLLC dataset](https://huggingface.co/datasets/SilentSpeak/EGCLLC). The dataset is not included in this repository, but can be downloaded from the link above.
77
 
78
- Total training time: 2 days
79
  Total epochs: 51
80
- Training hardware: NVIDIA GeForce RTX 3080 12GB
81
 
82
- ![LipCoordNet training curves](./assets/training_graphs.png)
83
 
84
  For an interactive view of the training curves, please refer to the tensorboard logs in the `runs` directory.
85
  Use this command to view the logs:
@@ -98,12 +98,7 @@ This project is licensed under the MIT License.
98
 
99
  ## Acknowledgments
100
 
101
- This model, LipCoordNet, has been developed with reference to the LipNet-PyTorch implementation available at [VIPL-Audio-Visual-Speech-Understanding](https://github.com/VIPL-Audio-Visual-Speech-Understanding/LipNet-PyTorch). We extend our gratitude to the contributors of this repository for providing a solid foundation and insightful examples that greatly facilitated the development of our enhanced lip reading model. Their work has been instrumental in advancing the field of audio-visual speech understanding and has provided the community with valuable resources to build upon.
102
-
103
- Alvarez Casado, C., Bordallo Lopez, M. Real-time face alignment: evaluation methods, training strategies and implementation optimization. Springer Journal of Real-time image processing, 2021
104
-
105
- Assael, Y., Shillingford, B., Whiteson, S., & Freitas, N. (2017). LipNet: End-to-End Sentence-level Lipreading. GPU Technology Conference.
106
-
107
  ## Contact
108
 
109
  Project Link: https://github.com/ffeew/LipCoordNet
 
6
 
7
  ## Introduction
8
 
9
+ Lipreading is an advanced neural network model designed for accurate lip reading by incorporating lip landmark coordinates as a supplementary input to the traditional image sequence input. This enhancement to the original LipNet architecture aims to improve the precision of sentence predictions by providing additional geometric context to the model.
10
 
11
  ## Features
12
 
 
51
 
52
  ### Usage
53
 
54
+ To train the LipReading model with your dataset, first update the options.py file with the appropriate paths to your dataset and pretrained weights (comment out the weights if you want to start from scratch). Then, run the following command:
55
 
56
  ```bash
57
  python train.py
 
67
 
68
  ## Model Architecture
69
 
70
+ ![LipReading model architecture](./assets/LipCoordNet_model_architecture.png)
71
 
72
  ## Training
73
 
74
+ This model is built on top of the [LipReading](https://github.com/wissemkarous/Lip-reading-Final-Year-Project) project on GitHub. The training process if similar to the original LipNet model, with the addition of landmark coordinates as a supplementary input. We used the pretrained weights from the original LipNet model as a starting point for training our model, froze the weights for the original LipNet layers, and trained the new layers for the landmark coordinates.
75
 
76
+ The dataset used to train this model is the [Lipreading dataset](https://huggingface.co/datasets/wissemkarous/lipreading). The dataset is not included in this repository, but can be downloaded from the link above.
77
 
78
+ Total training time: 12 h
79
  Total epochs: 51
80
+ Training hardware: NVIDIA GeForce RTX 3050 6GB
81
 
82
+ ![LipReading training curves](./assets/training_graphs.png)
83
 
84
  For an interactive view of the training curves, please refer to the tensorboard logs in the `runs` directory.
85
  Use this command to view the logs:
 
98
 
99
  ## Acknowledgments
100
 
101
+ This model, LipReading, has been developed for academic purposes as a final year project. Special thanks to everyone who provided assistance and all references
 
 
 
 
 
102
  ## Contact
103
 
104
  Project Link: https://github.com/ffeew/LipCoordNet