ASL-MoViNet-T5-translator

Sleeping

deanna-emery commited on Dec 12, 2023

Commit

31d4007

1 Parent(s): d2377c0

updates

Files changed (1) hide show

app.py CHANGED Viewed

@@ -55,7 +55,7 @@ def preprocess(filename, max_frames=0, resize=(224,224)):
     video = np.expand_dims(video, axis=0)
     return video
-def translate(video_file, text):
     video = preprocess(video_file, max_frames=0, resize=(224,224))
@@ -82,7 +82,7 @@ This application surfaces a model for translation of American Sign Language (ASL
 which comprises of a fine-tuned MoViNets CNN model and a T5 encoder-decoder model
 to generate translations from the video embeddings. This model architecture achieves a BLEU score of 1.98
 and an average cosine similarity score of 0.21 when trained and evaluated on the YouTube-ASL dataset.
-More information about the models can be found in our GitHub repository <a href=https://github.com/deanna-emery/ASL-Translator>here</a>.
 A limitation of this architecture is the size of the MoViNets model, making it especially slow during inference on a CPU.
 We do not recommend uploading videos longer than 4 seconds as the video embedding generation may take some time.
@@ -108,7 +108,7 @@ article =  """The captions for the example videos are as follows in order: \n
 # Gradio App interface
 gr.Interface(fn=translate,
-              inputs=gr.Video(label='Video', show_label=True, max_length=10),
               outputs="text",
               allow_flagging="never",
               title=title,

     video = np.expand_dims(video, axis=0)
     return video
+def translate(video_file, text=None):
     video = preprocess(video_file, max_frames=0, resize=(224,224))
 which comprises of a fine-tuned MoViNets CNN model and a T5 encoder-decoder model
 to generate translations from the video embeddings. This model architecture achieves a BLEU score of 1.98
 and an average cosine similarity score of 0.21 when trained and evaluated on the YouTube-ASL dataset.
+More information about the model training and instructions to download the models can be found in our GitHub repository <a href=https://github.com/deanna-emery/ASL-Translator>here</a>.
 A limitation of this architecture is the size of the MoViNets model, making it especially slow during inference on a CPU.
 We do not recommend uploading videos longer than 4 seconds as the video embedding generation may take some time.
 # Gradio App interface
 gr.Interface(fn=translate,
+              inputs=[gr.Video(label='Video', show_label=True, max_length=10, sources='upload'), 'text'],
               outputs="text",
               allow_flagging="never",
               title=title,