Generate multimodal embeddings and retrieve similar content
(Using BERT + OCEMOTION dataset to determine the other party