Generate multimodal embeddings and find similar content
(Using BERT + OCEMOTION dataset to determine the other party