--- library_name: transformers tags: [] --- # Model Card for Model ID ## Model Details ### Model Description - **Developed by:** Mohammad Essam ([metga97](https://huggingface.co/metga97)) - **Model type:** BERT-style encoder - **Language(s):** Arabic (MSA + Egyptian dialect) - **License:** MIT - **Finetuned from model:** [`metga97/Modern-EgyBert-Base`](https://huggingface.co/metga97/Modern-EgyBert-Base) ## Uses This model is intended to be used for generating sentence embeddings for downstream tasks: - Sentence similarity - Semantic retrieval - Clustering of Arabic sentences - Intent classification - Duplicate detection ## How to Get Started with the Model Use the code below to get started with the model. ```python from transformers import AutoTokenizer, AutoModel import torch tokenizer = AutoTokenizer.from_pretrained("metga97/Modern-EgyBert-Embedding") model = AutoModel.from_pretrained("metga97/Modern-EgyBert-Embedding") text = ["الجو النهارده جميل"] inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True) with torch.no_grad(): outputs = model(**inputs) last_hidden = outputs.last_hidden_state # Mean Pooling attention_mask = inputs["attention_mask"] input_mask_expanded = attention_mask.unsqueeze(-1).expand(last_hidden.size()).float() sum_embeddings = torch.sum(last_hidden * input_mask_expanded, dim=1) sum_mask = torch.clamp(input_mask_expanded.sum(1), min=1e-9) sentence_embedding = sum_embeddings / sum_mask print(sentence_embedding.shape) # torch.Size([1, 768]) ```