metga97
/

Modern-EgyBert-Embedding

Feature Extraction

text-embeddings-inference

Model card Files Files and versions

Modern-EgyBert-Embedding / README.md

metga97's picture

Update README.md

2245203 verified 8 months ago

|

history blame contribute delete

1.62 kB

	---
	library_name: transformers
	tags: []
	---

	# Model Card for Model ID

	<!-- Provide a quick summary of what the model is/does. -->



	## Model Details

	### Model Description

	- Developed by: Mohammad Essam ([metga97](https://huggingface.co/metga97))
	- Model type: BERT-style encoder
	- Language(s): Arabic (MSA + Egyptian dialect)
	- License: MIT
	- Finetuned from model: [`metga97/Modern-EgyBert-Base`](https://huggingface.co/metga97/Modern-EgyBert-Base)


	## Uses

	This model is intended to be used for generating sentence embeddings for downstream tasks:

	- Sentence similarity
	- Semantic retrieval
	- Clustering of Arabic sentences
	- Intent classification
	- Duplicate detection


	## How to Get Started with the Model

	Use the code below to get started with the model.

	```python
	from transformers import AutoTokenizer, AutoModel
	import torch

	tokenizer = AutoTokenizer.from_pretrained("metga97/Modern-EgyBert-Embedding")
	model = AutoModel.from_pretrained("metga97/Modern-EgyBert-Embedding")

	text = ["الجو النهارده جميل"]
	inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)

	with torch.no_grad():
	outputs = model(**inputs)
	last_hidden = outputs.last_hidden_state

	# Mean Pooling
	attention_mask = inputs["attention_mask"]
	input_mask_expanded = attention_mask.unsqueeze(-1).expand(last_hidden.size()).float()
	sum_embeddings = torch.sum(last_hidden * input_mask_expanded, dim=1)
	sum_mask = torch.clamp(input_mask_expanded.sum(1), min=1e-9)
	sentence_embedding = sum_embeddings / sum_mask

	print(sentence_embedding.shape) # torch.Size([1, 768])
	```