dataset_size: 77 loss: CosineSimilarityLoss base_model: sentence-transformers/all-MiniLM-L6-v2 pipeline_tag: sentence-similarity library_name: sentence-transformers 🇵🇰 Bilingual Roman Urdu + English Sentence Embedder A lightweight Sentence Transformer model designed for Roman Urdu + English semantic understanding, optimized for:

🔍 Semantic Search 💬 Chatbots / FAQ Retrieval 🏠 Real Estate Query Matching 🌐 Cross-lingual Similarity (Roman Urdu ↔ English)

This model maps sentences into a 384-dimensional dense vector space.

🚀 Quick Usage
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("embedingHG/bilingual-roman-urdu-embedder")

# Roman Urdu
emb1 = model.encode(["yeh ghar kitne ka hai"])

# English
emb2 = model.encode(["what is the price"])

tags: - sentence-transformers - sentence-similarity - feature-extraction - generated_from_trainer - dataset_size:77 - loss:CosineSimilarityLoss base_model: sentence-transformers/all-MiniLM-L6-v2 widget: - source_sentence: plot size please sentences: - land area - is this available - hello - source_sentence: area size sentences: - it is cheap - location kahan hay - total square footage - source_sentence: parking available sentences: - property price - location kia hay - hi - source_sentence: available hay kia sentences: - kia yeh khali hay - how many bedrooms - thanks - source_sentence: location kia hay sentences: - thanks a lot - how many square feet - yeh ghar kis area main hay pipeline_tag: sentence-similarity library_name: sentence-transformers

SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for retrieval.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: sentence-transformers/all-MiniLM-L6-v2
Maximum Sequence Length: 256 tokens
Output Dimensionality: 384 dimensions
Similarity Function: Cosine Similarity
Supported Modality: Text

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer( (0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'BertModel'}) (1): Pooling({'embedding_dimension': 384, 'pooling_mode': 'mean', 'include_prompt': True}) (2): Normalize({}) )

Usage Direct Usage (Sentence Transformers) First install the Sentence Transformers library:

pip install -U sentence-transformers
#Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model")
# Run inference
sentences = [
    'location kia hay',
    'yeh ghar kis area main hay',
    'how many square feet',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.8117, 0.5783],
#         [0.8117, 1.0000, 0.7344],
#         [0.5783, 0.7344, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

Size: 77 training samples
Columns: sentence_0, sentence_1, and label
Approximate statistics based on the first 77 samples:
sentence_0 sentence_1 label
type string string float
details
min: 3 tokens
mean: 6.03 tokens
max: 11 tokens

min: 3 tokens
mean: 5.75 tokens
max: 12 tokens

min: 0.08
mean: 0.84
max: 0.95
Samples:

sentence_0 sentence_1 label

kitne bedrooms hain bedrooms ki tadaad kia hay 0.87

what is the total area how many square feet 0.92

yeh property kahan hay address batao 0.85

	sentence_0	sentence_1	label
type	string	string	float
details	min: 3 tokens mean: 6.03 tokens max: 11 tokens	min: 3 tokens mean: 5.75 tokens max: 12 tokens	min: 0.08 mean: 0.84 max: 0.95

sentence_0	sentence_1	label
`kitne bedrooms hain`	`bedrooms ki tadaad kia hay`	`0.87`
`what is the total area`	`how many square feet`	`0.92`
`yeh property kahan hay`	`address batao`	`0.85`

Loss: CosineSimilarityLoss with these parameters:

{
    "loss_fct": "torch.nn.modules.loss.MSELoss",
    "cos_score_transformation": "torch.nn.modules.linear.Identity"
}

Training Hyperparameters Non-Default Hyperparameters per_device_train_batch_size: 16 num_train_epochs: 50 per_device_eval_batch_size: 16 multi_dataset_batch_sampler: round_robin All Hyperparameters Click to expand Training Time Training: 2.1 minutes Framework Versions Python: 3.12.3 Sentence Transformers: 5.4.1 Transformers: 5.5.4 PyTorch: 2.11.0+cpu Accelerate: 1.13.0 Datasets: 4.8.4 Tokenizers: 0.22.2 Citation BibTeX Sentence Transformers @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", }

Downloads last month: 120

Safetensors

Model size

22.7M params

Tensor type

F32

Model tree for embedingHF/bilingual-roman-urdu-embedder

Finetunes

1 model

Paper for embedingHF/bilingual-roman-urdu-embedder

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Paper • 1908.10084 • Published Aug 27, 2019 • 13