dataset_size: 77 loss: CosineSimilarityLoss base_model: sentence-transformers/all-MiniLM-L6-v2 pipeline_tag: sentence-similarity library_name: sentence-transformers πŸ‡΅πŸ‡° Bilingual Roman Urdu + English Sentence Embedder A lightweight Sentence Transformer model designed for Roman Urdu + English semantic understanding, optimized for:

πŸ” Semantic Search πŸ’¬ Chatbots / FAQ Retrieval 🏠 Real Estate Query Matching 🌐 Cross-lingual Similarity (Roman Urdu ↔ English)

This model maps sentences into a 384-dimensional dense vector space.

πŸš€ Quick Usage
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("embedingHG/bilingual-roman-urdu-embedder")

# Roman Urdu
emb1 = model.encode(["yeh ghar kitne ka hai"])

# English
emb2 = model.encode(["what is the price"])

tags: - sentence-transformers - sentence-similarity - feature-extraction - generated_from_trainer - dataset_size:77 - loss:CosineSimilarityLoss base_model: sentence-transformers/all-MiniLM-L6-v2 widget: - source_sentence: plot size please sentences: - land area - is this available - hello - source_sentence: area size sentences: - it is cheap - location kahan hay - total square footage - source_sentence: parking available sentences: - property price - location kia hay - hi - source_sentence: available hay kia sentences: - kia yeh khali hay - how many bedrooms - thanks - source_sentence: location kia hay sentences: - thanks a lot - how many square feet - yeh ghar kis area main hay pipeline_tag: sentence-similarity library_name: sentence-transformers


SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for retrieval.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity
  • Supported Modality: Text

Model Sources

Full Model Architecture

SentenceTransformer( (0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'BertModel'}) (1): Pooling({'embedding_dimension': 384, 'pooling_mode': 'mean', 'include_prompt': True}) (2): Normalize({}) )

Usage Direct Usage (Sentence Transformers) First install the Sentence Transformers library:

pip install -U sentence-transformers
#Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer

# Download from the πŸ€— Hub
model = SentenceTransformer("sentence_transformers_model")
# Run inference
sentences = [
    'location kia hay',
    'yeh ghar kis area main hay',
    'how many square feet',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.8117, 0.5783],
#         [0.8117, 1.0000, 0.7344],
#         [0.5783, 0.7344, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 77 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 77 samples:
    sentence_0 sentence_1 label
    type string string float
    details
    • min: 3 tokens
    • mean: 6.03 tokens
    • max: 11 tokens
    • min: 3 tokens
    • mean: 5.75 tokens
    • max: 12 tokens
    • min: 0.08
    • mean: 0.84
    • max: 0.95
  • Samples:
    sentence_0 sentence_1 label
    kitne bedrooms hain bedrooms ki tadaad kia hay 0.87
    what is the total area how many square feet 0.92
    yeh property kahan hay address batao 0.85
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss",
        "cos_score_transformation": "torch.nn.modules.linear.Identity"
    }
    

Training Hyperparameters Non-Default Hyperparameters per_device_train_batch_size: 16 num_train_epochs: 50 per_device_eval_batch_size: 16 multi_dataset_batch_sampler: round_robin All Hyperparameters Click to expand Training Time Training: 2.1 minutes Framework Versions Python: 3.12.3 Sentence Transformers: 5.4.1 Transformers: 5.5.4 PyTorch: 2.11.0+cpu Accelerate: 1.13.0 Datasets: 4.8.4 Tokenizers: 0.22.2 Citation BibTeX Sentence Transformers @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", }

Downloads last month
120
Safetensors
Model size
22.7M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for embedingHF/bilingual-roman-urdu-embedder

Finetunes
1 model

Paper for embedingHF/bilingual-roman-urdu-embedder