dataset_size: 77 loss: CosineSimilarityLoss base_model: sentence-transformers/all-MiniLM-L6-v2 pipeline_tag: sentence-similarity library_name: sentence-transformers π΅π° Bilingual Roman Urdu + English Sentence Embedder A lightweight Sentence Transformer model designed for Roman Urdu + English semantic understanding, optimized for:
π Semantic Search π¬ Chatbots / FAQ Retrieval π Real Estate Query Matching π Cross-lingual Similarity (Roman Urdu β English)
This model maps sentences into a 384-dimensional dense vector space.
π Quick Usage
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("embedingHG/bilingual-roman-urdu-embedder")
# Roman Urdu
emb1 = model.encode(["yeh ghar kitne ka hai"])
# English
emb2 = model.encode(["what is the price"])
tags: - sentence-transformers - sentence-similarity - feature-extraction - generated_from_trainer - dataset_size:77 - loss:CosineSimilarityLoss base_model: sentence-transformers/all-MiniLM-L6-v2 widget: - source_sentence: plot size please sentences: - land area - is this available - hello - source_sentence: area size sentences: - it is cheap - location kahan hay - total square footage - source_sentence: parking available sentences: - property price - location kia hay - hi - source_sentence: available hay kia sentences: - kia yeh khali hay - how many bedrooms - thanks - source_sentence: location kia hay sentences: - thanks a lot - how many square feet - yeh ghar kis area main hay pipeline_tag: sentence-similarity library_name: sentence-transformers
SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for retrieval.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: sentence-transformers/all-MiniLM-L6-v2
- Maximum Sequence Length: 256 tokens
- Output Dimensionality: 384 dimensions
- Similarity Function: Cosine Similarity
- Supported Modality: Text
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer( (0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'BertModel'}) (1): Pooling({'embedding_dimension': 384, 'pooling_mode': 'mean', 'include_prompt': True}) (2): Normalize({}) )
Usage Direct Usage (Sentence Transformers) First install the Sentence Transformers library:
pip install -U sentence-transformers
#Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the π€ Hub
model = SentenceTransformer("sentence_transformers_model")
# Run inference
sentences = [
'location kia hay',
'yeh ghar kis area main hay',
'how many square feet',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.8117, 0.5783],
# [0.8117, 1.0000, 0.7344],
# [0.5783, 0.7344, 1.0000]])
Training Details
Training Dataset
Unnamed Dataset
- Size: 77 training samples
- Columns:
sentence_0,sentence_1, andlabel - Approximate statistics based on the first 77 samples:
sentence_0 sentence_1 label type string string float details - min: 3 tokens
- mean: 6.03 tokens
- max: 11 tokens
- min: 3 tokens
- mean: 5.75 tokens
- max: 12 tokens
- min: 0.08
- mean: 0.84
- max: 0.95
- Samples:
sentence_0 sentence_1 label kitne bedrooms hainbedrooms ki tadaad kia hay0.87what is the total areahow many square feet0.92yeh property kahan hayaddress batao0.85 - Loss:
CosineSimilarityLosswith these parameters:{ "loss_fct": "torch.nn.modules.loss.MSELoss", "cos_score_transformation": "torch.nn.modules.linear.Identity" }
Training Hyperparameters Non-Default Hyperparameters per_device_train_batch_size: 16 num_train_epochs: 50 per_device_eval_batch_size: 16 multi_dataset_batch_sampler: round_robin All Hyperparameters Click to expand Training Time Training: 2.1 minutes Framework Versions Python: 3.12.3 Sentence Transformers: 5.4.1 Transformers: 5.5.4 PyTorch: 2.11.0+cpu Accelerate: 1.13.0 Datasets: 4.8.4 Tokenizers: 0.22.2 Citation BibTeX Sentence Transformers @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", }
- Downloads last month
- 120