ekacare
/

parrotlet-e

Feature Extraction

text-embeddings-inference

Model card Files Files and versions

parrotlet-e / README.md

ds-EkaCare's picture

Update README.md

32eb16c verified about 2 months ago

|

history blame contribute delete

3.28 kB

	---
	license:
	- cc-by-sa-4.0
	base_model:
	- BAAI/bge-m3
	library_name: transformers
	tags:
	- transformers
	- embedding
	- indic
	---
	# Parrotlet-e: Indic Medical Embedding Model

	Parrotlet-e is a state of the art multilingual medical embedding model designed for understanding and linking medical terms across Indian languages. It is optimised for entity-level representation of clinical concepts such as symptoms, diagnoses, and anatomical structures — enabling accurate medical coding, semantic search, and cross-lingual retrieval in healthcare applications.


	The model is fine-tuned from bge-m3 using weakly supervised contrastive learning with Multi-Similarity Loss on over 18 million multilingual medical term pairs aligned with SNOMED CT and UMLS. It supports both native and romanized scripts across 12 Indic languages and English, and is robust to abbreviations, spelling variations, and colloquial expressions commonly found in clinical documentation.

	Indic Languages support:
	- Hindi
	- Kannada
	- Marathi
	- Malayalam
	- Tamil
	- Telugu
	- Odia
	- Assamese
	- Bengali
	- Urdu
	- Gujarati
	- Punjabi

	## Loading the model from Hugging Face Hub

	```python
	from transformers import AutoTokenizer, AutoModel
	import torch

	# Load model and tokenizer
	model_name = "ekacare/parrotlet-e"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModel.from_pretrained(model_name)

	# Sample medical terms (can be in any supported language)
	texts = [
	"diabetes mellitus",
	"मधुमेह",
	"sugar problem"
	]

	# Tokenize input
	inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")

	# Get model outputs
	with torch.no_grad():
	outputs = model(**inputs)
	embeddings = outputs.last_hidden_state

	# Mean pooling
	attention_mask = inputs['attention_mask']
	embeddings = (embeddings * attention_mask.unsqueeze(-1)).sum(1) / attention_mask.sum(1).unsqueeze(-1)

	# Normalize embeddings
	embeddings = torch.nn.functional.normalize(embeddings, p=2, dim=1)
	```

	## Evaluation Results on Eka-IndicMTEB

	We evaluated Parrotlet-e on the [Eka-IndicMTEB](https://huggingface.co/datasets/ekacare/Eka-IndicMTEB) benchmark using [KARMA](https://karma.eka.care/), with metrics computed at Recall@1, Recall@3, and Recall@5.

	\| Model \| Recall@1 \| Recall@3 \| Recall@5 \|
	\|:------\|:----------:\|:----------:\|:----------:\|
	\| Parrotlet-e \| 0.7206 \| 0.8320 \| 0.8512 \|
	\| cambridgeltl/SapBERT-from-PubMedBERT-fulltext \| 0.3574 \| 0.4427 \| 0.4684 \|
	\| BAAI/bge-m3 \| 0.3146 \| 0.4060 \| 0.4444 \|
	\| google/embeddinggemma-300m \| 0.1031 \| 0.1408 \| 0.1525 \|
	\| ai4bharat/IndicBERTv2-MLM-only \| 0.0311 \| 0.0573 \| 0.0724 \|

	EkaCare Parrotlet-e and the Eka-IndicMTEB benchmark together provide a foundation for building robust, cross-lingual medical AI systems — enabling better coding, documentation, and understanding across India’s diverse clinical landscape.

	## Authentication (if required)
	Set up your Hugging Face token (if required):

	Log in to your Hugging Face account and generate an access token at Hugging Face Settings.
	Set the token in your environment:
	```
	export HF_TOKEN="your-access-token"
	```
	Alternatively, use the Hugging Face CLI to log in:
	```
	huggingface-cli login
	```

	## License
	This model is released under CC-By-SA-4.0