BharatGPT-Embedding
BharatGPT-Embedding is a state-of-the-art multilingual text embedding model developed by the BharatGPT team, designed for Indian languages and English.
Model Highlights
- Optimized for multilingual semantic search
- Supports English, Hindi, Bengali, Telugu, Tamil, Marathi, Gujarati
- Ideal for RAG pipelines, semantic similarity, and document retrieval
How to Use
from transformers import AutoTokenizer, AutoModel
import torch
tokenizer = AutoTokenizer.from_pretrained("harry121/BharatGPT-Embedding")
model = AutoModel.from_pretrained("harry121/BharatGPT-Embedding")
inputs = tokenizer("your text here", return_tensors="pt", padding=True, truncation=True)
with torch.no_grad():
outputs = model(**inputs)
# Mean pooling
embeddings = outputs.last_hidden_state.mean(dim=1)
print("Embedding shape:", embeddings.shape)
License
Apache 2.0 — free for commercial and personal use.
Contact
BharatGPT Team
- Downloads last month
- 25