You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

BharatGPT-Embedding

BharatGPT-Embedding is a state-of-the-art multilingual text embedding model developed by the BharatGPT team, designed for Indian languages and English.

Model Highlights

Optimized for multilingual semantic search
Supports English, Hindi, Bengali, Telugu, Tamil, Marathi, Gujarati
Ideal for RAG pipelines, semantic similarity, and document retrieval

How to Use

from transformers import AutoTokenizer, AutoModel
import torch

tokenizer = AutoTokenizer.from_pretrained("harry121/BharatGPT-Embedding")
model = AutoModel.from_pretrained("harry121/BharatGPT-Embedding")

inputs = tokenizer("your text here", return_tensors="pt", padding=True, truncation=True)

with torch.no_grad():
    outputs = model(**inputs)

# Mean pooling
embeddings = outputs.last_hidden_state.mean(dim=1)
print("Embedding shape:", embeddings.shape)

License

Apache 2.0 — free for commercial and personal use.

Contact

BharatGPT Team

Downloads last month: 25

Safetensors

Model size

4B params

Tensor type

BF16