You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

BharatGPT-Embedding

BharatGPT-Embedding is a state-of-the-art multilingual text embedding model developed by the BharatGPT team, designed for Indian languages and English.

Model Highlights

  • Optimized for multilingual semantic search
  • Supports English, Hindi, Bengali, Telugu, Tamil, Marathi, Gujarati
  • Ideal for RAG pipelines, semantic similarity, and document retrieval

How to Use

from transformers import AutoTokenizer, AutoModel
import torch

tokenizer = AutoTokenizer.from_pretrained("harry121/BharatGPT-Embedding")
model = AutoModel.from_pretrained("harry121/BharatGPT-Embedding")

inputs = tokenizer("your text here", return_tensors="pt", padding=True, truncation=True)

with torch.no_grad():
    outputs = model(**inputs)

# Mean pooling
embeddings = outputs.last_hidden_state.mean(dim=1)
print("Embedding shape:", embeddings.shape)

License

Apache 2.0 — free for commercial and personal use.

Contact

BharatGPT Team

Downloads last month
25
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support