File size: 1,567 Bytes
fb26617
5f36d51
57c40a2
 
fb26617
 
57c40a2
fb26617
 
 
 
57c40a2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
---
title: Urdu Emoji Predictor
emoji: 🎯
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
---

# 🎯 Urdu Emoji Predictor

An AI-powered tool that predicts relevant emojis for Urdu text using machine learning and semantic similarity.

## 🚀 Try It Out!

Simply enter Urdu text and get the most relevant emojis instantly.

## 🎯 Examples

- `میں بہت خوش ہوں` → 🎉 🎊 👌
- `دل ٹوٹ گیا ہے` → 🌚 😞 💔  
- `نیند آ رہی ہے` → 😴 😞 🌚
- `دوستوں کے ساتھ پارٹی` → 🎉 😋 🎊

## 🔧 How It Works

1. **Text Encoding**: Converts Urdu text to semantic embeddings using multilingual sentence transformers
2. **Similarity Search**: Compares text embeddings with pre-computed emoji embeddings
3. **Ranking**: Returns top emojis based on cosine similarity scores

## 🏗️ Technical Details

- **Model**: `sentence-transformers/paraphrase-multilingual-mpnet-base-v2`
- **Emojis**: 80 most common emojis from Urdu social media
- **Method**: Cosine similarity between text and emoji embeddings
- **Framework**: Gradio + FastAPI

## 📊 Model Performance

- **Top-1 Accuracy**: ~16%
- **Top-3 Accuracy**: ~30%
- **Trained on**: 800K+ Urdu text-emoji pairs

## 🎮 Usage

```python
from urdu_specific_embedding import UrduOptimizedPredictor

predictor = UrduOptimizedPredictor("models/urdu_optimized_model")
predictions = predictor.predict_smart("میں بہت خوش ہوں", top_k=3)
# Returns: [('🎉', 0.555), ('🎊', 0.537), ('👌', 0.439)]