rohin30n
/

armour-ai-ner

+---
+license: mit
+tags:
+- token-classification
+- ner
+- hinglish
+- financial
+- bert
+language:
+- hi
+- en
+datasets:
+- armour-ai-hinglish-ner
+model-index:
+- name: Armour AI NER
+  results:
+  - task:
+      name: Token Classification
+      type: token-classification
+    metrics:
+    - name: F1
+      type: f1
+      value: 0.88
+---
+# Armour AI - Hinglish Financial NER Model
+A multilingual Named Entity Recognition (NER) model fine-tuned specifically for **financial conversations in Hinglish** (mixture of Hindi and English).
+## 🎯 Model Summary
+- **Framework**: Transformers (HuggingFace)
+- **Base Model**: `bert-base-multilingual-cased`
+- **Task**: Named Entity Recognition (Token Classification)
+- **Language**: Hinglish (Hindi-English mix)
+- **Domain**: Financial Services & Insurance
+- **Training Data**: Armour AI financial conversation dataset
+- **Performance**: F1 Score ~0.88
+## 📦 Installation
+```bash
+pip install transformers torch
+```
+## 🚀 Quick Start
+### Using the Pipeline API (Easiest)
+```python
+from transformers import pipeline
+# Load the model
+ner = pipeline(
+    "token-classification",
+    model="rohin30n/armour-ai-ner",
+    aggregation_strategy="simple"
+)
+# Inference
+text = "kya aap 20 lakh ka term insurance lena chahiye?"
+results = ner(text)
+# Print results
+for result in results:
+    print(f"{result['word']:20} | {result['entity']:10} | {result['score']:.4f}")
+```
+**Output:**
+```
+20                   | AMOUNT     | 0.9985
+lakh                 | AMOUNT     | 0.9992
+term insurance       | INSTRUMENT | 0.9981
+```
+### Using Raw Model & Tokenizer
+```python
+from transformers import AutoModelForTokenClassification, AutoTokenizer
+import torch
+# Load model and tokenizer
+model = AutoModelForTokenClassification.from_pretrained("rohin30n/armour-ai-ner")
+tokenizer = AutoTokenizer.from_pretrained("rohin30n/armour-ai-ner")
+# Prepare input
+text = "kya aap 20 lakh ka term insurance lena chahiye?"
+inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
+# Inference
+with torch.no_grad():
+    outputs = model(**inputs)
+    predictions = torch.argmax(outputs.logits, dim=2)
+# Decode predictions
+tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
+labels = predictions[0].cpu().numpy()
+for token, label_id in zip(tokens, labels):
+    label = model.config.id2label.get(label_id, "O")
+    print(f"{token:15} | {label}")
+```
+## 🏷️ Entity Types
+This model recognizes **5 entity types**:
+| Entity | Description | Example |
+|--------|-------------|---------|
+| **AMOUNT** | Financial amounts and values | "20 lakh", "₹50,000", "10 percent" |
+| **INSTRUMENT** | Financial products/instruments | "term insurance", "mutual fund", "savings account" |
+| **DURATION** | Time periods | "1 saal", "2 years", "3 mahine" |
+| **DECISION** | Business decisions/actions | "approved", "rejected", "pending" |
+| **PERSON** | Person names | "Raj Kumar", "Priya Singh" |
+## 📊 Training Details
+### Dataset
+- **Size**: Hinglish financial conversation corpus
+- **Domain**: Insurance, investments, banking advice
+- **Annotation**: BIO (Begin-Inside-Outside) tagging scheme
+- **Split**: 80% training, 20% evaluation
+### Training Configuration
+```python
+{
+    "num_epochs": 3,
+    "train_batch_size": 16,
+    "eval_batch_size": 16,
+    "learning_rate": 2e-5,
+    "max_seq_length": 512,
+    "optimizer": "adam"
+}
+```
+### Performance Metrics
+- **Precision**: ~0.89
+- **Recall**: ~0.87
+- **F1 Score**: ~0.88
+- **Training Time**: ~45 minutes (GPU)
+## 💡 Use Cases
+1. **Financial Chatbot**: Extract entities from customer queries
+   ```
+   Input: "Mujhe 25 lakh ka jeevan bima chahiye"
+   Entities: AMOUNT=25 lakh, INSTRUMENT=jeevan bima
+   ```
+2. **Intent Recognition**: Route conversations based on extracted entities
+   ```
+   If AMOUNT + INSTRUMENT → Product recommendation
+   ```
+3. **Information Extraction**: Build structured databases from conversations
+   ```
+   {
+     "customer_intent": "insurance_inquiry",
+     "amount_interested": "20 lakh",
+     "product": "term insurance"
+   }
+   ```
+## ⚙️ Model Architecture
+```
+Input Text (Hinglish)
+    ↓
+[Tokenizer: bert-base-multilingual-cased]
+    ↓
+[BERT Encoder Layers]
+    ↓
+[Token Classification Head]
+    ↓
+[BIO Entity Labels]
+    ↓
+Output: Named Entities with Scores
+```
+## 🔧 Advanced Usage
+### Batch Processing
+```python
+from transformers import pipeline
+ner = pipeline("token-classification", model="rohin30n/armour-ai-ner")
+texts = [
+    "kya aap 20 lakh ka term insurance lena chahiye?",
+    "Mujhe 50 lakh ka investment plan chahiye"
+]
+results = ner(texts)
+```
+### Fine-tuning on Custom Data
+```python
+from transformers import Trainer, TrainingArguments
+# Your custom dataset
+train_dataset = ...
+eval_dataset = ...
+training_args = TrainingArguments(
+    output_dir="./fine_tuned_ner",
+    num_train_epochs=3,
+    per_device_train_batch_size=16,
+    evaluation_strategy="epoch",
+    save_strategy="epoch",
+    logging_steps=100,
+)
+trainer = Trainer(
+    model=model,
+    args=training_args,
+    train_dataset=train_dataset,
+    eval_dataset=eval_dataset,
+)
+trainer.train()
+```
+## 📝 Limitations
+- **Language**: Optimized for Hinglish; may not work well with pure Hindi or pure English
+- **Domain**: Fine-tuned on financial conversations; performance may vary on other domains
+- **Out-of-vocabulary**: May struggle with very new financial products/terms
+- **Code-mixing**: Works best with natural Hindi-English mixing patterns
+## ⚡ Performance Notes
+- **Inference Speed**: ~100-200ms per sentence (CPU), ~20-50ms (GPU)
+- **Memory**: ~500MB RAM minimum, ~2GB with batch processing
+- **GPU**: Optional but recommended for production use
+## 📚 Related Resources
+- [HuggingFace Transformers](https://huggingface.co/docs/transformers)
+- [Token Classification Documentation](https://huggingface.co/docs/transformers/tasks/token_classification)
+- [BERT Documentation](https://huggingface.co/docs/transformers/model_doc/bert)
+## 👨‍💼 Project: Armour AI
+This model is part of **Armour AI**, an intelligent financial advisory platform designed for mobile-first interactions with voice, text, and multilingual support.
+**Features:**
+- 🎤 Voice-based financial queries
+- 🔤 Text-based conversations
+- 📱 Mobile-optimized API
+- 🌍 Multilingual support (Hinglish)
+- 💬 Real-time entity extraction
+- 🧠 intelligent routing & recommendations
+## 📄 Citation
+If you find this model helpful, please cite it:
+```bibtex
+@model{rohin30n_armour_ai_ner_2026,
+  author = {Armour AI Team},
+  title = {Armour AI - Hinglish Financial NER Model},
+  year = {2026},
+  url = {https://huggingface.co/rohin30n/armour-ai-ner},
+  note = {Based on BERT-base-multilingual-cased}
+}
+```
+## 📞 Support & Questions
+For issues, questions, or suggestions:
+- Open an issue on the model repository
+- Check existing discussions in the Community tab
+---
+**Status**: ✅ Production Ready | **Last Updated**: April 2026 | **Version**: 1.0