Armour AI - Hinglish Financial NER Model
A multilingual Named Entity Recognition (NER) model fine-tuned specifically for financial conversations in Hinglish (mixture of Hindi and English).
π― Model Summary
- Framework: Transformers (HuggingFace)
- Base Model:
bert-base-multilingual-cased - Task: Named Entity Recognition (Token Classification)
- Language: Hinglish (Hindi-English mix)
- Domain: Financial Services & Insurance
- Training Data: Armour AI financial conversation dataset
- Performance: F1 Score ~0.88
π¦ Installation
pip install transformers torch
π Quick Start
Using the Pipeline API (Easiest)
from transformers import pipeline
# Load the model
ner = pipeline(
"token-classification",
model="rohin30n/armour-ai-ner",
aggregation_strategy="simple"
)
# Inference
text = "kya aap 20 lakh ka term insurance lena chahiye?"
results = ner(text)
# Print results
for result in results:
print(f"{result['word']:20} | {result['entity']:10} | {result['score']:.4f}")
Output:
20 | AMOUNT | 0.9985
lakh | AMOUNT | 0.9992
term insurance | INSTRUMENT | 0.9981
Using Raw Model & Tokenizer
from transformers import AutoModelForTokenClassification, AutoTokenizer
import torch
# Load model and tokenizer
model = AutoModelForTokenClassification.from_pretrained("rohin30n/armour-ai-ner")
tokenizer = AutoTokenizer.from_pretrained("rohin30n/armour-ai-ner")
# Prepare input
text = "kya aap 20 lakh ka term insurance lena chahiye?"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
# Inference
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.argmax(outputs.logits, dim=2)
# Decode predictions
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
labels = predictions[0].cpu().numpy()
for token, label_id in zip(tokens, labels):
label = model.config.id2label.get(label_id, "O")
print(f"{token:15} | {label}")
π·οΈ Entity Types
This model recognizes 5 entity types:
| Entity | Description | Example |
|---|---|---|
| AMOUNT | Financial amounts and values | "20 lakh", "βΉ50,000", "10 percent" |
| INSTRUMENT | Financial products/instruments | "term insurance", "mutual fund", "savings account" |
| DURATION | Time periods | "1 saal", "2 years", "3 mahine" |
| DECISION | Business decisions/actions | "approved", "rejected", "pending" |
| PERSON | Person names | "Raj Kumar", "Priya Singh" |
π Training Details
Dataset
- Size: Hinglish financial conversation corpus
- Domain: Insurance, investments, banking advice
- Annotation: BIO (Begin-Inside-Outside) tagging scheme
- Split: 80% training, 20% evaluation
Training Configuration
{
"num_epochs": 3,
"train_batch_size": 16,
"eval_batch_size": 16,
"learning_rate": 2e-5,
"max_seq_length": 512,
"optimizer": "adam"
}
Performance Metrics
- Precision: ~0.89
- Recall: ~0.87
- F1 Score: ~0.88
- Training Time: ~45 minutes (GPU)
π‘ Use Cases
Financial Chatbot: Extract entities from customer queries
Input: "Mujhe 25 lakh ka jeevan bima chahiye" Entities: AMOUNT=25 lakh, INSTRUMENT=jeevan bimaIntent Recognition: Route conversations based on extracted entities
If AMOUNT + INSTRUMENT β Product recommendationInformation Extraction: Build structured databases from conversations
{ "customer_intent": "insurance_inquiry", "amount_interested": "20 lakh", "product": "term insurance" }
βοΈ Model Architecture
Input Text (Hinglish)
β
[Tokenizer: bert-base-multilingual-cased]
β
[BERT Encoder Layers]
β
[Token Classification Head]
β
[BIO Entity Labels]
β
Output: Named Entities with Scores
π§ Advanced Usage
Batch Processing
from transformers import pipeline
ner = pipeline("token-classification", model="rohin30n/armour-ai-ner")
texts = [
"kya aap 20 lakh ka term insurance lena chahiye?",
"Mujhe 50 lakh ka investment plan chahiye"
]
results = ner(texts)
Fine-tuning on Custom Data
from transformers import Trainer, TrainingArguments
# Your custom dataset
train_dataset = ...
eval_dataset = ...
training_args = TrainingArguments(
output_dir="./fine_tuned_ner",
num_train_epochs=3,
per_device_train_batch_size=16,
evaluation_strategy="epoch",
save_strategy="epoch",
logging_steps=100,
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
)
trainer.train()
π Limitations
- Language: Optimized for Hinglish; may not work well with pure Hindi or pure English
- Domain: Fine-tuned on financial conversations; performance may vary on other domains
- Out-of-vocabulary: May struggle with very new financial products/terms
- Code-mixing: Works best with natural Hindi-English mixing patterns
β‘ Performance Notes
- Inference Speed: ~100-200ms per sentence (CPU), ~20-50ms (GPU)
- Memory: ~500MB RAM minimum, ~2GB with batch processing
- GPU: Optional but recommended for production use
π Related Resources
π¨βπΌ Project: Armour AI
This model is part of Armour AI, an intelligent financial advisory platform designed for mobile-first interactions with voice, text, and multilingual support.
Features:
- π€ Voice-based financial queries
- π€ Text-based conversations
- π± Mobile-optimized API
- π Multilingual support (Hinglish)
- π¬ Real-time entity extraction
- π§ intelligent routing & recommendations
π Citation
If you find this model helpful, please cite it:
@model{rohin30n_armour_ai_ner_2026,
author = {Armour AI Team},
title = {Armour AI - Hinglish Financial NER Model},
year = {2026},
url = {https://huggingface.co/rohin30n/armour-ai-ner},
note = {Based on BERT-base-multilingual-cased}
}
π Support & Questions
For issues, questions, or suggestions:
- Open an issue on the model repository
- Check existing discussions in the Community tab
Status: β Production Ready | Last Updated: April 2026 | Version: 1.0
- Downloads last month
- 56
Evaluation results
- F1self-reported0.880