|
|
--- |
|
|
language: |
|
|
- en |
|
|
- ru |
|
|
- uz |
|
|
- multilingual |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- multi-task-learning |
|
|
- token-classification |
|
|
- text-classification |
|
|
- ner |
|
|
- named-entity-recognition |
|
|
- intent-classification |
|
|
- language-detection |
|
|
- banking |
|
|
- transactions |
|
|
- financial |
|
|
- multilingual |
|
|
- bert |
|
|
- pytorch |
|
|
datasets: |
|
|
- custom |
|
|
metrics: |
|
|
- precision |
|
|
- recall |
|
|
- f1 |
|
|
- accuracy |
|
|
- seqeval |
|
|
widget: |
|
|
- text: "Transfer 12.5mln USD to Apex Industries account 27109477752047116719 INN 123456789 bank code 01234 for consulting" |
|
|
example_title: "English Transaction" |
|
|
- text: "Отправить 150тыс рублей на счет ООО Ромашка 40817810099910004312 ИНН 987654321 за услуги" |
|
|
example_title: "Russian Transaction" |
|
|
- text: "44380583609046995897 ҳисобга 170190.66 UZS ўтказиш Голден Стар ИНН 485232484" |
|
|
example_title: "Uzbek Cyrillic Transaction" |
|
|
- text: "Show completed transactions from 01.12.2024 to 15.12.2024" |
|
|
example_title: "Query Request" |
|
|
library_name: transformers |
|
|
pipeline_tag: token-classification |
|
|
--- |
|
|
|
|
|
# Intentity AIBA - Multi-Task Banking Model 🏦🤖 |
|
|
|
|
|
## Model Description |
|
|
|
|
|
**Intentity AIBA** is a state-of-the-art multi-task model that simultaneously performs: |
|
|
1. 🌐 **Language Detection** - Identifies the language of input text |
|
|
2. 🎯 **Intent Classification** - Determines user's intent |
|
|
3. 📋 **Named Entity Recognition** - Extracts key entities from banking transactions |
|
|
|
|
|
Built on `google-bert/bert-base-multilingual-cased` with a shared encoder and three specialized output heads, this model provides comprehensive understanding of banking and financial transaction texts in multiple languages. |
|
|
|
|
|
## 🎯 Capabilities |
|
|
|
|
|
### Language Detection |
|
|
Supports 5 languages: |
|
|
- `en` |
|
|
- `mixed` |
|
|
- `ru` |
|
|
- `uz_cyrl` |
|
|
- `uz_latn` |
|
|
|
|
|
### Intent Classification |
|
|
Recognizes 4 intent types: |
|
|
- `create_transaction` |
|
|
- `help` |
|
|
- `list_transaction` |
|
|
- `unknown` |
|
|
|
|
|
### Named Entity Recognition |
|
|
Extracts 6 entity types: |
|
|
- `amount` |
|
|
- `currency` |
|
|
- `description` |
|
|
- `receiver_hr` |
|
|
- `receiver_inn` |
|
|
- `receiver_name` |
|
|
|
|
|
## 📊 Model Performance |
|
|
|
|
|
| Task | Metric | Score | |
|
|
|------|--------|-------| |
|
|
| **NER** | F1 Score | 0.9891 | |
|
|
| **NER** | Precision | 0.9891 | |
|
|
| **Intent** | F1 Score | 0.9999 | |
|
|
| **Intent** | Accuracy | 0.9999 | |
|
|
| **Language** | Accuracy | 0.9648 | |
|
|
| **Overall** | Average F1 | 0.9945 | |
|
|
|
|
|
## 🚀 Quick Start |
|
|
|
|
|
### Installation |
|
|
|
|
|
```bash |
|
|
pip install transformers torch |
|
|
``` |
|
|
|
|
|
### Basic Usage |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from transformers import AutoTokenizer, AutoModel |
|
|
|
|
|
# Load model and tokenizer |
|
|
model_name = "primel/intentity-aiba" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModel.from_pretrained(model_name) |
|
|
|
|
|
# Note: This is a custom multi-task model |
|
|
# Use the inference code below for predictions |
|
|
``` |
|
|
|
|
|
### Complete Inference Code |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from transformers import AutoTokenizer, AutoModel |
|
|
import json |
|
|
|
|
|
class IntentityAIBA: |
|
|
def __init__(self, model_name="primel/intentity-aiba"): |
|
|
self.tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
self.model = AutoModel.from_pretrained(model_name) |
|
|
|
|
|
# Load label mappings from model config |
|
|
self.id2tag = self.model.config.id2label if hasattr(self.model.config, 'id2label') else {} |
|
|
# Note: Intent and language mappings should be loaded from model files |
|
|
|
|
|
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
|
|
self.model.to(self.device) |
|
|
self.model.eval() |
|
|
|
|
|
def predict(self, text): |
|
|
"""Predict language, intent, and entities for input text.""" |
|
|
inputs = self.tokenizer(text, return_tensors="pt", truncation=True, max_length=128) |
|
|
inputs = {k: v.to(self.device) for k, v in inputs.items()} |
|
|
|
|
|
with torch.no_grad(): |
|
|
outputs = self.model(**inputs) |
|
|
|
|
|
# Extract predictions from custom model heads |
|
|
# (Implementation depends on your model architecture) |
|
|
|
|
|
return { |
|
|
'language': 'detected_language', |
|
|
'intent': 'detected_intent', |
|
|
'entities': {} |
|
|
} |
|
|
|
|
|
# Initialize |
|
|
model = IntentityAIBA() |
|
|
|
|
|
# Predict |
|
|
text = "Transfer 12.5mln USD to Apex Industries account 27109477752047116719" |
|
|
result = model.predict(text) |
|
|
print(result) |
|
|
``` |
|
|
|
|
|
## 📝 Example Outputs |
|
|
|
|
|
### Example 1: English Transaction |
|
|
|
|
|
**Input**: `"Transfer 12.5mln USD to Apex Industries account 27109477752047116719 INN 123456789 bank code 01234 for consulting"` |
|
|
|
|
|
**Output**: |
|
|
```python |
|
|
{ |
|
|
"language": "en", |
|
|
"intent": "create_transaction", |
|
|
"entities": { |
|
|
"amount": "12.5mln", |
|
|
"currency": "USD", |
|
|
"receiver_name": "Apex Industries", |
|
|
"receiver_hr": "27109477752047116719", |
|
|
"receiver_inn": "123456789", |
|
|
"bank_code": "01234", |
|
|
"description": "consulting" |
|
|
} |
|
|
} |
|
|
``` |
|
|
|
|
|
### Example 2: Russian Transaction |
|
|
|
|
|
**Input**: `"Отправить 150тыс рублей на счет ООО Ромашка 40817810099910004312 ИНН 987654321"` |
|
|
|
|
|
**Output**: |
|
|
```python |
|
|
{ |
|
|
"language": "ru", |
|
|
"intent": "create_transaction", |
|
|
"entities": { |
|
|
"amount": "150тыс", |
|
|
"currency": "рублей", |
|
|
"receiver_name": "ООО Ромашка", |
|
|
"receiver_hr": "40817810099910004312", |
|
|
"receiver_inn": "987654321" |
|
|
} |
|
|
} |
|
|
``` |
|
|
|
|
|
### Example 3: Query Request |
|
|
|
|
|
**Input**: `"Show completed transactions from 01.12.2024 to 15.12.2024"` |
|
|
|
|
|
**Output**: |
|
|
```python |
|
|
{ |
|
|
"language": "en", |
|
|
"intent": "list_transaction", |
|
|
"entities": { |
|
|
"start_date": "01.12.2024", |
|
|
"end_date": "15.12.2024" |
|
|
} |
|
|
} |
|
|
``` |
|
|
|
|
|
## 🏗️ Model Architecture |
|
|
|
|
|
- **Base Model**: `google-bert/bert-base-multilingual-cased` |
|
|
- **Architecture**: Multi-task learning with shared encoder |
|
|
- Shared BERT encoder (110M parameters) |
|
|
- NER head: Token-level classifier |
|
|
- Intent head: Sequence-level classifier |
|
|
- Language head: Sequence-level classifier |
|
|
- **Total Parameters**: ~178M |
|
|
- **Loss Function**: Weighted combination (0.4 × NER + 0.3 × Intent + 0.3 × Language) |
|
|
|
|
|
## 🎓 Training Details |
|
|
|
|
|
- **Training Samples**: 340,986 |
|
|
- **Validation Samples**: 60,175 |
|
|
- **Epochs**: 6 |
|
|
- **Batch Size**: 16 (per device) |
|
|
- **Learning Rate**: 3e-5 |
|
|
- **Warmup Ratio**: 0.15 |
|
|
- **Optimizer**: AdamW with weight decay |
|
|
- **LR Scheduler**: Linear with warmup |
|
|
- **Framework**: Transformers + PyTorch |
|
|
- **Hardware**: Trained on Tesla T4 GPU |
|
|
|
|
|
## 💡 Use Cases |
|
|
|
|
|
- **Banking Applications**: Transaction processing and validation |
|
|
- **Chatbots**: Intent-aware financial assistants |
|
|
- **Document Processing**: Automated extraction from transaction documents |
|
|
- **Compliance**: KYC/AML data extraction |
|
|
- **Analytics**: Transaction categorization and analysis |
|
|
- **Multi-language Support**: Cross-border banking operations |
|
|
|
|
|
## ⚠️ Limitations |
|
|
|
|
|
- Designed for banking/financial domain - may not generalize to other domains |
|
|
- Performance may vary on formats significantly different from training data |
|
|
- Mixed language texts may have lower accuracy |
|
|
- Best results with transaction-style texts similar to training distribution |
|
|
- Requires fine-tuning for specific banking systems or regional variations |
|
|
|
|
|
## 📚 Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{intentity-aiba-2025, |
|
|
author = {Primel}, |
|
|
title = {Intentity AIBA: Multi-Task Banking Language Model}, |
|
|
year = {2025}, |
|
|
publisher = {Hugging Face}, |
|
|
journal = {Hugging Face Model Hub}, |
|
|
howpublished = {\url{https://huggingface.co/primel/intentity-aiba}} |
|
|
} |
|
|
``` |
|
|
|
|
|
## 📄 License |
|
|
|
|
|
Apache 2.0 |
|
|
|
|
|
## 🤝 Contact |
|
|
|
|
|
For questions, issues, or collaboration opportunities, please open an issue on the model repository. |
|
|
|
|
|
--- |
|
|
|
|
|
**Model Card Authors**: Primel |
|
|
**Last Updated**: 2025 |
|
|
**Model Version**: 1.0 |
|
|
|