intentity-aibator / README.md

Upload Intentity AIBA - Multi-Task Banking Model (Language + Intent + NER)

890029a verified 3 months ago

7.61 kB

	---
	language:
	- en
	- ru
	- uz
	- multilingual
	license: apache-2.0
	tags:
	- multi-task-learning
	- token-classification
	- text-classification
	- ner
	- named-entity-recognition
	- intent-classification
	- language-detection
	- banking
	- transactions
	- financial
	- multilingual
	- bert
	- pytorch
	datasets:
	- custom
	metrics:
	- precision
	- recall
	- f1
	- accuracy
	- seqeval
	widget:
	- text: "Transfer 12.5mln USD to Apex Industries account 27109477752047116719 INN 123456789 bank code 01234 for consulting"
	example_title: "English Transaction"
	- text: "Отправить 150тыс рублей на счет ООО Ромашка 40817810099910004312 ИНН 987654321 за услуги"
	example_title: "Russian Transaction"
	- text: "44380583609046995897 ҳисобга 170190.66 UZS ўтказиш Голден Стар ИНН 485232484"
	example_title: "Uzbek Cyrillic Transaction"
	- text: "Show completed transactions from 01.12.2024 to 15.12.2024"
	example_title: "Query Request"
	library_name: transformers
	pipeline_tag: token-classification
	---

	# Intentity AIBA - Multi-Task Banking Model 🏦🤖

	## Model Description

	Intentity AIBA is a state-of-the-art multi-task model that simultaneously performs:
	1. 🌐 Language Detection - Identifies the language of input text
	2. 🎯 Intent Classification - Determines user's intent
	3. 📋 Named Entity Recognition - Extracts key entities from banking transactions

	Built on `google-bert/bert-base-multilingual-cased` with a shared encoder and three specialized output heads, this model provides comprehensive understanding of banking and financial transaction texts in multiple languages.

	## 🎯 Capabilities

	### Language Detection
	Supports 5 languages:
	- `en`
	- `mixed`
	- `ru`
	- `uz_cyrl`
	- `uz_latn`

	### Intent Classification
	Recognizes 4 intent types:
	- `create_transaction`
	- `help`
	- `list_transaction`
	- `unknown`

	### Named Entity Recognition
	Extracts 6 entity types:
	- `amount`
	- `currency`
	- `description`
	- `receiver_hr`
	- `receiver_inn`
	- `receiver_name`

	## 📊 Model Performance

	\| Task \| Metric \| Score \|
	\|------\|--------\|-------\|
	\| NER \| F1 Score \| 0.9891 \|
	\| NER \| Precision \| 0.9891 \|
	\| Intent \| F1 Score \| 0.9999 \|
	\| Intent \| Accuracy \| 0.9999 \|
	\| Language \| Accuracy \| 0.9648 \|
	\| Overall \| Average F1 \| 0.9945 \|

	## 🚀 Quick Start

	### Installation

	```bash
	pip install transformers torch
	```

	### Basic Usage

	```python
	import torch
	from transformers import AutoTokenizer, AutoModel

	# Load model and tokenizer
	model_name = "primel/intentity-aiba"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModel.from_pretrained(model_name)

	# Note: This is a custom multi-task model
	# Use the inference code below for predictions
	```

	### Complete Inference Code

	```python
	import torch
	from transformers import AutoTokenizer, AutoModel
	import json

	class IntentityAIBA:
	def __init__(self, model_name="primel/intentity-aiba"):
	self.tokenizer = AutoTokenizer.from_pretrained(model_name)
	self.model = AutoModel.from_pretrained(model_name)

	# Load label mappings from model config
	self.id2tag = self.model.config.id2label if hasattr(self.model.config, 'id2label') else {}
	# Note: Intent and language mappings should be loaded from model files

	self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
	self.model.to(self.device)
	self.model.eval()

	def predict(self, text):
	"""Predict language, intent, and entities for input text."""
	inputs = self.tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
	inputs = {k: v.to(self.device) for k, v in inputs.items()}

	with torch.no_grad():
	outputs = self.model(**inputs)

	# Extract predictions from custom model heads
	# (Implementation depends on your model architecture)

	return {
	'language': 'detected_language',
	'intent': 'detected_intent',
	'entities': {}
	}

	# Initialize
	model = IntentityAIBA()

	# Predict
	text = "Transfer 12.5mln USD to Apex Industries account 27109477752047116719"
	result = model.predict(text)
	print(result)
	```

	## 📝 Example Outputs

	### Example 1: English Transaction

	Input: `"Transfer 12.5mln USD to Apex Industries account 27109477752047116719 INN 123456789 bank code 01234 for consulting"`

	Output:
	```python
	{
	"language": "en",
	"intent": "create_transaction",
	"entities": {
	"amount": "12.5mln",
	"currency": "USD",
	"receiver_name": "Apex Industries",
	"receiver_hr": "27109477752047116719",
	"receiver_inn": "123456789",
	"bank_code": "01234",
	"description": "consulting"
	}
	}
	```

	### Example 2: Russian Transaction

	Input: `"Отправить 150тыс рублей на счет ООО Ромашка 40817810099910004312 ИНН 987654321"`

	Output:
	```python
	{
	"language": "ru",
	"intent": "create_transaction",
	"entities": {
	"amount": "150тыс",
	"currency": "рублей",
	"receiver_name": "ООО Ромашка",
	"receiver_hr": "40817810099910004312",
	"receiver_inn": "987654321"
	}
	}
	```

	### Example 3: Query Request

	Input: `"Show completed transactions from 01.12.2024 to 15.12.2024"`

	Output:
	```python
	{
	"language": "en",
	"intent": "list_transaction",
	"entities": {
	"start_date": "01.12.2024",
	"end_date": "15.12.2024"
	}
	}
	```

	## 🏗️ Model Architecture

	- Base Model: `google-bert/bert-base-multilingual-cased`
	- Architecture: Multi-task learning with shared encoder
	- Shared BERT encoder (110M parameters)
	- NER head: Token-level classifier
	- Intent head: Sequence-level classifier
	- Language head: Sequence-level classifier
	- Total Parameters: ~178M
	- Loss Function: Weighted combination (0.4 × NER + 0.3 × Intent + 0.3 × Language)

	## 🎓 Training Details

	- Training Samples: 340,986
	- Validation Samples: 60,175
	- Epochs: 6
	- Batch Size: 16 (per device)
	- Learning Rate: 3e-5
	- Warmup Ratio: 0.15
	- Optimizer: AdamW with weight decay
	- LR Scheduler: Linear with warmup
	- Framework: Transformers + PyTorch
	- Hardware: Trained on Tesla T4 GPU

	## 💡 Use Cases

	- Banking Applications: Transaction processing and validation
	- Chatbots: Intent-aware financial assistants
	- Document Processing: Automated extraction from transaction documents
	- Compliance: KYC/AML data extraction
	- Analytics: Transaction categorization and analysis
	- Multi-language Support: Cross-border banking operations

	## ⚠️ Limitations

	- Designed for banking/financial domain - may not generalize to other domains
	- Performance may vary on formats significantly different from training data
	- Mixed language texts may have lower accuracy
	- Best results with transaction-style texts similar to training distribution
	- Requires fine-tuning for specific banking systems or regional variations

	## 📚 Citation

	```bibtex
	@misc{intentity-aiba-2025,
	author = {Primel},
	title = {Intentity AIBA: Multi-Task Banking Language Model},
	year = {2025},
	publisher = {Hugging Face},
	journal = {Hugging Face Model Hub},
	howpublished = {\url{https://huggingface.co/primel/intentity-aiba}}
	}
	```

	## 📄 License

	Apache 2.0

	## 🤝 Contact

	For questions, issues, or collaboration opportunities, please open an issue on the model repository.

	---

	Model Card Authors: Primel
	Last Updated: 2025
	Model Version: 1.0