transactor-aiba / README.md
primel's picture
Upload Transactor AIBA - Multilingual Banking Transaction NER Model
95c3971 verified
---
language:
- en
- ru
- multilingual
license: apache-2.0
tags:
- token-classification
- ner
- named-entity-recognition
- banking
- transactions
- financial
- multilingual
- bert
datasets:
- custom
metrics:
- precision
- recall
- f1
- seqeval
widget:
- text: "Transfer 12.5mln USD to Apex Industries account 27109477752047116719 INN 123456789 bank code 01234 for consulting"
- text: "Send 150k RUB to ООО Ромашка счет 40817810099910004312 ИНН 987654321 за услуги"
- text: "Show completed transactions from 01.12.2024 to 15.12.2024"
pipeline_tag: token-classification
---
# Transactor AIBA - Banking Transaction NER Model
## Model Description
**Transactor AIBA** is a multilingual Named Entity Recognition (NER) model fine-tuned on `google-bert/bert-base-multilingual-cased` for extracting entities from banking and financial transaction texts. The model supports both English and Russian languages.
## Intended Use
This model is designed to extract key entities from banking transaction requests, including:
- Transaction amounts and currencies
- Account numbers and bank codes
- Tax identification numbers (INN)
- Recipient/sender information
- Transaction purposes
- Dates and time periods
## Entity Types
The model recognizes the following entity types:
- `amount`
- `bank_code`
- `currency`
- `date`
- `description`
- `end_date`
- `receiver_hr`
- `receiver_inn`
- `receiver_name`
- `start_date`
- `status`
## Training Data
- **Base Model**: `google-bert/bert-base-multilingual-cased`
- **Training Samples**: 200,015
- **Validation Samples**: 35,297
- **Dataset**: Custom banking transaction dataset with multilingual support
## Training Details
- **Epochs**: 5
- **Batch Size**: 16
- **Learning Rate**: 2e-5
- **Optimizer**: AdamW
- **LR Scheduler**: Linear with warmup
- **Framework**: Transformers + PyTorch
## Performance
- **Validation F1 Score**: 0.9999
## Usage
```python
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch
# Load model and tokenizer
model_name = "primel/transactor-aiba"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
# Example prediction
def extract_entities(text):
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.argmax(outputs.logits, dim=2)
tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'][0])
predicted_labels = [model.config.id2label[pred.item()] for pred in predictions[0]]
entities = {}
current_entity = None
current_tokens = []
for token, label in zip(tokens, predicted_labels):
if token in ['[CLS]', '[SEP]', '[PAD]']:
continue
if label.startswith('B-'):
if current_entity and current_tokens:
entity_text = tokenizer.convert_tokens_to_string(current_tokens)
entities[current_entity] = entity_text.strip()
current_entity = label[2:]
current_tokens = [token]
elif label.startswith('I-') and current_entity == label[2:]:
current_tokens.append(token)
else:
if current_entity and current_tokens:
entity_text = tokenizer.convert_tokens_to_string(current_tokens)
entities[current_entity] = entity_text.strip()
current_entity = None
current_tokens = []
if current_entity and current_tokens:
entity_text = tokenizer.convert_tokens_to_string(current_tokens)
entities[current_entity] = entity_text.strip()
return entities
# Example
text = "Transfer 12.5mln USD to Apex Industries account 27109477752047116719"
print(extract_entities(text))
```
## Example Outputs
**Input**: "Transfer 12.5mln USD to Apex Industries account 27109477752047116719 INN 123456789 bank code 01234 for consulting"
**Output**:
```python
{
"amount": "12.5mln",
"currency": "USD",
"receiver_name": "Apex Industries",
"receiver_hr": "27109477752047116719",
"receiver_inn": "123456789",
"receiver_bank_code": "01234",
"purpose": "consulting"
}
```
## Limitations
- The model is trained on synthetic and curated banking transaction data
- Performance may vary on real-world data with different formatting
- Best results are achieved with transaction texts similar to training distribution
- May require fine-tuning for specific banking systems or regional variations
## License
Apache 2.0
## Citation
```bibtex
@misc{transactor-aiba,
author = {Primel},
title = {Transactor AIBA: Multilingual Banking Transaction NER},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/primel/transactor-aiba}}
}
```