File size: 4,851 Bytes
95c3971 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 | ---
language:
- en
- ru
- multilingual
license: apache-2.0
tags:
- token-classification
- ner
- named-entity-recognition
- banking
- transactions
- financial
- multilingual
- bert
datasets:
- custom
metrics:
- precision
- recall
- f1
- seqeval
widget:
- text: "Transfer 12.5mln USD to Apex Industries account 27109477752047116719 INN 123456789 bank code 01234 for consulting"
- text: "Send 150k RUB to ООО Ромашка счет 40817810099910004312 ИНН 987654321 за услуги"
- text: "Show completed transactions from 01.12.2024 to 15.12.2024"
pipeline_tag: token-classification
---
# Transactor AIBA - Banking Transaction NER Model
## Model Description
**Transactor AIBA** is a multilingual Named Entity Recognition (NER) model fine-tuned on `google-bert/bert-base-multilingual-cased` for extracting entities from banking and financial transaction texts. The model supports both English and Russian languages.
## Intended Use
This model is designed to extract key entities from banking transaction requests, including:
- Transaction amounts and currencies
- Account numbers and bank codes
- Tax identification numbers (INN)
- Recipient/sender information
- Transaction purposes
- Dates and time periods
## Entity Types
The model recognizes the following entity types:
- `amount`
- `bank_code`
- `currency`
- `date`
- `description`
- `end_date`
- `receiver_hr`
- `receiver_inn`
- `receiver_name`
- `start_date`
- `status`
## Training Data
- **Base Model**: `google-bert/bert-base-multilingual-cased`
- **Training Samples**: 200,015
- **Validation Samples**: 35,297
- **Dataset**: Custom banking transaction dataset with multilingual support
## Training Details
- **Epochs**: 5
- **Batch Size**: 16
- **Learning Rate**: 2e-5
- **Optimizer**: AdamW
- **LR Scheduler**: Linear with warmup
- **Framework**: Transformers + PyTorch
## Performance
- **Validation F1 Score**: 0.9999
## Usage
```python
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch
# Load model and tokenizer
model_name = "primel/transactor-aiba"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
# Example prediction
def extract_entities(text):
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.argmax(outputs.logits, dim=2)
tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'][0])
predicted_labels = [model.config.id2label[pred.item()] for pred in predictions[0]]
entities = {}
current_entity = None
current_tokens = []
for token, label in zip(tokens, predicted_labels):
if token in ['[CLS]', '[SEP]', '[PAD]']:
continue
if label.startswith('B-'):
if current_entity and current_tokens:
entity_text = tokenizer.convert_tokens_to_string(current_tokens)
entities[current_entity] = entity_text.strip()
current_entity = label[2:]
current_tokens = [token]
elif label.startswith('I-') and current_entity == label[2:]:
current_tokens.append(token)
else:
if current_entity and current_tokens:
entity_text = tokenizer.convert_tokens_to_string(current_tokens)
entities[current_entity] = entity_text.strip()
current_entity = None
current_tokens = []
if current_entity and current_tokens:
entity_text = tokenizer.convert_tokens_to_string(current_tokens)
entities[current_entity] = entity_text.strip()
return entities
# Example
text = "Transfer 12.5mln USD to Apex Industries account 27109477752047116719"
print(extract_entities(text))
```
## Example Outputs
**Input**: "Transfer 12.5mln USD to Apex Industries account 27109477752047116719 INN 123456789 bank code 01234 for consulting"
**Output**:
```python
{
"amount": "12.5mln",
"currency": "USD",
"receiver_name": "Apex Industries",
"receiver_hr": "27109477752047116719",
"receiver_inn": "123456789",
"receiver_bank_code": "01234",
"purpose": "consulting"
}
```
## Limitations
- The model is trained on synthetic and curated banking transaction data
- Performance may vary on real-world data with different formatting
- Best results are achieved with transaction texts similar to training distribution
- May require fine-tuning for specific banking systems or regional variations
## License
Apache 2.0
## Citation
```bibtex
@misc{transactor-aiba,
author = {Primel},
title = {Transactor AIBA: Multilingual Banking Transaction NER},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/primel/transactor-aiba}}
}
```
|