File size: 4,851 Bytes
95c3971
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
---
language:
- en
- ru
- multilingual
license: apache-2.0
tags:
- token-classification
- ner
- named-entity-recognition
- banking
- transactions
- financial
- multilingual
- bert
datasets:
- custom
metrics:
- precision
- recall
- f1
- seqeval
widget:
- text: "Transfer 12.5mln USD to Apex Industries account 27109477752047116719 INN 123456789 bank code 01234 for consulting"
- text: "Send 150k RUB to ООО Ромашка счет 40817810099910004312 ИНН 987654321 за услуги"
- text: "Show completed transactions from 01.12.2024 to 15.12.2024"
pipeline_tag: token-classification
---

# Transactor AIBA - Banking Transaction NER Model

## Model Description

**Transactor AIBA** is a multilingual Named Entity Recognition (NER) model fine-tuned on `google-bert/bert-base-multilingual-cased` for extracting entities from banking and financial transaction texts. The model supports both English and Russian languages.

## Intended Use

This model is designed to extract key entities from banking transaction requests, including:
- Transaction amounts and currencies
- Account numbers and bank codes
- Tax identification numbers (INN)
- Recipient/sender information
- Transaction purposes
- Dates and time periods

## Entity Types

The model recognizes the following entity types:

- `amount`
- `bank_code`
- `currency`
- `date`
- `description`
- `end_date`
- `receiver_hr`
- `receiver_inn`
- `receiver_name`
- `start_date`
- `status`

## Training Data

- **Base Model**: `google-bert/bert-base-multilingual-cased`
- **Training Samples**: 200,015
- **Validation Samples**: 35,297
- **Dataset**: Custom banking transaction dataset with multilingual support

## Training Details

- **Epochs**: 5
- **Batch Size**: 16
- **Learning Rate**: 2e-5
- **Optimizer**: AdamW
- **LR Scheduler**: Linear with warmup
- **Framework**: Transformers + PyTorch

## Performance

- **Validation F1 Score**: 0.9999

## Usage

```python
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

# Load model and tokenizer
model_name = "primel/transactor-aiba"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

# Example prediction
def extract_entities(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
    
    with torch.no_grad():
        outputs = model(**inputs)
        predictions = torch.argmax(outputs.logits, dim=2)
    
    tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'][0])
    predicted_labels = [model.config.id2label[pred.item()] for pred in predictions[0]]
    
    entities = {}
    current_entity = None
    current_tokens = []
    
    for token, label in zip(tokens, predicted_labels):
        if token in ['[CLS]', '[SEP]', '[PAD]']:
            continue
            
        if label.startswith('B-'):
            if current_entity and current_tokens:
                entity_text = tokenizer.convert_tokens_to_string(current_tokens)
                entities[current_entity] = entity_text.strip()
            current_entity = label[2:]
            current_tokens = [token]
        elif label.startswith('I-') and current_entity == label[2:]:
            current_tokens.append(token)
        else:
            if current_entity and current_tokens:
                entity_text = tokenizer.convert_tokens_to_string(current_tokens)
                entities[current_entity] = entity_text.strip()
            current_entity = None
            current_tokens = []
    
    if current_entity and current_tokens:
        entity_text = tokenizer.convert_tokens_to_string(current_tokens)
        entities[current_entity] = entity_text.strip()
    
    return entities

# Example
text = "Transfer 12.5mln USD to Apex Industries account 27109477752047116719"
print(extract_entities(text))
```

## Example Outputs

**Input**: "Transfer 12.5mln USD to Apex Industries account 27109477752047116719 INN 123456789 bank code 01234 for consulting"

**Output**:
```python
{
    "amount": "12.5mln",
    "currency": "USD",
    "receiver_name": "Apex Industries",
    "receiver_hr": "27109477752047116719",
    "receiver_inn": "123456789",
    "receiver_bank_code": "01234",
    "purpose": "consulting"
}
```

## Limitations

- The model is trained on synthetic and curated banking transaction data
- Performance may vary on real-world data with different formatting
- Best results are achieved with transaction texts similar to training distribution
- May require fine-tuning for specific banking systems or regional variations

## License

Apache 2.0

## Citation

```bibtex
@misc{transactor-aiba,
  author = {Primel},
  title = {Transactor AIBA: Multilingual Banking Transaction NER},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/primel/transactor-aiba}}
}
```