File size: 4,851 Bytes

95c3971

---
language:
- en
- ru
- multilingual
license: apache-2.0
tags:
- token-classification
- ner
- named-entity-recognition
- banking
- transactions
- financial
- multilingual
- bert
datasets:
- custom
metrics:
- precision
- recall
- f1
- seqeval
widget:
- text: "Transfer 12.5mln USD to Apex Industries account 27109477752047116719 INN 123456789 bank code 01234 for consulting"
- text: "Send 150k RUB to ООО Ромашка счет 40817810099910004312 ИНН 987654321 за услуги"
- text: "Show completed transactions from 01.12.2024 to 15.12.2024"
pipeline_tag: token-classification
---

# Transactor AIBA - Banking Transaction NER Model

## Model Description

**Transactor AIBA** is a multilingual Named Entity Recognition (NER) model fine-tuned on `google-bert/bert-base-multilingual-cased` for extracting entities from banking and financial transaction texts. The model supports both English and Russian languages.

## Intended Use

This model is designed to extract key entities from banking transaction requests, including:
- Transaction amounts and currencies
- Account numbers and bank codes
- Tax identification numbers (INN)
- Recipient/sender information
- Transaction purposes
- Dates and time periods

## Entity Types

The model recognizes the following entity types:

- `amount`
- `bank_code`
- `currency`
- `date`
- `description`
- `end_date`
- `receiver_hr`
- `receiver_inn`
- `receiver_name`
- `start_date`
- `status`

## Training Data

- **Base Model**: `google-bert/bert-base-multilingual-cased`
- **Training Samples**: 200,015
- **Validation Samples**: 35,297
- **Dataset**: Custom banking transaction dataset with multilingual support

## Training Details

- **Epochs**: 5
- **Batch Size**: 16
- **Learning Rate**: 2e-5
- **Optimizer**: AdamW
- **LR Scheduler**: Linear with warmup
- **Framework**: Transformers + PyTorch

## Performance

- **Validation F1 Score**: 0.9999

## Usage

```python
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

# Load model and tokenizer
model_name = "primel/transactor-aiba"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

# Example prediction
def extract_entities(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
    
    with torch.no_grad():
        outputs = model(**inputs)
        predictions = torch.argmax(outputs.logits, dim=2)
    
    tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'][0])
    predicted_labels = [model.config.id2label[pred.item()] for pred in predictions[0]]
    
    entities = {}
    current_entity = None
    current_tokens = []
    
    for token, label in zip(tokens, predicted_labels):
        if token in ['[CLS]', '[SEP]', '[PAD]']:
            continue
            
        if label.startswith('B-'):
            if current_entity and current_tokens:
                entity_text = tokenizer.convert_tokens_to_string(current_tokens)
                entities[current_entity] = entity_text.strip()
            current_entity = label[2:]
            current_tokens = [token]
        elif label.startswith('I-') and current_entity == label[2:]:
            current_tokens.append(token)
        else:
            if current_entity and current_tokens:
                entity_text = tokenizer.convert_tokens_to_string(current_tokens)
                entities[current_entity] = entity_text.strip()
            current_entity = None
            current_tokens = []
    
    if current_entity and current_tokens:
        entity_text = tokenizer.convert_tokens_to_string(current_tokens)
        entities[current_entity] = entity_text.strip()
    
    return entities

# Example
text = "Transfer 12.5mln USD to Apex Industries account 27109477752047116719"
print(extract_entities(text))
```

## Example Outputs

**Input**: "Transfer 12.5mln USD to Apex Industries account 27109477752047116719 INN 123456789 bank code 01234 for consulting"

**Output**:
```python
{
    "amount": "12.5mln",
    "currency": "USD",
    "receiver_name": "Apex Industries",
    "receiver_hr": "27109477752047116719",
    "receiver_inn": "123456789",
    "receiver_bank_code": "01234",
    "purpose": "consulting"
}
```

## Limitations

- The model is trained on synthetic and curated banking transaction data
- Performance may vary on real-world data with different formatting
- Best results are achieved with transaction texts similar to training distribution
- May require fine-tuning for specific banking systems or regional variations

## License

Apache 2.0

## Citation

```bibtex
@misc{transactor-aiba,
  author = {Primel},
  title = {Transactor AIBA: Multilingual Banking Transaction NER},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/primel/transactor-aiba}}
}
```