| | --- |
| | language: |
| | - en |
| | - ru |
| | - multilingual |
| | license: apache-2.0 |
| | tags: |
| | - token-classification |
| | - ner |
| | - named-entity-recognition |
| | - banking |
| | - transactions |
| | - financial |
| | - multilingual |
| | - bert |
| | datasets: |
| | - custom |
| | metrics: |
| | - precision |
| | - recall |
| | - f1 |
| | - seqeval |
| | widget: |
| | - text: "Transfer 12.5mln USD to Apex Industries account 27109477752047116719 INN 123456789 bank code 01234 for consulting" |
| | - text: "Send 150k RUB to ООО Ромашка счет 40817810099910004312 ИНН 987654321 за услуги" |
| | - text: "Show completed transactions from 01.12.2024 to 15.12.2024" |
| | pipeline_tag: token-classification |
| | --- |
| | |
| | # Transactor AIBA - Banking Transaction NER Model |
| |
|
| | ## Model Description |
| |
|
| | **Transactor AIBA** is a multilingual Named Entity Recognition (NER) model fine-tuned on `google-bert/bert-base-multilingual-cased` for extracting entities from banking and financial transaction texts. The model supports both English and Russian languages. |
| |
|
| | ## Intended Use |
| |
|
| | This model is designed to extract key entities from banking transaction requests, including: |
| | - Transaction amounts and currencies |
| | - Account numbers and bank codes |
| | - Tax identification numbers (INN) |
| | - Recipient/sender information |
| | - Transaction purposes |
| | - Dates and time periods |
| |
|
| | ## Entity Types |
| |
|
| | The model recognizes the following entity types: |
| |
|
| | - `amount` |
| | - `bank_code` |
| | - `currency` |
| | - `date` |
| | - `description` |
| | - `end_date` |
| | - `receiver_hr` |
| | - `receiver_inn` |
| | - `receiver_name` |
| | - `start_date` |
| | - `status` |
| |
|
| | ## Training Data |
| |
|
| | - **Base Model**: `google-bert/bert-base-multilingual-cased` |
| | - **Training Samples**: 200,015 |
| | - **Validation Samples**: 35,297 |
| | - **Dataset**: Custom banking transaction dataset with multilingual support |
| |
|
| | ## Training Details |
| |
|
| | - **Epochs**: 5 |
| | - **Batch Size**: 16 |
| | - **Learning Rate**: 2e-5 |
| | - **Optimizer**: AdamW |
| | - **LR Scheduler**: Linear with warmup |
| | - **Framework**: Transformers + PyTorch |
| |
|
| | ## Performance |
| |
|
| | - **Validation F1 Score**: 0.9999 |
| |
|
| | ## Usage |
| |
|
| | ```python |
| | from transformers import AutoTokenizer, AutoModelForTokenClassification |
| | import torch |
| | |
| | # Load model and tokenizer |
| | model_name = "primel/transactor-aiba" |
| | tokenizer = AutoTokenizer.from_pretrained(model_name) |
| | model = AutoModelForTokenClassification.from_pretrained(model_name) |
| | |
| | # Example prediction |
| | def extract_entities(text): |
| | inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128) |
| | |
| | with torch.no_grad(): |
| | outputs = model(**inputs) |
| | predictions = torch.argmax(outputs.logits, dim=2) |
| | |
| | tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'][0]) |
| | predicted_labels = [model.config.id2label[pred.item()] for pred in predictions[0]] |
| | |
| | entities = {} |
| | current_entity = None |
| | current_tokens = [] |
| | |
| | for token, label in zip(tokens, predicted_labels): |
| | if token in ['[CLS]', '[SEP]', '[PAD]']: |
| | continue |
| | |
| | if label.startswith('B-'): |
| | if current_entity and current_tokens: |
| | entity_text = tokenizer.convert_tokens_to_string(current_tokens) |
| | entities[current_entity] = entity_text.strip() |
| | current_entity = label[2:] |
| | current_tokens = [token] |
| | elif label.startswith('I-') and current_entity == label[2:]: |
| | current_tokens.append(token) |
| | else: |
| | if current_entity and current_tokens: |
| | entity_text = tokenizer.convert_tokens_to_string(current_tokens) |
| | entities[current_entity] = entity_text.strip() |
| | current_entity = None |
| | current_tokens = [] |
| | |
| | if current_entity and current_tokens: |
| | entity_text = tokenizer.convert_tokens_to_string(current_tokens) |
| | entities[current_entity] = entity_text.strip() |
| | |
| | return entities |
| | |
| | # Example |
| | text = "Transfer 12.5mln USD to Apex Industries account 27109477752047116719" |
| | print(extract_entities(text)) |
| | ``` |
| |
|
| | ## Example Outputs |
| |
|
| | **Input**: "Transfer 12.5mln USD to Apex Industries account 27109477752047116719 INN 123456789 bank code 01234 for consulting" |
| |
|
| | **Output**: |
| | ```python |
| | { |
| | "amount": "12.5mln", |
| | "currency": "USD", |
| | "receiver_name": "Apex Industries", |
| | "receiver_hr": "27109477752047116719", |
| | "receiver_inn": "123456789", |
| | "receiver_bank_code": "01234", |
| | "purpose": "consulting" |
| | } |
| | ``` |
| |
|
| | ## Limitations |
| |
|
| | - The model is trained on synthetic and curated banking transaction data |
| | - Performance may vary on real-world data with different formatting |
| | - Best results are achieved with transaction texts similar to training distribution |
| | - May require fine-tuning for specific banking systems or regional variations |
| |
|
| | ## License |
| |
|
| | Apache 2.0 |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @misc{transactor-aiba, |
| | author = {Primel}, |
| | title = {Transactor AIBA: Multilingual Banking Transaction NER}, |
| | year = {2025}, |
| | publisher = {Hugging Face}, |
| | howpublished = {\url{https://huggingface.co/primel/transactor-aiba}} |
| | } |
| | ``` |
| |
|