--- language: - en - ru - multilingual license: apache-2.0 tags: - token-classification - ner - named-entity-recognition - banking - transactions - financial - multilingual - bert datasets: - custom metrics: - precision - recall - f1 - seqeval widget: - text: "Transfer 12.5mln USD to Apex Industries account 27109477752047116719 INN 123456789 bank code 01234 for consulting" - text: "Send 150k RUB to ООО Ромашка счет 40817810099910004312 ИНН 987654321 за услуги" - text: "Show completed transactions from 01.12.2024 to 15.12.2024" pipeline_tag: token-classification --- # Transactor AIBA - Banking Transaction NER Model ## Model Description **Transactor AIBA** is a multilingual Named Entity Recognition (NER) model fine-tuned on `google-bert/bert-base-multilingual-cased` for extracting entities from banking and financial transaction texts. The model supports both English and Russian languages. ## Intended Use This model is designed to extract key entities from banking transaction requests, including: - Transaction amounts and currencies - Account numbers and bank codes - Tax identification numbers (INN) - Recipient/sender information - Transaction purposes - Dates and time periods ## Entity Types The model recognizes the following entity types: - `amount` - `bank_code` - `currency` - `date` - `description` - `end_date` - `receiver_hr` - `receiver_inn` - `receiver_name` - `start_date` - `status` ## Training Data - **Base Model**: `google-bert/bert-base-multilingual-cased` - **Training Samples**: 200,015 - **Validation Samples**: 35,297 - **Dataset**: Custom banking transaction dataset with multilingual support ## Training Details - **Epochs**: 5 - **Batch Size**: 16 - **Learning Rate**: 2e-5 - **Optimizer**: AdamW - **LR Scheduler**: Linear with warmup - **Framework**: Transformers + PyTorch ## Performance - **Validation F1 Score**: 0.9999 ## Usage ```python from transformers import AutoTokenizer, AutoModelForTokenClassification import torch # Load model and tokenizer model_name = "primel/transactor-aiba" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForTokenClassification.from_pretrained(model_name) # Example prediction def extract_entities(text): inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128) with torch.no_grad(): outputs = model(**inputs) predictions = torch.argmax(outputs.logits, dim=2) tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'][0]) predicted_labels = [model.config.id2label[pred.item()] for pred in predictions[0]] entities = {} current_entity = None current_tokens = [] for token, label in zip(tokens, predicted_labels): if token in ['[CLS]', '[SEP]', '[PAD]']: continue if label.startswith('B-'): if current_entity and current_tokens: entity_text = tokenizer.convert_tokens_to_string(current_tokens) entities[current_entity] = entity_text.strip() current_entity = label[2:] current_tokens = [token] elif label.startswith('I-') and current_entity == label[2:]: current_tokens.append(token) else: if current_entity and current_tokens: entity_text = tokenizer.convert_tokens_to_string(current_tokens) entities[current_entity] = entity_text.strip() current_entity = None current_tokens = [] if current_entity and current_tokens: entity_text = tokenizer.convert_tokens_to_string(current_tokens) entities[current_entity] = entity_text.strip() return entities # Example text = "Transfer 12.5mln USD to Apex Industries account 27109477752047116719" print(extract_entities(text)) ``` ## Example Outputs **Input**: "Transfer 12.5mln USD to Apex Industries account 27109477752047116719 INN 123456789 bank code 01234 for consulting" **Output**: ```python { "amount": "12.5mln", "currency": "USD", "receiver_name": "Apex Industries", "receiver_hr": "27109477752047116719", "receiver_inn": "123456789", "receiver_bank_code": "01234", "purpose": "consulting" } ``` ## Limitations - The model is trained on synthetic and curated banking transaction data - Performance may vary on real-world data with different formatting - Best results are achieved with transaction texts similar to training distribution - May require fine-tuning for specific banking systems or regional variations ## License Apache 2.0 ## Citation ```bibtex @misc{transactor-aiba, author = {Primel}, title = {Transactor AIBA: Multilingual Banking Transaction NER}, year = {2025}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/primel/transactor-aiba}} } ```