Upload Transactor AIBA - Multilingual Banking Transaction NER Model

95c3971 verified 4 months ago

4.85 kB

	---
	language:
	- en
	- ru
	- multilingual
	license: apache-2.0
	tags:
	- token-classification
	- ner
	- named-entity-recognition
	- banking
	- transactions
	- financial
	- multilingual
	- bert
	datasets:
	- custom
	metrics:
	- precision
	- recall
	- f1
	- seqeval
	widget:
	- text: "Transfer 12.5mln USD to Apex Industries account 27109477752047116719 INN 123456789 bank code 01234 for consulting"
	- text: "Send 150k RUB to ООО Ромашка счет 40817810099910004312 ИНН 987654321 за услуги"
	- text: "Show completed transactions from 01.12.2024 to 15.12.2024"
	pipeline_tag: token-classification
	---

	# Transactor AIBA - Banking Transaction NER Model

	## Model Description

	Transactor AIBA is a multilingual Named Entity Recognition (NER) model fine-tuned on `google-bert/bert-base-multilingual-cased` for extracting entities from banking and financial transaction texts. The model supports both English and Russian languages.

	## Intended Use

	This model is designed to extract key entities from banking transaction requests, including:
	- Transaction amounts and currencies
	- Account numbers and bank codes
	- Tax identification numbers (INN)
	- Recipient/sender information
	- Transaction purposes
	- Dates and time periods

	## Entity Types

	The model recognizes the following entity types:

	- `amount`
	- `bank_code`
	- `currency`
	- `date`
	- `description`
	- `end_date`
	- `receiver_hr`
	- `receiver_inn`
	- `receiver_name`
	- `start_date`
	- `status`

	## Training Data

	- Base Model: `google-bert/bert-base-multilingual-cased`
	- Training Samples: 200,015
	- Validation Samples: 35,297
	- Dataset: Custom banking transaction dataset with multilingual support

	## Training Details

	- Epochs: 5
	- Batch Size: 16
	- Learning Rate: 2e-5
	- Optimizer: AdamW
	- LR Scheduler: Linear with warmup
	- Framework: Transformers + PyTorch

	## Performance

	- Validation F1 Score: 0.9999

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForTokenClassification
	import torch

	# Load model and tokenizer
	model_name = "primel/transactor-aiba"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForTokenClassification.from_pretrained(model_name)

	# Example prediction
	def extract_entities(text):
	inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)

	with torch.no_grad():
	outputs = model(**inputs)
	predictions = torch.argmax(outputs.logits, dim=2)

	tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'][0])
	predicted_labels = [model.config.id2label[pred.item()] for pred in predictions[0]]

	entities = {}
	current_entity = None
	current_tokens = []

	for token, label in zip(tokens, predicted_labels):
	if token in ['[CLS]', '[SEP]', '[PAD]']:
	continue

	if label.startswith('B-'):
	if current_entity and current_tokens:
	entity_text = tokenizer.convert_tokens_to_string(current_tokens)
	entities[current_entity] = entity_text.strip()
	current_entity = label[2:]
	current_tokens = [token]
	elif label.startswith('I-') and current_entity == label[2:]:
	current_tokens.append(token)
	else:
	if current_entity and current_tokens:
	entity_text = tokenizer.convert_tokens_to_string(current_tokens)
	entities[current_entity] = entity_text.strip()
	current_entity = None
	current_tokens = []

	if current_entity and current_tokens:
	entity_text = tokenizer.convert_tokens_to_string(current_tokens)
	entities[current_entity] = entity_text.strip()

	return entities

	# Example
	text = "Transfer 12.5mln USD to Apex Industries account 27109477752047116719"
	print(extract_entities(text))
	```

	## Example Outputs

	Input: "Transfer 12.5mln USD to Apex Industries account 27109477752047116719 INN 123456789 bank code 01234 for consulting"

	Output:
	```python
	{
	"amount": "12.5mln",
	"currency": "USD",
	"receiver_name": "Apex Industries",
	"receiver_hr": "27109477752047116719",
	"receiver_inn": "123456789",
	"receiver_bank_code": "01234",
	"purpose": "consulting"
	}
	```

	## Limitations

	- The model is trained on synthetic and curated banking transaction data
	- Performance may vary on real-world data with different formatting
	- Best results are achieved with transaction texts similar to training distribution
	- May require fine-tuning for specific banking systems or regional variations

	## License

	Apache 2.0

	## Citation

	```bibtex
	@misc{transactor-aiba,
	author = {Primel},
	title = {Transactor AIBA: Multilingual Banking Transaction NER},
	year = {2025},
	publisher = {Hugging Face},
	howpublished = {\url{https://huggingface.co/primel/transactor-aiba}}
	}
	```