|
|
--- |
|
|
library_name: transformers |
|
|
pipeline_tag: token-classification |
|
|
tags: |
|
|
- named-entity-recognition |
|
|
- merchant-extraction |
|
|
- finance |
|
|
- pytorch |
|
|
language: |
|
|
- en |
|
|
--- |
|
|
|
|
|
# Merchant Name Extraction Model |
|
|
|
|
|
This model extracts merchant names from transaction descriptions using Named Entity Recognition (NER). |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Model Type**: DistilBERT for Token Classification |
|
|
- **Task**: Merchant Name Extraction |
|
|
- **Language**: English |
|
|
- **Framework**: PyTorch + Transformers |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import DistilBertTokenizerFast, DistilBertForTokenClassification |
|
|
import torch |
|
|
|
|
|
# Load model and tokenizer |
|
|
model = DistilBertForTokenClassification.from_pretrained("GalalEwida/SIA-MerchentName") |
|
|
tokenizer = DistilBertTokenizerFast.from_pretrained("GalalEwida/SIA-MerchentName") |
|
|
|
|
|
# Prediction function |
|
|
def extract_merchant(text): |
|
|
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128) |
|
|
|
|
|
with torch.no_grad(): |
|
|
outputs = model(**inputs) |
|
|
predictions = torch.argmax(outputs.logits, dim=2) |
|
|
|
|
|
tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'][0]) |
|
|
id2label = {0: 'O', 1: 'B-MERCHANT', 2: 'I-MERCHANT'} |
|
|
predicted_labels = [id2label[pred.item()] for pred in predictions[0]] |
|
|
|
|
|
merchant_tokens = [] |
|
|
for token, label in zip(tokens, predicted_labels): |
|
|
if label in ['B-MERCHANT', 'I-MERCHANT']: |
|
|
if token.startswith('##'): |
|
|
if merchant_tokens: |
|
|
merchant_tokens[-1] += token[2:] |
|
|
else: |
|
|
merchant_tokens.append(token) |
|
|
|
|
|
return ' '.join(merchant_tokens) |
|
|
|
|
|
# Example usage |
|
|
text = "WALMART SUPERCENTER #1234 ANYTOWN US" |
|
|
merchant = extract_merchant(text) |
|
|
print(f"Extracted: {merchant}") |
|
|
``` |
|
|
|
|
|
## Labels |
|
|
|
|
|
- `O`: Outside (not part of merchant name) |
|
|
- `B-MERCHANT`: Beginning of merchant name |
|
|
- `I-MERCHANT`: Inside merchant name |
|
|
|
|
|
## Example Predictions |
|
|
|
|
|
| Input | Extracted Merchant | |
|
|
|-------|-------------------| |
|
|
| WALMART SUPERCENTER #1234 ANYTOWN US | WALMART | |
|
|
| AMAZON.COM AMZN.COM/BILL WA | AMAZON | |
|
|
| STARBUCKS STORE #0123 NEW YORK NY | STARBUCKS | |
|
|
|
|
|
|