SIA-MerchentName / README.md
GalalEwida's picture
Upload folder using huggingface_hub
85d4f56 verified
---
library_name: transformers
pipeline_tag: token-classification
tags:
- named-entity-recognition
- merchant-extraction
- finance
- pytorch
language:
- en
---
# Merchant Name Extraction Model
This model extracts merchant names from transaction descriptions using Named Entity Recognition (NER).
## Model Details
- **Model Type**: DistilBERT for Token Classification
- **Task**: Merchant Name Extraction
- **Language**: English
- **Framework**: PyTorch + Transformers
## Usage
```python
from transformers import DistilBertTokenizerFast, DistilBertForTokenClassification
import torch
# Load model and tokenizer
model = DistilBertForTokenClassification.from_pretrained("GalalEwida/SIA-MerchentName")
tokenizer = DistilBertTokenizerFast.from_pretrained("GalalEwida/SIA-MerchentName")
# Prediction function
def extract_merchant(text):
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.argmax(outputs.logits, dim=2)
tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'][0])
id2label = {0: 'O', 1: 'B-MERCHANT', 2: 'I-MERCHANT'}
predicted_labels = [id2label[pred.item()] for pred in predictions[0]]
merchant_tokens = []
for token, label in zip(tokens, predicted_labels):
if label in ['B-MERCHANT', 'I-MERCHANT']:
if token.startswith('##'):
if merchant_tokens:
merchant_tokens[-1] += token[2:]
else:
merchant_tokens.append(token)
return ' '.join(merchant_tokens)
# Example usage
text = "WALMART SUPERCENTER #1234 ANYTOWN US"
merchant = extract_merchant(text)
print(f"Extracted: {merchant}")
```
## Labels
- `O`: Outside (not part of merchant name)
- `B-MERCHANT`: Beginning of merchant name
- `I-MERCHANT`: Inside merchant name
## Example Predictions
| Input | Extracted Merchant |
|-------|-------------------|
| WALMART SUPERCENTER #1234 ANYTOWN US | WALMART |
| AMAZON.COM AMZN.COM/BILL WA | AMAZON |
| STARBUCKS STORE #0123 NEW YORK NY | STARBUCKS |