File size: 2,971 Bytes
87182e6
 
3662fa6
 
 
 
 
 
87182e6
 
3662fa6
87182e6
3662fa6
87182e6
3662fa6
 
 
87182e6
 
 
3662fa6
 
 
 
 
 
 
 
 
 
 
 
 
 
87182e6
3662fa6
 
 
87182e6
3662fa6
87182e6
3662fa6
 
 
 
87182e6
3662fa6
87182e6
3662fa6
 
 
 
87182e6
3662fa6
87182e6
3662fa6
87182e6
3662fa6
 
 
 
87182e6
3662fa6
87182e6
3662fa6
87182e6
3662fa6
 
87182e6
3662fa6
 
87182e6
3662fa6
 
87182e6
3662fa6
87182e6
3662fa6
 
 
 
 
 
 
 
87182e6
3662fa6
87182e6
1bbe5f5
 
 
3662fa6
87182e6
1bbe5f5
87182e6
3662fa6
 
87182e6
3662fa6
87182e6
1bbe5f5
3662fa6
87182e6
1bbe5f5
3662fa6
 
 
 
 
 
 
1bbe5f5
87182e6
1bbe5f5
3662fa6
87182e6
3662fa6
 
87182e6
3662fa6
87182e6
3662fa6
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
---
library_name: transformers
tags:
  - text-classification
  - distilbert
  - sentiment-analysis
  - new-closed-neutral
  - colab
---

# πŸ“Œ Model Card: distil-bert-classifier

This model is a fine-tuned DistilBERT model for sequence classification, designed to identify whether a place (e.g., restaurants, businesses) is **NEW**, **CLOSED**, or **NEUTRAL** based on short text snippets.

---

## 🧠 Model Details

### Model Description

- **Base Model:** `distilbert-base-uncased`  
- **Task:** Sequence Classification  
- **Classes:** `NEW`, `CLOSED`, `NEUTRAL`  
- **Language:** English  
- **License:** MIT *(confirm if needed)*  
- **Developer:** virustechhacks  

This model helps extract signals about business status from textual data such as reviews, posts, or headlines.

---

## πŸ”— Model Sources

- **Repository:** https://huggingface.co/virustechhacks/distil-bert-classifier  

---

## πŸš€ Uses

### βœ… Direct Use

Classify short text snippets into:
- `NEW` β†’ Newly opened places  
- `CLOSED` β†’ Shut down or no longer operating  
- `NEUTRAL` β†’ No clear status signal  

### πŸ”„ Downstream Use

Outputs can be aggregated into features like:
- `closed_signal_ratio`
- `new_signal_ratio`
- `mention_count`

These can feed into larger ML pipelines (e.g., XGBoost models).

### ⚠️ Out-of-Scope

- General sentiment analysis beyond defined labels  
- Non-English text  
- Long documents (>128 tokens)  
- High-stakes decision-making systems  

---

## ⚠️ Bias, Risks, and Limitations

- **Synthetic Data Bias:**  
  Trained on rule-based synthetic data β†’ may not generalize well to real-world language.

- **No Time Awareness:**  
  Cannot distinguish *recent vs outdated* signals.

- **Token Limit:**  
  Inputs >128 tokens are truncated.

---

## πŸ’‘ Recommendations

For production use:
- Fine-tune on real-world datasets  
- Add timestamp-based features  
- Evaluate thoroughly on live data  

---

## πŸ› οΈ How to Use

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import torch.nn.functional as F

repo_name = "virustechhacks/distil-bert-classifier"

tokenizer = AutoTokenizer.from_pretrained(repo_name)
model = AutoModelForSequenceClassification.from_pretrained(repo_name)

id_to_label = {0: "NEW", 1: "CLOSED", 2: "NEUTRAL"}

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

def predict_status(text):
    inputs = tokenizer(
        text,
        truncation=True,
        padding="max_length",
        max_length=128,
        return_tensors="pt"
    )
    inputs = {k: v.to(device) for k, v in inputs.items()}

    with torch.no_grad():
        outputs = model(**inputs)

    probs = F.softmax(outputs.logits, dim=-1)
    confidence, pred = torch.max(probs, dim=1)

    return id_to_label[pred.item()], confidence.item()

# Example
print(predict_status("Grand opening this weekend!"))
print(predict_status("The store ceased operations."))