Text Classification
Transformers
Safetensors
distilbert
sentiment-analysis
new-closed-neutral
colab
text-embeddings-inference
Instructions to use virustechhacks/distil-bert-classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use virustechhacks/distil-bert-classifier with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="virustechhacks/distil-bert-classifier")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("virustechhacks/distil-bert-classifier") model = AutoModelForSequenceClassification.from_pretrained("virustechhacks/distil-bert-classifier") - Notebooks
- Google Colab
- Kaggle
File size: 2,971 Bytes
87182e6 3662fa6 87182e6 3662fa6 87182e6 3662fa6 87182e6 3662fa6 87182e6 3662fa6 87182e6 3662fa6 87182e6 3662fa6 87182e6 3662fa6 87182e6 3662fa6 87182e6 3662fa6 87182e6 3662fa6 87182e6 3662fa6 87182e6 3662fa6 87182e6 3662fa6 87182e6 3662fa6 87182e6 3662fa6 87182e6 3662fa6 87182e6 3662fa6 87182e6 3662fa6 87182e6 3662fa6 87182e6 3662fa6 87182e6 1bbe5f5 3662fa6 87182e6 1bbe5f5 87182e6 3662fa6 87182e6 3662fa6 87182e6 1bbe5f5 3662fa6 87182e6 1bbe5f5 3662fa6 1bbe5f5 87182e6 1bbe5f5 3662fa6 87182e6 3662fa6 87182e6 3662fa6 87182e6 3662fa6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 | ---
library_name: transformers
tags:
- text-classification
- distilbert
- sentiment-analysis
- new-closed-neutral
- colab
---
# π Model Card: distil-bert-classifier
This model is a fine-tuned DistilBERT model for sequence classification, designed to identify whether a place (e.g., restaurants, businesses) is **NEW**, **CLOSED**, or **NEUTRAL** based on short text snippets.
---
## π§ Model Details
### Model Description
- **Base Model:** `distilbert-base-uncased`
- **Task:** Sequence Classification
- **Classes:** `NEW`, `CLOSED`, `NEUTRAL`
- **Language:** English
- **License:** MIT *(confirm if needed)*
- **Developer:** virustechhacks
This model helps extract signals about business status from textual data such as reviews, posts, or headlines.
---
## π Model Sources
- **Repository:** https://huggingface.co/virustechhacks/distil-bert-classifier
---
## π Uses
### β
Direct Use
Classify short text snippets into:
- `NEW` β Newly opened places
- `CLOSED` β Shut down or no longer operating
- `NEUTRAL` β No clear status signal
### π Downstream Use
Outputs can be aggregated into features like:
- `closed_signal_ratio`
- `new_signal_ratio`
- `mention_count`
These can feed into larger ML pipelines (e.g., XGBoost models).
### β οΈ Out-of-Scope
- General sentiment analysis beyond defined labels
- Non-English text
- Long documents (>128 tokens)
- High-stakes decision-making systems
---
## β οΈ Bias, Risks, and Limitations
- **Synthetic Data Bias:**
Trained on rule-based synthetic data β may not generalize well to real-world language.
- **No Time Awareness:**
Cannot distinguish *recent vs outdated* signals.
- **Token Limit:**
Inputs >128 tokens are truncated.
---
## π‘ Recommendations
For production use:
- Fine-tune on real-world datasets
- Add timestamp-based features
- Evaluate thoroughly on live data
---
## π οΈ How to Use
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import torch.nn.functional as F
repo_name = "virustechhacks/distil-bert-classifier"
tokenizer = AutoTokenizer.from_pretrained(repo_name)
model = AutoModelForSequenceClassification.from_pretrained(repo_name)
id_to_label = {0: "NEW", 1: "CLOSED", 2: "NEUTRAL"}
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
def predict_status(text):
inputs = tokenizer(
text,
truncation=True,
padding="max_length",
max_length=128,
return_tensors="pt"
)
inputs = {k: v.to(device) for k, v in inputs.items()}
with torch.no_grad():
outputs = model(**inputs)
probs = F.softmax(outputs.logits, dim=-1)
confidence, pred = torch.max(probs, dim=1)
return id_to_label[pred.item()], confidence.item()
# Example
print(predict_status("Grand opening this weekend!"))
print(predict_status("The store ceased operations.")) |