Upload folder using huggingface_hub

Files changed (8) hide show

.ruff_cache/.gitignore ADDED Viewed


1	+ # Automatically created by ruff.
2	+ *

.ruff_cache/0.14.9/14058212920099261697 ADDED Viewed

Binary file (94 Bytes). View file

.ruff_cache/CACHEDIR.TAG ADDED Viewed

	@@ -0,0 +1 @@


1	+ Signature: 8a477f597d28d172789f06886806bc55

BiLSTMClassifier.py ADDED Viewed

+import torch
+import torch.nn as nn
+class BiLSTMClassifier(nn.Module):
+    def __init__(
+        self,
+        vocab_size,
+        embedding_dim,
+        hidden_size,
+        num_layers=1,
+        dropout=0.2,
+        **kwargs,
+    ):
+        super().__init__()
+        self.embedding = nn.Embedding(vocab_size, embedding_dim)
+        self.lstm = nn.LSTM(
+            input_size=embedding_dim,
+            hidden_size=hidden_size,
+            num_layers=num_layers,
+            batch_first=True,
+            bidirectional=True,
+            dropout=dropout if num_layers > 1 else 0.0,
+        )
+        self.fc = nn.Linear(hidden_size * 2, 1)
+    def forward(self, x):
+        x = self.embedding(x)
+        outputs, (h_n, c_n) = self.lstm(x)
+        h_fwd = h_n[-2, :, :]
+        h_bwd = h_n[-1, :, :]
+        h_final = torch.cat((h_fwd, h_bwd), dim=1)
+        logits = self.fc(h_final)
+        return logits

BiLSTMClassifier.safetensors ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:2c835a0dfca19f68ba9f319433eff472c9adf9716fd19807cdd7a262d7045aff
+size 17618180

README.md CHANGED Viewed

@@ -1,3 +1,38 @@
----
-license: mit
----

+# BiLSTM Text Classifier
+Simple BiLSTM model PyTorch trained for SPAM detection on SMS Span collection (Almeida, Tiago and Jos Hidalgo. 2011. SMS Spam Collection. UCI Machine Learning Repository. https://doi.org/10.24432/C5CC84.).
+## Important Notes
+- The model returns the logits as output, so in order to get the probability pass the output to `torch.sigmoid`.
+- The model use `bert-base-uncased` tokenizer
+## Files
+- `BiLSTMClassifier.safetensors`: trained weights
+- `BiLSTMClassifier.py`: model definition
+- `config.json`: hyperparameters
+## Usage
+```python
+import json
+import torch
+from transformers import BertTokenizer
+from safetensors.torch import load_file
+from BiLSTMClassifier import BiLSTMClassifier
+with open("config.json") as f:
+    cfg = json.load(f)
+model = BiLSTMClassifier(**cfg)
+state_dict = load_file("BiLSTMClassifier.safetensors")
+model.load_state_dict(state_dict)
+model.eval()
+sample_text = "URGENT HIRING! Earn $500/day working from home. No experience needed. Apply here: www.somenthing.io/hiring"
+tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
+tokens = tokenizer(sample_text, return_tensors="pt")
+logits = model(tokens["input_ids"])
+p = torch.sigmoid(logits)
+```

__pycache__/BiLSTMClassifier.cpython-310.pyc ADDED Viewed

Binary file (1.21 kB). View file

config.json ADDED Viewed

+{
+  "model_type": "bilstm",
+  "framework": "pytorch",
+  "task": "text-classification",
+  "vocab_size": 30522,
+  "embedding_dim": 128,
+  "hidden_size": 64,
+  "num_layers": 5,
+  "bidirectional": true,
+  "dropout": 0.2,
+  "num_classes": 2
+}