Upload folder using huggingface_hub

Browse files

Files changed (7) hide show

README.md +63 -0
config.json +16 -0
pytorch_model.bin +3 -0
special_tokens_map.json +7 -0
tokenizer.json +0 -0
tokenizer_config.json +56 -0
vocab.txt +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,63 @@

+---
+language: en
+license: mit
+library_name: pytorch
+tags:
+- shakespeare
+- text-classification
+- bert
+- pytorch
+- nlp
+datasets:
+- lanretto/shakespeare-vs-modern-dialogue
+---
+# 🎭 Shakespeare Authenticator - PyTorch Implementation
+## Model Description
+This is a **PyTorch manual implementation** of the Shakespeare Authenticator model, distinguishing between authentic Shakespearean text and modern writing. This model was built from scratch using raw PyTorch (without Hugging Face Trainer) for educational purposes.
+## Model Performance
+| Metric | Value |
+|--------|-------|
+| **Accuracy** | 0.9835 |
+| **F1-Score** | 0.9685 |
+| **Test Samples** | 40,626 |
+| **Avg Confidence** | 0.9938 |
+### Comparison with Original Implementation
+| Model | Accuracy | F1-Score |
+|-------|----------|----------|
+| Original (HF Trainer) | 0.9820 | 0.9658 |
+| **PyTorch Manual** | **0.9835** | **0.9685** |
+## Training Details
+- **Architecture**: BERT-base + Custom Classification Head
+- **Training Approach**: Manual PyTorch training loop
+- **Learning Rates**: BERT (2e-5), Classifier (1e-4)
+- **Epochs**: 3
+- **Batch Size**: 128
+- **Best Epoch**: 3
+- **Best Validation Accuracy**: 0.9849
+## Model Architecture
+```python
+class ShakespeareClassifier(nn.Module):
+    def __init__(self, bert_model, num_classes=2, dropout_rate=0.1):
+        super().__init__()
+        self.bert = bert_model
+        self.dropout = nn.Dropout(dropout_rate)
+        self.classifier = nn.Linear(768, num_classes)
+    def forward(self, input_ids, attention_mask):
+        outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask)
+        pooled_output = outputs.pooler_output
+        x = self.dropout(pooled_output)
+        logits = self.classifier(x)
+        return logits
+```

config.json ADDED Viewed

	@@ -0,0 +1,16 @@

+{
+  "bert_model_name": "bert-base-uncased",
+  "hidden_size": 768,
+  "num_labels": 2,
+  "id2label": {
+    "0": "Modern Creation",
+    "1": "Authentic Shakespeare"
+  },
+  "label2id": {
+    "Modern Creation": 0,
+    "Authentic Shakespeare": 1
+  },
+  "architecture": "BertForSequenceClassification",
+  "pytorch_version": "2.9.1+cu128",
+  "transformers_version": "4.57.3"
+}

pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4e08d518fc143d0fa1facb505a0b378d7f8c341f568a463d132650ff5cb113b2
+size 438018375

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "cls_token": "[CLS]",
+  "mask_token": "[MASK]",
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "unk_token": "[UNK]"
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,56 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "clean_up_tokenization_spaces": false,
+  "cls_token": "[CLS]",
+  "do_lower_case": true,
+  "extra_special_tokens": {},
+  "mask_token": "[MASK]",
+  "model_max_length": 512,
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "unk_token": "[UNK]"
+}

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff