lanretto
/

shakespeare-authenticator

@@ -16,78 +16,176 @@ tags: []
 <!-- Provide a longer summary of what this model is. -->
 This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [Lanre Moluga]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 ### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
 ### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
 ## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
 ### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
 ## How to Get Started with the Model
 Use the code below to get started with the model.
-[More Information Needed]
 ## Training Details
 ### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
 ### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
 #### Training Hyperparameters
@@ -102,8 +200,26 @@ Use the code below to get started with the model.
 ## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
 ### Testing Data, Factors & Metrics
 #### Testing Data

 <!-- Provide a longer summary of what this model is. -->
 This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
+---
+language:
+- en
+tags:
+- text-classification
+- shakespeare
+- nlp
+- bert
+- transformers
+- literary-analysis
+pipeline_tag: text-classification
+widget:
+- text: "To be or not to be, that is the question"
+  example_title: "Hamlet"
+- text: "Friends, Romans, countrymen, lend me your ears"
+  example_title: "Julius Caesar"
+- text: "The meeting is scheduled for 2 PM tomorrow"
+  example_title: "Modern Text"
+---
+# Shakespeare Authenticator
+## Model Description
+A BERT-based model fine-tuned to distinguish authentic Shakespearean text from modern imitations and synthetic Shakespearean-style writing.
+- **Developed by:** Lanre Moluga
+- **Model type:** BERT for Sequence Classification
+- **Language(s):** English (Early Modern English & Contemporary English)
+- **License:** MIT
+- **Finetuned from model:** `bert-base-uncased`
+- **Repository:** [GitHub Repository Link - Optional]
+## Model Sources
+- **Repository:** [Your GitHub repo if available]
+- **Demo:** [https://huggingface.co/spaces/lanretto/shakespeare-authenticator]
+## Uses
 ### Direct Use
+This model is designed for binary text classification to determine whether a given text sample is authentic Shakespearean writing or a modern creation/imitation.
+```python
+from transformers import pipeline
+classifier = pipeline("text-classification", model="lanretto/shakespeare-authenticator")
+result = classifier("To be or not to be, that is the question")
+print(result)
+Downstream Use [optional]
+Literary analysis and research tools
+Educational applications for Shakespeare studies
+Content moderation for Shakespearean text databases
+Style transfer evaluation
+Digital humanities research
 ### Out-of-Scope Use
+Classification of non-English text
+Professional literary authentication without human verification
+Legal or academic authentication purposes
+Texts from other historical periods or authors
 ## Bias, Risks, and Limitations
+Temporal Bias: Model is trained specifically on Shakespearean vs modern text, not other historical periods
+Style Limitations: May misclassify high-quality modern Shakespearean imitations
+Length Sensitivity: Performance may vary with very short text fragments
+Genre Limitations: Primarily trained on dramatic dialogue, may perform differently on poetry or prose
+Cultural Context: Limited to English language and Western literary traditions
 ### Recommendations
+Users should:
+Verify critical classifications with human experts
+Use longer text samples for more reliable predictions
+Consider the model as a supplementary tool rather than definitive authentication
+Be aware of potential false positives with sophisticated modern imitations
 ## How to Get Started with the Model
 Use the code below to get started with the model.
+# Install required packages
+# pip install transformers torch
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+# Load model and tokenizer
+model_name = "lanretto/shakespeare-authenticator"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForSequenceClassification.from_pretrained(model_name)
+# Example prediction
+text = "Shall I compare thee to a summer's day?"
+inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
+with torch.no_grad():
+    outputs = model(**inputs)
+    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
+    predicted_class = torch.argmax(predictions, dim=-1).item()
+labels = {0: "Modern Creation", 1: "Authentic Shakespeare"}
+print(f"Prediction: {labels[predicted_class]}")
+print(f"Confidence: {predictions[0][predicted_class]:.2%}")
 ## Training Details
 ### Training Data
+Total Samples: ~400,000 text samples
+Authentic Shakespeare: ~108,000 lines from Shakespearean plays
+Modern Dialogue: ~300,000 lines from modern movie scripts
+Train/Validation/Test Split: 80%/10%/10%
+Class Distribution: ~26% Shakespeare, ~74% Modern
 ### Training Procedure
+Preprocessing
+Text normalization and cleaning
+Tokenization using BERT tokenizer (bert-base-uncased)
+Maximum sequence length: 512 tokens
+Dynamic padding during training
+Training Hyperparameters
+Training regime: Mixed precision training
+Optimizer: AdamW
+Learning Rate: 2e-5
+Batch Size: 128 (with gradient accumulation)
+Epochs: 3
+Weight Decay: 0.01
+Warmup Ratio: 0.1
+Speeds, Sizes, Times
+Model Size: 438 MB
+Training Time: ~2 hours on 1x Tesla T4 GPU
+Inference Speed: ~100 samples/second on CPU
 #### Training Hyperparameters
 ## Evaluation
+Testing Data & Metrics
+Testing Data
+Test Set Size: ~40,000 samples
+Class Distribution: Representative of training distribution
+Data Source: Held-out from original dataset
+Metrics
+Accuracy: 84.7%
+F1 Score: 0.8928
+Precision (Shakespeare): 0.8619
+Recall (Shakespeare): 0.8300
+Precision (Modern): 0.8321
+Recall (Modern): 0.8642
 ### Testing Data, Factors & Metrics
 #### Testing Data