--- library_name: transformers tags: [] --- # Model Card for Model ID ## Model Details ### Model Description This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. --- language: - en tags: - text-classification - shakespeare - nlp - bert - transformers - literary-analysis pipeline_tag: text-classification widget: - text: "To be or not to be, that is the question" example_title: "Hamlet" - text: "Friends, Romans, countrymen, lend me your ears" example_title: "Julius Caesar" - text: "The meeting is scheduled for 2 PM tomorrow" example_title: "Modern Text" --- # Shakespeare Authenticator ## Model Description A BERT-based model fine-tuned to distinguish authentic Shakespearean text from modern imitations and synthetic Shakespearean-style writing. - **Developed by:** Lanre Moluga - **Model type:** BERT for Sequence Classification - **Language(s):** English (Early Modern English & Contemporary English) - **License:** MIT - **Finetuned from model:** `bert-base-uncased` - **Repository:** [GitHub Repository Link - Optional] ## Model Sources - **Repository:** [Your GitHub repo if available] - **Demo:** [https://huggingface.co/spaces/lanretto/shakespeare-authenticator] ## Uses ### Direct Use This model is designed for binary text classification to determine whether a given text sample is authentic Shakespearean writing or a modern creation/imitation. ```python from transformers import pipeline classifier = pipeline("text-classification", model="lanretto/shakespeare-authenticator") result = classifier("To be or not to be, that is the question") print(result) Downstream Use [optional] Literary analysis and research tools Educational applications for Shakespeare studies Content moderation for Shakespearean text databases Style transfer evaluation Digital humanities research ### Out-of-Scope Use Classification of non-English text Professional literary authentication without human verification Legal or academic authentication purposes Texts from other historical periods or authors ## Bias, Risks, and Limitations Temporal Bias: Model is trained specifically on Shakespearean vs modern text, not other historical periods Style Limitations: May misclassify high-quality modern Shakespearean imitations Length Sensitivity: Performance may vary with very short text fragments Genre Limitations: Primarily trained on dramatic dialogue, may perform differently on poetry or prose Cultural Context: Limited to English language and Western literary traditions ### Recommendations Users should: Verify critical classifications with human experts Use longer text samples for more reliable predictions Consider the model as a supplementary tool rather than definitive authentication Be aware of potential false positives with sophisticated modern imitations ## How to Get Started with the Model Use the code below to get started with the model. # Install required packages # pip install transformers torch from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch # Load model and tokenizer model_name = "lanretto/shakespeare-authenticator" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) # Example prediction text = "Shall I compare thee to a summer's day?" inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True) with torch.no_grad(): outputs = model(**inputs) predictions = torch.nn.functional.softmax(outputs.logits, dim=-1) predicted_class = torch.argmax(predictions, dim=-1).item() labels = {0: "Modern Creation", 1: "Authentic Shakespeare"} print(f"Prediction: {labels[predicted_class]}") print(f"Confidence: {predictions[0][predicted_class]:.2%}") ## Training Details ### Training Data Total Samples: ~400,000 text samples Authentic Shakespeare: ~108,000 lines from Shakespearean plays Modern Dialogue: ~300,000 lines from modern movie scripts Train/Validation/Test Split: 80%/10%/10% Class Distribution: ~26% Shakespeare, ~74% Modern ### Training Procedure Preprocessing Text normalization and cleaning Tokenization using BERT tokenizer (bert-base-uncased) Maximum sequence length: 512 tokens Dynamic padding during training Training Hyperparameters Training regime: Mixed precision training Optimizer: AdamW Learning Rate: 2e-5 Batch Size: 128 (with gradient accumulation) Epochs: 3 Weight Decay: 0.01 Warmup Ratio: 0.1 Speeds, Sizes, Times Model Size: 438 MB Training Time: ~2 hours on 1x Tesla T4 GPU Inference Speed: ~100 samples/second on CPU #### Training Hyperparameters - **Training regime:** [More Information Needed] #### Speeds, Sizes, Times [optional] [More Information Needed] ## Evaluation Testing Data & Metrics Testing Data Test Set Size: ~40,000 samples Class Distribution: Representative of training distribution Data Source: Held-out from original dataset Metrics Accuracy: 84.7% F1 Score: 0.8928 Precision (Shakespeare): 0.8619 Recall (Shakespeare): 0.8300 Precision (Modern): 0.8321 Recall (Modern): 0.8642 ### Testing Data, Factors & Metrics #### Testing Data [More Information Needed] #### Factors [More Information Needed] #### Metrics [More Information Needed] ### Results [More Information Needed] #### Summary ## Model Examination [optional] [More Information Needed] ## Environmental Impact Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). - **Hardware Type:** [More Information Needed] - **Hours used:** [More Information Needed] - **Cloud Provider:** [More Information Needed] - **Compute Region:** [More Information Needed] - **Carbon Emitted:** [More Information Needed] ## Technical Specifications [optional] ### Model Architecture and Objective [More Information Needed] ### Compute Infrastructure [More Information Needed] #### Hardware [More Information Needed] #### Software [More Information Needed] ## Citation [optional] **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional] [More Information Needed] ## More Information [optional] [More Information Needed] ## Model Card Authors [optional] [More Information Needed] ## Model Card Contact [More Information Needed]