Instructions to use raghavnimbalkar/gpt2-screenplay-generator with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use raghavnimbalkar/gpt2-screenplay-generator with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="raghavnimbalkar/gpt2-screenplay-generator")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("raghavnimbalkar/gpt2-screenplay-generator") model = AutoModelForCausalLM.from_pretrained("raghavnimbalkar/gpt2-screenplay-generator") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use raghavnimbalkar/gpt2-screenplay-generator with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "raghavnimbalkar/gpt2-screenplay-generator" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "raghavnimbalkar/gpt2-screenplay-generator", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/raghavnimbalkar/gpt2-screenplay-generator
- SGLang
How to use raghavnimbalkar/gpt2-screenplay-generator with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "raghavnimbalkar/gpt2-screenplay-generator" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "raghavnimbalkar/gpt2-screenplay-generator", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "raghavnimbalkar/gpt2-screenplay-generator" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "raghavnimbalkar/gpt2-screenplay-generator", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use raghavnimbalkar/gpt2-screenplay-generator with Docker Model Runner:
docker model run hf.co/raghavnimbalkar/gpt2-screenplay-generator
- GPT-2 Small — Screenplay Scriptwriting Model
GPT-2 Small — Screenplay Scriptwriting Model
Study Context: This is the first model in a dual-architecture comparative study on screenplay generation using GPT-2 Small. This model is a full-parameter fine-tune executed on a cloud NVIDIA T4 GPU. The second model is a LoRA adapter trained entirely on consumer edge hardware — an Apple Silicon MacBook Air — using PEFT to operate within the hard constraints of a fanless, unified-memory device.
A fully fine-tuned GPT-2 Small (124M) causal language model trained end-to-end on ~94M tokens of professional screenplay corpora, with stateful MLOps checkpoint recovery from a mid-run hardware preemption event.
Model Description
This model is a full-parameter fine-tune of OpenAI's GPT-2 Small (124M parameters) for the task of Causal Language Modeling with a specialization in screenplay and script generation. Every one of the 124,439,808 parameters was unfrozen and updated during training — this is not a LoRA, adapter, or PEFT-based model. All weights have been fully overwritten from the base GPT-2 checkpoint.
The model has internalized the highly structured formatting conventions of professional screenplays: scene slugs (INT./EXT.), character action lines, dialogue blocks, parentheticals, and production draft metadata — making it capable of generating coherent, industry-formatted script content from open-ended prompts.
| Property | Value |
|---|---|
| Base Model | GPT-2 Small (openai-community/gpt2) |
| Parameter Count | 124,439,808 (100% updated) |
| Architecture | Decoder-only Transformer (GPT-2) |
| Fine-tune Method | Full-Parameter Overwrite (no PEFT/LoRA) |
| Task | Causal Language Modeling / Script Generation |
| Context Window | 512 tokens (contiguous) |
| Language | English |
Training Data
The model was trained on a corpus of approximately 94 million tokens of raw, professionally formatted screenplay text files. The dataset consists of:
- Standard industry-formatted
.fountain/ plain-text screenplay sources - Scene slugline notation (
INT. LOCATION - DAY/NIGHT) - Character cues, action blocks, parentheticals, and dialogue
- Production draft metadata headers and transition markers
No dataset card is available at this time. The corpus was not filtered for content rating or genre — the model reflects the full stylistic and tonal range of the training material.
Training Procedure & Infrastructure
Compute Infrastructure
| Component | Specification |
|---|---|
| Accelerator | NVIDIA T4 Cloud GPU |
| CUDA Backend | Enabled |
| Precision Strategy | FP16 Mixed Precision (torch.cuda.amp via HF Accelerate) |
Hyperparameters
| Hyperparameter | Value |
|---|---|
| Optimizer | AdamW |
| Learning Rate | 5e-5 (linear decay) |
| per_device_train_batch_size | 4 |
| gradient_accumulation_steps | 4 |
| Effective Global Batch Size | 16 |
| Total Optimization Steps | 9,272 (1 full epoch) |
| Total FLOs | 3.876 × 10¹⁶ |
MLOps Resiliency & Checkpoint Recovery
A defining characteristic of this training run is its stateful recovery from a mid-training hardware preemption event. The full timeline is documented below as an engineering reference.
Timeline
[00:00:00] → Training initiated on primary cloud instance (T4 GPU).
Checkpoints configured to persist every 200 global steps.
[04:43:00] → HARDWARE PREEMPTION at global Step 5,600 (60.4% complete).
Primary compute container abruptly disconnected.
Checkpoint preserved: model.safetensors, optimizer.pt, scheduler.pt
[04:43:xx] → Hot-resume initiated on secondary cloud instance from Step 5,601.
Full optimizer state (momentum buffers, variance estimates),
learning rate scheduler, and gradient context fully restored.
[07:43:30] → Training complete at global Step 9,272.
Zero loss discontinuity detected across the resume boundary.
Total aggregate compute time: 7 hours, 43 minutes, 30 seconds across both instances.
The pre-crash and post-resume loss values at Steps 5,600 and 5,800 (see convergence table below) confirm perfect gradient and loss continuity with no regression caused by the preemption event. This demonstrates that HuggingFace's Trainer-native checkpoint serialization — saving full optimizer and scheduler state — is sufficient for lossless mid-run recovery on stateless cloud infrastructure.
Training Metrics & Convergence
The model shows clear asymptotic convergence on screenplay formatting conventions and domain vocabulary across the full 9,272-step run.
| Global Step | Training Phase | Validation Loss | Notes |
|---|---|---|---|
| 200 | Baseline (early) | 1.4586 | Initial domain vocabulary acquisition |
| 2,000 | Formatting alignment | 1.3653 | Scene/dialogue structure stabilizing |
| 5,600 | Pre-crash state | 1.3305 | Checkpoint preserved at preemption |
| 5,800 | Post-resume stability | 1.3276 | Confirmed loss continuity after resume |
| 9,272 | Final (absolute termination) | 1.3194 | Convergence plateau reached |
Total loss reduction: −0.1392 across the full run (−9.5% relative improvement from baseline).
The negligible delta between Steps 5,600 and 5,800 (−0.0029) confirms that the optimizer state was fully restored and training resumed without gradient shock or instability.
Usage & Inference
Loading the Model
from transformers import GPT2LMHeadModel, GPT2Tokenizer
model_id = "raghavnimbalkar/gpt2-screenplay-generator"
tokenizer = GPT2Tokenizer.from_pretrained(model_id)
model = GPT2LMHeadModel.from_pretrained(model_id)
model.eval()
Recommended Inference Parameters
The following nucleus sampling configuration is recommended to produce high-fidelity, coherent screenplay output while avoiding repetitive boilerplate:
| Parameter | Recommended Value | Notes |
|---|---|---|
max_length |
Up to 512 |
Hard context window limit |
temperature |
0.75 – 0.85 |
Lower = sharper dialogue; higher = creative variance |
top_k |
40 or 50 |
Limits vocabulary sampling pool |
top_p |
0.92 – 0.95 |
Nucleus sampling threshold |
repetition_penalty |
1.12 – 1.15 |
Critical — prevents screenplay boilerplate loops |
Inference Example
import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer
model_id = "raghavnimbalkar/gpt2-screenplay-generator"
tokenizer = GPT2Tokenizer.from_pretrained(model_id)
model = GPT2LMHeadModel.from_pretrained(model_id)
model.eval()
prompt = "INT. POLICE PRECINCT - NIGHT\n\nDetective HARRIS slams a folder on the table."
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
output = model.generate(
**inputs,
max_length=512,
temperature=0.80,
top_k=50,
top_p=0.92,
repetition_penalty=1.13,
do_sample=True,
pad_token_id=tokenizer.eos_token_id,
)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)
Tip: A
repetition_penaltyin the1.12–1.15range is especially important for this model. Screenplay corpora contain many repeated structural tokens (INT.,EXT.,CUT TO:, character cues) that, without penalty, the model will loop aggressively during unconstrained generation.
Comparison with LoRA Adapter Model
This model is one half of an ongoing comparative study. The table below contrasts both trained models across architecture, compute, and convergence dimensions.
| Property | Full-Parameter (This Model) | LoRA Adapter (Local) |
|---|---|---|
| Hardware | NVIDIA T4 (Cloud GPU) | Apple Silicon MacBook Air (MPS) |
| Fine-tune Method | Full-parameter overwrite | LoRA / PEFT (c_attn only) |
| Trainable Parameters | 124,439,808 (100%) | 294,912 (0.24%) |
| Epoch Coverage | 1.0 (full corpus) | 0.51 (half corpus) |
| Total Steps | 9,272 | 4,700 |
| Training Time | 7h 43m 30s | 7h 51m 02s |
| Final Eval Loss | 1.3194 | 2.4017 |
| Step Throughput | ~3.0s/step | ~6.01s/step |
| MLOps Event | Hardware preemption + stateful hot-resume | 17× speedup via LoRA over full-param attempt |
Both models spent approximately the same wall-clock time training (~7h 45m). The divergence in final evaluation loss is a direct consequence of full-parameter depth and full corpus coverage versus adapter-based efficiency on constrained hardware — not a difference in compute investment. The LoRA adapter represents a deliberate trade-off: edge-feasibility over convergence depth.
Intended Use
Intended uses:
- Screenplay drafting assistance and creative ideation
- Automated scene/dialogue continuation from a provided slug or action line
- Style transfer and scriptwriting research
- Educational exploration of domain-adaptive fine-tuning on structured text
Out-of-scope uses:
- Factual question answering (this is a generative, not retrieval, model)
- Production-ready script generation without human editorial review
- Any use case requiring truthfulness, citation, or factual accuracy
Bias, Risks, and Limitations
- The model was trained on an unfiltered corpus spanning multiple genres and tones; it may generate content reflecting biases, stereotypes, or mature themes present in its training data.
- As a 124M parameter model, outputs are prone to incoherence over long sequences and may not maintain narrative or character consistency beyond a few exchanges.
- The model has no instruction-following capability; it is a raw next-token predictor conditioned on screenplay-formatted text.
- Users should apply content moderation filters appropriate for their deployment context.
Environmental Impact
Carbon emissions were estimated using the Machine Learning Impact Calculator.
| Property | Value |
|---|---|
| Hardware Type | NVIDIA T4 (Cloud GPU) |
| Hours Used | ~7.72 hours (across 2 instances) |
| Cloud Provider | (Not disclosed) |
| Compute Region | (Not disclosed) |
| Carbon Emitted | 0.31 kg |
Citation
If you reference this model or its training methodology in research, please cite the base model:
@article{radford2019language,
title = {Language Models are Unsupervised Multitask Learners},
author = {Radford, Alec and Wu, Jeff and Child, Rewon and Luan, David and Amodei, Dario and Sutskever, Ilya},
year = {2019}
}
Model Card Contact
For questions about this fine-tune's training methodology, dataset, or inference behavior, please open an issue in this repository.
- Downloads last month
- 104
Model tree for raghavnimbalkar/gpt2-screenplay-generator
Base model
openai-community/gpt2