Instructions to use raghavnimbalkar/gpt2-screenplay-generator with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use raghavnimbalkar/gpt2-screenplay-generator with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="raghavnimbalkar/gpt2-screenplay-generator")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("raghavnimbalkar/gpt2-screenplay-generator")
model = AutoModelForCausalLM.from_pretrained("raghavnimbalkar/gpt2-screenplay-generator")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use raghavnimbalkar/gpt2-screenplay-generator with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "raghavnimbalkar/gpt2-screenplay-generator"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "raghavnimbalkar/gpt2-screenplay-generator",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/raghavnimbalkar/gpt2-screenplay-generator

SGLang

How to use raghavnimbalkar/gpt2-screenplay-generator with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "raghavnimbalkar/gpt2-screenplay-generator" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "raghavnimbalkar/gpt2-screenplay-generator",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "raghavnimbalkar/gpt2-screenplay-generator" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "raghavnimbalkar/gpt2-screenplay-generator",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use raghavnimbalkar/gpt2-screenplay-generator with Docker Model Runner:
```
docker model run hf.co/raghavnimbalkar/gpt2-screenplay-generator
```

GPT-2 Small — Screenplay Scriptwriting Model

Study Context: This is the first model in a dual-architecture comparative study on screenplay generation using GPT-2 Small. This model is a full-parameter fine-tune executed on a cloud NVIDIA T4 GPU. The second model is a LoRA adapter trained entirely on consumer edge hardware — an Apple Silicon MacBook Air — using PEFT to operate within the hard constraints of a fanless, unified-memory device.

A fully fine-tuned GPT-2 Small (124M) causal language model trained end-to-end on ~94M tokens of professional screenplay corpora, with stateful MLOps checkpoint recovery from a mid-run hardware preemption event.

Model Description

This model is a full-parameter fine-tune of OpenAI's GPT-2 Small (124M parameters) for the task of Causal Language Modeling with a specialization in screenplay and script generation. Every one of the 124,439,808 parameters was unfrozen and updated during training — this is not a LoRA, adapter, or PEFT-based model. All weights have been fully overwritten from the base GPT-2 checkpoint.

The model has internalized the highly structured formatting conventions of professional screenplays: scene slugs (INT./EXT.), character action lines, dialogue blocks, parentheticals, and production draft metadata — making it capable of generating coherent, industry-formatted script content from open-ended prompts.

Property	Value
Base Model	GPT-2 Small (`openai-community/gpt2`)
Parameter Count	124,439,808 (100% updated)
Architecture	Decoder-only Transformer (GPT-2)
Fine-tune Method	Full-Parameter Overwrite (no PEFT/LoRA)
Task	Causal Language Modeling / Script Generation
Context Window	512 tokens (contiguous)
Language	English

Training Data

The model was trained on a corpus of approximately 94 million tokens of raw, professionally formatted screenplay text files. The dataset consists of:

Standard industry-formatted .fountain / plain-text screenplay sources
Scene slugline notation (INT. LOCATION - DAY/NIGHT)
Character cues, action blocks, parentheticals, and dialogue
Production draft metadata headers and transition markers

No dataset card is available at this time. The corpus was not filtered for content rating or genre — the model reflects the full stylistic and tonal range of the training material.

Training Procedure & Infrastructure

Compute Infrastructure

Component	Specification
Accelerator	NVIDIA T4 Cloud GPU
CUDA Backend	Enabled
Precision Strategy	FP16 Mixed Precision (`torch.cuda.amp` via HF Accelerate)

Hyperparameters

Hyperparameter	Value
Optimizer	AdamW
Learning Rate	`5e-5` (linear decay)
per_device_train_batch_size	4
gradient_accumulation_steps	4
Effective Global Batch Size	16
Total Optimization Steps	9,272 (1 full epoch)
Total FLOs	3.876 × 10¹⁶

MLOps Resiliency & Checkpoint Recovery

A defining characteristic of this training run is its stateful recovery from a mid-training hardware preemption event. The full timeline is documented below as an engineering reference.

Timeline

[00:00:00] → Training initiated on primary cloud instance (T4 GPU).
                Checkpoints configured to persist every 200 global steps.

[04:43:00] → HARDWARE PREEMPTION at global Step 5,600 (60.4% complete).
                Primary compute container abruptly disconnected.
                Checkpoint preserved: model.safetensors, optimizer.pt, scheduler.pt

[04:43:xx] → Hot-resume initiated on secondary cloud instance from Step 5,601.
                Full optimizer state (momentum buffers, variance estimates),
                learning rate scheduler, and gradient context fully restored.

[07:43:30] → Training complete at global Step 9,272.
                Zero loss discontinuity detected across the resume boundary.

Total aggregate compute time: 7 hours, 43 minutes, 30 seconds across both instances.

The pre-crash and post-resume loss values at Steps 5,600 and 5,800 (see convergence table below) confirm perfect gradient and loss continuity with no regression caused by the preemption event. This demonstrates that HuggingFace's Trainer-native checkpoint serialization — saving full optimizer and scheduler state — is sufficient for lossless mid-run recovery on stateless cloud infrastructure.

Training Metrics & Convergence

The model shows clear asymptotic convergence on screenplay formatting conventions and domain vocabulary across the full 9,272-step run.

Global Step	Training Phase	Validation Loss	Notes
200	Baseline (early)	1.4586	Initial domain vocabulary acquisition
2,000	Formatting alignment	1.3653	Scene/dialogue structure stabilizing
5,600	Pre-crash state	1.3305	Checkpoint preserved at preemption
5,800	Post-resume stability	1.3276	Confirmed loss continuity after resume
9,272	Final (absolute termination)	1.3194	Convergence plateau reached

Total loss reduction: −0.1392 across the full run (−9.5% relative improvement from baseline).

The negligible delta between Steps 5,600 and 5,800 (−0.0029) confirms that the optimizer state was fully restored and training resumed without gradient shock or instability.

Usage & Inference

Loading the Model

from transformers import GPT2LMHeadModel, GPT2Tokenizer

model_id = "raghavnimbalkar/gpt2-screenplay-generator"

tokenizer = GPT2Tokenizer.from_pretrained(model_id)
model = GPT2LMHeadModel.from_pretrained(model_id)
model.eval()

Recommended Inference Parameters

The following nucleus sampling configuration is recommended to produce high-fidelity, coherent screenplay output while avoiding repetitive boilerplate:

Parameter	Recommended Value	Notes
`max_length`	Up to `512`	Hard context window limit
`temperature`	`0.75` – `0.85`	Lower = sharper dialogue; higher = creative variance
`top_k`	`40` or `50`	Limits vocabulary sampling pool
`top_p`	`0.92` – `0.95`	Nucleus sampling threshold
`repetition_penalty`	`1.12` – `1.15`	Critical — prevents screenplay boilerplate loops

Inference Example

import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer

model_id = "raghavnimbalkar/gpt2-screenplay-generator"  

tokenizer = GPT2Tokenizer.from_pretrained(model_id)
model = GPT2LMHeadModel.from_pretrained(model_id)
model.eval()

prompt = "INT. POLICE PRECINCT - NIGHT\n\nDetective HARRIS slams a folder on the table."

inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
    output = model.generate(
        **inputs,
        max_length=512,
        temperature=0.80,
        top_k=50,
        top_p=0.92,
        repetition_penalty=1.13,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id,
    )

generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)

Tip: A repetition_penalty in the 1.12–1.15 range is especially important for this model. Screenplay corpora contain many repeated structural tokens (INT., EXT., CUT TO:, character cues) that, without penalty, the model will loop aggressively during unconstrained generation.

Comparison with LoRA Adapter Model

This model is one half of an ongoing comparative study. The table below contrasts both trained models across architecture, compute, and convergence dimensions.

Property	Full-Parameter (This Model)	LoRA Adapter (Local)
Hardware	NVIDIA T4 (Cloud GPU)	Apple Silicon MacBook Air (MPS)
Fine-tune Method	Full-parameter overwrite	LoRA / PEFT (`c_attn` only)
Trainable Parameters	124,439,808 (100%)	294,912 (0.24%)
Epoch Coverage	1.0 (full corpus)	0.51 (half corpus)
Total Steps	9,272	4,700
Training Time	7h 43m 30s	7h 51m 02s
Final Eval Loss	1.3194	2.4017
Step Throughput	~3.0s/step	~6.01s/step
MLOps Event	Hardware preemption + stateful hot-resume	17× speedup via LoRA over full-param attempt

Both models spent approximately the same wall-clock time training (~7h 45m). The divergence in final evaluation loss is a direct consequence of full-parameter depth and full corpus coverage versus adapter-based efficiency on constrained hardware — not a difference in compute investment. The LoRA adapter represents a deliberate trade-off: edge-feasibility over convergence depth.

Intended Use

Intended uses:

Screenplay drafting assistance and creative ideation
Automated scene/dialogue continuation from a provided slug or action line
Style transfer and scriptwriting research
Educational exploration of domain-adaptive fine-tuning on structured text

Out-of-scope uses:

Factual question answering (this is a generative, not retrieval, model)
Production-ready script generation without human editorial review
Any use case requiring truthfulness, citation, or factual accuracy

Bias, Risks, and Limitations

The model was trained on an unfiltered corpus spanning multiple genres and tones; it may generate content reflecting biases, stereotypes, or mature themes present in its training data.
As a 124M parameter model, outputs are prone to incoherence over long sequences and may not maintain narrative or character consistency beyond a few exchanges.
The model has no instruction-following capability; it is a raw next-token predictor conditioned on screenplay-formatted text.
Users should apply content moderation filters appropriate for their deployment context.

Environmental Impact

Carbon emissions were estimated using the Machine Learning Impact Calculator.

Property	Value
Hardware Type	NVIDIA T4 (Cloud GPU)
Hours Used	~7.72 hours (across 2 instances)
Cloud Provider	(Not disclosed)
Compute Region	(Not disclosed)
Carbon Emitted	0.31 kg

Citation

If you reference this model or its training methodology in research, please cite the base model:

@article{radford2019language,
  title   = {Language Models are Unsupervised Multitask Learners},
  author  = {Radford, Alec and Wu, Jeff and Child, Rewon and Luan, David and Amodei, Dario and Sutskever, Ilya},
  year    = {2019}
}

Model Card Contact

For questions about this fine-tune's training methodology, dataset, or inference behavior, please open an issue in this repository.

Downloads last month: 27

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for raghavnimbalkar/gpt2-screenplay-generator

Base model

openai-community/gpt2

Finetuned

(2208)

this model

raghavnimbalkar
/

gpt2-screenplay-generator