Instructions to use cwestnedge/gpt2-medium-pubmed with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use cwestnedge/gpt2-medium-pubmed with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="cwestnedge/gpt2-medium-pubmed")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("cwestnedge/gpt2-medium-pubmed")
model = AutoModelForCausalLM.from_pretrained("cwestnedge/gpt2-medium-pubmed")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use cwestnedge/gpt2-medium-pubmed with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "cwestnedge/gpt2-medium-pubmed"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cwestnedge/gpt2-medium-pubmed",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/cwestnedge/gpt2-medium-pubmed

SGLang

How to use cwestnedge/gpt2-medium-pubmed with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "cwestnedge/gpt2-medium-pubmed" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cwestnedge/gpt2-medium-pubmed",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "cwestnedge/gpt2-medium-pubmed" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cwestnedge/gpt2-medium-pubmed",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use cwestnedge/gpt2-medium-pubmed with Docker Model Runner:
```
docker model run hf.co/cwestnedge/gpt2-medium-pubmed
```

Overview

This pipeline was used to fine‑tune GPT‑2 small, medium, and large on abstracts from PubMed's baseline data. Models were trained on a single A100 GPU in Google Colab.

Training

Setup

Single epoch over 221,709 batches × 16 × 1024 tokens ≈ 3.63 billion tokens
Identical optimizer, learning‑rate schedule, and hyper‑parameters for all models
No additional regularization or early stopping

Loss

Here are the the loss curves for GPT‑2 small, medium, and large fine‑tuned on PubMed abstracts over single epoch.

Loss comparisons

Evaluation

Dataset

Hold‑out set of 1000 × 16 × 1024 tokens (≈ 16.4 M tokens) randomly sampled from PubMed abstracts, disjoint from the training split.

Metrics

Cross‑entropy loss (averaged over all tokens) and derived perplexity (ppl = exp(loss)) on the hold‑out set:

Model	Parameters	Avg CE Loss ↓	Perplexity ↓
gpt2‑snall‑pubmed	124 M	2.5017	12.20
gpt2‑medium‑pubmed	355 M	2.2984	9.96
gpt2‑large‑pubmed	774 M	2.1863	8.90

Caveats

Perplexities are in‑domain (PubMed abstracts) and may not reflect general‑purpose LM quality
Only one epoch of training; performance likely improves with more epochs or hyper‑parameter tuning
Downstream biomedical benchmarks have not yet been conducted

Usage

1) Quick‑start with the 🤗 pipeline API

from transformers import pipeline
import torch 

device = "cuda" if torch.cuda.is_available() else "cpu"

generator = pipeline(
    "text-generation",
    model="cwestnedge/gpt2-medium-pubmed",
    tokenizer="openai-community/gpt2-medium",
    device=device,
)

prompt = (
    "Background: The CRISPR–Cas9 system has revolutionized gene editing. "
    "In this study, we evaluate its efficacy in"
)
out = generator(
    prompt,
    max_length=200,
    temperature=1e-9,
    top_p=1e-9,
    num_return_sequences=1,
    truncation=True,
)
print(out[0]["generated_text"])

2) Manual load + generate for finer control

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

device = "cuda" if torch.cuda.is_available() else "cpu"
model_name = "cwestnedge/gpt2-medium-pubmed"
tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2-medium")
model = AutoModelForCausalLM.from_pretrained(model_name).to(device)

inputs = tokenizer(
    "Methods: We performed a double‐blind randomized trial to assess",
    return_tensors="pt",
).to(device)

gen_ids = model.generate(
    **inputs,
    max_length=150,
    num_beams=5,
    no_repeat_ngram_size=2,
    early_stopping=True,
)
print(tokenizer.decode(gen_ids[0], skip_special_tokens=True))

3) Scoring / perplexity

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

device = "cuda" if torch.cuda.is_available() else "cpu"
model_name = "cwestnedge/gpt2-medium-pubmed"
tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2-medium")
model = AutoModelForCausalLM.from_pretrained(model_name).to(device)

text = (
    "Tetralogy of Fallot is a rare congenital heart condition that is present at birth."
)
enc = tokenizer(text, return_tensors="pt").to(device)

with torch.no_grad():
    outputs = model(**enc, labels=enc.input_ids)
    loss = outputs.loss
    ppl = torch.exp(loss)

print(f"CE loss: {loss:.4f} → Perplexity: {ppl:.2f}")

Downloads last month: 1

Safetensors

Model size

0.4B params

Tensor type

F32

Model tree for cwestnedge/gpt2-medium-pubmed

Base model

openai-community/gpt2-medium

Finetuned

(196)

this model