Instructions to use robowaifudev/megatron-gpt2-345m with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use robowaifudev/megatron-gpt2-345m with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="robowaifudev/megatron-gpt2-345m")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("robowaifudev/megatron-gpt2-345m")
model = AutoModelForCausalLM.from_pretrained("robowaifudev/megatron-gpt2-345m", device_map="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use robowaifudev/megatron-gpt2-345m with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "robowaifudev/megatron-gpt2-345m"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "robowaifudev/megatron-gpt2-345m",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/robowaifudev/megatron-gpt2-345m

SGLang

How to use robowaifudev/megatron-gpt2-345m with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "robowaifudev/megatron-gpt2-345m" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "robowaifudev/megatron-gpt2-345m",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "robowaifudev/megatron-gpt2-345m" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "robowaifudev/megatron-gpt2-345m",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use robowaifudev/megatron-gpt2-345m with Docker Model Runner:
```
docker model run hf.co/robowaifudev/megatron-gpt2-345m
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

This is an archive of nvidia/megatron-gpt2-345m that contains readily available model weights (375M). Its performance on Wikitext-103 is 19.31.¹ In comparison, the performance of GPT2-large (1.5B) is 17.48 and GPT2-medium (762M) is 22.05.²

References

Shoeybi, Mohammad, et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. arXiv, 2019, https://doi.org/10.48550/ARXIV.1909.08053.
Alec Radford, et al. Language Models are Unsupervised Multitask Learners. OpenAI, 2019. https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf.

Description

Megatron is a large, powerful transformer developed by the Applied Deep Learning Research team at NVIDIA. This particular Megatron model was trained from a generative, left-to-right transformer in the style of GPT-2. This model was trained on text sourced from Wikipedia, RealNews, OpenWebText, and CC-Stories. It contains 345 million parameters.

Find more information at https://github.com/NVIDIA/Megatron-LM

How to run Megatron GPT2 using Transformers

Text generation

The following code shows how to use the Megatron GPT2 checkpoint and Transformers to generate text.

import os
import torch

from transformers import GPT2Tokenizer, GPT2LMHeadModel

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("robowaifudev/megatron-gpt2-345m")

if torch.cuda.is_available():
    device = torch.device("cuda")
    model.half()
else:
    device = torch.device("cpu")
model.to(device)
model.eval()

# Generate
prompt = (
"It was a bright cold day in April, and the clocks were striking thirteen. Winston Smith,"
)
input_ids = tokenizer.encode(prompt, return_tensors="pt").to(device)
output = model.generate(
    input_ids=input_ids,
    max_length=len(input_ids) + 128,
    do_sample=True,
    top_k=64,
    top_p=0.9,
    temperature=0.8,
    num_return_sequences=2,
    repetition_penalty=1.025
)

# Output the text
print("Prompt:", prompt)
print("*" * 3)
for i, sentence in enumerate(output):
    text = tokenizer.decode(sentence, clean_up_tokenization_spaces=True)
    print(f"{i}:", text)
    print("*" * 3)

Original code

The original Megatron code can be found here: https://github.com/NVIDIA/Megatron-LM.

Downloads last month: 22,149

Safetensors

Model size

0.4B params

Tensor type

F16

Model tree for robowaifudev/megatron-gpt2-345m

Merges

1 model

Quantizations

3 models

Datasets used to train robowaifudev/megatron-gpt2-345m

Spaces using robowaifudev/megatron-gpt2-345m 29

Paper for robowaifudev/megatron-gpt2-345m

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

Paper • 1909.08053 • Published Sep 17, 2019 • 5

Evaluation results

Perplexity on WikiText-103
self-reported

19.310
Perplexity on WikiText-2
self-reported

17.151
Perplexity on LAMBADA
self-reported

5.509
Accuracy on LAMBADA
self-reported

68.31%