Instructions to use WestCode1357/gpt-sw3-126m-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use WestCode1357/gpt-sw3-126m-instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="WestCode1357/gpt-sw3-126m-instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("WestCode1357/gpt-sw3-126m-instruct")
model = AutoModelForCausalLM.from_pretrained("WestCode1357/gpt-sw3-126m-instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use WestCode1357/gpt-sw3-126m-instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "WestCode1357/gpt-sw3-126m-instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WestCode1357/gpt-sw3-126m-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/WestCode1357/gpt-sw3-126m-instruct

SGLang

How to use WestCode1357/gpt-sw3-126m-instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "WestCode1357/gpt-sw3-126m-instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WestCode1357/gpt-sw3-126m-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "WestCode1357/gpt-sw3-126m-instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WestCode1357/gpt-sw3-126m-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use WestCode1357/gpt-sw3-126m-instruct with Docker Model Runner:
```
docker model run hf.co/WestCode1357/gpt-sw3-126m-instruct
```

gpt-sw3-126m-instruct

Smallest GPT-SW3 instruct model (126M parameters). Loads instantly — ideal for testing and prototyping.

Size: 126M | Type: instruct | Languages: Swedish, Norwegian, Danish, Icelandic, English

Community mirror of AI-Sweden-Models/gpt-sw3-126m-instruct

Warning and Disclaimer

This model is provided as-is for research and educational purposes. Community redistribution of AI Sweden's GPT-SW3 under the same modified RAIL license.

You are responsible for any content you create using this model. Use responsibly.

The model may reflect biases from training data and may generate inaccurate, offensive, or inappropriate content. Neither the uploader nor AI Sweden are liable for downstream misuse. Review the AI Sweden RAIL license before any production deployment.

"You are responsible for any content you create using this model. Enjoy responsibly."

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "WestCode1357/gpt-sw3-126m-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16)
device = "mps" if torch.backends.mps.is_available() else "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

prompt = "Träd är fina för att"
inputs = tokenizer(prompt, return_tensors="pt").to(device)
out = model.generate(**inputs, max_new_tokens=150, do_sample=True, temperature=0.7)
print(tokenizer.decode(out[0]))

Chat / instruct format

GPT-SW3 instruct uses special tokens. The format is:

<|endoftext|><s>User: [your message]<s>Bot: [response]<s>...

eos = "<|endoftext|>"
seg = "<s>"
prompt = f"{eos}{seg}User: Vad är huvudstaden i Sverige?{seg}Bot: "
inputs = tokenizer(prompt, return_tensors="pt").to(device)
out = model.generate(
    **inputs, max_new_tokens=200,
    do_sample=True, temperature=0.7, top_p=0.95,
    eos_token_id=tokenizer.eos_token_id
)
print(tokenizer.decode(out[0][inputs.input_ids.shape[-1]:], skip_special_tokens=False))

Intended Use

⚠️ These models contain extreme bias and are NOT intended for commercial use. For scientific and research use only.

GPT-SW3 was trained on large-scale web data and may reflect harmful societal biases present in that data. It has not been aligned or safety-tuned beyond its original training. Use strictly in controlled research settings. Do not deploy in any consumer-facing or commercial product without thorough evaluation and additional safety measures.

About GPT-SW3

GPT-SW3 is developed by AI Sweden in collaboration with RISE and WASP WARA for Media and Language. Trained on 320B tokens: Swedish, Norwegian, Danish, Icelandic, English, and code.

Original models: https://huggingface.co/AI-Sweden-Models
Project page: https://www.ai.se/en/project/gpt-sw3

Downloads last month: 979

Safetensors

Model size

0.2B params

Tensor type

F32