How to use from the
Use from the
Transformers library
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="nn-tech/MetalGPT-1-FP8")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("nn-tech/MetalGPT-1-FP8")
model = AutoModelForCausalLM.from_pretrained("nn-tech/MetalGPT-1-FP8")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))
Quick Links

Description

MetalGPT-1 is a model built upon the Qwen/Qwen3-32B and incorporates both continual pre-training and supervised fine-tuning on domain-specific data from the mining and metallurgy industry.


Quantization

For convenience and improved performance, we also provide this FP8 checkpoint of the nn-tech/MetalGPT-1 model. Using FP8 precision enables faster inference and lower memory usage, while preserving model quality and numerical stability.


VLLM usage

vllm serve nn-tech/MetalGPT-1-FP8 --reasoning-parser qwen3

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="dummy"  
)

response = client.chat.completions.create(
    model="nn-tech/MetalGPT-1-FP8",
    messages=[
        {"role": "system", "content": "Ты специалист в области металлургии."},
        {"role": "user", "content": "Назови плюсы и минусы хлоридной и сульфатной технологии производства никеля."}
    ],
    temperature=0.7,
    max_tokens=1024
)

print(response.choices[0].message.content)
Downloads last month
95
Safetensors
Model size
33B params
Tensor type
BF16
·
F8_E4M3
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nn-tech/MetalGPT-1-FP8

Base model

Qwen/Qwen3-32B
Quantized
(3)
this model