nn-tech
/

MetalGPT-1-AWQ

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions

MetalGPT-1-AWQ / README.md

preductor's picture

Upload README.md with huggingface_hub

a73e89e verified 2 months ago

|

history blame contribute delete

2.62 kB

	---
	pipeline_tag: text-generation
	library_name: transformers
	tags:
	- mining
	- awq
	license: cc-by-nc-sa-4.0
	language:
	- ru
	base_model: nn-tech/MetalGPT-1
	---

	## Description

	MetalGPT-1 is a model built upon the Qwen/Qwen3-32B and incorporates both continual pre-training and supervised fine-tuning on domain-specific data from the mining and metallurgy industry.

	---

	### Quantization

	For convenience and better efficiency, we also offer this AWQ-quantized checkpoint of the nn-tech/MetalGPT-1 model. Using AWQ 4-bit quantization greatly speeds up inference and reduces memory consumption, without significant impact on quality.

	---

	### HF Usage

	```python

	from awq import AutoAWQForCausalLM
	from transformers import AutoTokenizer
	import torch

	torch.manual_seed(42)

	model_name = "nn-tech/MetalGPT-1-AWQ"

	tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
	model = AutoAWQForCausalLM.from_quantized(
	model_name,
	device_map="auto",
	)

	messages=[
	{"role": "system", "content": "Ты специалист в области металлургии."},
	{"role": "user", "content": "Назови плюсы и минусы хлоридной и сульфатной технологии производства никеля."}
	]

	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True,
	# enable_thinking=False
	)

	device = next(model.parameters()).device
	model_inputs = tokenizer([text], return_tensors="pt").to(device)

	generated_ids = model.generate(
	**model_inputs,
	max_new_tokens=1024,
	do_sample=True,
	temperature=0.7,
	)

	# Обрезаем префикс промпта
	generated_ids = [
	output_ids[len(input_ids):]
	for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
	]

	response = tokenizer.batch_decode(
	generated_ids,
	skip_special_tokens=True
	)[0]

	print(response)

	```

	---

	### VLLM usage

	```bash
	vllm serve nn-tech/MetalGPT-1-AWQ --reasoning-parser qwen3

	```

	```python

	from openai import OpenAI

	client = OpenAI(
	base_url="http://localhost:8000/v1",
	api_key="dummy"
	)

	response = client.chat.completions.create(
	model="nn-tech/MetalGPT-1-AWQ",
	messages=[
	{"role": "system", "content": "Ты специалист в области металлургии."},
	{"role": "user", "content": "Назови плюсы и минусы хлоридной и сульфатной технологии производства никеля."}
	],
	temperature=0.7,
	max_tokens=1024
	)

	print(response.choices[0].message.content)

	```