Instructions to use gurgutan/ruGPT-13B-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use gurgutan/ruGPT-13B-4bit with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="gurgutan/ruGPT-13B-4bit")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("gurgutan/ruGPT-13B-4bit") model = AutoModelForCausalLM.from_pretrained("gurgutan/ruGPT-13B-4bit") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use gurgutan/ruGPT-13B-4bit with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "gurgutan/ruGPT-13B-4bit" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "gurgutan/ruGPT-13B-4bit", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/gurgutan/ruGPT-13B-4bit
- SGLang
How to use gurgutan/ruGPT-13B-4bit with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "gurgutan/ruGPT-13B-4bit" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "gurgutan/ruGPT-13B-4bit", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "gurgutan/ruGPT-13B-4bit" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "gurgutan/ruGPT-13B-4bit", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use gurgutan/ruGPT-13B-4bit with Docker Model Runner:
docker model run hf.co/gurgutan/ruGPT-13B-4bit
ruGPT-13B-4bit
This files are GPTQ model files for sberbank ruGPT-3.5-13B model.
Technical details
Model was quantized to 4-bit with AutoGPTQ library
Examples of usage
First make sure you have AutoGPTQ installed:
GITHUB_ACTIONS=true pip install auto-gptq
Then try the following example code:
from transformers import AutoTokenizer, TextGenerationPipeline
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
repo_name = "gurgutan/ruGPT-13B-4bit"
# load tokenizer from Hugging Face Hub
tokenizer = AutoTokenizer.from_pretrained(repo_name, use_fast=True)
# download quantized model from Hugging Face Hub and load to the first GPU
model = AutoGPTQForCausalLM.from_quantized(repo_name, device="cuda:0", use_safetensors=True, use_triton=False)
# inference with model.generate
request = "Буря мглою небо кроет"
print(tokenizer.decode(model.generate(**tokenizer(request, return_tensors="pt").to(model.device))[0]))
# or you can also use pipeline
pipeline = TextGenerationPipeline(model=model, tokenizer=tokenizer)
print(pipeline(request)[0]["generated_text"])
Original model: ruGPT-3.5 13B
Language model for Russian. Model has 13B parameters as you can guess from it's name. This is our biggest model so far and it was used for trainig GigaChat (read more about it in the article).
- Downloads last month
- 11