Text Generation
Transformers
PyTorch
English
llama
upstage
llama-2
instruct
instruction
text-generation-inference
8-bit precision
Instructions to use upstage/SOLAR-0-70b-8bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use upstage/SOLAR-0-70b-8bit with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="upstage/SOLAR-0-70b-8bit")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("upstage/SOLAR-0-70b-8bit") model = AutoModelForCausalLM.from_pretrained("upstage/SOLAR-0-70b-8bit") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use upstage/SOLAR-0-70b-8bit with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "upstage/SOLAR-0-70b-8bit" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "upstage/SOLAR-0-70b-8bit", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/upstage/SOLAR-0-70b-8bit
- SGLang
How to use upstage/SOLAR-0-70b-8bit with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "upstage/SOLAR-0-70b-8bit" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "upstage/SOLAR-0-70b-8bit", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "upstage/SOLAR-0-70b-8bit" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "upstage/SOLAR-0-70b-8bit", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use upstage/SOLAR-0-70b-8bit with Docker Model Runner:
docker model run hf.co/upstage/SOLAR-0-70b-8bit
모델 CPU로드시 나타나는 TensorSize mismatch
#2
by AMITA94 - opened
이 모델을 로컬에서 실험해보고 싶어서 quantization coonfigure만 CPU로 동작할 수 있도록 load_in_8bit에서 load_in_8bit_fp32_cpu_offload=True로 변경하였습니다.
이후 모델을 로드하는 도중 다음과 같은 에러가 발생하는데 해결 방법이 있을까요?
File "inference.py", line 14, in <module>
model = AutoModelForCausalLM.from_pretrained(
File "/home/waiker/shyoon/venv/solar_translate/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 467, in from_pretrained
return model_class.from_pretrained(
File "/home/waiker/shyoon/venv/solar_translate/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2777, in from_pretrained
) = cls._load_pretrained_model(
File "/home/waiker/shyoon/venv/solar_translate/lib/python3.8/site-packages/transformers/modeling_utils.py", line 3118, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/home/waiker/shyoon/venv/solar_translate/lib/python3.8/site-packages/transformers/modeling_utils.py", line 702, in _load_state_dict_into_meta_model
set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
File "/home/waiker/shyoon/venv/solar_translate/lib/python3.8/site-packages/accelerate/utils/modeling.py", line 281, in set_module_tensor_to_device
raise ValueError(
ValueError: Trying to set a tensor of shape torch.Size([1024, 8192]) in "weight" (which has shape torch.Size([8192, 8192])), this look incorrect.```