Text Generation
Transformers
Safetensors
English
Chinese
llama
Long Context
chatglm
text-generation-inference
Instructions to use zai-org/LongWriter-llama3.1-8b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use zai-org/LongWriter-llama3.1-8b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="zai-org/LongWriter-llama3.1-8b")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("zai-org/LongWriter-llama3.1-8b") model = AutoModelForCausalLM.from_pretrained("zai-org/LongWriter-llama3.1-8b") - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use zai-org/LongWriter-llama3.1-8b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "zai-org/LongWriter-llama3.1-8b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "zai-org/LongWriter-llama3.1-8b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/zai-org/LongWriter-llama3.1-8b
- SGLang
How to use zai-org/LongWriter-llama3.1-8b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "zai-org/LongWriter-llama3.1-8b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "zai-org/LongWriter-llama3.1-8b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "zai-org/LongWriter-llama3.1-8b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "zai-org/LongWriter-llama3.1-8b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use zai-org/LongWriter-llama3.1-8b with Docker Model Runner:
docker model run hf.co/zai-org/LongWriter-llama3.1-8b
What prompt template?
#2
by sdalemorrey - opened
I have no idea what prompt template you're using or what settings. It's just repeating the system and user prompts endlessly when using standard llama template.
Mind sharing?
same here, it cant use standard llama template.
[INST]{query1}[/INST]{response1}[INST]{query2}[/INST]{response2}...
Actually, I originally said that wasn't working for me, then realized my issue. That does seem to work. Thanks so much!
sdalemorrey changed discussion status to closed