Instructions to use BAAI/Aquila2-70B-Expr with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use BAAI/Aquila2-70B-Expr with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="BAAI/Aquila2-70B-Expr")# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("BAAI/Aquila2-70B-Expr", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use BAAI/Aquila2-70B-Expr with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "BAAI/Aquila2-70B-Expr" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "BAAI/Aquila2-70B-Expr", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/BAAI/Aquila2-70B-Expr
- SGLang
How to use BAAI/Aquila2-70B-Expr with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "BAAI/Aquila2-70B-Expr" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "BAAI/Aquila2-70B-Expr", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "BAAI/Aquila2-70B-Expr" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "BAAI/Aquila2-70B-Expr", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use BAAI/Aquila2-70B-Expr with Docker Model Runner:
docker model run hf.co/BAAI/Aquila2-70B-Expr
English | 简体中文
We opensource our Aquila2 series, now including Aquila2, the base language models, namely Aquila2-7B, Aquila2-34B and Aquila2-70B-Expr , as well as AquilaChat2, the chat models, namely AquilaChat2-7B, AquilaChat2-34B and AquilaChat2-70B-Expr, as well as the long-text chat models, namely AquilaChat2-7B-16k and AquilaChat2-34B-16k
The additional details of the Aquila model will be presented in the official technical report. Please stay tuned for updates on official channels.
Quick Start
1. Inference
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import BitsAndBytesConfig
model_info = "BAAI/Aquila2-70B-Expr"
tokenizer = AutoTokenizer.from_pretrained(model_info, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_info, trust_remote_code=True)
model.eval()
text = "请给出10个要到北京旅游的理由。"
tokens = tokenizer.encode_plus(text)['input_ids']
tokens = torch.tensor(tokens)[None,].to(device)
stop_tokens = ["###", "[UNK]", "</s>"]
with torch.no_grad():
out = model.generate(tokens, do_sample=True, max_length=512, eos_token_id=100007, bad_words_ids=[[tokenizer.encode(token)[0] for token in stop_tokens]])[0]
out = tokenizer.decode(out.cpu().numpy().tolist())
print(out)
License
Aquila2 70B series open-source model is licensed under BAAI Aquila 70B Model Licence Agreement
- Downloads last month
- 825
