Instructions to use abacusai/Giraffe-v2-70b-32k with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use abacusai/Giraffe-v2-70b-32k with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="abacusai/Giraffe-v2-70b-32k")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("abacusai/Giraffe-v2-70b-32k")
model = AutoModelForCausalLM.from_pretrained("abacusai/Giraffe-v2-70b-32k")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use abacusai/Giraffe-v2-70b-32k with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "abacusai/Giraffe-v2-70b-32k"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "abacusai/Giraffe-v2-70b-32k",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/abacusai/Giraffe-v2-70b-32k

SGLang

How to use abacusai/Giraffe-v2-70b-32k with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "abacusai/Giraffe-v2-70b-32k" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "abacusai/Giraffe-v2-70b-32k",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "abacusai/Giraffe-v2-70b-32k" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "abacusai/Giraffe-v2-70b-32k",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use abacusai/Giraffe-v2-70b-32k with Docker Model Runner:
```
docker model run hf.co/abacusai/Giraffe-v2-70b-32k
```

Model Details

Model Description

We have followed up on our previous training runs related to extending the context length of Llama models. The associated github repository

https://github.com/abacusai/long-context

has some basic details on our approach and metrics. We have also published a paper on arXiv that covers our experiments and analysis a lot more comprehensively.

http://arxiv.org/abs/2308.10882

Developed by: Abacus.AI
Model type: Transformer based autoregressive causal language model
License: Llama 2 Community License: https://github.com/facebookresearch/llama/blob/main/LICENSE
Finetuned from model: Llama V2 70B

Usage

To use this model at longer lengths the model needs to be patched to interpolate the longer context lengths. It will not work if it is simply loaded with the AutoModel framework of transformers. For full details and usage see:

https://github.com/abacusai/Long-Context

The evaluation section has detailed code for how to load and patch the model for inference (or further fine-tuning). Note in particular the max_position_embeddings is not relevant since the patched module dynamically reallocates the position buffers as required.

The tokenizer corresponding to this model is https://huggingface.co/abacusai/Giraffe-v1-Tokenizer.

Using the code in the repository you can load this model with the following code:

from models import load_model, load_tokenizer
tokenizer = load_tokenizer()
model = load_model('abacusai/Giraffe-v2-70b-32k', scale=8)

Downloads last month: 94

Safetensors

Model size

69B params

Tensor type

F16

Paper for abacusai/Giraffe-v2-70b-32k

Giraffe: Adventures in Expanding Context Lengths in LLMs

Paper • 2308.10882 • Published Aug 21, 2023 • 1