Instructions to use rhymes-ai/Aria with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use rhymes-ai/Aria with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="rhymes-ai/Aria")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("rhymes-ai/Aria")
model = AutoModelForMultimodalLM.from_pretrained("rhymes-ai/Aria")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use rhymes-ai/Aria with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "rhymes-ai/Aria"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rhymes-ai/Aria",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/rhymes-ai/Aria

SGLang

How to use rhymes-ai/Aria with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "rhymes-ai/Aria" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rhymes-ai/Aria",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "rhymes-ai/Aria" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rhymes-ai/Aria",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use rhymes-ai/Aria with Docker Model Runner:
```
docker model run hf.co/rhymes-ai/Aria
```

rtx 4090

by mylesgoose - opened Oct 12, 2024

Discussion

mylesgoose

Oct 12, 2024

seems to run okay on 3* 24gb cards. it could describe the cat:-0 nice works.
also i updated yor code a little bit here so it works with newer pytorch verison
torch.amp.autocast(device_type=model.device.type, dtype=torch.bfloat16):

/usr/local/bin/python3 /home/myles/Desktop/Aria/aria.py
grouped_gemm is not installed, using sequential GEMM, which is slower.
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:11<00:00, 1.07it/s]
The seen_tokens attribute is deprecated and will be removed in v4.41. Use the cache_position model input instead.
Starting from v4.46, the logits model output will have the same type as the model (except at train time, where it will always be FP32)
The image shows a close-up of a cat. The cat has black fur on its head and around its eyes, and white fur on the rest of its face and body. It is looking directly at the camera with a curious expression. The cat is situated on a cardboard box, with its front paws resting on the edge and a green background behind it.<|im_end|>

import os
import requests
import torch
from PIL import Image
from transformers import AutoModelForCausalLM, AutoProcessor

# Set the visible devices (you can also do this in the shell before running the script)
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2"

model_id_or_path = "rhymes-ai/Aria"

# Load the model and automatically distribute layers across the visible devices
model = AutoModelForCausalLM.from_pretrained(
    model_id_or_path,
    device_map="auto",  # Automatically distribute model layers
    torch_dtype=torch.bfloat16,
    trust_remote_code=True
)

# Load the processor
processor = AutoProcessor.from_pretrained(model_id_or_path, trust_remote_code=True)

# Load an example image
image_path = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png"
image = Image.open(requests.get(image_path, stream=True).raw)

# Prepare the input message structure
messages = [
    {
        "role": "user",
        "content": [
            {"text": None, "type": "image"},
            {"text": "what is the image?", "type": "text"},
        ],
    }
]

# Prepare inputs for the model
text = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(text=text, images=image, return_tensors="pt")
inputs["pixel_values"] = inputs["pixel_values"].to(model.dtype)

# Move all input tensors to the same device as the model
inputs = {k: v.to(model.device) for k, v in inputs.items()}

# Perform inference
with torch.inference_mode(), torch.amp.autocast(device_type=model.device.type, dtype=torch.bfloat16):
    output = model.generate(
        **inputs,
        max_new_tokens=64000,
        stop_strings=["<|im_end|>"],
        tokenizer=processor.tokenizer,
        do_sample=True,
        temperature=0.9,
    )
    output_ids = output[0][inputs["input_ids"].shape[1]:]
    result = processor.decode(output_ids, skip_special_tokens=True)

print(result)

deleted

Oct 16, 2024

This comment has been hidden

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment