AnyaSchen/image2music_abc
Viewer • Updated • 1k • 16
How to use AnyaSchen/image2music with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("image-text-to-text", model="AnyaSchen/image2music") # Load model directly
from transformers import AutoTokenizer, AutoModelForImageTextToText
tokenizer = AutoTokenizer.from_pretrained("AnyaSchen/image2music")
model = AutoModelForImageTextToText.from_pretrained("AnyaSchen/image2music")How to use AnyaSchen/image2music with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AnyaSchen/image2music"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "AnyaSchen/image2music",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/AnyaSchen/image2music
How to use AnyaSchen/image2music with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "AnyaSchen/image2music" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "AnyaSchen/image2music",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "AnyaSchen/image2music" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "AnyaSchen/image2music",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use AnyaSchen/image2music with Docker Model Runner:
docker model run hf.co/AnyaSchen/image2music
This repo contains model for music generation from images. The generated music returns in ABC format and it can be sound for example here. Note, that you need to correct BPM (this is speed) to make music more logical and natural. The model is fune-tuned concatecation of two pre-trained models: google/vit-base-patch16-224 as encoder and sander-wood/text-to-music as decoder. To use this model you can write this:
from PIL import Image
import requests
from transformers import AutoTokenizer, VisionEncoderDecoderModel, ViTImageProcessor
def generate_music(model, image, tokenizer):
pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values
pixel_values = pixel_values.to(device)
generated_tokens = model.generate(
pixel_values,
max_length=300,
num_beams=5,
top_p=0.8,
temperature=2.0,
do_sample=True,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
)
generated_music = tokenizer.decode(generated_tokens[0], skip_special_tokens=True)
return generated_music
path = 'AnyaSchen/image2music'
fine_tuned_model = VisionEncoderDecoderModel.from_pretrained(path).to(device)
feature_extractor = ViTImageProcessor.from_pretrained(path)
tokenizer = AutoTokenizer.from_pretrained(path)
url = 'https://anandaindia.org/wp-content/uploads/2018/12/happy-man.jpg'
image = Image.open(requests.get(url, stream=True).raw)
generated_music = generate_music(fine_tuned_model, image, tokenizer)
print(generated_music)