Instructions to use ocisd4/multi-modal-llama-ocis with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ocisd4/multi-modal-llama-ocis with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="ocisd4/multi-modal-llama-ocis", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("ocisd4/multi-modal-llama-ocis", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
How to Get Started with the Model
import torch
from transformers import AutoModel, AutoProcessor, pipeline
import librosa
from PIL import Image
model_path = "ocisd4/multi-modal-llama-ocis"
processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True, token="hf_tokens")
pipe = pipeline(model=model_path, trust_remote_code=True, processor=processor, device_map='auto')
audio, sr = librosa.load("/path/to/請問圖片中的景點是哪裡.wav", sr=16000)
image = Image.open("/path/to/台南孔廟.jpg")
turns = [
dict(
role='system',
content = "You are a travel expert who can accurately analyze the attractions in the pictures. All conversations should be conducted in Traditional Chinese.",
),
dict(
role='user',
content='<|image|><|begin_of_audio|><|audio|><|end_of_audio|>'
)
]
y_pred = pipe({'audio': [audio], 'images': [image], 'turns': turns, 'sampling_rate': sr}, max_new_tokens=300)
print(f"{y_pred}") # 這張照片中的景點是台灣的「台南孔廟」。...
- Downloads last month
- -