Magiv3

A model for comics understanding.

DISCLAIMER

This is a model duplicated from ragavsachdeva. Please refer to the original model or its paper for more information.

Usage

from transformers import AutoModelForCausalLM, AutoProcessor
from PIL import Image
import numpy as np
import torch

def load_image(path):
    with open(path, "rb") as file:
        image = Image.open(file).convert("L").convert("RGB")
        image = np.array(image)

    return image

images = ["01.jpg", "02.jpg"]
images = [load_image(image) for image in images]

# All panels from images, not provided by model
panels = splitImagesToPanels(images)

# The generated captions for each panels, not provided by model
captions = generateCaptionsFromPanels(panels) 

model = AutoModelForCausalLM.from_pretrained('mrfish233/magiv3', torch_dtype=torch.float16, trust_remote_code=True).cuda().eval()
processor = AutoProcessor.from_pretrained('mrfish233/magiv3', trust_remote_code=True)

with torch.no_grad():
    # detections from 
    detections = model.predict_detections_and_associations(images, processor)

    # OCR for each page
    ocr_results = model.predict_ocr(images, processor)

    # get character grounding with captions provided
    grounding = model.predict_character_grounding(panels, captions, processor)

Downloads last month: 25

Safetensors

Model size

0.8B params

Tensor type

F16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mrfish233/magiv3

Base model

ragavsachdeva/magiv3

Finetuned

(1)

this model

Paper for mrfish233/magiv3

From Panels to Prose: Generating Literary Narratives from Comics

Paper • 2503.23344 • Published Mar 30, 2025