mrfish233 commited on
Commit
d0d3935
·
verified ·
1 Parent(s): b4f6e0d

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +44 -6
README.md CHANGED
@@ -1,11 +1,49 @@
1
- # Usage
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
  ```python
4
- model = AutoModelForCausalLM.from_pretrained("ragavsachdeva/magiv3", torch_dtype=torch.float16, trust_remote_code=True).cuda().eval()
5
- processor = AutoProcessor.from_pretrained("ragavsachdeva/magiv3", trust_remote_code=True)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
 
7
- model.predict_detections_and_associations(images, processor)
8
- model.predict_ocr(images, processor)
9
- model.predict_character_grounding(images, captions, processor)
10
 
 
 
11
  ```
 
1
+ ---
2
+ base_model: ragavsachdeva/magiv3
3
+ ---
4
+
5
+ # Magiv3
6
+
7
+ A model for comics understanding.
8
+
9
+ ## DISCLAIMER
10
+
11
+ This is a model duplicated from [ragavsachdeva](https://huggingface.co/ragavsachdeva/magiv3). Please refer to the original model or its [paper](https://arxiv.org/abs/2503.23344) for more information.
12
+
13
+ ## Usage
14
 
15
  ```python
16
+ from transformers import AutoModelForCausalLM, AutoProcessor
17
+ from PIL import Image
18
+ import numpy as np
19
+ import torch
20
+
21
+ def load_image(path):
22
+ with open(path, "rb") as file:
23
+ image = Image.open(file).convert("L").convert("RGB")
24
+ image = np.array(image)
25
+
26
+ return image
27
+
28
+ images = ["01.jpg", "02.jpg"]
29
+ images = [load_image(image) for image in images]
30
+
31
+ # All panels from images, not provided by model
32
+ panels = splitImagesToPanels(images)
33
+
34
+ # The generated captions for each panels, not provided by model
35
+ captions = generateCaptionsFromPanels(panels)
36
+
37
+ model = AutoModelForCausalLM.from_pretrained('mrfish233/magiv3', torch_dtype=torch.float16, trust_remote_code=True).cuda().eval()
38
+ processor = AutoProcessor.from_pretrained('mrfish233/magiv3', trust_remote_code=True)
39
+
40
+ with torch.no_grad():
41
+ # detections from
42
+ detections = model.predict_detections_and_associations(images, processor)
43
 
44
+ # OCR for each page
45
+ ocr_results = model.predict_ocr(images, processor)
 
46
 
47
+ # get character grounding with captions provided
48
+ grounding = model.predict_character_grounding(panels, captions, processor)
49
  ```