Subh775 commited on
Commit
0108732
·
verified ·
1 Parent(s): a61c538

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -17
README.md CHANGED
@@ -42,42 +42,51 @@ Because this model relies on the custom Moondream2 architecture, you will need t
42
  ### Prerequisites
43
  Make sure you have the required libraries installed:
44
  ```bash
45
- pip install transformers pillow einops
46
  ```
47
- ### Python Inference Script
 
48
  ```python
49
  import torch
50
  from transformers import AutoModelForCausalLM, AutoTokenizer
51
  from PIL import Image
 
52
 
53
- # 1. Define the model ID
54
  model_id = "Subh775/Perception-moondream2"
55
 
56
- # 2. Load the tokenizer and model
57
- # Note: trust_remote_code=True is required for the moondream2 architecture
58
  tokenizer = AutoTokenizer.from_pretrained(model_id)
 
59
  model = AutoModelForCausalLM.from_pretrained(
60
- model_id,
61
  trust_remote_code=True,
62
- torch_dtype=torch.float16, # Recommended for memory efficiency
63
- device_map="auto"
64
  )
 
 
 
 
65
 
66
- # 3. Load your traffic/CCTV image
67
- image_path = "path_to_your_traffic_image.jpg"
 
68
  image = Image.open(image_path).convert("RGB")
69
 
70
- # 4. Encode the image using the vision encoder
71
  enc_image = model.encode_image(image)
72
 
73
- # 5. Ask the model to describe the scene
74
- # We use the same prompt that the model was fine-tuned on
75
- prompt = "Describe this traffic scene in detail."
 
 
 
76
 
77
  answer = model.answer_question(enc_image, prompt, tokenizer)
78
 
79
- print("Traffic Scene Analysis:")
80
- print("-" * 50)
81
- print(answer)
 
 
82
  ```
83
 
 
42
  ### Prerequisites
43
  Make sure you have the required libraries installed:
44
  ```bash
45
+ !pip install transformers==4.44.2 "huggingface_hub<1.0" accelerate pillow einops
46
  ```
47
+
48
+ ### Load Tokenizer & Model
49
  ```python
50
  import torch
51
  from transformers import AutoModelForCausalLM, AutoTokenizer
52
  from PIL import Image
53
+ import requests
54
 
 
55
  model_id = "Subh775/Perception-moondream2"
56
 
 
 
57
  tokenizer = AutoTokenizer.from_pretrained(model_id)
58
+
59
  model = AutoModelForCausalLM.from_pretrained(
60
+ model_id,
61
  trust_remote_code=True,
62
+ torch_dtype=torch.float16,
63
+ # REMOVED device_map="auto"
64
  )
65
+ # move to the GPU
66
+ model = model.to("cuda")
67
+ model.eval()
68
+ ```
69
 
70
+ # Inference
71
+ ```python
72
+ image_path = "/content/100130.jpg"
73
  image = Image.open(image_path).convert("RGB")
74
 
 
75
  enc_image = model.encode_image(image)
76
 
77
+ # Give it explicit instructions & explicitly ban the geographic bias.
78
+ prompt = (
79
+ "Describe this traffic scene in detail. Focus strictly on the vehicles, "
80
+ "pedestrians, infrastructure, and traffic density. Do not mention Bengaluru, "
81
+ "India, or any specific geographic locations."
82
+ )
83
 
84
  answer = model.answer_question(enc_image, prompt, tokenizer)
85
 
86
+ banned_phrases = ["in Bengaluru, India", "in Bengaluru", "Bengaluru, India,", "Bengaluru,"]
87
+ for banned in banned_phrases:
88
+ answer = answer.replace(banned, "")
89
+
90
+ print(answer.strip())
91
  ```
92