TroyDoesAI commited on
Commit
18d0c39
·
verified ·
1 Parent(s): c058bed

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +64 -0
README.md CHANGED
@@ -29,3 +29,67 @@ My Opinion: Black Sheep must comment on the image (not describe the image) based
29
 
30
  *NOTE: `You:` is in reference to your characters you dont want the ai to talk for as it does tend to continue conversations if you dont got the stop tokens.
31
  The dataset is all conversations and comments about images by human curators who have contributed.*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
  *NOTE: `You:` is in reference to your characters you dont want the ai to talk for as it does tend to continue conversations if you dont got the stop tokens.
31
  The dataset is all conversations and comments about images by human curators who have contributed.*
32
+
33
+
34
+ ### Chat Format
35
+
36
+ Given the nature of the training data, the Phi-3-Vision-128K-Instruct model is best suited for a single image input wih prompts using the chat format as follows.
37
+ You can provide the prompt as a single image with a generic template as follow:
38
+ ```markdown
39
+ <|user|>\n<|image_1|>\n{prompt}<|end|>\n<|assistant|>\n
40
+ ```
41
+
42
+ where the model generates the text after `<|assistant|>` . In case of multi-turn conversation, the prompt can be formatted as follows:
43
+
44
+ ```markdown
45
+ <|user|>\n<|image_1|>\n{prompt_1}<|end|>\n<|assistant|>\n{response_1}<|end|>\n<|user|>\n{prompt_2}<|end|>\n<|assistant|>\n
46
+ ```
47
+
48
+ ### Sample inference code
49
+
50
+ This code snippets show how to get quickly started with running the model on a GPU:
51
+
52
+ ```python
53
+ from PIL import Image
54
+ import requests
55
+ from transformers import AutoModelForCausalLM
56
+ from transformers import AutoProcessor
57
+
58
+ model_id = "Disobedient/BlackSheep-Vision"
59
+
60
+ model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cuda", trust_remote_code=True, torch_dtype="auto", _attn_implementation='flash_attention_2') # use _attn_implementation='eager' to disable flash attention
61
+
62
+ processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
63
+
64
+ messages = [
65
+ {"role": "user", "content": "<|image_1|>\nWhat is shown in this image?"},
66
+ {"role": "assistant", "content": "The chart displays the percentage of respondents who agree with various statements about their preparedness for meetings. It shows five categories: 'Having clear and pre-defined goals for meetings', 'Knowing where to find the information I need for a meeting', 'Understanding my exact role and responsibilities when I'm invited', 'Having tools to manage admin tasks like note-taking or summarization', and 'Having more focus time to sufficiently prepare for meetings'. Each category has an associated bar indicating the level of agreement, measured on a scale from 0% to 100%."},
67
+ {"role": "user", "content": "Provide insightful questions to spark discussion."}
68
+ ]
69
+
70
+ url = "https://assets-c4akfrf5b4d3f4b7.z01.azurefd.net/assets/2024/04/BMDataViz_661fb89f3845e.png"
71
+ image = Image.open(requests.get(url, stream=True).raw)
72
+
73
+ prompt = processor.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
74
+
75
+ inputs = processor(prompt, [image], return_tensors="pt").to("cuda:0")
76
+
77
+ generation_args = {
78
+ "max_new_tokens": 500,
79
+ "temperature": 0.0,
80
+ "do_sample": False,
81
+ }
82
+
83
+ generate_ids = model.generate(**inputs, eos_token_id=processor.tokenizer.eos_token_id, **generation_args)
84
+
85
+ # remove input tokens
86
+ generate_ids = generate_ids[:, inputs['input_ids'].shape[1]:]
87
+ response = processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
88
+
89
+ print(response)
90
+ ```
91
+
92
+ Additional basic examples are provided [here](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct/blob/main/sample_inference.py).
93
+
94
+ ### How to finetune?
95
+ We recommend user to take a look at the [Phi-3 CookBook finetuning recipe for Vision](https://github.com/microsoft/Phi-3CookBook/blob/main/md/04.Fine-tuning/FineTuning_Vision.md)