Jerrick777 commited on
Commit
f4b953a
·
verified ·
1 Parent(s): 3a9e490

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +85 -0
README.md CHANGED
@@ -20,6 +20,91 @@ Memories-S0 is designed to address two key challenges in security video understa
20
  * **Extreme Efficiency:** It utilizes an innovative input token compression algorithm that dynamically prunes redundant background tokens, focusing computation on foreground objects and motion. This allows the 3B model to run efficiently on mobile/edge hardware.
21
  * **Post-Training:** The model employs a unique post-training strategy using Reinforcement Learning (RL) and event-based temporal shuffling to enhance sequential understanding without expensive full fine-tuning.
22
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
  ## Intended Use
24
 
25
  ### Primary Use Cases
 
20
  * **Extreme Efficiency:** It utilizes an innovative input token compression algorithm that dynamically prunes redundant background tokens, focusing computation on foreground objects and motion. This allows the 3B model to run efficiently on mobile/edge hardware.
21
  * **Post-Training:** The model employs a unique post-training strategy using Reinforcement Learning (RL) and event-based temporal shuffling to enhance sequential understanding without expensive full fine-tuning.
22
 
23
+ ## Installation
24
+
25
+ ```bash
26
+ conda create -n memories-s0 python=3.10 -y
27
+ conda activate memories-s0
28
+
29
+ # Install PyTorch with CUDA support
30
+ pip install torch torchvision torchaudio --index-url <https://download.pytorch.org/whl/cu121>
31
+
32
+ # Install dependencies for Qwen2.5-VL architecture and Flash Attention
33
+ pip install transformers>=4.37.0 accelerate qwen_vl_utils
34
+ pip install flash-attn --no-build-isolation
35
+
36
+ ```
37
+
38
+ ## Inference
39
+
40
+ The following script demonstrates how to run the **Memories-S0** model. It automatically handles the loading of weights from the official Hugging Face repository.
41
+
42
+ ```python
43
+ import torch
44
+ import argparse
45
+ from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
46
+ from qwen_vl_utils import process_vision_info
47
+
48
+ # Official Model Repository
49
+ MODEL_ID = "Memories-ai/security_model"
50
+
51
+ def run_inference(video_path, model_id=MODEL_ID):
52
+ # Load Model with Flash Attention 2 for efficiency
53
+ model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
54
+ model_id,
55
+ torch_dtype=torch.bfloat16,
56
+ attn_implementation="flash_attention_2",
57
+ device_map="auto",
58
+ )
59
+
60
+ processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
61
+
62
+ # Define Security Analysis Prompt
63
+ prompt_text = """YOUR_PROMPT"""
64
+
65
+ messages = [
66
+ {
67
+ "role": "user",
68
+ "content": [
69
+ {"type": "video", "video": video_path},
70
+ {"type": "text", "text": prompt_text},
71
+ ],
72
+ }
73
+ ]
74
+
75
+ # Preprocessing
76
+ text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
77
+ image_inputs, video_inputs, video_kwargs = process_vision_info(messages, return_video_kwargs=True)
78
+
79
+ inputs = processor(
80
+ text=[text],
81
+ images=image_inputs,
82
+ videos=video_inputs,
83
+ padding=True,
84
+ return_tensors="pt",
85
+ **video_kwargs,
86
+ )
87
+ inputs = inputs.to("cuda")
88
+
89
+ # Generate
90
+ generated_ids = model.generate(**inputs, max_new_tokens=768)
91
+ generated_ids_trimmed = [
92
+ out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
93
+ ]
94
+ output_text = processor.batch_decode(
95
+ generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
96
+ )
97
+
98
+ print(output_text[0])
99
+
100
+ if __name__ == "__main__":
101
+ parser = argparse.ArgumentParser()
102
+ parser.add_argument("--video_path", type=str, required=True, help="Path to input video")
103
+ args = parser.parse_args()
104
+ run_inference(args.video_path)
105
+
106
+ ```
107
+
108
  ## Intended Use
109
 
110
  ### Primary Use Cases