Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,64 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Model Card for Memories-S0
|
| 2 |
+
|
| 3 |
+
**Memories-S0** is a highly efficient, 3-billion-parameter video understanding model designed specifically for the security and surveillance domain. It leverages synthetic data generation (via Veo 3) and extreme optimization strategies to achieve state-of-the-art performance on edge devices.
|
| 4 |
+
|
| 5 |
+
## Model Details
|
| 6 |
+
|
| 7 |
+
* **Model Name:** Memories-S0
|
| 8 |
+
* **Organization:** Memories.ai Research
|
| 9 |
+
* **Model Architecture:** 3B Parameter VideoLLM
|
| 10 |
+
* **Release Date:** Jan 2026
|
| 11 |
+
* **License:** Apache 2.0
|
| 12 |
+
* **Paper:** [Memories-SO: An Efficient and Accurate Framework for Security Video Understanding](https://memories.ai/research/Camera)
|
| 13 |
+
* **Code Repository:** [https://github.com/Memories-ai-labs/memories-s0](https://github.com/Memories-ai-labs/memories-s0)
|
| 14 |
+
|
| 15 |
+
### Model Description
|
| 16 |
+
|
| 17 |
+
Memories-S0 is designed to address two key challenges in security video understanding: data scarcity and deployment efficiency on resource-constrained devices.
|
| 18 |
+
|
| 19 |
+
* **Data Innovation:** The model is pre-trained on a massive, diverse set of synthetic surveillance videos generated by advanced video generation models (like Veo 3). This allows for pixel-perfect annotations and covers diverse scenarios (e.g., dimly lit hallways, unattended packages).
|
| 20 |
+
* **Extreme Efficiency:** It utilizes an innovative input token compression algorithm that dynamically prunes redundant background tokens, focusing computation on foreground objects and motion. This allows the 3B model to run efficiently on mobile/edge hardware.
|
| 21 |
+
* **Post-Training:** The model employs a unique post-training strategy using Reinforcement Learning (RL) and event-based temporal shuffling to enhance sequential understanding without expensive full fine-tuning.
|
| 22 |
+
|
| 23 |
+
## Intended Use
|
| 24 |
+
|
| 25 |
+
### Primary Use Cases
|
| 26 |
+
|
| 27 |
+
* **Security & Surveillance:** Detecting anomalies, tracking suspicious activities, and monitoring public safety.
|
| 28 |
+
* **Smart Home Monitoring:** Analyzing video feeds for unusual events (e.g., falls, intruders) as benchmarked on SmartHomeBench.
|
| 29 |
+
* **Edge Computing:** Deploying high-performance video analysis directly on cameras or local gateways with limited memory and compute power.
|
| 30 |
+
|
| 31 |
+
### Out-of-Scope Use Cases
|
| 32 |
+
|
| 33 |
+
* General open-domain video understanding (e.g., movie classification) may not be optimal as the model is specialized for surveillance angles and events.
|
| 34 |
+
* Biometric identification (Face Recognition) is not the primary design goal; the focus is on action and event understanding.
|
| 35 |
+
|
| 36 |
+
## Performance (SmartHomeBench)
|
| 37 |
+
|
| 38 |
+
We evaluated Memories-S0(3B) on the **SmartHomeBench** dataset, a recognized benchmark for smart home video anomaly detection.
|
| 39 |
+
|
| 40 |
+
Despite having only **3B parameters**, our model achieves an **F1-score of 79.21** using a simple **Zero-shot** prompt, surpassing larger models like VILA-13b and performing competitively against GPT-4o and Claude-3.5-Sonnet (which require complex Chain-of-Thought prompting).
|
| 41 |
+
|
| 42 |
+
| Model | Params | Prompting Method | Accuracy | Precision | Recall | **F1-score** |
|
| 43 |
+
| --- | --- | --- | --- | --- | --- | --- |
|
| 44 |
+
| **Memories-S0 (Ours)** | **3B** | **Zero-shot** | **71.33** | **73.04** | **86.51** | **79.21** |
|
| 45 |
+
| VILA-13b | 13B | Few-shot CoT | 67.17 | 69.18 | 70.57 | 69.87 |
|
| 46 |
+
| GPT-4o | Closed | Zero-shot | 68.41 | 80.09 | 55.16 | 65.33 |
|
| 47 |
+
| Gemini-1.5-Pro | Closed | Zero-shot | 57.36 | 84.34 | 25.73 | 39.43 |
|
| 48 |
+
|
| 49 |
+
## Citation
|
| 50 |
+
|
| 51 |
+
If you use this model or framework in your research, please cite our technical report:
|
| 52 |
+
|
| 53 |
+
```bibtex
|
| 54 |
+
@techreport{memories_s0_2025,
|
| 55 |
+
title = {{Memories-S0}: An Efficient and Accurate Framework for Security Video Understanding},
|
| 56 |
+
author = {{Memories.ai Research}},
|
| 57 |
+
institution = {Memories.ai},
|
| 58 |
+
year = {2025},
|
| 59 |
+
month = oct,
|
| 60 |
+
url = {https://huggingface.co/Memories-ai/security_model},
|
| 61 |
+
note = {Accessed: 2025-11-20}
|
| 62 |
+
}
|
| 63 |
+
|
| 64 |
+
```
|