memory-keeper / README.md
likki1715's picture
Update README.md
cc20c37 verified
|
Raw
History Blame Contribute Delete
3 kB
---
title: Memory Keeper
emoji: ๐Ÿƒ
colorFrom: indigo
colorTo: pink
sdk: docker
pinned: false
license: mit
short_description: Turn voice and photos into memory books
tags:
- build-small-hackathon
- thousand-token-wood
- modal
- custom-ui
---
# Memory Keeper ๐ŸŒฒ *(Track: Thousand Token Wood)*
๐Ÿ’ป **[GitHub Repository](https://github.com/KongaraLikhith/memory-keeper)** | ๐ŸŽฅ **[Watch the Demo Video](https://drive.google.com/file/d/1MCXUOhq1C8chFCno9T7GjPm8BOkAZX_X/view?usp=sharing)** | ๐Ÿ“ **[LinkedIn Post](https://www.linkedin.com/posts/likhith-kongara-049b87212_github-kongaralikhithmemory-keeper-memory-activity-7470707614285410304-tbUS)**
## ๐Ÿ“– The Story: Why I Built Memory Keeper
We all have those little momentsโ€”a beautiful sunset, a fleeting thought we record as a voice note, or a random photo that captures a specific feeling. But more often than not, these memories get lost in the endless scroll of our camera rolls or the unorganized abyss of our voice memos.
I built **Memory Keeper** for the Hugging Face "Build Small" Hackathon because I wanted a whimsical, personal digital archive that actually *understands* these fragments. I wanted a tool that could take my raw audio notes and spontaneous photos, and weave them together into beautifully structured storybooks and letters to my future self.
I chose the **Thousand Token Wood** track because this project isn't just about utility; itโ€™s about creating something deeply personal, experimental, and delightful.
## โœจ The Magic: How It Works
Memory Keeper is an entirely open-weight, multi-modal AI pipeline that acts as your personal archivist:
1. **Upload:** You upload a photo, record a voice note, or simply type a thought into the custom glassmorphic UI.
2. **Perception:** The backend immediately spins up specialized "small models" to perceive the inputs. It runs `openai/whisper-base` to transcribe the audio and `Salesforce/blip-image-captioning-base` to generate rich semantic descriptions of the photos.
3. **Synthesis:** A central orchestrator LLM (`Qwen/Qwen2.5-7B-Instruct`) takes all these pieces, looks at your history, and writes a narrative timeline, a structured story, and a personal letter summarizing the memory.
## ๐Ÿ—๏ธ Architecture & Deployment
To keep the application incredibly lightweight while maintaining a premium feel, the system uses a decoupled frontend-backend architecture:
- **Frontend (Hugging Face Spaces):** A completely custom HTML/CSS UI built on top of `gradio.Server`. This bypasses the standard Gradio blocks to deliver a stunning visual experience while strictly adhering to the hackathon's Gradio requirement.
- **Compute Engine (Modal):** Heavy AI perception tasks are offloaded to A10G GPUs via Modal. These endpoints are entirely serverless, meaning they scale to zero when not in use, keeping the memory footprint minimal.
### Local Development
To run the frontend locally for testing:
```bash
pip install -r requirements.txt
python app.py
```