title: Memory Keeper
emoji: π
colorFrom: indigo
colorTo: pink
sdk: docker
pinned: false
license: mit
short_description: Turn voice and photos into memory books
tags:
- build-small-hackathon
- thousand-token-wood
- modal
- custom-ui
Memory Keeper π² (Track: Thousand Token Wood)
π» GitHub Repository | π₯ Watch the Demo Video | π LinkedIn Post
π The Story: Why I Built Memory Keeper
We all have those little momentsβa beautiful sunset, a fleeting thought we record as a voice note, or a random photo that captures a specific feeling. But more often than not, these memories get lost in the endless scroll of our camera rolls or the unorganized abyss of our voice memos.
I built Memory Keeper for the Hugging Face "Build Small" Hackathon because I wanted a whimsical, personal digital archive that actually understands these fragments. I wanted a tool that could take my raw audio notes and spontaneous photos, and weave them together into beautifully structured storybooks and letters to my future self.
I chose the Thousand Token Wood track because this project isn't just about utility; itβs about creating something deeply personal, experimental, and delightful.
β¨ The Magic: How It Works
Memory Keeper is an entirely open-weight, multi-modal AI pipeline that acts as your personal archivist:
- Upload: You upload a photo, record a voice note, or simply type a thought into the custom glassmorphic UI.
- Perception: The backend immediately spins up specialized "small models" to perceive the inputs. It runs
openai/whisper-baseto transcribe the audio andSalesforce/blip-image-captioning-baseto generate rich semantic descriptions of the photos. - Synthesis: A central orchestrator LLM (
Qwen/Qwen2.5-7B-Instruct) takes all these pieces, looks at your history, and writes a narrative timeline, a structured story, and a personal letter summarizing the memory.
ποΈ Architecture & Deployment
To keep the application incredibly lightweight while maintaining a premium feel, the system uses a decoupled frontend-backend architecture:
- Frontend (Hugging Face Spaces): A completely custom HTML/CSS UI built on top of
gradio.Server. This bypasses the standard Gradio blocks to deliver a stunning visual experience while strictly adhering to the hackathon's Gradio requirement. - Compute Engine (Modal): Heavy AI perception tasks are offloaded to A10G GPUs via Modal. These endpoints are entirely serverless, meaning they scale to zero when not in use, keeping the memory footprint minimal.
Local Development
To run the frontend locally for testing:
pip install -r requirements.txt
python app.py