Spaces:

build-small-hackathon
/

memory-keeper

Running

App Files Files Community

memory-keeper / README.md

likki1715

Update README.md

cc20c37 verified 21 days ago

preview code

Raw

History Blame Contribute Delete

3 kB

	---
	title: Memory Keeper
	emoji: 🏃
	colorFrom: indigo
	colorTo: pink
	sdk: docker
	pinned: false
	license: mit
	short_description: Turn voice and photos into memory books
	tags:
	- build-small-hackathon
	- thousand-token-wood
	- modal
	- custom-ui
	---
	# Memory Keeper 🌲 (Track: Thousand Token Wood)

	💻 [GitHub Repository](https://github.com/KongaraLikhith/memory-keeper) \| 🎥 [Watch the Demo Video](https://drive.google.com/file/d/1MCXUOhq1C8chFCno9T7GjPm8BOkAZX_X/view?usp=sharing) \| 📝 [LinkedIn Post](https://www.linkedin.com/posts/likhith-kongara-049b87212_github-kongaralikhithmemory-keeper-memory-activity-7470707614285410304-tbUS)

	## 📖 The Story: Why I Built Memory Keeper

	We all have those little moments—a beautiful sunset, a fleeting thought we record as a voice note, or a random photo that captures a specific feeling. But more often than not, these memories get lost in the endless scroll of our camera rolls or the unorganized abyss of our voice memos.

	I built Memory Keeper for the Hugging Face "Build Small" Hackathon because I wanted a whimsical, personal digital archive that actually understands these fragments. I wanted a tool that could take my raw audio notes and spontaneous photos, and weave them together into beautifully structured storybooks and letters to my future self.

	I chose the Thousand Token Wood track because this project isn't just about utility; it’s about creating something deeply personal, experimental, and delightful.


	## ✨ The Magic: How It Works

	Memory Keeper is an entirely open-weight, multi-modal AI pipeline that acts as your personal archivist:
	1. Upload: You upload a photo, record a voice note, or simply type a thought into the custom glassmorphic UI.
	2. Perception: The backend immediately spins up specialized "small models" to perceive the inputs. It runs `openai/whisper-base` to transcribe the audio and `Salesforce/blip-image-captioning-base` to generate rich semantic descriptions of the photos.
	3. Synthesis: A central orchestrator LLM (`Qwen/Qwen2.5-7B-Instruct`) takes all these pieces, looks at your history, and writes a narrative timeline, a structured story, and a personal letter summarizing the memory.


	## 🏗️ Architecture & Deployment

	To keep the application incredibly lightweight while maintaining a premium feel, the system uses a decoupled frontend-backend architecture:

	- Frontend (Hugging Face Spaces): A completely custom HTML/CSS UI built on top of `gradio.Server`. This bypasses the standard Gradio blocks to deliver a stunning visual experience while strictly adhering to the hackathon's Gradio requirement.
	- Compute Engine (Modal): Heavy AI perception tasks are offloaded to A10G GPUs via Modal. These endpoints are entirely serverless, meaning they scale to zero when not in use, keeping the memory footprint minimal.

	### Local Development
	To run the frontend locally for testing:
	```bash
	pip install -r requirements.txt
	python app.py
	```