Faraday Memory: Data Ingestion Guide
Your AI memory system has built-in smart parsers designed to ingest various file formats. You do not need to convert your documents manually β the ingestion pipeline handles extraction, chunking, and vector embedding automatically based on file extensions.
π₯ Where to Drop Files
You have two primary designated "Drop Zones". The sync.py push command recursively scans these directories and all their subfolders:
- Your Obsidian Vault's Inbox or Reference Folders
00 - Inbox/01 - Raw Sources/
- The Dedicated Raw Data Directory
Faraday/ai-memory-mcp/data_raw/
π Supported Formats & Naming Rules
1. ChatGPT Exports (JSON)
- Format:
.json - Rule: The filename must contain the word
conversations. - Example:
chatgpt_conversations.jsonorconversations_2025.json - Behavior: Explodes the dump and parses it specifically tracking Human vs AI messages.
2. Gemini Exports (HTML / Takeout)
- Format:
.html - Rule: The filename must contain the word
activityorgemini. - Example:
My_Gemini_Activity.html - Behavior: Strips out Google tracking headers and parses out prompts and responses cleanly.
3. PDF Documents (Coursework, Papers, E-books)
- Format:
.pdf - Rule: Standard PDF documents (text-based).
- Example:
Linear_Algebra_Notes.pdforAttention_Is_All_You_Need.pdf - Behavior: Extracts multi-page text using PDFMiner/PyMuPDF into logical reading chunks.
4. Images & Scans (Diagrams, Receipts, Screenshots)
- Format:
.png,.jpg,.jpeg,.webp,.bmp,.tiff - Rule: Can be named anything.
- Requirements: You must have Tesseract-OCR installed on your system.
- Behavior: Automatically performs Optical Character Recognition (OCR) to rip the text out of the image and inject it into your vector database.
5. Plain Text, Emails, Code, CSVs, and Markdown
- Format:
.md,.txt,.csv,.log,.rst - Rule: Can be named anything.
- Example:
Meeting_Notes.mdoremail_invoice.txt - Behavior: The default fallback. Reads the file as plain structured text and intelligently slices it up. Large CSVs are skipped if they exceed 15MB to prevent clogging the embedding memory.
π How to Add It to the Cloud
Once your files are dropped into the folders:
- Open your terminal in the
ai-memory-mcpfolder. - Run the ingestion and cloud synchronization command:
python sync.py push - The script will:
- Identify only the new files.
- Run the appropriate parser (OCR, HTML extractor, etc.).
- Break them down into semantic chunks and generate embeddings.
- Compress the new database.
- Securely upload the compressed database to Supabase.
Your Claude mobile app (and Antigravity!) will automatically have access to this new data upon their next query!