# workflow_notes.md [AGENTARIUM_ASSET] Name: Viral Muse – Workflow Notes (Implementation) Version: v1.0 Status: Draft ## Goal Implement Viral Muse as a **dataset-driven** agent using: - system prompt + reasoning template + personality fingerprint - guardrails - RAG over the 6 CSV datasets - optional “knowledge map” layer for cross-dataset linking - memory: user profile + project workspace This guide assumes an orchestration runtime like **n8n**, but the logic applies to LangChain, Flowise, Dify, etc. --- ## 0) Folder sanity check (Agentarium v1) You should have: - `/core/` (system_prompt.md, reasoning_template.md, personality_fingerprint.md) - `/datasets/` (6 CSVs) - `/guardrails/guardrails.md` - `/memory_schemas/` (2 CSV schemas + memory_rules.md) - `/docs/` (this file + readme + use cases) If you add a knowledge map later, put it in: - `/datasets/knowledge_map.csv` (recommended) or `/datasets/master_grid.csv` --- ## 1) Implement the core behavior files ### 1.1 System Prompt - Paste `/core/system_prompt.md` into your agent’s **system** message. - This defines the agent’s role: pattern-first creative partner. ### 1.2 Reasoning Template - Store `/core/reasoning_template.md` as internal guidance in your runtime (developer message / hidden instruction / “policy doc”). - Your runtime should prepend it before each completion (or inject as a “rules” section). ### 1.3 Personality Fingerprint - Add `/core/personality_fingerprint.md` as a style constraint layer. - Use it to keep tone consistent: compact, direct, pattern-oriented. **Result:** the model behaves consistently even before RAG. --- ## 2) Apply guardrails - Load `/guardrails/guardrails.md` as a rules block. - Enforce: - no plagiarism / no “copy this hit song” behavior - no made-up dataset facts - no unsafe content requests - outputs should be structured and testable In n8n: you typically inject guardrails as part of the prompt assembly (before the user message). --- ## 3) Prepare datasets for RAG You have 6 CSV datasets in `/datasets/`. Best practice is to convert each row into a **retrieval document** with: - `source_dataset` - `row_id` - key fields - a short “row summary” string for embeddings ### 3.1 Minimal row-to-document format (recommended) For each CSV row, create a text payload like: - Title line: `[DATASET=lyric_structure_map | id=LSM_012]` - Then: `field=value` lines (only the meaningful ones) - Then: a compact 1–2 sentence row summary This makes retrieval clean and avoids embedding empty columns. --- ## 4) Upsert into a Vector DB (VDB) You can use Pinecone, Qdrant, Weaviate, Chroma, FAISS — anything that supports: - embeddings vector - metadata filters - similarity search ### 4.1 What to store per vector **Vector record** - `id`: stable id (ex: `lyric_structure_map:LSM_012`) - `text`: the row-to-document payload - `metadata`: - `dataset` (one of the 6) - tags / genre / pattern_type (if available) - any fields you want to filter by ### 4.2 n8n implementation (practical steps) 1) **Read file(s)** - Node: “Read Binary File” (or fetch from GitHub / Drive) 2) **Parse CSV** - Node: “Spreadsheet File” → Convert to JSON (or CSV Parse) 3) **Normalize rows** - Node: “Function” (build `id`, `text`, `metadata`) 4) **Create embeddings** - Node: “OpenAI” → Embeddings (or any embedding provider) 5) **Upsert to VDB** - Pinecone/Qdrant/Weaviate via: - native node if available, OR - “HTTP Request” node to the VDB REST API 6) **Verify** - Run a test query and confirm you retrieve relevant rows. **Tip:** store dataset name in metadata so you can filter retrieval per task: - “only TikTok formats” → filter dataset=`tiktok_concept_patterns` - “structure help” → dataset=`lyric_structure_map` --- ## 5) RAG retrieval at runtime At inference time, your agent should: 1) classify intent (hook / structure / tiktok / genre flip / audit) 2) select 1–3 datasets to query 3) retrieve top-K rows (ex: K=6–12) 4) synthesize output using retrieved rows only (no invented dataset claims) ### 5.1 Prompt assembly (runtime order) 1) System prompt 2) Guardrails 3) Reasoning template 4) Personality fingerprint 5) Memory snapshot (user profile + project workspace) 6) Retrieved context (RAG) 7) User message --- ## 6) Knowledge map / “Master Grid” (optional but recommended) If you want cross-dataset reasoning, add a **knowledge map** file to link patterns: ### 6.1 Simple schema (CSV) Store links as triplets: - `source_node`, `relation`, `target_node`, `weight`, `notes` Examples: - `tiktok_format:duet_bait` → `supports` → `viral_signal:comment_trigger` - `structure:prechorus_lift` → `amplifies` → `viral_signal:anticipation` - `genre_flip:reggaeton` → `prefers` → `hook_style:call_response` ### 6.2 How to use it - Upsert the knowledge map into the same VDB (or keep as a small local lookup table). - When generating, retrieve: - primary rows from the relevant dataset(s) - plus 3–8 knowledge-map links that connect them - Use those links to produce “why this works” explanations and better constraints. --- ## 7) Memory implementation (User Profile + Project Workspace) Use the files in `/memory_schemas/`: - `user_profile_memory.csv` - `project_workspace_memory.csv` - `memory_rules.md` ### 7.1 Read memory Before responding: - load active user profile facts (preferences, style constraints) - load current project workspace (objectives, constraints, next actions) ### 7.2 Write memory After responding, write only durable facts: - user preferences that recur - project decisions (selected concept, chosen genre, chosen structure) - next actions (what to test next) **Important:** append new rows; don’t overwrite old ones. --- ## 8) Quick acceptance test (you can run in any runtime) Try these prompts and verify RAG is working: 1) “Give me 8 hook angles + why each is replayable.” 2) “Design a 30s TikTok loop concept. 1 prop, 1 angle.” 3) “Transform this concept into cumbia and then into alt-rock.” 4) “Audit this chorus for viral signals and give minimal fixes.” If outputs reference your dataset concepts consistently, you’re done. --- ## 9) Common failure modes (and fixes) - **Generic output** → increase retrieval K; tighten prompt to require citing retrieved patterns - **Hallucinated claims** → enforce: “If not in retrieved context, say unknown” - **Too long** → cap variants; default to compact bullet outputs - **Bad retrieval** → improve row-to-document summaries; add better metadata filters