# workflow_notes.md
[AGENTARIUM_ASSET]
Name: Viral Muse – Workflow Notes (Implementation)
Version: v1.0
Status: Draft

## Goal
Implement Viral Muse as a **dataset-driven** agent using:
- system prompt + reasoning template + personality fingerprint
- guardrails
- RAG over the 6 CSV datasets
- optional “knowledge map” layer for cross-dataset linking
- memory: user profile + project workspace

This guide assumes an orchestration runtime like **n8n**, but the logic applies to LangChain, Flowise, Dify, etc.

---

## 0) Folder sanity check (Agentarium v1)
You should have:
- `/core/` (system_prompt.md, reasoning_template.md, personality_fingerprint.md)
- `/datasets/` (6 CSVs)
- `/guardrails/guardrails.md`
- `/memory_schemas/` (2 CSV schemas + memory_rules.md)
- `/docs/` (this file + readme + use cases)

If you add a knowledge map later, put it in:
- `/datasets/knowledge_map.csv` (recommended) or `/datasets/master_grid.csv`

---

## 1) Implement the core behavior files
### 1.1 System Prompt
- Paste `/core/system_prompt.md` into your agent’s **system** message.
- This defines the agent’s role: pattern-first creative partner.

### 1.2 Reasoning Template
- Store `/core/reasoning_template.md` as internal guidance in your runtime (developer message / hidden instruction / “policy doc”).
- Your runtime should prepend it before each completion (or inject as a “rules” section).

### 1.3 Personality Fingerprint
- Add `/core/personality_fingerprint.md` as a style constraint layer.
- Use it to keep tone consistent: compact, direct, pattern-oriented.

**Result:** the model behaves consistently even before RAG.

---

## 2) Apply guardrails
- Load `/guardrails/guardrails.md` as a rules block.
- Enforce:
  - no plagiarism / no “copy this hit song” behavior
  - no made-up dataset facts
  - no unsafe content requests
  - outputs should be structured and testable

In n8n: you typically inject guardrails as part of the prompt assembly (before the user message).

---

## 3) Prepare datasets for RAG
You have 6 CSV datasets in `/datasets/`.
Best practice is to convert each row into a **retrieval document** with:
- `source_dataset`
- `row_id`
- key fields
- a short “row summary” string for embeddings

### 3.1 Minimal row-to-document format (recommended)
For each CSV row, create a text payload like:

- Title line: `[DATASET=lyric_structure_map | id=LSM_012]`
- Then: `field=value` lines (only the meaningful ones)
- Then: a compact 1–2 sentence row summary

This makes retrieval clean and avoids embedding empty columns.

---

## 4) Upsert into a Vector DB (VDB)
You can use Pinecone, Qdrant, Weaviate, Chroma, FAISS — anything that supports:
- embeddings vector
- metadata filters
- similarity search

### 4.1 What to store per vector
**Vector record**
- `id`: stable id (ex: `lyric_structure_map:LSM_012`)
- `text`: the row-to-document payload
- `metadata`:
  - `dataset` (one of the 6)
  - tags / genre / pattern_type (if available)
  - any fields you want to filter by

### 4.2 n8n implementation (practical steps)
1) **Read file(s)**
   - Node: “Read Binary File” (or fetch from GitHub / Drive)
2) **Parse CSV**
   - Node: “Spreadsheet File” → Convert to JSON (or CSV Parse)
3) **Normalize rows**
   - Node: “Function” (build `id`, `text`, `metadata`)
4) **Create embeddings**
   - Node: “OpenAI” → Embeddings (or any embedding provider)
5) **Upsert to VDB**
   - Pinecone/Qdrant/Weaviate via:
     - native node if available, OR
     - “HTTP Request” node to the VDB REST API
6) **Verify**
   - Run a test query and confirm you retrieve relevant rows.

**Tip:** store dataset name in metadata so you can filter retrieval per task:
- “only TikTok formats” → filter dataset=`tiktok_concept_patterns`
- “structure help” → dataset=`lyric_structure_map`

---

## 5) RAG retrieval at runtime
At inference time, your agent should:
1) classify intent (hook / structure / tiktok / genre flip / audit)
2) select 1–3 datasets to query
3) retrieve top-K rows (ex: K=6–12)
4) synthesize output using retrieved rows only (no invented dataset claims)

### 5.1 Prompt assembly (runtime order)
1) System prompt
2) Guardrails
3) Reasoning template
4) Personality fingerprint
5) Memory snapshot (user profile + project workspace)
6) Retrieved context (RAG)
7) User message

---

## 6) Knowledge map / “Master Grid” (optional but recommended)
If you want cross-dataset reasoning, add a **knowledge map** file to link patterns:

### 6.1 Simple schema (CSV)
Store links as triplets:
- `source_node`, `relation`, `target_node`, `weight`, `notes`

Examples:
- `tiktok_format:duet_bait` → `supports` → `viral_signal:comment_trigger`
- `structure:prechorus_lift` → `amplifies` → `viral_signal:anticipation`
- `genre_flip:reggaeton` → `prefers` → `hook_style:call_response`

### 6.2 How to use it
- Upsert the knowledge map into the same VDB (or keep as a small local lookup table).
- When generating, retrieve:
  - primary rows from the relevant dataset(s)
  - plus 3–8 knowledge-map links that connect them
- Use those links to produce “why this works” explanations and better constraints.

---

## 7) Memory implementation (User Profile + Project Workspace)
Use the files in `/memory_schemas/`:
- `user_profile_memory.csv`
- `project_workspace_memory.csv`
- `memory_rules.md`

### 7.1 Read memory
Before responding:
- load active user profile facts (preferences, style constraints)
- load current project workspace (objectives, constraints, next actions)

### 7.2 Write memory
After responding, write only durable facts:
- user preferences that recur
- project decisions (selected concept, chosen genre, chosen structure)
- next actions (what to test next)

**Important:** append new rows; don’t overwrite old ones.

---

## 8) Quick acceptance test (you can run in any runtime)
Try these prompts and verify RAG is working:

1) “Give me 8 hook angles + why each is replayable.”  
2) “Design a 30s TikTok loop concept. 1 prop, 1 angle.”  
3) “Transform this concept into cumbia and then into alt-rock.”  
4) “Audit this chorus for viral signals and give minimal fixes.”

If outputs reference your dataset concepts consistently, you’re done.

---

## 9) Common failure modes (and fixes)
- **Generic output** → increase retrieval K; tighten prompt to require citing retrieved patterns
- **Hallucinated claims** → enforce: “If not in retrieved context, say unknown”
- **Too long** → cap variants; default to compact bullet outputs
- **Bad retrieval** → improve row-to-document summaries; add better metadata filters