File size: 6,627 Bytes
bccb271
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
# workflow_notes.md
[AGENTARIUM_ASSET]
Name: Viral Muse – Workflow Notes (Implementation)
Version: v1.0
Status: Draft

## Goal
Implement Viral Muse as a **dataset-driven** agent using:
- system prompt + reasoning template + personality fingerprint
- guardrails
- RAG over the 6 CSV datasets
- optional “knowledge map” layer for cross-dataset linking
- memory: user profile + project workspace

This guide assumes an orchestration runtime like **n8n**, but the logic applies to LangChain, Flowise, Dify, etc.

---

## 0) Folder sanity check (Agentarium v1)
You should have:
- `/core/` (system_prompt.md, reasoning_template.md, personality_fingerprint.md)
- `/datasets/` (6 CSVs)
- `/guardrails/guardrails.md`
- `/memory_schemas/` (2 CSV schemas + memory_rules.md)
- `/docs/` (this file + readme + use cases)

If you add a knowledge map later, put it in:
- `/datasets/knowledge_map.csv` (recommended) or `/datasets/master_grid.csv`

---

## 1) Implement the core behavior files
### 1.1 System Prompt
- Paste `/core/system_prompt.md` into your agent’s **system** message.
- This defines the agent’s role: pattern-first creative partner.

### 1.2 Reasoning Template
- Store `/core/reasoning_template.md` as internal guidance in your runtime (developer message / hidden instruction / “policy doc”).
- Your runtime should prepend it before each completion (or inject as a “rules” section).

### 1.3 Personality Fingerprint
- Add `/core/personality_fingerprint.md` as a style constraint layer.
- Use it to keep tone consistent: compact, direct, pattern-oriented.

**Result:** the model behaves consistently even before RAG.

---

## 2) Apply guardrails
- Load `/guardrails/guardrails.md` as a rules block.
- Enforce:
  - no plagiarism / no “copy this hit song” behavior
  - no made-up dataset facts
  - no unsafe content requests
  - outputs should be structured and testable

In n8n: you typically inject guardrails as part of the prompt assembly (before the user message).

---

## 3) Prepare datasets for RAG
You have 6 CSV datasets in `/datasets/`.
Best practice is to convert each row into a **retrieval document** with:
- `source_dataset`
- `row_id`
- key fields
- a short “row summary” string for embeddings

### 3.1 Minimal row-to-document format (recommended)
For each CSV row, create a text payload like:

- Title line: `[DATASET=lyric_structure_map | id=LSM_012]`
- Then: `field=value` lines (only the meaningful ones)
- Then: a compact 1–2 sentence row summary

This makes retrieval clean and avoids embedding empty columns.

---

## 4) Upsert into a Vector DB (VDB)
You can use Pinecone, Qdrant, Weaviate, Chroma, FAISS — anything that supports:
- embeddings vector
- metadata filters
- similarity search

### 4.1 What to store per vector
**Vector record**
- `id`: stable id (ex: `lyric_structure_map:LSM_012`)
- `text`: the row-to-document payload
- `metadata`:
  - `dataset` (one of the 6)
  - tags / genre / pattern_type (if available)
  - any fields you want to filter by

### 4.2 n8n implementation (practical steps)
1) **Read file(s)**
   - Node: “Read Binary File” (or fetch from GitHub / Drive)
2) **Parse CSV**
   - Node: “Spreadsheet File” → Convert to JSON (or CSV Parse)
3) **Normalize rows**
   - Node: “Function” (build `id`, `text`, `metadata`)
4) **Create embeddings**
   - Node: “OpenAI” → Embeddings (or any embedding provider)
5) **Upsert to VDB**
   - Pinecone/Qdrant/Weaviate via:
     - native node if available, OR
     - “HTTP Request” node to the VDB REST API
6) **Verify**
   - Run a test query and confirm you retrieve relevant rows.

**Tip:** store dataset name in metadata so you can filter retrieval per task:
- “only TikTok formats” → filter dataset=`tiktok_concept_patterns`
- “structure help” → dataset=`lyric_structure_map`

---

## 5) RAG retrieval at runtime
At inference time, your agent should:
1) classify intent (hook / structure / tiktok / genre flip / audit)
2) select 1–3 datasets to query
3) retrieve top-K rows (ex: K=6–12)
4) synthesize output using retrieved rows only (no invented dataset claims)

### 5.1 Prompt assembly (runtime order)
1) System prompt
2) Guardrails
3) Reasoning template
4) Personality fingerprint
5) Memory snapshot (user profile + project workspace)
6) Retrieved context (RAG)
7) User message

---

## 6) Knowledge map / “Master Grid” (optional but recommended)
If you want cross-dataset reasoning, add a **knowledge map** file to link patterns:

### 6.1 Simple schema (CSV)
Store links as triplets:
- `source_node`, `relation`, `target_node`, `weight`, `notes`

Examples:
- `tiktok_format:duet_bait` → `supports` → `viral_signal:comment_trigger`
- `structure:prechorus_lift` → `amplifies` → `viral_signal:anticipation`
- `genre_flip:reggaeton` → `prefers` → `hook_style:call_response`

### 6.2 How to use it
- Upsert the knowledge map into the same VDB (or keep as a small local lookup table).
- When generating, retrieve:
  - primary rows from the relevant dataset(s)
  - plus 3–8 knowledge-map links that connect them
- Use those links to produce “why this works” explanations and better constraints.

---

## 7) Memory implementation (User Profile + Project Workspace)
Use the files in `/memory_schemas/`:
- `user_profile_memory.csv`
- `project_workspace_memory.csv`
- `memory_rules.md`

### 7.1 Read memory
Before responding:
- load active user profile facts (preferences, style constraints)
- load current project workspace (objectives, constraints, next actions)

### 7.2 Write memory
After responding, write only durable facts:
- user preferences that recur
- project decisions (selected concept, chosen genre, chosen structure)
- next actions (what to test next)

**Important:** append new rows; don’t overwrite old ones.

---

## 8) Quick acceptance test (you can run in any runtime)
Try these prompts and verify RAG is working:

1) “Give me 8 hook angles + why each is replayable.”  
2) “Design a 30s TikTok loop concept. 1 prop, 1 angle.”  
3) “Transform this concept into cumbia and then into alt-rock.”  
4) “Audit this chorus for viral signals and give minimal fixes.”

If outputs reference your dataset concepts consistently, you’re done.

---

## 9) Common failure modes (and fixes)
- **Generic output** → increase retrieval K; tighten prompt to require citing retrieved patterns
- **Hallucinated claims** → enforce: “If not in retrieved context, say unknown”
- **Too long** → cap variants; default to compact bullet outputs
- **Bad retrieval** → improve row-to-document summaries; add better metadata filters