frankbrsrk commited on
Commit
7066ead
·
verified ·
1 Parent(s): 540b060

Upload viral_muse_dataset_card.md

Browse files
Files changed (1) hide show
  1. datasets./viral_muse_dataset_card.md +107 -0
datasets./viral_muse_dataset_card.md ADDED
@@ -0,0 +1,107 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ dataset_name: viral-muse-vectorized-kg
3
+ pretty_name: Viral Muse Vectorized Knowledge Graph
4
+ version: v1.0.0
5
+ created_at_utc: 2025-12-13
6
+ license: CC-BY-4.0
7
+ tags:
8
+ - knowledge-graph
9
+ - master-grid
10
+ - synthetic-data
11
+ - retrieval
12
+ - agentic
13
+ - content-patterns
14
+ task_categories:
15
+ - text-retrieval
16
+ - text-generation
17
+ languages:
18
+ - en
19
+ ---
20
+
21
+ # Viral Muse Vectorized Knowledge Graph (KG)
22
+
23
+ This dataset is a **synthetic, structured Master Grid** built for retrieval + linking workflows.
24
+ It contains:
25
+
26
+ - **Atoms** (nodes): normalized semantic units derived from source rows
27
+ - **Edges** (relations): deterministic internal links, platform-alignment links, and rule-based semantic links
28
+ - **Knowledge Map**: dataset registry (schemas, atomization strategy, enabled rules)
29
+
30
+ ## Files
31
+
32
+ - `viral_muse_atoms_master_v1.csv`
33
+ - `viral_muse_edges_master_v1.csv`
34
+ - `viral_muse_knowledge_map_v1.csv`
35
+ - `viral_muse_kg_manifest_v1.json`
36
+
37
+ ## What’s inside
38
+
39
+ ### Atom counts
40
+
41
+ {
42
+ "creative_partner_advice_map": 406,
43
+ "genre_transformation_rules": 169,
44
+ "lyric_structure_map": 300,
45
+ "tiktok_concept_patterns": 299,
46
+ "viral_pattern_signals": 308,
47
+ "viral_potential_rated": 616
48
+ }
49
+
50
+ Total atoms: **2098**
51
+
52
+ ### Edge counts by relation type
53
+
54
+ {
55
+ "guided_by": 203,
56
+ "has_bundle": 308,
57
+ "platform_aligned": 7440,
58
+ "platform_multi_aligned": 260,
59
+ "semantic_related": 857
60
+ }
61
+
62
+ Total edges: **9068**
63
+
64
+ ## Schema
65
+
66
+ ### atoms_master columns
67
+
68
+ atom_id, source_dataset_id, source_row_id, atom_category, primary_label, secondary_labels, attributes, tags, provenance, confidence, status, notes
69
+
70
+ ### edges_master columns
71
+
72
+ edge_id, from_atom_id, to_atom_id, relation_type, relation_subtype, weight, confidence, source_rule_ids, notes
73
+
74
+ ## Intended use
75
+
76
+ Use this KG to:
77
+
78
+ - retrieve “best matching” patterns (atoms) for a given request
79
+ - follow relationships to adjacent patterns (edges) for expansion
80
+ - power ranking, recommendation, and structured reasoning inside agent pipelines
81
+
82
+ ## Recommended ingestion pattern
83
+
84
+ 1) **Vector DB**: upsert **atoms** as documents
85
+ - Vector ID recommendation: `viral_muse:<source_dataset_id>:<atom_id>`
86
+ - Attach metadata: `source_dataset_id`, `source_row_id`, `atom_category`, `confidence`, `status`, `version`
87
+
88
+ 2) **Graph layer**: ingest **edges** into a graph store (or adjacency index)
89
+ - Use `edge_id` as primary key
90
+ - Validate that `from_atom_id` and `to_atom_id` exist in atoms
91
+
92
+ ## Data origin & safety
93
+
94
+ - **Synthetic**: not collected from real users
95
+ - **No personal data**
96
+ - **Non-operational**: patterns are for creative/retrieval workflows
97
+
98
+ ## Limitations
99
+
100
+ - CPA semantic connectivity is intentionally conservative (low shared vocabulary)
101
+ - Genre transformation rules (GTR) are present as atoms but have minimal linking in v1 (to be expanded with transformation-specific rules)
102
+
103
+ ## Citation / attribution
104
+
105
+ If you use this dataset publicly, attribute:
106
+ **Agentarium / Viral Muse (synthetic)** (CC-BY-4.0)
107
+