File size: 6,609 Bytes
c9f5b32
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
# Utilizes TOON (Token-Oriented Object Notation) for token efficiency and structured output.

LABELING_PROMPT_TEMPLATE = """
You are an AI Factuality Assessment Agent operating under the "Ali Arsanjani Factuality Factors" framework. 
Your goal is to mass-label video content, quantifying "Veracity Vectors" and "Modality Alignment".

**INPUT DATA:**
- **User Caption:** "{caption}"
- **Audio Transcript:** "{transcript}"
- **Visuals:** (Provided in video context)

**INSTRUCTIONS:**
1.  **Grounding:** Cross-reference claims in the transcript with your internal knowledge base (and tools if active).
2.  **Chain of Thought (<thinking>):** You MUST think step-by-step inside a `<thinking>` block before generating output.
    *   Analyze *Visual Integrity* (Artifacts, edits).
    *   Analyze *Audio Integrity* (Voice cloning, sync).
    *   Analyze *Modality Alignment* (Does video match audio? Does caption match content? Does audio match caption?).
    *   Analyze *Logic* (Fallacies, gaps).
    *   **Classify Tags:** Identify 3-5 relevant tags (e.g., "political", "celebrity", "targeting", "satire", "news").
    *   Determine *Disinformation* classification.
3.  **Output Format:** Output strictly in **TOON** format (Token-Oriented Object Notation) as defined below.

**CRITICAL CONSTRAINTS:** 
- Do NOT repeat the input data.
- START your response IMMEDIATELY with the `<thinking>` tag.
- **DO NOT use Markdown code blocks.** (Output plain text only).
- Use strict `Key : Type [ Count ] {{ Headers }} :` format followed by data lines.
- Strings containing commas MUST be quoted.
- ALL scores must be filled (use 0 if unsure, do not leave blank).
- **MODALITY SCORING:** You must provide 3 distinct alignment scores: Video-Audio, Video-Caption, and Audio-Caption.

**TOON SCHEMA:**
{toon_schema}

{score_instructions}

**RESPONSE:**
<thinking>
"""

SCORE_INSTRUCTIONS_REASONING = """
**Constraints:** 
1. Provide specific reasoning for EACH score in the `vectors` and `modalities` tables.
2. Ensure strings are properly quoted.
"""

SCORE_INSTRUCTIONS_SIMPLE = """
**Constraint:** Focus on objective measurements. Keep text concise.
"""

# Updated Schema based on user requirements - Ensure explicit newlines
SCHEMA_SIMPLE = """summary: text[1]{text}:
"Brief neutral summary of the video events"

tags: list[1]{keywords}:
"political, celebrity, deepfake, viral"

vectors: scores[1]{visual,audio,source,logic,emotion}:
(Int 1-10),(Int 1-10),(Int 1-10),(Int 1-10),(Int 1-10)
*Scale: 1=Fake/Malicious, 10=Authentic/Neutral*

modalities: scores[1]{video_audio_score,video_caption_score,audio_caption_score}:
(Int 1-10),(Int 1-10),(Int 1-10)
*Scale: 1=Mismatch, 10=Perfect Match*

factuality: factors[1]{accuracy,gap,grounding}:
(Verified/Misleading/False),"Missing evidence description","Grounding check results"

disinfo: analysis[1]{class,intent,threat}:
(None/Misinfo/Disinfo/Satire),(Political/Commercial/None),(Deepfake/Recontextualization/None)

final: assessment[1]{score,reasoning}:
(Int 1-100),"Final synthesis of why this score was given"
"""

SCHEMA_REASONING = """
summary: text[1]{text}:
"Brief neutral summary of the video events"

tags: list[1]{keywords}:
"political, celebrity, deepfake, viral"

vectors: details[5]{category,score,reasoning}:
Visual,(Int 1-10),"Reasoning for visual score"
Audio,(Int 1-10),"Reasoning for audio score"
Source,(Int 1-10),"Reasoning for source credibility"
Logic,(Int 1-10),"Reasoning for logical consistency"
Emotion,(Int 1-10),"Reasoning for emotional manipulation"

modalities: details[3]{category,score,reasoning}:
VideoAudio,(Int 1-10),"Reasoning for video-to-audio alignment"
VideoCaption,(Int 1-10),"Reasoning for video-to-caption alignment"
AudioCaption,(Int 1-10),"Reasoning for audio-to-caption alignment"

factuality: factors[1]{accuracy,gap,grounding}:
(Verified/Misleading/False),"Missing evidence description","Grounding check results"

disinfo: analysis[1]{class,intent,threat}:
(None/Misinfo/Disinfo/Satire),(Political/Commercial/None),(Deepfake/Recontextualization/None)

final: assessment[1]{score,reasoning}:
(Int 1-100),"Final synthesis of why this score was given"
"""

# ==========================================
# Fractal Chain of Thought (FCoT) Prompts
# ==========================================

FCOT_MACRO_PROMPT = """
**Fractal Chain of Thought - Stage 1: Macro-Scale Hypothesis (Wide Aperture)**

You are analyzing a video for factuality.
**Context:** Caption: "{caption}" | Transcript: "{transcript}"

1. **Global Scan**: Observe the video, audio, and caption as a whole entity.
2. **Context Aperture**: Wide. Assess the overall intent (Humor, Information, Political, Social) and the setting.
3. **Macro Hypothesis**: Formulate a high-level hypothesis about the veracity. (e.g., "The video is likely authentic but the caption misrepresents the location" or "The audio quality suggests synthetic generation").

**Objective**: Maximize **Coverage** (broadly explore potential angles of manipulation).

**Output**: A concise paragraph summarizing the "Macro Hypothesis".
"""

FCOT_MESO_PROMPT = """
**Fractal Chain of Thought - Stage 2: Meso-Scale Expansion (Recursive Verification)**

**Current Macro Hypothesis**: "{macro_hypothesis}"

**Action**: Zoom In. Decompose the hypothesis into specific verification branches.
Perform the following checks recursively:

1. **Visual Branch**: Look for specific artifacts, lighting inconsistencies, cuts, or deepfake signs.
2. **Audio Branch**: Analyze lip-sync, background noise consistency, and voice tonality.
3. **Logical Branch**: Does the visual evidence strictly support the caption's claim? Are there logical fallacies?

**Dual-Objective Self-Correction**:
- **Faithfulness**: Do not hallucinate details not present in the video.
- **Coverage**: Did you miss any subtle cues?

**Output**: Detailed "Micro-Observations" for each branch. If you find contradictions to the Macro Hypothesis, note them explicitly as **"Self-Correction"**.
"""

FCOT_SYNTHESIS_PROMPT = """
**Fractal Chain of Thought - Stage 3: Inter-Scale Consensus & Synthesis**

**Action**: Integrate your Macro Hypothesis and Micro-Observations.
- **Consensus Check**: If Micro-Observations contradict the Macro Hypothesis, prioritize the Micro evidence (Self-Correction).
- **Compression**: Synthesize the findings into the final structured format.
- **Tags**: Assign 3-5 high-level tags (e.g., "political", "fabricated", "humor").

**Output Format**:
Strictly fill out the following TOON schema based on the consensus. Do not include markdown code blocks.

**TOON SCHEMA**:
{toon_schema}

{score_instructions}
"""