Spaces:

Patricksturg
/

silicon-sampling-dashboard

Running

App Files Files Community

Patricksturg commited on Mar 13

Commit

3a7eac5

verified ·

1 Parent(s): 054b0d1

Update cogtest prompts: checklist respondent, label-free analyst, merge-by-issue synthesis

Browse files

Files changed (1) hide show

dashboard_backend.py +44 -27

dashboard_backend.py CHANGED Viewed

@@ -116,19 +116,45 @@ Work through the question naturally, thinking aloud as you go."""
 COGTEST_RESPONDENT_USER = """WHO YOU ARE:
 {backstory}
 SURVEY QUESTION:
 {question_text}
 RESPONSE OPTIONS: {response_options}
-Think aloud as you work through answering this question:
-1. COMPREHENSION: Put the question in your own words. What is it asking you?
-2. RETRIEVAL: What from your life or experience are you drawing on?
-3. JUDGEMENT: Are you having to weigh up or combine different things?
-4. RESPONSE MAPPING: Which option fits best? How well does it capture what you want to say? Is anything missing?
 5. CHOSEN ANSWER: State your final answer.
-6. CONFIDENCE: Rate 1-5 how confident you are in that answer."""
 COGTEST_ANALYST_SYSTEM = """You are an expert in cognitive interviewing and survey methodology.
 You are reviewing a think-aloud transcript from a cognitive interview.
@@ -144,25 +170,17 @@ RESPONDENT THINK-ALOUD TRANSCRIPT:
 Identify ONLY problems that are clearly evidenced in this transcript. Do NOT go looking for problems that are not there. Most transcripts will have only 1-2 genuine problems, and some will have none. If the respondent navigated the question without difficulty, return an empty problems list.
-A problem must be supported by specific evidence in the transcript - something the respondent actually said or demonstrably struggled with. Do not infer problems that the respondent did not experience.
-When a genuine problem is found, classify it using whichever of these categories fits best:
-- COMPREHENSION PROBLEMS: Respondent misunderstood or misinterpreted terms
-- DOUBLE-BARRELLED: Question asks about two distinct things requiring different answers
-- RETRIEVAL DIFFICULTIES: Respondent struggled to recall or estimate
-- RESPONSE MAPPING FAILURES: Answer did not fit the available options
-- SOCIAL DESIRABILITY / SENSITIVITY: Respondent hedged or showed discomfort
-- PRESUPPOSITION FAILURES: Question assumed something untrue for this respondent
-- TAUTOLOGY / LOGICAL PROBLEMS: Circularity between question wording and response options
-If the problem does not fit any category above, use a short descriptive label.
 For each problem:
 - Cite specific evidence from the transcript (what the respondent actually said)
 - Rate SEVERITY 1-10: the degree to which this problem would distort the accuracy of the answer (1 = negligible distortion, 10 = answer rendered meaningless)
 Return ONLY valid JSON, no markdown fencing:
-{{"respondent_id": "{respondent_id}", "problems": [{{"type": "...", "description": "...", "evidence": "...", "severity": 7}}]}}"""
 COGTEST_SYNTHESIS_SYSTEM = """You are a senior survey methodologist writing a cognitive testing report.
 You are synthesising the results of individual transcript analyses from a cognitive interview study."""
@@ -172,22 +190,21 @@ COGTEST_SYNTHESIS_USER = """SURVEY QUESTION:
 RESPONSE OPTIONS: {response_options}
-Below are the coded problems identified by an analyst for each respondent's think-aloud transcript.
-Each problem has a severity score (1-10) indicating the degree to which it would distort accuracy of the answer.
-Your job is to aggregate these into a summary of the distinct problems found across respondents.
-Use the SAME problem type labels as the individual analyses (e.g. if analysts coded "DOUBLE-BARRELLED", use that label, not a paraphrase).
 INDIVIDUAL ANALYSES:
 {analyses_block}
-For each distinct problem type that was identified:
-- Use the same type label as in the individual analyses
 - List which respondents showed evidence of it
-- Calculate the mean severity score across respondents who had this problem
 - Summarise the evidence across respondents
 Return ONLY valid JSON, no markdown fencing:
-{{"question_id": "synthesis", "problems_detected": [{{"type": "...", "description": "...", "respondents_affected": [...], "mean_severity": 6.5, "evidence_summary": "..."}}]}}"""
 # ============================================================================
 # EXPERT REVIEW PIPELINE PROMPTS

 COGTEST_RESPONDENT_USER = """WHO YOU ARE:
 {backstory}
+You are this person. Respond in first person throughout (I, me, my). Do not break character or refer to yourself in the third person.
 SURVEY QUESTION:
 {question_text}
 RESPONSE OPTIONS: {response_options}
+Think aloud as you work through answering this question, following the stages below. At each stage, consider whether you experience any of the specific difficulties described. Not all of the difficulties listed below will apply — only mention those you actually experience as you work through the question.
+1. COMPREHENSION: Put the question in your own words. What is it actually asking you?
+   As you do this, consider:
+   - DOUBLE-BARRELLED: Is the question asking about two or more distinct things at once, where you might want to give different answers to each part?
+   - VAGUE OR AMBIGUOUS TERMS: Are any words or phrases unclear, open to multiple interpretations, or likely to mean different things to different people?
+   - PRESUPPOSITION: Does the question assume something about you or your situation that may not be true — for example, that you have used a service, hold an opinion, or have had a particular experience?
+   - LEADING OR LOADED: Does the phrasing of the question push you toward a particular answer, for example through emotive language, built-in justifications, or a question structure that implies one answer is more natural or correct?
+2. RETRIEVAL: What from your life or experience are you drawing on to answer this?
+   As you do this, consider:
+   - RECALL DIFFICULTY: Is it hard to remember the information the question asks about — for example because it happened a long time ago, happened many times, or requires counting or estimating over a long period?
+   - REFERENCE PERIOD PROBLEMS: Does the question specify a time frame that is difficult to recall over, or is the time frame vague or missing entirely?
+3. JUDGEMENT: Are you having to weigh up, combine, or summarise different things to arrive at a single answer?
+   As you do this, consider:
+   - COMPLEX AGGREGATION: Does the question require you to combine multiple different experiences, views, or considerations into a single summary response in a way that feels forced or oversimplified?
+   - KNOWLEDGE DEFICIT: Is the question asking you to make a judgement about something you don't feel you know enough about to answer meaningfully?
+4. RESPONSE MAPPING: Look at the response options. Which one fits best? How well does it capture what you actually want to say?
+   As you do this, consider:
+   - POOR FIT: Does your answer not map clearly onto any of the available options, forcing you to choose one that doesn't quite capture what you mean?
+   - MISSING OPTIONS: Is there a response you want to give that isn't available — for example a "not applicable" or "don't know" option, or a middle category?
+   - UNEQUAL INTERVALS: Do the response categories feel unevenly spaced, so that some options cover a much wider range of views or experiences than others?
 5. CHOSEN ANSWER: State your final answer.
+6. CONFIDENCE: Rate your confidence 1-5, where 1 means you felt very unsure about your answer and 5 means you felt completely certain."""
 COGTEST_ANALYST_SYSTEM = """You are an expert in cognitive interviewing and survey methodology.
 You are reviewing a think-aloud transcript from a cognitive interview.
 Identify ONLY problems that are clearly evidenced in this transcript. Do NOT go looking for problems that are not there. Most transcripts will have only 1-2 genuine problems, and some will have none. If the respondent navigated the question without difficulty, return an empty problems list.
+A problem must be supported by specific evidence in the transcript — something the respondent actually said or demonstrably struggled with. Do not infer problems that the respondent did not experience.
 For each problem:
+- Describe the problem in plain language (what went wrong for this respondent)
 - Cite specific evidence from the transcript (what the respondent actually said)
 - Rate SEVERITY 1-10: the degree to which this problem would distort the accuracy of the answer (1 = negligible distortion, 10 = answer rendered meaningless)
+Do NOT classify problems into named categories or assign type labels. Just describe what happened.
 Return ONLY valid JSON, no markdown fencing:
+{{"respondent_id": "{respondent_id}", "problems": [{{"description": "...", "evidence": "...", "severity": 7}}]}}"""
 COGTEST_SYNTHESIS_SYSTEM = """You are a senior survey methodologist writing a cognitive testing report.
 You are synthesising the results of individual transcript analyses from a cognitive interview study."""
 RESPONSE OPTIONS: {response_options}
+Below are the problems identified by an analyst for each respondent's think-aloud transcript. Each problem has a severity score (1-10) indicating the degree to which it would distort accuracy of the answer. Your job is to aggregate these into a summary of the DISTINCT problems found across respondents.
+Focus on the UNDERLYING ISSUE. If multiple respondents describe the same fundamental problem with the question — even in different words or from different angles — merge them into a single problem. Two descriptions count as the same problem if fixing one would also fix the other.
 INDIVIDUAL ANALYSES:
 {analyses_block}
+For each distinct underlying problem:
+- Describe the problem clearly
 - List which respondents showed evidence of it
+- Calculate the mean severity score across all respondents who had this problem
 - Summarise the evidence across respondents
 Return ONLY valid JSON, no markdown fencing:
+{{"question_id": "synthesis", "problems_detected": [{{"description": "...", "respondents_affected": [...], "mean_severity": 6.5, "evidence_summary": "..."}}]}}"""
 # ============================================================================
 # EXPERT REVIEW PIPELINE PROMPTS