Spaces:

BtB-ExpC
/

Exercises

Sleeping

App Files Files Community

BtB-ExpC commited on Feb 12, 2025

Commit

2a11ca0

1 Parent(s): d5dd05e

improving prompts

Browse files

Files changed (4) hide show

chains/distractors/distractors_chain.py +4 -4
config/system_prompt_texts.py +260 -0
config/templates.py +44 -93
test samples.md +17 -3

chains/distractors/distractors_chain.py CHANGED Viewed

@@ -47,25 +47,25 @@ class DistractorsChain(BaseModel):
         tasks.append(run_brainstorm(
             self.template_distractors_brainstorm_1,
             self.llm_brainstorm_1,
-            "T1-1"
         ))
         # Template 1, LLM 2
         tasks.append(run_brainstorm(
             self.template_distractors_brainstorm_1,
             self.llm_brainstorm_2,
-            "T1-2"
         ))
         # Template 2, LLM 1
         tasks.append(run_brainstorm(
             self.template_distractors_brainstorm_2,
             self.llm_brainstorm_1,
-            "T2-1"
         ))
         # Template 2, LLM 2
         tasks.append(run_brainstorm(
             self.template_distractors_brainstorm_2,
             self.llm_brainstorm_2,
-            "T2-2"
         ))
         # Kick them off concurrently

         tasks.append(run_brainstorm(
             self.template_distractors_brainstorm_1,
             self.llm_brainstorm_1,
+            "T1-L1"
         ))
         # Template 1, LLM 2
         tasks.append(run_brainstorm(
             self.template_distractors_brainstorm_1,
             self.llm_brainstorm_2,
+            "T1-L2"
         ))
         # Template 2, LLM 1
         tasks.append(run_brainstorm(
             self.template_distractors_brainstorm_2,
             self.llm_brainstorm_1,
+            "T2-L1"
         ))
         # Template 2, LLM 2
         tasks.append(run_brainstorm(
             self.template_distractors_brainstorm_2,
             self.llm_brainstorm_2,
+            "T2-L2"
         ))
         # Kick them off concurrently

config/system_prompt_texts.py ADDED Viewed

	@@ -0,0 +1,260 @@

+# config/prompt_texts.py
+template_standardize_exercise_text = """
+"""
+template_standardize_studytext_text = """
+"""
+template_diagnose_double_negation_text = """
+Analyze a multiple-choice exercise (Question- or Statement-type) for the presence of double negatives: either two negations in the question/statement itself, or a negation in the question/statement AND in an answer option.
+Here are some examples of double negatives:
+<example>
+    <exercise>
+        <prompt>
+            <type>Vraag</type>
+            <text>Wat is geen veel voorkomend symptoom (volgens de enquêteuitslag) van niet gelukkig zijn?</text>
+        </prompt>
+        <options>
+            1. Gezondheidsproblemen
+            2. Weinig tijd voor ontspanning
+            3. Vaak alleen zijn
+            4. Veel te doen hebben op je werk
+        </options>
+        <correct_answer>4</correct_answer>
+    </exercise>
+    <double_negative>
+        First negation: "Wat is geen" in prompt
+        Second negation: "van niet gelukkig zijn" in prompt
+        Explanation: Two negations in the prompt ("geen" and "niet") form a double negation.
+    </double_negative>
+</example>
+<example>
+    <exercise>
+        <prompt>
+            <type>Stelling</type>
+            <text>Expertfolio wordt niet aangeboden door ENI.</text>
+        </prompt>
+        <options>
+            1. Deze stelling is correct
+            2. Deze stelling is niet correct
+        </options>
+        <correct_answer>1</correct_answer>
+    </exercise>
+    <double_negative>
+        First negation: "wordt niet aangeboden" in prompt
+        Second negation: "is niet correct" in option 1
+        Explanation: Interpreted together (as the student would in their head, trying to pick the correct answer option), they form a statement with a double negation: "De stelling dat Expertfolio niet wordt aangeboden is niet correct"
+    </double_negative>
+</example>
+<example>
+    <exercise>
+        <prompt>
+            <type>Vraag</type>
+            <text>Welk aspect hoort niet bij eenzaamheid?</text>
+        </prompt>
+        <options>
+            1. Betekenisvolle relaties hebben
+            2. Depressiviteit en angst
+            3. Veel alleen zijn
+            4. Geen lijfelijk contact hebben
+        </options>
+        <correct_answer>1</correct_answer>
+    </exercise>
+    <double_negative>
+        First negation: "hoort niet bij" in prompt
+        Second negation: "Geen lijfelijk contact" in option 4
+        Explanation: Together, these create a double negative - "Geen lichamelijk contact
+        hebben hoort niet bij eenzaamheid"
+    </double_negative>
+</example>
+If it's obvious that there is or isn't a double negative in this exercise, just give a short one-sentence diagnosis on this.
+If the issue is more nuanced, take more time to do some reasoning first, and give your diagnosis only then.
+"""
+template_diagnose_correct_answer_stands_out_text = """
+You evaluate a multiple-choice exercise to determine if the correct answer
+stands out inappropriately compared to the distractors. If the correct answer is significantly
+longer, much more specific, or grammatically or otherwise structurally different, this is undesirable. This is because any clear pattern in the answer options which distinguishes the correct answer from the other options, makes it easier for the students to guess correctly regardless of their factual knowledge. So, we are looking for cases where the correct answer differs from the rest in a cosmetic, meta-, visual or otherwise superficial way, which doesn't require factual understanding to spot. It is your task to diagnose such
+cases.
+Here are some examples of cases where the correct answer stands out inappropriately:
+<examples>
+    <example>
+        <exercise>
+            <context>
+                De volgende afbeelding komt uit een onderzoek over eenzaamheid dat in 2012 is uitgevoerd.
+            </context>
+            <prompt>
+                <type>Vraag</type>
+                <text>Bij welke groep komt eenzaamheid volgens dit onderzoek het vaakst voor?</text>
+            </prompt>
+            <options>
+                1. Gehandicapten
+                2. Mantelzorgers
+                3. Mensen met langdurige psychische aandoeningen
+                4. Sporters
+            </options>
+            <correct_answer>3</correct_answer>
+        </exercise>
+        <answer_consistency_analysis>
+            Issue: Length difference
+            Pattern: All distractors are single words, while correct answer is a multi-word phrase
+            Diagnosis: The longer length of the correct answer makes it stand out inappropriately
+        </answer_consistency_analysis>
+    </example>
+    <example>
+       <exercise>
+           <prompt>
+               <type>Vraag</type>
+               <text>Wat is alimentatie?</text>
+           </prompt>
+           <options>
+               1. Geld dat betaald moet worden na een scheiding
+               2. Een lening van de overheid
+               3. Een maandelijkse bijdrage aan liefdadigheid
+               4. Een belastingteruggave
+           </options>
+           <correct_answer>1</correct_answer>
+       </exercise>
+       <answer_consistency_analysis>
+           Issue: Grammatical structure difference
+           Pattern: All distractors start with "Een", while correct answer starts with "Geld"
+           Diagnosis: The different grammatical structure of the correct answer makes it stand out undesirably (a superficial pattern that could hint at the answer)
+       </answer_consistency_analysis>
+    </example>
+    <example>
+        <exercise>
+            <prompt>
+                <type>Vraag</type>
+                <text>Welke onderwijskundige benadering wordt hier beschreven: "Leerlingen werken samen in kleine groepen en hebben elk een eigen rol en verantwoordelijkheid binnen de groep"?</text>
+            </prompt>
+            <options>
+                1. Een activerende methode
+                2. Jigsaw cooperative learning
+                3. Benadering waarbij gedrag belangrijk is
+                4. Leren als informatieverwerking
+            </options>
+            <correct_answer>2</correct_answer>
+        </exercise>
+        <answer_consistency_analysis>
+            Issue: Level of specificity difference
+            Pattern: While distractors use general educational terms that could apply to many approaches, the correct answer uses a very specific named methodology
+            Impact: The precise, technical term in the correct answer stands out against the more general educational concepts in the distractors
+        </answer_consistency_analysis>
+    </example>
+</examples>
+Your only focus is to accurately diagnose this issue of an inappropriately different correct answer, no need to provide a fix. Really take your time to arrive at the correct diagnosis, weighing if the pattern is clear enough or not.
+Do some reasoning first, and give your diagnosis then.
+"""
+template_diagnose_distractor_clearly_wrong_text = """
+"""
+template_diagnose_distractor_partially_correct_text = """
+"""
+diagnose_scorecard_template_text = """
+"""
+template_distractors_brainstorm_1_text = """
+"""
+template_distractors_brainstorm_2_text = """
+"""
+template_consolidate_distractors_text = """
+"""
+template_gen_prompt_a_text = """
+"""
+template_gen_prompt_b_text = """
+"""
+template_sanitize_learning_objectives_text = """
+"""
+XML_templates= [
+"""
+<example>
+    <exercise>
+        <prompt>
+            <type>Stelling</type>
+            <text></text>
+        </prompt>
+        <options>
+            1. Deze stelling is correct
+            2. Deze stelling is niet correct
+        </options>
+        <correct_answer></correct_answer>
+    </exercise>
+    <double_negative>
+        -
+        -
+    </double_negative>
+</example>
+"""
+    ,
+"""
+<example>
+    <exercise>
+        <prompt>
+            <type>Vraag</type>
+            <text></text>
+        </prompt>
+        <options>
+            1.
+            2.
+            3.
+            4.
+        </options>
+        <correct_answer></correct_answer>
+    </exercise>
+    <double_negative>
+        -
+        -
+    </double_negative>
+</example>
+"""
+    ,
+"""
+"""
+]

config/templates.py CHANGED Viewed

@@ -1,5 +1,22 @@
 # config/templates.py
 from langchain_core.prompts.chat import ChatPromptTemplate
 template_standardize_exercise = ChatPromptTemplate(
     messages=[
@@ -28,46 +45,7 @@ template_standardize_studytext = ChatPromptTemplate(
 template_diagnose_double_negation = ChatPromptTemplate(
     messages=[
-        ("system", """Analyze a multiple-choice exercise for the presence of double negatives: either two negations in the question/statement itself, or a negation in the question/statement AND in an answer option.
-        Here are some examples of double negatives:
-        <example 1>
-        <exercise>
-        Stelling
-        Expertfolio wordt niet aangeboden door ENI.
-        Keuzeopties:
-        1. Deze stelling is niet correct
-        2. Deze stelling is correct
-        Correct antwoord:
-        1. Deze stelling is niet correct
-        </exercise>
-        <double negative explanation>
-        The statement itself contains one negation (wordt 'niet' aangeboden), and one answer option contains another (is 'niet' correct). Interpreted together, this forms a statement with a double negation ('het is niet correct dat Expertfolio niet wordt aangeboden' is een dubbele ontkenning).
-        </double negative explanation>
-        </example 1>
-        <example 2>
-        <exercise>
-        Vraag
-        Welk aspect hoort niet bij eenzaamheid?
-        Keuzeopties:
-        1. Betekenisvolle relaties hebben
-        2. Depressiviteit en angst
-        3. Veel alleen zijn
-        4. Geen lijfelijk contact hebben
-        Correct antwoord:
-        1. Betekenisvolle relaties hebben
-        </exercise>
-        <double negative explanation>
-        The question itself contains one negation  (hoort 'niet' bij), and an answer option contains the second ('Geen' lijfelijk contact). Together, the resulting statement contains a double negation ('Geen lichamelijk contact hebben hoort niet bij eenzaamheid').
-        </double negative explanation>
-        </example 2>.
-        If it's obvious that there is or isn't a double negative in this exercise, just give a short one-sentence diagnosis on this.
-        If the issue is more nuanced, take more time to do some reasoning first, and give your diagnosis only after."""),
         ("human", "{standardized_exercise}")
     ],
     input_variables=["standardized_exercise"]
@@ -75,53 +53,7 @@ template_diagnose_double_negation = ChatPromptTemplate(
 template_diagnose_correct_answer_stands_out = ChatPromptTemplate(
     messages=[
-        ("system", """You evaluate a multiple-choice exercise to determine if the correct answer
-        stands out too much compared to the distractors. If the correct answer is significantly
-        longer, more detailed, or structurally or grammatically different, this is undesirable. Identify such
-        cases.
-        Here are some examples of cases where the correct answer stands out:
-        <example where the correct answer is much longer>
-        <exercise>
-        Theorie:
-        De volgende afbeelding komt uit een onderzoek over eenzaamheid dat in 2012 is uitgevoerd.
-        Vraag:
-        Bij welke groep komt eenzaamheid volgens dit onderzoek het vaakst voor?
-        1. Gehandicapten
-        2. Mantelzorgers
-        3. Mensen met langdurige psychische aandoeningen
-        4. Sporters
-        Correct antwoord:
-        3. Mensen met langdurige psychische aandoeningen.
-        </exercise>
-        <explanation how the correct answer stands out>
-        Alle afleiders zijn 1 woord (kort), terwijl het correcte antwoord een zin is (duidelijk langer).
-        </explanation how the correct answer stands out>
-        </example where X>
-        <example where the correct answer is grammatically different>
-        <exercise>
-        Vraag: Wat is alimentatie?
-        1. Geld dat betaald moet worden na een scheiding
-        2. Een lening van de overheid
-        3. Een maandelijkse bijdrage aan liefdadigheid
-        4. Een belastingteruggave
-        Correct antwoord:
-        1. Geld dat betaald moet worden na een scheiding of als men niet meer samen is met de andere ouder van de kinderen.
-        </exercise>
-        <explanation how the correct answer stands out>
-        Alle afleiders beginnen met "Een", maar het correcte antwoord begint anders.
-        </explanation how the correct answer stands out>
-        </example where the correct answer is grammatically different>
-        Your only focus is to accurately diagnose this issue, no need to provide a fix. Really take your time to arrive at the correct diagnosis.
-        Do some reasoning first, and give your diagnosis then."""),
         ("human", "{standardized_exercise}")
     ],
     input_variables=["standardized_exercise"]
@@ -222,7 +154,7 @@ template_distractors_brainstorm_2 = ChatPromptTemplate(
                    "Those are the two bounds of the spectrum range we aim to operate between during brainstorming.\n"
                    "So, through the above process of picking some júst faulty distractors in the context of the given question, both barely too correct and barely too obviously false, you establish the two bounds of acceptable distractors. When brainstorming, don't play it entirely safe though; when in doubt about where exactly on the spectrum the distractors would lie, just list the distractors you came up with anyway.\n\n"
                    "Next, in the brainstorming phase, it's most important that you get really creative and really try to think outside the box, to come up with the required potential alternative answer options to the exercise. We want to approach this task from all different angles, "
-                   "to arrive at a varied selection of options, to serve as inspiration for a later stage of final selection (not now) to make the exercise the best it can be. For now, carry out the above-described prep in writing, then draft the list of{intermediate_distractors_specification} alternative distractors (in the same language as the existing exercise)."),
         ("human", "{standardized_exercise}")
     ],
     input_variables=["standardized_exercise", "intermediate_distractors_specification"]
@@ -262,7 +194,15 @@ template_gen_prompt_a = ChatPromptTemplate(
         - Use exactly the same terminology that's used in the study text
         - Mirror also the general language level of the study text. If the text is written with very simple words, then the learning objectives should be also written in very simple words
         - Mirror also the voice of the text (passive or active voice) and the perspective of the text (second or third person)
-        - Are as concise as can be: they contain the smallest possible knowledge element. A learning objective does not combine multiple facts, but rather isolates individual facts
         - Avoid absolute terms that overstate their universality, like 'always' and 'never', unless that actually is true 100% of the time (usually there are exceptions to every rule, so account for those in your phrasing)
         - Alternatively avoid vague terms that make what they wanna say too meaningless, like 'can', 'could', 'might' and 'may' (many things 'can', 'could' or 'might be', this doesn't say much)
         - Also avoid subjective terms like 'often', 'sometimes', 'many', 'few', 'common', 'rare'. Instead, make more specific and falsifiable claims like 'in most cases' or 'A is more common than B'
@@ -279,6 +219,17 @@ template_gen_prompt_a = ChatPromptTemplate(
     input_variables=["standardized_text"]
 )
 template_gen_prompt_b = ChatPromptTemplate(
     messages=[
         ("system", """
@@ -372,11 +323,11 @@ template_gen_prompt_b = ChatPromptTemplate(
 template_sanitize_learning_objectives = ChatPromptTemplate(
     messages=[
         ("system", "You are given an output of a brainstorming session that lead to the generation of learning objectives. Your task is to "
-                   "turn this output into a neat numbered list of just the learning objectives, nothing else. Do not translate or otherwise edit the learning objectives, just relay them as a list.\n"
                    "<example of a perfect list>\n"
-                   "1. De student weet dat de neus een zintuig is.\n"
-                   "2. De student weet dat de tong een zintuig is.\n"
-                   "3. De student weet dat de huid een zintuig is.\n"
                    "</example of a perfect list>"),
         ("human", "Here is the output:\n "
                   "{raw_output}")

 # config/templates.py
 from langchain_core.prompts.chat import ChatPromptTemplate
+# config/templates.py
+from config.system_prompt_texts import (
+    template_standardize_exercise_text,
+    template_standardize_studytext_text,
+    template_diagnose_double_negation_text,
+    template_diagnose_correct_answer_stands_out_text,
+    template_diagnose_distractor_clearly_wrong_text,
+    template_diagnose_distractor_partially_correct_text,
+    diagnose_scorecard_template_text,
+    template_distractors_brainstorm_1_text,
+    template_distractors_brainstorm_2_text,
+    template_consolidate_distractors_text,
+    template_gen_prompt_a_text,
+    template_gen_prompt_b_text,
+    template_sanitize_learning_objectives_text,
+)
 template_standardize_exercise = ChatPromptTemplate(
     messages=[
 template_diagnose_double_negation = ChatPromptTemplate(
     messages=[
+        ("system", template_diagnose_double_negation_text),
         ("human", "{standardized_exercise}")
     ],
     input_variables=["standardized_exercise"]
 template_diagnose_correct_answer_stands_out = ChatPromptTemplate(
     messages=[
+        ("system", template_diagnose_correct_answer_stands_out_text),
         ("human", "{standardized_exercise}")
     ],
     input_variables=["standardized_exercise"]
                    "Those are the two bounds of the spectrum range we aim to operate between during brainstorming.\n"
                    "So, through the above process of picking some júst faulty distractors in the context of the given question, both barely too correct and barely too obviously false, you establish the two bounds of acceptable distractors. When brainstorming, don't play it entirely safe though; when in doubt about where exactly on the spectrum the distractors would lie, just list the distractors you came up with anyway.\n\n"
                    "Next, in the brainstorming phase, it's most important that you get really creative and really try to think outside the box, to come up with the required potential alternative answer options to the exercise. We want to approach this task from all different angles, "
+                   "to arrive at a varied selection of options, to serve as inspiration for a later stage of final selection (not now) to make the exercise the best it can be. For now, carry out the above-described prep in writing, then draft the list of{intermediate_distractors_specification}alternative distractors (in the same language as the existing exercise)."),
         ("human", "{standardized_exercise}")
     ],
     input_variables=["standardized_exercise", "intermediate_distractors_specification"]
         - Use exactly the same terminology that's used in the study text
         - Mirror also the general language level of the study text. If the text is written with very simple words, then the learning objectives should be also written in very simple words
         - Mirror also the voice of the text (passive or active voice) and the perspective of the text (second or third person)
+        - Are as **specific** as can be: they contain the smallest possible knowledge element. A learning objective does not combine multiple facts, but rather isolates individual facts
+        <illustration of 'specific'>
+    <bad example: not specific enough>
+    </bad example: not specific enough>
+    <good example: states isolated fact>
+    </good example: states isolated fact>
+</illustration of 'specific'>
         - Avoid absolute terms that overstate their universality, like 'always' and 'never', unless that actually is true 100% of the time (usually there are exceptions to every rule, so account for those in your phrasing)
         - Alternatively avoid vague terms that make what they wanna say too meaningless, like 'can', 'could', 'might' and 'may' (many things 'can', 'could' or 'might be', this doesn't say much)
         - Also avoid subjective terms like 'often', 'sometimes', 'many', 'few', 'common', 'rare'. Instead, make more specific and falsifiable claims like 'in most cases' or 'A is more common than B'
     input_variables=["standardized_text"]
 )
+"""
+<illustration of 'specific'>
+    <bad example: not specific enough>
+    </bad example: not specific enough>
+    <good example: states isolated fact>
+    </good example: states isolated fact>
+</illustration of 'specific'>
+"""
 template_gen_prompt_b = ChatPromptTemplate(
     messages=[
         ("system", """
 template_sanitize_learning_objectives = ChatPromptTemplate(
     messages=[
         ("system", "You are given an output of a brainstorming session that lead to the generation of learning objectives. Your task is to "
+                   "turn this output into a neat clean list of just the learning objectives, nothing else. Do not translate or otherwise edit the learning objectives, just relay them as a list.\n"
                    "<example of a perfect list>\n"
+                   "De student weet dat de neus een zintuig is.\n"
+                   "De student weet dat de tong een zintuig is.\n"
+                   "De student weet dat de huid een zintuig is.\n"
                    "</example of a perfect list>"),
         ("human", "Here is the output:\n "
                   "{raw_output}")

test samples.md CHANGED Viewed

@@ -1,4 +1,4 @@
-# Exercises
 ## False positives
 ### 1
 Theorie:
@@ -34,7 +34,7 @@ Correct antwoord:
 ### 3
 ---
-## Double negative
 ### 1
 Stelling:
 Voor een volledig overzicht van organisaties die ondersteuning bieden bij eenzaamheid kun je beter niet terecht bij het WMO-loket.
@@ -46,7 +46,21 @@ Correct antwoord:
 2. Deze stelling is niet correct.
 ### 2
 Stelling:
 De bovenste holle ader (*Vena Cava Superior*) komt niet uit in de rechterboezem.

+# Exercises Test Set (don't contaminate prompts)
 ## False positives
 ### 1
 Theorie:
 ### 3
 ---
+## Clear double negative
 ### 1
 Stelling:
 Voor een volledig overzicht van organisaties die ondersteuning bieden bij eenzaamheid kun je beter niet terecht bij het WMO-loket.
 2. Deze stelling is niet correct.
+## Clear jumping out answer (specificity pattern)
+### 1
+<prompt>
+<type>Vraag</type>
+<text>Hoe heet de natuurkundige wet die magnetisch gedrag van paramagnetische materialen beschrijft?</text>
+</prompt>
+<options>
+1. Wet van verandering
+2. Wet van magnetisme
+3. Wet van Curie-Weiss
+4. Wet van faseovergangen
+</options>
+<correct_answer>3</correct_answer>
 ### 2
 Stelling:
 De bovenste holle ader (*Vena Cava Superior*) komt niet uit in de rechterboezem.