Spaces:

nat232
/

student_sample_panel

Build error

App Files Files Community

mikaelmp commited on Jun 18, 2025

Commit

ebc3a7c

verified ·

1 Parent(s): 1bdef6a

BIG UPDATE: separate parse and validate methods

Browse files

Files changed (1) hide show

researchsimulation/InteractiveInterviewChatbot.py +59 -43

researchsimulation/InteractiveInterviewChatbot.py CHANGED Viewed

@@ -21,35 +21,7 @@ def parse_question_with_llm(question, respondent_names, processor_llm):
     ### User Input:
     {question}
-    ### Permitted Topics Scope:
-    The respondents may only answer questions related to the following general topics:
-    - Demographics: Age, location, education, family background, life events.
-    - Values & Beliefs: Family responsibility, independence, hard work, gender equality, spirituality, simplicity, mental health, traditional vs modern values.
-    - Career & Aspirations: Education, career goals, entrepreneurship, financial independence, stability, ambition, and personal development.
-    - Influences & Role Models: Family members, mentors, public figures, influencers.
-    - Interests & Hobbies: Sports, music, fitness, cooking, creative arts, gaming, travel, entertainment content, podcasts, leisure.
-    - Health & Lifestyle: Physical health, fitness, diet, skincare, self-care, mental wellbeing, lifestyle balance.
-    - Social Media & Technology: Social media usage, digital content, influencer interests, technology habits.
-    - Personal Relationships: Family, friends, romantic relationships, support systems, social circles.
-    - Future Outlook: Career plans, financial security, personal growth, family goals, confidence building.
-    - Social & Societal Issues: Gender equality, societal expectations, economic issues, tradition vs freedom, social development.
-    - Lifestyle Preferences: Food preferences, fashion, routines, spending habits, religious or cultural practices.
-    - Personal Growth & Development: Maturity, emotional regulation, responsibility, adaptability, self-improvement, learning mindset.
-    If a question is not strictly relevant to the topics, immediately return "INVALID" as the question instead of extracting a question.
-    ### IMPORTANT RULES:
-    - Only extract questions that fall within the **Permitted Topics Scope** above.
-    - If the question is not strictly relevant to the Permitted Topics, return "INVALID" as the question instead of extracting a question.
-       - For **each respondent**, if the question directed to them is out of scope, return "INVALID" as their question.
-       - Even if one or more respondents are addressed by name, **do not answer** unless their question is within scope.
-       - Do not infer or soften language to make it in-scope — the original question must already fit the topics.
-    - Do not assume or infer intent beyond these boundaries.
-    - When extracting the question, if any American English spelling, phrasing, or vocabulary is detected, automatically convert it into correct British English spelling, grammar, and usage before returning it.
-    - Use standard British English conventions for all words (e.g. organise, behaviour, licence, travelling, programme, aeroplane, etc.).
-    - DO NOT mirror the original spelling if it contains American English — always apply British spelling rules.
     ### Instructions:
     1. Identify **each respondent being addressed**.
        The respondents available are {respondent_names}. If these names are mistyped, then ensure that you match the names to the ones available.
@@ -58,9 +30,9 @@ def parse_question_with_llm(question, respondent_names, processor_llm):
     3. Ensure extracted questions **match the original structure**.
     4. If no respondent is explicitly addressed, return "General" as the respondent name.
     5. If the question is posed to all respondents, return "All" as the respondent name.
-    6. Ensure that you follow the formatting rules exactly. THIS IS EXTREMELY IMPORTANT.
-    ### Formatting Rules:
     Provide the output strictly in this format:
     - Respondent: <Respondent Name>
       Question: <Extracted Question>
@@ -72,11 +44,6 @@ def parse_question_with_llm(question, respondent_names, processor_llm):
     response = processor_llm.invoke(prompt)
     chatgpt_output = response.content.strip()
     logging.info(f"LLM Parsed Output: {chatgpt_output}")
-    # Handle out-of-scope detection
-    if chatgpt_output.startswith("INVALID"):
-        logging.warning("Question rejected: out of scope.")
-        return None
     parsed_questions = {}
     respondent_name = "General"
@@ -94,6 +61,51 @@ def parse_question_with_llm(question, respondent_names, processor_llm):
     return parsed_questions
 def ask_interview_question(respondent_agents_dict, last_active_agent, question, processor_llm):
     """
     Handles both individual and group interview questions while tracking conversation flow.
@@ -107,17 +119,21 @@ def ask_interview_question(respondent_agents_dict, last_active_agent, question,
     logging.info(f"Available respondents: {agent_names}")
     print(f"Available respondents: {agent_names}")
-    # Use OpenAI LLM to parse question into individual respondent-specific sub-questions
     parsed_questions = parse_question_with_llm(question, str(agent_names), processor_llm)
     if not parsed_questions:
-        logging.warning("No parsed questions returned. Exiting function.")
-        return "**PreData Moderator**: No valid respondents were detected for this question."
-    for resp_name, extracted_question in parsed_questions.items():
         if extracted_question == "INVALID":
-            logging.warning("Invalid question detected during parsing.")
-            return "**PreData Moderator**: The question is invalid. Please ask another question."
     if len(parsed_questions) > 1:
         logging.warning("More than one respondent specified. Exiting function.")

     ### User Input:
     {question}
     ### Instructions:
     1. Identify **each respondent being addressed**.
        The respondents available are {respondent_names}. If these names are mistyped, then ensure that you match the names to the ones available.
     3. Ensure extracted questions **match the original structure**.
     4. If no respondent is explicitly addressed, return "General" as the respondent name.
     5. If the question is posed to all respondents, return "All" as the respondent name.
+    6. Ensure that you follow the **Formatting Rules** exactly. THIS IS EXTREMELY IMPORTANT.
+    ### **Formatting Rules**:
     Provide the output strictly in this format:
     - Respondent: <Respondent Name>
       Question: <Extracted Question>
     response = processor_llm.invoke(prompt)
     chatgpt_output = response.content.strip()
     logging.info(f"LLM Parsed Output: {chatgpt_output}")
     parsed_questions = {}
     respondent_name = "General"
     return parsed_questions
+def validate_question_topics(parsed_questions, processor_llm):
+    """
+    Validates each question to ensure it's within the permitted topic scope.
+    Converts question to British English spelling if valid.
+    Returns 'INVALID' for any out-of-scope question.
+    """
+    validated_questions = {}
+    for respondent, question in parsed_questions.items():
+        prompt = f"""
+        You are a senior research analyst. Your job is to **validate** whether a market research question is within the allowed topic scope and convert it to **British English** spelling, grammar, and phrasing.
+        ### Question:
+        {question}
+        ### Permitted Topics Scope:
+        The respondents may only answer questions related to the following general topics:
+        - Demographics: Age, location, education, family background, life events.
+        - Values & Beliefs: Family responsibility, independence, hard work, gender equality, spirituality, simplicity, mental health, traditional vs modern values.
+        - Career & Aspirations: Education, career goals, entrepreneurship, financial independence, stability, ambition, and personal development.
+        - Influences & Role Models: Family members, mentors, public figures, influencers.
+        - Interests & Hobbies: Sports, music, fitness, cooking, creative arts, gaming, travel, entertainment content, podcasts, leisure.
+        - Health & Lifestyle: Physical health, fitness, diet, skincare, self-care, mental wellbeing, lifestyle balance.
+        - Social Media & Technology: Social media usage, digital content, influencer interests, technology habits.
+        - Personal Relationships: Family, friends, romantic relationships, support systems, social circles.
+        - Future Outlook: Career plans, financial security, personal growth, family goals, confidence building.
+        - Social & Societal Issues: Gender equality, societal expectations, economic issues, tradition vs freedom, social development.
+        - Lifestyle Preferences: Food preferences, fashion, routines, spending habits, religious or cultural practices.
+        - Personal Growth & Development: Maturity, emotional regulation, responsibility, adaptability, self-improvement, learning mindset.
+        ### Validation Instructions:
+        - If the question is not strictly relevant to any of the above, return exactly: "INVALID"
+        - If valid, return the **same question**, rewritten in **British English** if necessary.
+        ### Output:
+        <Validated question in British English, or "INVALID">
+        """
+        response = processor_llm.invoke(prompt)
+        validated_output = response.content.strip()
+        validated_questions[respondent] = validated_output
+    return validated_questions
 def ask_interview_question(respondent_agents_dict, last_active_agent, question, processor_llm):
     """
     Handles both individual and group interview questions while tracking conversation flow.
     logging.info(f"Available respondents: {agent_names}")
     print(f"Available respondents: {agent_names}")
+    # Use OpenAI LLM to parse questions into individual respondent-specific sub-questions and validate them
+    # Step 1: Parse question
     parsed_questions = parse_question_with_llm(question, str(agent_names), processor_llm)
     if not parsed_questions:
+        return ["**PreData Moderator**: No valid respondents were detected for this question."]
+    # Step 2: Validate question content (scope + spelling)
+    validated_questions = validate_question_topics(parsed_questions, processor_llm)
+    for resp_name, extracted_question in validated_questions.items():
         if extracted_question == "INVALID":
+            return ["**PreData Moderator**: The question is invalid. Please ask another question."]
+    # Use validated questions from this point on
+    parsed_questions = validated_questions
     if len(parsed_questions) > 1:
         logging.warning("More than one respondent specified. Exiting function.")