Spaces:

gabejavitt
/

agentCourse

Sleeping

App Files Files Community

gabejavitt commited on Nov 4, 2025

Commit

a7f41e5

verified ·

1 Parent(s): e45f08b

Update app.py

Browse files

Files changed (1) hide show

app.py +157 -156

app.py CHANGED Viewed

@@ -1251,176 +1251,177 @@ except Exception as e:
 # ====================================================
 # --- (Original Template Code - Mock Questions Version) ---
-def run_and_submit_all( profile: gr.OAuthProfile | None): # Corrected type hint
     """
-    Fetches MOCK questions, runs the BasicAgent on them, simulates submission prep,
-    and displays the results. DOES NOT SUBMIT.
     """
-    space_id = os.getenv("SPACE_ID")
-    username = profile.username if profile else "local_test_user"
-    print(f"User: {username}{'' if profile else ' (dummy)'}")
-    # Check if global agent initialized
-    if not agent:
-        return "FATAL ERROR: Global agent failed to initialize. Check logs.", None
-    print("Using globally instantiated agent.")
-    agent_code = f"httpsS://huggingface.co/spaces/{space_id}/tree/main" if space_id else "local_run" # Corrected URL
-    print(f"Agent code URL: {agent_code}")
-    print("--- USING MOCK QUESTIONS ---")
-    # --- MOCK QUESTIONS ---
-    #
-    # vvv  PASTE YOUR FULL LIST OF 20 MOCK QUESTIONS HERE  vvv
-    #
-    mock_questions_data = [
-        {
-            "task_id": "mock_level1_001",
-            "question": r"""Here's a fun riddle that I'd like you to try.\n\nAn adventurer exploring an ancient tomb came across a horde of gold coins, all neatly stacked in columns. As he reached to scoop them into his backpack, a mysterious voice filled the room. \"You have fallen for my trap adventurer,\" the voice began, and suddenly the doorway to the chamber was sealed by a heavy rolling disk of stone. The adventurer tried to move the stone disk but was unable to budge the heavy stone. Trapped, he was startled when the voice again spoke. \n\n\"If you solve my riddle, I will reward you with a portion of my riches, but if you are not clever, you will never leave this treasure chamber. Before you are 200 gold coins. I pose a challenge to you, adventurer. Within these stacks of coins, all but 30 are face-up. You must divide the coins into two piles, one is yours, and one is mine. You may place as many coins as you like in either pile. You may flip any coins over, but you may not balance any coins on their edges. For every face-down coin in your pile, you will be rewarded with two gold coins. But be warned, if both piles do not contain the same number of face-down coins, the door will remain sealed for all eternity!\"\n\nThe adventurer smiled, as this would be an easy task. All he had to do was flip over every coin so it was face down, and he would win the entire treasure! As he moved to the columns of coins, however, the light suddenly faded, and he was left in total darkness. The adventurer reached forward and picked up one of the coins, and was shocked when he realized that both sides felt almost the same. Without the light, he was unable to determine which side of the coin was heads and which side was tails. He carefully replaced the coin in its original orientation and tried to think of a way to solve the puzzle. Finally, out of desperation, the adventurer removed 30 coins to create his pile. He then carefully flipped over each coin in his pile, so its orientation was inverted from its original state.\n\n\"I've finished,\" he said, and the lights returned. Looking at the two piles, he noticed that the larger pile contained 14 face-down coins.\n\nWhat was the outcome for the adventurer? If he failed the challenge, please respond with \"The adventurer died.\" Otherwise, please provide the number of coins the adventurer won at the conclusion of the riddle. If the adventurer won any coins, provide your response as the number of coins, with no other text."""
-        },
-        {
-            "task_id": "mock_level1_002",
-            "question": r"""If you use some of the letters in the given Letter Bank to spell out the sentence "I am a penguin halfway to the moon", which of the remaining unused letters would have to be changed to spell out, "The moon is made of cheese"? Return a comma-separated alphabetized list.\nLetter Bank: {OAMFETIMPECRFSHTDNIWANEPNOFAAIYOOMGUTNAHHLNEHCME}"""
-        },
-        {
-            "task_id": "mock_level1_003",
-            "question": r"""A data annotator stayed up too late creating test questions to check that a system was working properly and submitted several questions with mathematical errors. On nights when they created 15 test questions, they made 1 error. On nights when they created fewer than 15 questions, they also corrected 3 errors. On nights they created 20 questions, they made 0 errors. On nights when they created 25 or more, they made 4 errors. Over the course of five nights, the worker produced a total of 6 errors. When asked how many nights they created 15 questions, they gave three possible numbers as responses. What are the three numbers, presented in the format x, y, z in ascending order?"""
-        },
-        {
-            "task_id": "mock_level1_004",
-            "question": r"""Please solve the following crossword:\n\n|1|2|3|4|5|\n|6| | | | |\n|7| | | | |\n|8| | | | |\n|X|9| | | |\n\nI have indicated by numbers where the hints start, so you should replace numbers and spaces by the answers.\nAnd X denotes a black square that isn\u2019t to fill.\n\nACROSS\n- 1 Wooden strips on a bed frame\n- 6 _ Minhaj, Peabody-winning comedian for "Patriot Act"\n- 7 Japanese city of 2.6+ million\n- 8 Stopwatch, e.g.\n- 9 Pain in the neck\n\nDOWN\n- 1 Quick drink of whiskey\n- 2 Eye procedure\n- 3 "Same here," in a three-word phrase\n- 4 Already occupied, as a seat\n- 5 Sarcastically critical commentary. Answer by concatenating the characters you choose to fill the crossword, in row-major order."""
-        },
-        {
-            "task_id": "mock_level1_005",
-            "question": r"""I wanted to make another batch of cherry melomel. I remember liking the last recipe I tried, but I can't remember it off the top of my head. It was from the Reddit, r/mead. I remember that the user who made it had a really distinct name, I think it was StormBeforeDawn. Could you please look up the recipe for me? I'm not sure if it has been changed, so please make sure that the recipe you review wasn't updated after July 14, 2022. That's the last time I tried the recipe.\n\nWhat I want to know is how many cherries I'm supposed to use. I'm making a 10-gallon batch in two 5-gallon carboys. Please just respond with the integer number of pounds of whole cherries with pits that are supposed to be used for a 10-gallon batch."""
-        },
-        {
-            "task_id": "mock_level1_006",
-            "question": r"""Verify each of the following ISBN 13 numbers:\n\n1. 9783518188156\n2. 9788476540746\n3. 9788415091004\n4. 9788256014590\n5. 9782046407331\n\nIf any are invalid, correct them by changing the final digit. Then, return the list, comma separated, in the same order as in the question."""
-        },
-        {
-            "task_id": "mock_level1_007",
-            "question": r"""A porterhouse by any other name is centered around a letter. What does Three Dog Night think about the first natural number that starts with that letter? Give the first line from the lyrics that references it."""
-        },
-        {
-            "task_id": "mock_level1_008",
-            "question": r"""Bob has genome type Aa, and Linda has genome type Aa. Assuming that a child of theirs also has a child with someone who also has genome type Aa, what is the probability that Bob and Linda's grandchild will have Genome type Aa? Write the answer as a percentage, rounding to the nearest integer if necessary."""
-        },
-        {
-            "task_id": "mock_level1_009",
-            "question": r"""An array of candy is set out to choose from including gumballs, candy corn, gumdrops, banana taffy, chocolate chips, and gummy bears. There is one bag of each type of candy. The gumballs come in red, orange, yellow, green, blue, and brown. The candy corn is yellow, white, and orange. The gumdrops are red, green, purple, yellow, and orange. The banana taffy is yellow. The chocolate chips are brown and white. The gummy bears are red, green, yellow, and orange. Five people pass through and each selects one bag. The first selects one with only primary colors. The second selects one with no primary colors. The third selects one with all the primary colors. The fourth selects one that has neither the most nor the least colors of the remaining bags. The fifth selects the one with their favorite color, green. A second bag of the candy the first person chose is added to the remaining bag of candy. Which two candies are in the remaining bag after the addition? Give me them in a comma separated list, in alphabetical order"""
-        },
-        {
-            "task_id": "mock_level1_010",
-            "question": r"""In the year 2020, where were koi fish found in the watershed with the id 02040203? Give only the name of the pond, lake, or stream where the fish were found, and not the name of the city or county."""
-        },
-        {
-            "task_id": "mock_level1_011",
-            "question": r"""In Sonia Sanchez\u2019s poem \u201cfather\u2019s voice\u201d, what primary colour is evoked by the imagery in the beginning of the tenth stanza? Answer with a capitalized word."""
-        },
-        {
-            "task_id": "mock_level1_012",
-            "question": r"""According to Papers with Code, what was the name of the first model to go beyond 70% of accuracy on ImageNet ?"""
-        },
-        {
-            "task_id": "mock_level1_013",
-            "question": r"""What is the dimension of the boundary of the tame twindragon rounded to two decimal places?"""
-        },
-        {
-            "task_id": "mock_level1_014",
-            "question": r"""In what year was the home village of the subject of British Museum item #Bb,11.118 founded?"""
-        },
-        {
-            "task_id": "mock_level1_015",
-            "question": r"""What is the ISSN of the journal that included G. Scott's potato article that mentioned both a fast food restaurant and a Chinese politician in the title in a 2012 issue?"""
-        },
-        {
-            "task_id": "mock_level1_016",
-            "question": r"""VNV Nation has a song that shares its title with the nickname of Louis XV. What album was it released with?"""
-        },
-        {
-            "task_id": "mock_level1_017",
-            "question": r"""If I combine a Beatle's first name and a type of beer, in what category and year of Nobel Prize do I have a winner? Answer using the format CATEGORY, YEAR."""
-        },
-        {
-            "task_id": "mock_level1_018",
-            "question": r"""In the version of NumPy where the numpy.msort function was deprecated, which attribute was added to the numpy.polynomial package's polynomial classes?"""
-        },
-        {
-            "task_id": "mock_level1_019",
-            "question": r"""A word meaning dramatic or theatrical forms a species of duck when appended with two letters and then duplicated. What is that word?"""
-        },
-        {
-            "task_id": "mock_level1_020",
-            "question": r"""As of August 2023, how many in-text citations on the West African Vodun Wikipedia page reference a source that was cited using Scopus?"""
-        }
-    ]
-    questions_data = mock_questions_data
-    print(f"Using {len(questions_data)} mock questions.")
-    results_log, answers_payload = [], []
-    print(f"Running agent on {len(questions_data)} mock questions...")
-    for i, item in enumerate(questions_data):
-        task_id, question_text = item.get("task_id"), item.get("question")
-        if not task_id or question_text is None: print(f"Skipping mock item {i+1}"); continue
-        print(f"\n--- Running Mock Task {i+1} (ID: {task_id}) ---")
         try:
-            file_path = item.get("file_path")
-            question_text_with_context = question_text
-            if file_path:
-                 question_text_with_context = f"{question_text}\n\n[Attached File: {file_path}]"
-                 print(f"Q includes file: {file_path}")
-            submitted_answer = agent(question_text_with_context)
-            submitted_answer_str = str(submitted_answer) if submitted_answer is not None else ""
-            answers_payload.append({"task_id": task_id, "submitted_answer": submitted_answer_str})
-            results_log.append({"Task ID": task_id, "Question": question_text, "Submitted Answer": submitted_answer_str})
-            print(f"--- Mock Task {task_id} Complete ---")
         except Exception as e:
-             print(f"FATAL ERROR on mock task {task_id}: {e}")
-             import traceback; traceback.print_exc()
-             submitted_answer = f"AGENT CRASH: {e}"
-             answers_payload.append({"task_id": task_id, "submitted_answer": submitted_answer})
-             results_log.append({"Task ID": task_id, "Question": question_text, "Submitted Answer": submitted_answer})
-    if not answers_payload: return "Agent produced no answers.", pd.DataFrame(results_log)
-    status_update = f"Finished mock run. Processed {len(answers_payload)} answers for '{username}'."
-    print(status_update); print("--- MOCK RUN - SUBMISSION SKIPPED ---")
-    final_status = "--- Mock RUN COMPLETE ---\n" + status_update + "\nSubmission SKIPPED." # Corrected typo
-    results_df = pd.DataFrame(results_log); results_df['Correct'] = 'N/A (Mock)'
-    return final_status, results_df
-# --- Build Gradio Interface ---
 with gr.Blocks() as demo:
-    gr.Markdown("# GAIA Agent - MOCK TEST (Groq Llama3.1)")
-    gr.Markdown("""
-        **Instructions:** Click 'Run Mock Evaluation'.
-        **Notes:** Uses Groq (Llama-3.3-70b Executor). Ensure `GROQ_API_KEY` secret/env var exists. **DOES NOT** fetch official Qs or submit. Check logs for details.
-        """)
     gr.LoginButton()
-    run_button = gr.Button("Run Mock Evaluation")
-    status_output = gr.Textbox(label="Run Status / Mock Result", lines=5, interactive=False)
-    results_table = gr.DataFrame(label="Mock Qs, Agent Answers, Results", wrap=True)
-    run_button.click(fn=run_and_submit_all, outputs=[status_output, results_table])
 if __name__ == "__main__":
     print("\n" + "-"*30 + " App Starting " + "-"*30)
-    space_host_startup = os.getenv("SPACE_ID"); space_id_startup = os.getenv("SPACE_ID") # Corrected variable name
-    if space_host_startup: print(f"✅ SPACE_HOST: {space_host_startup}\n   Runtime URL: https://{space_host_startup}.hf.space")
-    else: print("ℹ️ No SPACE_HOST (local?).")
-    if space_id_startup: print(f"✅ SPACE_ID: {space_id_startup}\n   Repo URL: https://huggingface.co/spaces/{space_id_startup}\n   Tree URL: https://huggingface.co/spaces/{space_id_startup}/tree/main")
-    else: print("ℹ️ No SPACE_ID (local?).")
-    try: script_dir = os.path.dirname(os.path.realpath(__file__))
-    except NameError: script_dir = os.getcwd()
-    print(f"Script directory: {script_dir}")
-    print(f"CWD: {os.getcwd()}")
-    try: print("Files in CWD:", os.listdir("."))
-    except FileNotFoundError: print("Warning: CWD listing failed.")
     print("-"*(60 + len(" App Starting ")) + "\n")
-    print("Launching Gradio Interface...")
-    demo.queue().launch(debug=True, share=False)

 # ====================================================
 # --- (Original Template Code - Mock Questions Version) ---
+def run_and_submit_all( profile: gr.OAuthProfile | None):
     """
+    Fetches all questions, runs the BasicAgent on them, submits all answers,
+    and displays the results.
     """
+    # --- Determine HF Space Runtime URL and Repo URL ---
+    space_id = os.getenv("SPACE_ID") # Get the SPACE_ID for sending link to the code
+    if profile:
+        username= f"{profile.username}"
+        print(f"User logged in: {username}")
+    else:
+        print("User not logged in.")
+        return "Please Login to Hugging Face with the button.", None
+    api_url = DEFAULT_API_URL
+    questions_url = f"{api_url}/questions"
+    submit_url = f"{api_url}/submit"
+    # 1. Instantiate Agent ( modify this part to create your agent)
+    try:
+        agent = BasicAgent()
+    except Exception as e:
+        print(f"Error instantiating agent: {e}")
+        return f"Error initializing agent: {e}", None
+    # In the case of an app running as a hugging Face space, this link points toward your codebase ( usefull for others so please keep it public)
+    agent_code = f"https://huggingface.co/spaces/{space_id}/tree/main"
+    print(agent_code)
+    # 2. Fetch Questions
+    print(f"Fetching questions from: {questions_url}")
+    try:
+        response = requests.get(questions_url, timeout=15)
+        response.raise_for_status()
+        questions_data = response.json()
+        if not questions_data:
+             print("Fetched questions list is empty.")
+             return "Fetched questions list is empty or invalid format.", None
+        print(f"Fetched {len(questions_data)} questions.")
+    except requests.exceptions.RequestException as e:
+        print(f"Error fetching questions: {e}")
+        return f"Error fetching questions: {e}", None
+    except requests.exceptions.JSONDecodeError as e:
+         print(f"Error decoding JSON response from questions endpoint: {e}")
+         print(f"Response text: {response.text[:500]}")
+         return f"Error decoding server response for questions: {e}", None
+    except Exception as e:
+        print(f"An unexpected error occurred fetching questions: {e}")
+        return f"An unexpected error occurred fetching questions: {e}", None
+    # 3. Run your Agent
+    results_log = []
+    answers_payload = []
+    print(f"Running agent on {len(questions_data)} questions...")
+    for item in questions_data:
+        task_id = item.get("task_id")
+        question_text = item.get("question")
+        if not task_id or question_text is None:
+            print(f"Skipping item with missing task_id or question: {item}")
+            continue
         try:
+            submitted_answer = agent(question_text)
+            answers_payload.append({"task_id": task_id, "submitted_answer": submitted_answer})
+            results_log.append({"Task ID": task_id, "Question": question_text, "Submitted Answer": submitted_answer})
         except Exception as e:
+             print(f"Error running agent on task {task_id}: {e}")
+             results_log.append({"Task ID": task_id, "Question": question_text, "Submitted Answer": f"AGENT ERROR: {e}"})
+    if not answers_payload:
+        print("Agent did not produce any answers to submit.")
+        return "Agent did not produce any answers to submit.", pd.DataFrame(results_log)
+    # 4. Prepare Submission
+    submission_data = {"username": username.strip(), "agent_code": agent_code, "answers": answers_payload}
+    status_update = f"Agent finished. Submitting {len(answers_payload)} answers for user '{username}'..."
+    print(status_update)
+    # 5. Submit
+    print(f"Submitting {len(answers_payload)} answers to: {submit_url}")
+    try:
+        response = requests.post(submit_url, json=submission_data, timeout=60)
+        response.raise_for_status()
+        result_data = response.json()
+        final_status = (
+            f"Submission Successful!\n"
+            f"User: {result_data.get('username')}\n"
+            f"Overall Score: {result_data.get('score', 'N/A')}% "
+            f"({result_data.get('correct_count', '?')}/{result_data.get('total_attempted', '?')} correct)\n"
+            f"Message: {result_data.get('message', 'No message received.')}"
+        )
+        print("Submission successful.")
+        results_df = pd.DataFrame(results_log)
+        return final_status, results_df
+    except requests.exceptions.HTTPError as e:
+        error_detail = f"Server responded with status {e.response.status_code}."
+        try:
+            error_json = e.response.json()
+            error_detail += f" Detail: {error_json.get('detail', e.response.text)}"
+        except requests.exceptions.JSONDecodeError:
+            error_detail += f" Response: {e.response.text[:500]}"
+        status_message = f"Submission Failed: {error_detail}"
+        print(status_message)
+        results_df = pd.DataFrame(results_log)
+        return status_message, results_df
+    except requests.exceptions.Timeout:
+        status_message = "Submission Failed: The request timed out."
+        print(status_message)
+        results_df = pd.DataFrame(results_log)
+        return status_message, results_df
+    except requests.exceptions.RequestException as e:
+        status_message = f"Submission Failed: Network error - {e}"
+        print(status_message)
+        results_df = pd.DataFrame(results_log)
+        return status_message, results_df
+    except Exception as e:
+        status_message = f"An unexpected error occurred during submission: {e}"
+        print(status_message)
+        results_df = pd.DataFrame(results_log)
+        return status_message, results_df
+# --- Build Gradio Interface using Blocks ---
 with gr.Blocks() as demo:
+    gr.Markdown("# Basic Agent Evaluation Runner")
+    gr.Markdown(
+        """
+        **Instructions:**
+        1.  Please clone this space, then modify the code to define your agent's logic, the tools, the necessary packages, etc ...
+        2.  Log in to your Hugging Face account using the button below. This uses your HF username for submission.
+        3.  Click 'Run Evaluation & Submit All Answers' to fetch questions, run your agent, submit answers, and see the score.
+        ---
+        **Disclaimers:**
+        Once clicking on the "submit button, it can take quite some time ( this is the time for the agent to go through all the questions).
+        This space provides a basic setup and is intentionally sub-optimal to encourage you to develop your own, more robust solution. For instance for the delay process of the submit button, a solution could be to cache the answers and submit in a seperate action or even to answer the questions in async.
+        Please note that this version requires an OpenAI Key to run.
+        """
+    )
     gr.LoginButton()
+    run_button = gr.Button("Run Evaluation & Submit All Answers")
+    status_output = gr.Textbox(label="Run Status / Submission Result", lines=5, interactive=False)
+    # Removed max_rows=10 from DataFrame constructor
+    results_table = gr.DataFrame(label="Questions and Agent Answers", wrap=True)
+    run_button.click(
+        fn=run_and_submit_all,
+        outputs=[status_output, results_table]
+    )
 if __name__ == "__main__":
     print("\n" + "-"*30 + " App Starting " + "-"*30)
+    # Check for SPACE_HOST and SPACE_ID at startup for information
+    space_host_startup = os.getenv("SPACE_HOST")
+    space_id_startup = os.getenv("SPACE_ID") # Get SPACE_ID at startup
+    if space_host_startup:
+        print(f"✅ SPACE_HOST found: {space_host_startup}")
+        print(f"   Runtime URL should be: https://{space_host_startup}.hf.space")
+    else:
+        print("ℹ️  SPACE_HOST environment variable not found (running locally?).")
+    if space_id_startup: # Print repo URLs if SPACE_ID is found
+        print(f"✅ SPACE_ID found: {space_id_startup}")
+        print(f"   Repo URL: https://huggingface.co/spaces/{space_id_startup}")
+        print(f"   Repo Tree URL: https://huggingface.co/spaces/{space_id_startup}/tree/main")
+    else:
+        print("ℹ️  SPACE_ID environment variable not found (running locally?). Repo URL cannot be determined.")
     print("-"*(60 + len(" App Starting ")) + "\n")
+    print("Launching Gradio Interface for Basic Agent Evaluation...")
+    demo.launch(debug=True, share=False)