Spaces:

humane-intelligence
/

space-turtle

Paused

Akash190104 commited on Mar 31, 2025

Commit

8f2b0ed

1 Parent(s): 6d546ef

Header Changes

Files changed (4) hide show

1_Auto_Generate_Prompts.py → 1_Auto_Generate_Prompts_Using_HI_Model.py RENAMED Viewed

@@ -27,12 +27,18 @@ scroll_css = """
 """
 st.markdown(scroll_css, unsafe_allow_html=True)
-st.title("Auto Red Teaming Demo for HI")
 st.markdown(
     """
-    This prototype auto generates prompts based on a “bias category” and a “country/region” using a model fine-tuned on data from Humane Intelligence.
-    The generated prompts are input into an example “Client Model” to elicit responses.
-    These responses are then judged/evaluated by another fine-tuned model showing a bias probability metric for each response.
     """
 )

 """
 st.markdown(scroll_css, unsafe_allow_html=True)
+st.title("Auto Generate Prompts Using HI Model")
 st.markdown(
     """
+    Humane Intelligence’s Auto Red Teaming prototype is built to empower clients to run red-teaming exercises on their AI applications using HI’s intuitive no-code/low-code platform.
+The system generates adversarial prompts via a model trained on proprietary HI data, targeting potential vulnerabilities in the client’s models or applications. These responses are then evaluated by a separate judge LLM, also trained by HI.
+Specifically, the prototype follows these steps:
+1. Generates adversarial prompts based on a selected **bias category** and **country/region** using HI’s pre-trained model.
+2. Selects the most effective prompts and feeds them into the client’s model to elicit responses.
+3. Uses a dedicated HI-trained judge LLM to assess the responses.
+4. Produces a final output that includes a **probability score** and a **justification** for each response.
     """
 )

pages/{2_Select_Best_Prompts.py → 2_Select_Best_Prompts_For_Input_.py} RENAMED Viewed

@@ -21,7 +21,7 @@ scroll_css = """
 """
 st.markdown(scroll_css, unsafe_allow_html=True)
-st.title("Select Best Prompts")
 def extract_json_content(markdown_str: str) -> str:
     lines = markdown_str.splitlines()

 """
 st.markdown(scroll_css, unsafe_allow_html=True)
+st.title("Select Best Prompts for Input in Client Model")
 def extract_json_content(markdown_str: str) -> str:
     lines = markdown_str.splitlines()

pages/3_Client_Response.py CHANGED Viewed

@@ -19,7 +19,7 @@ scroll_css = """
 st.markdown(scroll_css, unsafe_allow_html=True)
-st.title("Client Response (Answering)")
 # Use best_samples if available; otherwise, fallback to the interactive single sample.
 if "best_samples" in st.session_state:

 st.markdown(scroll_css, unsafe_allow_html=True)
+st.title("Client Model Response (Answering)")
 # Use best_samples if available; otherwise, fallback to the interactive single sample.
 if "best_samples" in st.session_state:

pages/4_Evaluation_Report.py CHANGED Viewed

@@ -6,8 +6,20 @@ import json
 from openai import OpenAI
 st.set_page_config(layout="wide")
-st.title("Client Responses for Bias Evaluation")
 def extract_json_from_text(text: str) -> str:
     """

 from openai import OpenAI
 st.set_page_config(layout="wide")
+scroll_css = """
+<style>
+.table-scroll {
+    overflow-x: auto;
+    width: 100%;
+    max-width: 100%;
+}
+</style>
+"""
+st.markdown(scroll_css, unsafe_allow_html=True)
+st.title("Evaluation Response using HI Judge LLM")
 def extract_json_from_text(text: str) -> str:
     """