Spaces:

tyang4
/

Ecodata

Sleeping

App Files Files Community

tyang4 commited on Jul 1

Commit

8484435

verified ·

1 Parent(s): 84e9622

Update src/streamlit_app.py

Browse files

Files changed (1) hide show

src/streamlit_app.py +22 -14

src/streamlit_app.py CHANGED Viewed

@@ -376,36 +376,44 @@ def evaluate_dataset_with_gpt(subtask: str, df: pd.DataFrame, client=openai_clie
     column_info = {col: str(df[col].dtype) for col in selected_cols}
     sample_rows = df.head(3)[selected_cols].to_dict(orient="records")   # take 3 example rows
-    prompt = f"""
 You are a data‑validation assistant. Decide whether the dataset below is useful for the research subtask.
 ===== TASK =====
-Subtask: "{subtask}"
 ===== DATASET PREVIEW =====
-Schema (first {len(selected_cols)} columns):
-{json.dumps(column_info, indent=2)}
 Sample rows (3 max):
-{json.dumps(sample_rows, indent=2)}
 ===== OUTPUT INSTRUCTIONS (follow strictly) =====
-Case A – Relevant:
 • Write exactly two sentences, each no more than 30 words.
 • Summarize what the dataset contains and why it helps the subtask.
 • Do not mention column names or list individual rows.
-Case B – Not relevant:
-• Write one or two sentences, each no more than 30 words, **describing only what the dataset contains**.
-• Do **not** mention the subtask, relevance, suitability, limitations, or missing information (avoid phrases like “not related,” “does not focus,” “irrelevant,” etc.).
-• After the sentences, output the header **Additionally, here are some external resources you might find helpful:** on a new line. Format your output in markdown as:
 - [Name of Source](URL)
 • Then list 2–3 bullet points, each on its own line, starting with “- ” followed immediately by a URL likely to contain the needed data.
 • No additional commentary.
-General rules:
-Plain text only — no code fences. Markdown link syntax (`[text](url)`) is allowed.
 """
     rsp = client.chat.completions.create(

     column_info = {col: str(df[col].dtype) for col in selected_cols}
     sample_rows = df.head(3)[selected_cols].to_dict(orient="records")   # take 3 example rows
+prompt = f"""
 You are a data‑validation assistant. Decide whether the dataset below is useful for the research subtask.
 ===== TASK =====
+Subtask: "{{subtask}}"
 ===== DATASET PREVIEW =====
+Schema (first {{len(selected_cols)}} columns):
+{{json.dumps(column_info, indent=2)}}
 Sample rows (3 max):
+{{json.dumps(sample_rows, indent=2)}}
 ===== OUTPUT INSTRUCTIONS (follow strictly) =====
+First, begin your response with one of the following labels on its own line:
+Relevant
+or
+Irrelevant
+Then follow the appropriate instruction below based on your decision.
+---
+If you choose "Relevant":
 • Write exactly two sentences, each no more than 30 words.
 • Summarize what the dataset contains and why it helps the subtask.
 • Do not mention column names or list individual rows.
+---
+If you choose "Irrelevant":
+• Write one or two sentences, each no more than 30 words, describing only what the dataset contains.
+• Do NOT mention the subtask, relevance, suitability, limitations, or missing information (avoid phrases like “not related,” “does not focus,” “irrelevant,” etc.).
+• After the sentences, output the header **Additionally, here are some external resources you might find helpful:** on a new line. Format your output in markdown as:
 - [Name of Source](URL)
 • Then list 2–3 bullet points, each on its own line, starting with “- ” followed immediately by a URL likely to contain the needed data.
 • No additional commentary.
+Plain text only — no code fences. Markdown link syntax (`[text](url)`) is allowed.
 """
     rsp = client.chat.completions.create(