{ "cells": [ { "cell_type": "code", "execution_count": null, "id": "7839d3bf", "metadata": {}, "outputs": [], "source": [ "def prompt_return(reference_summary, generated_summary, subclaims_json, difficulty_level):\n", " return f'''\n", " **SYSTEM / ROLE INSTRUCTION:**\n", "\n", "> You are a medical linguistics evaluator specializing in readability control of Spanish medical texts.\n", "> You will assess whether omitted subclaims (those with `result = 0`) from a generated summary are reasonably excluded based on readability simplification (easy/intermediate/hard).\n", "\n", "> Criteria:\n", "> * **Easy:** suitable for non-medical readers; focus on main story and outcomes; omit measurements, anatomy, and technical tests.\n", "> * **Intermediate:** moderate medical detail; keep main findings but simplify phrasing.\n", "> * **Hard:** close to clinical summary; high precision, moderate technical detail.\n", ">\n", "> You must provide a **judgment table**, a **numerical reasonableness score (0–5)**, and an **overall explanation**.\n", "\n", "---\n", "\n", "**INPUT:**\n", "\n", "**Reference summary:**\n", "{reference_summary}\n", "\n", "**Generated summary ({difficulty_level}):**\n", "{generated_summary}\n", "\n", "**Subclaims and results:**\n", "{subclaims_json}\n", "\n", "---\n", "\n", "**TASK:**\n", "1. Examine all subclaims with `\"result\": 0` (i.e., not supported in the generated summary).\n", "2. For each omitted subclaim, decide if omission is **reasonable** (yes/no/borderline).\n", "3. Provide a short explanation (≤2 sentences) for each.\n", "4. Assign a **numerical reasonableness score (0–5)**:\n", "\n", " * **5** = All omissions reasonable (excellent simplification)\n", " * **4** = Mostly reasonable; minor omissions could be improved\n", " * **3** = Some omissions reduce clarity or omit key ideas\n", " * **2** = Many key omissions or poor balance\n", " * **1** = Major content loss; poor summary\n", " * **0** = Incoherent simplification or severe distortion\n", "5. Give an **overall explanation** (3–5 sentences) summarizing your reasoning.\n", "\n", "---\n", "\n", "**OUTPUT FORMAT (strict):**\n", "\n", "```json\n", "{{\n", " \"evaluation_table\": [\n", " {{\n", " \"id\": ,\n", " \"subclaim\": \"\",\n", " \"reasonable_omission\": \"\",\n", " \"explanation\": \"\"\n", " }}\n", " ],\n", " \"reasonableness_score\": <0-5>,\n", " \"overall_explanation\": \"\"\n", "}}\n", "```\n", " '''" ] }, { "cell_type": "code", "execution_count": null, "id": "c94fe25a", "metadata": {}, "outputs": [], "source": [ "def return_promptsV2(reference_summary, generated_summary, subclaims_json, difficulty_level):\n", " prompt=f'''\n", " **SYSTEM / ROLE INSTRUCTION:**\n", " You are a **medical readability evaluator**.\n", " Your task is to judge whether omitted subclaims (those with `\"result\": 0\"`) from a generated summary are *reasonably omitted* based on the intended **readability level**: *easy*, *intermediate*, or *hard*.\n", " You evaluate this from the standpoint of clarity, faithfulness, and readability goals.\n", "\n", " ---\n", "\n", " ### **READABILITY GUIDELINES**\n", "\n", " | Level | Target Audience | Content Expectation | Technical Detail Allowed |\n", " | :--------------- | :--------------------------------------- | :-------------------------------------------------------------- | :--------------------------------------------------------------- |\n", " | **Easy** | General public | Focus on main events, outcomes, and diagnoses in plain Spanish. | Minimal — avoid measurements, anatomy, and test results. |\n", " | **Intermediate** | Educated lay readers or medical students | Include key findings and procedures in simplified form. | Moderate — basic terms and causes allowed. |\n", " | **Hard** | Medical professionals | Retain most technical information and precision. | High — measurements, anatomy, and test interpretations expected. |\n", "\n", " ---\n", "\n", " ### **INPUT FIELDS**\n", "\n", " **Reference summary:**\n", " {reference_summary}\n", "\n", " **Generated summary ({difficulty_level}):**\n", " {generated_summary}\n", "\n", " **Subclaims and results:**\n", " {subclaims_json}\n", "\n", " ---\n", "\n", " ### **TASK INSTRUCTIONS**\n", "\n", " 1. Focus on subclaims with `\"result\": 0\"` (not supported by the generated summary).\n", " 2. For each omitted subclaim:\n", "\n", " * Decide whether omission is **reasonable** given the readability level.\n", " * Label as: `\"yes\"`, `\"no\"`, or `\"borderline\"`.\n", " * Write a brief justification (1–2 sentences).\n", " 3. After individual evaluations, assign a **reasonableness score (0–5)** using this scale:\n", "\n", " * **5** = All omissions appropriate for target readability.\n", " * **4** = Minor omissions could improve completeness.\n", " * **3** = Some omissions reduce understanding or medical clarity.\n", " * **2** = Many important omissions harm faithfulness.\n", " * **1** = Major omissions misrepresent case.\n", " * **0** = Summary fails to reflect key medical information.\n", " 4. End with an **overall explanation (3–5 sentences)** describing:\n", "\n", " * The main reasoning behind the score.\n", " * Whether the summary fits its intended readability level.\n", " * Suggestions for improvement if needed.\n", "\n", " ---\n", "\n", " ### **OUTPUT FORMAT (strict JSON)**\n", "\n", " ```json\n", " {{\n", " \"evaluation_table\": [\n", " {{\n", " \"id\": ,\n", " \"subclaim\": \"\",\n", " \"reasonable_omission\": \"\",\n", " \"explanation\": \"\"\n", " }}\n", " ],\n", " \"reasonableness_score\": <0-5>,\n", " \"overall_explanation\": \"\"\n", " }}\n", " ```\n", " '''\n", " return prompt" ] }, { "cell_type": "code", "execution_count": null, "id": "0162eddf", "metadata": {}, "outputs": [], "source": [ "def return_prompts_attribution(reference_full_text, generated_summary, subclaims_json, difficulty_level):\n", " return f'''\n", "### **SYSTEM / ROLE INSTRUCTION**\n", "\n", "You are a **medical factuality and attribution evaluator**.\n", "You will assess whether **unsupported subclaims** in a generated summary (those with `\"result\": 0\"`) are *reasonable additions* based on the readability level (*easy / intermediate / hard*).\n", "\n", "The goal is to determine whether these **extra pieces of information** are acceptable simplifications or *hallucinations* that reduce factual faithfulness.\n", "\n", "---\n", "\n", "### **READABILITY & ATTRIBUTION GUIDELINES**\n", "\n", "| Level | Audience | Content Goal | Allowable Additions |\n", "| :--------------- | :------------------------------- | :--------------------------------------------------------------------- | :--------------------------------------------------------------------------------- |\n", "| **Easy** | General public | Simplify and clarify events | Allow general background info or lay explanations, but not new facts or diagnoses. |\n", "| **Intermediate** | Educated layperson / med student | Add brief clarifications or causal context if consistent with the text | Allow inferred, non-contradictory context; avoid adding unconfirmed data. |\n", "| **Hard** | Medical professional | Maintain factual precision | No additions; everything must be supported by source text. |\n", "\n", "---\n", "\n", "### **INPUT FIELDS**\n", "\n", "**Reference full text:**\n", "{reference_full_text}\n", "\n", "**Generated summary ({difficulty_level}):**\n", "{generated_summary}\n", "\n", "**Subclaims and results:**\n", "{subclaims_json}\n", "\n", "---\n", "\n", "### **TASK INSTRUCTIONS**\n", "\n", "1. Focus only on subclaims with `\"result\": 0\"` (not supported by the input text).\n", "2. For each unsupported subclaim:\n", "\n", " * Judge whether adding it is **reasonable** for the given readability level.\n", " * Choose one of: `\"reasonable addition\"`, `\"unnecessary but harmless\"`, `\"misleading / hallucinated\"`.\n", " * Provide a **1–2 sentence justification** explaining your reasoning.\n", "3. After all evaluations, assign a **numerical attribution score (0–5)**:\n", "\n", " * **5** = All additions are reasonable or harmless simplifications.\n", " * **4** = Mostly reasonable; minor harmless additions.\n", " * **3** = Some misleading or unjustified additions.\n", " * **2** = Many factual inaccuracies.\n", " * **1** = Serious hallucinations; distorts source meaning.\n", " * **0** = Highly unfaithful; mostly invented content.\n", "4. End with an **overall explanation (3–5 sentences)** summarizing your reasoning and suggestions.\n", "\n", "---\n", "\n", "### **OUTPUT FORMAT (strict JSON)**\n", "\n", "```json\n", "{{\n", " \"evaluation_table\": [\n", " {{\n", " \"id\": ,\n", " \"subclaim\": \"\",\n", " \"evaluation\": \"\",\n", " \"explanation\": \"\"\n", " }}\n", " ],\n", " \"attribution_score\": <0-5>,\n", " \"overall_explanation\": \"\"\n", "}}\n", "```\n", "'''" ] }, { "cell_type": "code", "execution_count": null, "id": "efec346c", "metadata": {}, "outputs": [], "source": [ "def revised_results(reference_summary, generated_summary, list_of_missing_subclaims, difficulty_level):\n", " return f'''\n", "### **SYSTEM / ROLE INSTRUCTION**\n", "\n", "You are a **medical text rewriting assistant** that improves summaries while maintaining the intended readability level (*easy / intermediate / hard*).\n", "You will receive:\n", "\n", "* The **original reference summary** (the factual source)\n", "* The **current generated summary**\n", "* A list of **important missing subclaims** to be reintroduced\n", "* The **target readability level**\n", "\n", "Your task:\n", "Revise the generated summary so that it **adds the missing information** naturally, while keeping:\n", "\n", "* The same **tone, vocabulary, and sentence simplicity** of the given readability level.\n", "* Logical **flow and coherence**.\n", "* No extra, invented information beyond what’s in the reference summary.\n", "\n", "---\n", "\n", "### **INPUT FIELDS**\n", "\n", "**Reference summary:**\n", "{reference_summary}\n", "\n", "**Current generated summary ({difficulty_level}):**\n", "{generated_summary}\n", "\n", "**Missing important subclaims to add back:**\n", "{list_of_missing_subclaims}\n", "\n", "**Target readability level:**\n", "{difficulty_level}\n", "\n", "\n", "---\n", "\n", "### **TASK INSTRUCTIONS**\n", "\n", "1. Integrate the missing subclaims **smoothly** into the generated summary.\n", "2. Do **not** add any new facts beyond those listed.\n", "3. Maintain the **same readability level**:\n", "\n", " * **Easy:** conversational, short sentences, no jargon.\n", " * **Intermediate:** light medical terms, brief explanations.\n", " * **Hard:** concise clinical tone with correct terminology.\n", "4. Keep the summary approximately the same length; avoid redundancy.\n", "5. Ensure the resulting text remains **fluent, coherent, and faithful** to the reference summary.\n", "\n", "---\n", "\n", "### **OUTPUT FORMAT**\n", "\n", "```json\n", "{{\n", " \"revised_summary\": \"\",\n", " \"explanation\": \"\"\n", "}}\n", "```\n", "\n", "'''" ] }, { "cell_type": "code", "execution_count": null, "id": "b5d5ad90", "metadata": {}, "outputs": [], "source": [ "from openai import OpenAI\n", "import json\n", "file_path = \"/home/mshahidul/api_new.json\"\n", "with open(file_path, \"r\") as file:\n", " api_keys = json.load(file)\n", "\n", "openai_api_key = api_keys.get(\"openai\")\n", "\n", "client = OpenAI(api_key=openai_api_key)\n", "def openai_return(prompt):\n", " response = client.chat.completions.create(\n", " model=\"gpt-5-mini\",\n", " messages=[\n", " {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n", " {\"role\": \"user\", \"content\": prompt}\n", " ]\n", " )\n", " cleaned_response = response.choices[0].message.content.strip().replace(\"```json\", \"\").replace(\"```\", \"\")\n", " return json.loads(cleaned_response)" ] }, { "cell_type": "code", "execution_count": null, "id": "f3706ef0", "metadata": {}, "outputs": [], "source": [ "import json\n", "file_path = \"/home/mshahidul/readctrl/data/training_data_subclaim_verifier/synthetic_data_es_subclaims_100.json\"\n", "\n", "with open(file_path, 'r') as f:\n", " synthetic_data = json.load(f)\n" ] }, { "cell_type": "code", "execution_count": 8, "id": "7b691bbe", "metadata": {}, "outputs": [], "source": [ "with open(\"/home/mshahidul/readctrl/data/testing_data_gs/multiclinsum_gs_train_es.json\", \"r\") as f_train:\n", " multiclinsum_gs_train_es = json.load(f_train)\n", "dat_full_text={}\n", "dat_summary={}\n", "for item in multiclinsum_gs_train_es:\n", " dat_full_text[item['id']]=item['fulltext']\n", " dat_summary[item['id']]=item['summary']" ] }, { "cell_type": "code", "execution_count": null, "id": "49f435b1", "metadata": {}, "outputs": [], "source": [ "# /home/mshahidul/readctrl/results/dataset_quality_check/resonability_check_100_gpt5_completeness.json\n", "\n", "\n", "\n", "with open(\"/home/mshahidul/readctrl/results/dataset_quality_check/resonability_check_100_gpt5_completeness.json\", 'r') as f:\n", " readability_reasoning = json.load(f)\n", "# readability_reasoning[0].keys() # dict_keys(['id', 'difficulty_level', 'prompt'])\n", "# readability_reasoning[0]['prompt'].keys() # dict_keys(['evaluation_table', 'reasonableness_score', 'overall_explanation'])\n", "reason_info={}\n", "for item in readability_reasoning:\n", " id=item['id']\n", " difficulty_level=item['difficulty_level']\n", " data_temp=item['prompt']\n", " for _data in data_temp['evaluation_table']:\n", " if _data['reasonable_omission'] == \"no\":\n", " key=(id, difficulty_level)\n", " if key not in reason_info:\n", " reason_info[key]=[]\n", " reason_info[key].append(_data['subclaim'])" ] }, { "cell_type": "code", "execution_count": null, "id": "d74f2582", "metadata": {}, "outputs": [], "source": [ "file_path_qwen3_32B = \"/home/mshahidul/readctrl/results/dataset_quality_check/subclaim_verifier_results_100_qwen3-32B.json\"\n", "\n", "with open(file_path_qwen3_32B, 'r') as f:\n", " qwen3_32B_results = json.load(f)\n", "\n", "# print(qwen3_32B_results[0]['completeness']['results'])\n", "print(qwen3_32B_results[0].keys())\n", "print(qwen3_32B_results[0]['completeness']['results'])" ] }, { "cell_type": "code", "execution_count": null, "id": "4e8a38e1", "metadata": {}, "outputs": [], "source": [ "# dict_keys(['id', 'full_text', 'ref_summary', 'readability_versions'])\n", "# print(f\"Full text: {synthetic_data[0]['full_text']}\")\n", "import os\n", "# def revised_results(reference_summary, generated_summary, list_of_missing_subclaims, difficulty_level):\n", "res=[]\n", "temp=\"\"\n", "save_path = \"/home/mshahidul/readctrl/results/dataset_quality_check/results_revised_100_gpt5.json\"\n", "if os.path.exists(save_path):\n", " with open(save_path, 'r') as f:\n", " res = json.load(f)\n", "existing_check=set((entry['id'], entry['difficulty_level']) for entry in res)\n", "print(f\"Resuming from {len(res)} entries\")\n", "import tqdm\n", "for ind in tqdm.tqdm(range(0,100)):\n", " for version in [\"easy\", \"intermediate\", \"hard\"]:\n", " reference_summary = (f\"{synthetic_data[ind]['ref_summary']['text']}\")\n", " generated_summary = (f\"{synthetic_data[ind]['readability_versions'][version]['text']}\")\n", " if (synthetic_data[ind]['id'],version) in existing_check:\n", " continue\n", " if (synthetic_data[ind]['id'],version) not in reason_info:\n", " continue\n", " subclaims_results = reason_info[(synthetic_data[ind]['id'],version)]\n", " prompt = revised_results(reference_summary, generated_summary, subclaims_results, version)\n", " print(prompt)\n", " assert False\n", " ans=openai_return(prompt)\n", " res.append({\n", " \"id\": synthetic_data[ind]['id'],\n", " \"difficulty_level\": version,\n", " \"prompt\": prompt,\n", " \"response\": ans\n", " })\n", " \n", " if len(res)%2==0:\n", " print(f\"Completed {len(res)} out of 300\")\n", " with open(save_path, 'w') as outfile:\n", " json.dump(res, outfile, indent=2)\n", " temp=res\n", " assert False\n", " # print(prompt)\n", " # assert False\n", "with open(save_path, 'w') as outfile:\n", " json.dump(res, outfile, indent=2)" ] }, { "cell_type": "code", "execution_count": null, "id": "b89ff032", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "ff82e523", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "unsloth", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.11" } }, "nbformat": 4, "nbformat_minor": 5 }