File size: 20,657 Bytes

c7a6fe6

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7839d3bf",
   "metadata": {},
   "outputs": [],
   "source": [
    "def prompt_return(reference_summary, generated_summary, subclaims_json, difficulty_level):\n",
    "    return f'''\n",
    "    **SYSTEM / ROLE INSTRUCTION:**\n",
    "\n",
    "> You are a medical linguistics evaluator specializing in readability control of Spanish medical texts.\n",
    "> You will assess whether omitted subclaims (those with `result = 0`) from a generated summary are reasonably excluded based on readability simplification (easy/intermediate/hard).\n",
    "\n",
    "> Criteria:\n",
    "> * **Easy:** suitable for non-medical readers; focus on main story and outcomes; omit measurements, anatomy, and technical tests.\n",
    "> * **Intermediate:** moderate medical detail; keep main findings but simplify phrasing.\n",
    "> * **Hard:** close to clinical summary; high precision, moderate technical detail.\n",
    ">\n",
    "> You must provide a **judgment table**, a **numerical reasonableness score (0–5)**, and an **overall explanation**.\n",
    "\n",
    "---\n",
    "\n",
    "**INPUT:**\n",
    "\n",
    "**Reference summary:**\n",
    "{reference_summary}\n",
    "\n",
    "**Generated summary ({difficulty_level}):**\n",
    "{generated_summary}\n",
    "\n",
    "**Subclaims and results:**\n",
    "{subclaims_json}\n",
    "\n",
    "---\n",
    "\n",
    "**TASK:**\n",
    "1. Examine all subclaims with `\"result\": 0` (i.e., not supported in the generated summary).\n",
    "2. For each omitted subclaim, decide if omission is **reasonable** (yes/no/borderline).\n",
    "3. Provide a short explanation (≤2 sentences) for each.\n",
    "4. Assign a **numerical reasonableness score (0–5)**:\n",
    "\n",
    "   * **5** = All omissions reasonable (excellent simplification)\n",
    "   * **4** = Mostly reasonable; minor omissions could be improved\n",
    "   * **3** = Some omissions reduce clarity or omit key ideas\n",
    "   * **2** = Many key omissions or poor balance\n",
    "   * **1** = Major content loss; poor summary\n",
    "   * **0** = Incoherent simplification or severe distortion\n",
    "5. Give an **overall explanation** (3–5 sentences) summarizing your reasoning.\n",
    "\n",
    "---\n",
    "\n",
    "**OUTPUT FORMAT (strict):**\n",
    "\n",
    "```json\n",
    "{{\n",
    "  \"evaluation_table\": [\n",
    "    {{\n",
    "      \"id\": <subclaim_id>,\n",
    "      \"subclaim\": \"<text>\",\n",
    "      \"reasonable_omission\": \"<yes / no / borderline>\",\n",
    "      \"explanation\": \"<why it was or wasn't reasonable to omit>\"\n",
    "    }}\n",
    "  ],\n",
    "  \"reasonableness_score\": <0-5>,\n",
    "  \"overall_explanation\": \"<short paragraph explaining judgment>\"\n",
    "}}\n",
    "```\n",
    "    '''"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c94fe25a",
   "metadata": {},
   "outputs": [],
   "source": [
    "def return_promptsV2(reference_summary, generated_summary, subclaims_json, difficulty_level):\n",
    "    prompt=f'''\n",
    "            **SYSTEM / ROLE INSTRUCTION:**\n",
    "            You are a **medical readability evaluator**.\n",
    "            Your task is to judge whether omitted subclaims (those with `\"result\": 0\"`) from a generated summary are *reasonably omitted* based on the intended **readability level**: *easy*, *intermediate*, or *hard*.\n",
    "            You evaluate this from the standpoint of clarity, faithfulness, and readability goals.\n",
    "\n",
    "            ---\n",
    "\n",
    "            ### **READABILITY GUIDELINES**\n",
    "\n",
    "            | Level            | Target Audience                          | Content Expectation                                             | Technical Detail Allowed                                         |\n",
    "            | :--------------- | :--------------------------------------- | :-------------------------------------------------------------- | :--------------------------------------------------------------- |\n",
    "            | **Easy**         | General public                           | Focus on main events, outcomes, and diagnoses in plain Spanish. | Minimal — avoid measurements, anatomy, and test results.         |\n",
    "            | **Intermediate** | Educated lay readers or medical students | Include key findings and procedures in simplified form.         | Moderate — basic terms and causes allowed.                       |\n",
    "            | **Hard**         | Medical professionals                    | Retain most technical information and precision.                | High — measurements, anatomy, and test interpretations expected. |\n",
    "\n",
    "            ---\n",
    "\n",
    "            ### **INPUT FIELDS**\n",
    "\n",
    "            **Reference summary:**\n",
    "            {reference_summary}\n",
    "\n",
    "            **Generated summary ({difficulty_level}):**\n",
    "            {generated_summary}\n",
    "\n",
    "            **Subclaims and results:**\n",
    "            {subclaims_json}\n",
    "\n",
    "            ---\n",
    "\n",
    "            ### **TASK INSTRUCTIONS**\n",
    "\n",
    "            1. Focus on subclaims with `\"result\": 0\"` (not supported by the generated summary).\n",
    "            2. For each omitted subclaim:\n",
    "\n",
    "            * Decide whether omission is **reasonable** given the readability level.\n",
    "            * Label as: `\"yes\"`, `\"no\"`, or `\"borderline\"`.\n",
    "            * Write a brief justification (1–2 sentences).\n",
    "            3. After individual evaluations, assign a **reasonableness score (0–5)** using this scale:\n",
    "\n",
    "            * **5** = All omissions appropriate for target readability.\n",
    "            * **4** = Minor omissions could improve completeness.\n",
    "            * **3** = Some omissions reduce understanding or medical clarity.\n",
    "            * **2** = Many important omissions harm faithfulness.\n",
    "            * **1** = Major omissions misrepresent case.\n",
    "            * **0** = Summary fails to reflect key medical information.\n",
    "            4. End with an **overall explanation (3–5 sentences)** describing:\n",
    "\n",
    "            * The main reasoning behind the score.\n",
    "            * Whether the summary fits its intended readability level.\n",
    "            * Suggestions for improvement if needed.\n",
    "\n",
    "            ---\n",
    "\n",
    "            ### **OUTPUT FORMAT (strict JSON)**\n",
    "\n",
    "            ```json\n",
    "            {{\n",
    "            \"evaluation_table\": [\n",
    "                {{\n",
    "                \"id\": <subclaim_id>,\n",
    "                \"subclaim\": \"<text>\",\n",
    "                \"reasonable_omission\": \"<yes | no | borderline>\",\n",
    "                \"explanation\": \"<short reason>\"\n",
    "                }}\n",
    "            ],\n",
    "            \"reasonableness_score\": <0-5>,\n",
    "            \"overall_explanation\": \"<concise paragraph>\"\n",
    "            }}\n",
    "            ```\n",
    "            '''\n",
    "    return prompt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0162eddf",
   "metadata": {},
   "outputs": [],
   "source": [
    "def return_prompts_attribution(reference_full_text, generated_summary, subclaims_json, difficulty_level):\n",
    "    return f'''\n",
    "### **SYSTEM / ROLE INSTRUCTION**\n",
    "\n",
    "You are a **medical factuality and attribution evaluator**.\n",
    "You will assess whether **unsupported subclaims** in a generated summary (those with `\"result\": 0\"`) are *reasonable additions* based on the readability level (*easy / intermediate / hard*).\n",
    "\n",
    "The goal is to determine whether these **extra pieces of information** are acceptable simplifications or *hallucinations* that reduce factual faithfulness.\n",
    "\n",
    "---\n",
    "\n",
    "### **READABILITY & ATTRIBUTION GUIDELINES**\n",
    "\n",
    "| Level            | Audience                         | Content Goal                                                           | Allowable Additions                                                                |\n",
    "| :--------------- | :------------------------------- | :--------------------------------------------------------------------- | :--------------------------------------------------------------------------------- |\n",
    "| **Easy**         | General public                   | Simplify and clarify events                                            | Allow general background info or lay explanations, but not new facts or diagnoses. |\n",
    "| **Intermediate** | Educated layperson / med student | Add brief clarifications or causal context if consistent with the text | Allow inferred, non-contradictory context; avoid adding unconfirmed data.          |\n",
    "| **Hard**         | Medical professional             | Maintain factual precision                                             | No additions; everything must be supported by source text.                         |\n",
    "\n",
    "---\n",
    "\n",
    "### **INPUT FIELDS**\n",
    "\n",
    "**Reference full text:**\n",
    "{reference_full_text}\n",
    "\n",
    "**Generated summary ({difficulty_level}):**\n",
    "{generated_summary}\n",
    "\n",
    "**Subclaims and results:**\n",
    "{subclaims_json}\n",
    "\n",
    "---\n",
    "\n",
    "### **TASK INSTRUCTIONS**\n",
    "\n",
    "1. Focus only on subclaims with `\"result\": 0\"` (not supported by the input text).\n",
    "2. For each unsupported subclaim:\n",
    "\n",
    "   * Judge whether adding it is **reasonable** for the given readability level.\n",
    "   * Choose one of: `\"reasonable addition\"`, `\"unnecessary but harmless\"`, `\"misleading / hallucinated\"`.\n",
    "   * Provide a **1–2 sentence justification** explaining your reasoning.\n",
    "3. After all evaluations, assign a **numerical attribution score (0–5)**:\n",
    "\n",
    "   * **5** = All additions are reasonable or harmless simplifications.\n",
    "   * **4** = Mostly reasonable; minor harmless additions.\n",
    "   * **3** = Some misleading or unjustified additions.\n",
    "   * **2** = Many factual inaccuracies.\n",
    "   * **1** = Serious hallucinations; distorts source meaning.\n",
    "   * **0** = Highly unfaithful; mostly invented content.\n",
    "4. End with an **overall explanation (3–5 sentences)** summarizing your reasoning and suggestions.\n",
    "\n",
    "---\n",
    "\n",
    "### **OUTPUT FORMAT (strict JSON)**\n",
    "\n",
    "```json\n",
    "{{\n",
    "  \"evaluation_table\": [\n",
    "    {{\n",
    "      \"id\": <subclaim_id>,\n",
    "      \"subclaim\": \"<text>\",\n",
    "      \"evaluation\": \"<reasonable addition | unnecessary but harmless | misleading / hallucinated>\",\n",
    "      \"explanation\": \"<short justification>\"\n",
    "    }}\n",
    "  ],\n",
    "  \"attribution_score\": <0-5>,\n",
    "  \"overall_explanation\": \"<concise summary of your judgment>\"\n",
    "}}\n",
    "```\n",
    "'''"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "efec346c",
   "metadata": {},
   "outputs": [],
   "source": [
    "def revised_results(reference_summary, generated_summary, list_of_missing_subclaims, difficulty_level):\n",
    "    return f'''\n",
    "### **SYSTEM / ROLE INSTRUCTION**\n",
    "\n",
    "You are a **medical text rewriting assistant** that improves summaries while maintaining the intended readability level (*easy / intermediate / hard*).\n",
    "You will receive:\n",
    "\n",
    "* The **original reference summary** (the factual source)\n",
    "* The **current generated summary**\n",
    "* A list of **important missing subclaims** to be reintroduced\n",
    "* The **target readability level**\n",
    "\n",
    "Your task:\n",
    "Revise the generated summary so that it **adds the missing information** naturally, while keeping:\n",
    "\n",
    "* The same **tone, vocabulary, and sentence simplicity** of the given readability level.\n",
    "* Logical **flow and coherence**.\n",
    "* No extra, invented information beyond what’s in the reference summary.\n",
    "\n",
    "---\n",
    "\n",
    "### **INPUT FIELDS**\n",
    "\n",
    "**Reference summary:**\n",
    "{reference_summary}\n",
    "\n",
    "**Current generated summary ({difficulty_level}):**\n",
    "{generated_summary}\n",
    "\n",
    "**Missing important subclaims to add back:**\n",
    "{list_of_missing_subclaims}\n",
    "\n",
    "**Target readability level:**\n",
    "{difficulty_level}\n",
    "\n",
    "\n",
    "---\n",
    "\n",
    "### **TASK INSTRUCTIONS**\n",
    "\n",
    "1. Integrate the missing subclaims **smoothly** into the generated summary.\n",
    "2. Do **not** add any new facts beyond those listed.\n",
    "3. Maintain the **same readability level**:\n",
    "\n",
    "   * **Easy:** conversational, short sentences, no jargon.\n",
    "   * **Intermediate:** light medical terms, brief explanations.\n",
    "   * **Hard:** concise clinical tone with correct terminology.\n",
    "4. Keep the summary approximately the same length; avoid redundancy.\n",
    "5. Ensure the resulting text remains **fluent, coherent, and faithful** to the reference summary.\n",
    "\n",
    "---\n",
    "\n",
    "### **OUTPUT FORMAT**\n",
    "\n",
    "```json\n",
    "{{\n",
    "  \"revised_summary\": \"<the new version of the summary, rewritten with the added subclaims>\",\n",
    "  \"explanation\": \"<brief note explaining how the missing subclaims were added while preserving readability>\"\n",
    "}}\n",
    "```\n",
    "\n",
    "'''"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b5d5ad90",
   "metadata": {},
   "outputs": [],
   "source": [
    "from openai import OpenAI\n",
    "import json\n",
    "file_path = \"/home/mshahidul/api_new.json\"\n",
    "with open(file_path, \"r\") as file:\n",
    "    api_keys = json.load(file)\n",
    "\n",
    "openai_api_key = api_keys.get(\"openai\")\n",
    "\n",
    "client = OpenAI(api_key=openai_api_key)\n",
    "def openai_return(prompt):\n",
    "    response = client.chat.completions.create(\n",
    "        model=\"gpt-5-mini\",\n",
    "        messages=[\n",
    "            {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n",
    "            {\"role\": \"user\", \"content\": prompt}\n",
    "        ]\n",
    "    )\n",
    "    cleaned_response = response.choices[0].message.content.strip().replace(\"```json\", \"\").replace(\"```\", \"\")\n",
    "    return json.loads(cleaned_response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f3706ef0",
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "file_path = \"/home/mshahidul/readctrl/data/training_data_subclaim_verifier/synthetic_data_es_subclaims_100.json\"\n",
    "\n",
    "with open(file_path, 'r') as f:\n",
    "    synthetic_data = json.load(f)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "7b691bbe",
   "metadata": {},
   "outputs": [],
   "source": [
    "with open(\"/home/mshahidul/readctrl/data/testing_data_gs/multiclinsum_gs_train_es.json\", \"r\") as f_train:\n",
    "    multiclinsum_gs_train_es = json.load(f_train)\n",
    "dat_full_text={}\n",
    "dat_summary={}\n",
    "for item in multiclinsum_gs_train_es:\n",
    "    dat_full_text[item['id']]=item['fulltext']\n",
    "    dat_summary[item['id']]=item['summary']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "49f435b1",
   "metadata": {},
   "outputs": [],
   "source": [
    "# /home/mshahidul/readctrl/results/dataset_quality_check/resonability_check_100_gpt5_completeness.json\n",
    "\n",
    "\n",
    "\n",
    "with open(\"/home/mshahidul/readctrl/results/dataset_quality_check/resonability_check_100_gpt5_completeness.json\", 'r') as f:\n",
    "    readability_reasoning = json.load(f)\n",
    "# readability_reasoning[0].keys() # dict_keys(['id', 'difficulty_level', 'prompt'])\n",
    "# readability_reasoning[0]['prompt'].keys() # dict_keys(['evaluation_table', 'reasonableness_score', 'overall_explanation'])\n",
    "reason_info={}\n",
    "for item in readability_reasoning:\n",
    "    id=item['id']\n",
    "    difficulty_level=item['difficulty_level']\n",
    "    data_temp=item['prompt']\n",
    "    for _data in data_temp['evaluation_table']:\n",
    "        if _data['reasonable_omission'] == \"no\":\n",
    "            key=(id, difficulty_level)\n",
    "            if key not in reason_info:\n",
    "                reason_info[key]=[]\n",
    "            reason_info[key].append(_data['subclaim'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d74f2582",
   "metadata": {},
   "outputs": [],
   "source": [
    "file_path_qwen3_32B = \"/home/mshahidul/readctrl/results/dataset_quality_check/subclaim_verifier_results_100_qwen3-32B.json\"\n",
    "\n",
    "with open(file_path_qwen3_32B, 'r') as f:\n",
    "    qwen3_32B_results = json.load(f)\n",
    "\n",
    "# print(qwen3_32B_results[0]['completeness']['results'])\n",
    "print(qwen3_32B_results[0].keys())\n",
    "print(qwen3_32B_results[0]['completeness']['results'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4e8a38e1",
   "metadata": {},
   "outputs": [],
   "source": [
    "# dict_keys(['id', 'full_text', 'ref_summary', 'readability_versions'])\n",
    "# print(f\"Full text: {synthetic_data[0]['full_text']}\")\n",
    "import os\n",
    "# def revised_results(reference_summary, generated_summary, list_of_missing_subclaims, difficulty_level):\n",
    "res=[]\n",
    "temp=\"\"\n",
    "save_path = \"/home/mshahidul/readctrl/results/dataset_quality_check/results_revised_100_gpt5.json\"\n",
    "if os.path.exists(save_path):\n",
    "    with open(save_path, 'r') as f:\n",
    "        res = json.load(f)\n",
    "existing_check=set((entry['id'], entry['difficulty_level']) for entry in res)\n",
    "print(f\"Resuming from {len(res)} entries\")\n",
    "import tqdm\n",
    "for ind in tqdm.tqdm(range(0,100)):\n",
    "    for version in [\"easy\", \"intermediate\", \"hard\"]:\n",
    "        reference_summary = (f\"{synthetic_data[ind]['ref_summary']['text']}\")\n",
    "        generated_summary = (f\"{synthetic_data[ind]['readability_versions'][version]['text']}\")\n",
    "        if (synthetic_data[ind]['id'],version) in existing_check:\n",
    "            continue\n",
    "        if (synthetic_data[ind]['id'],version) not in reason_info:\n",
    "            continue\n",
    "        subclaims_results = reason_info[(synthetic_data[ind]['id'],version)]\n",
    "        prompt = revised_results(reference_summary, generated_summary, subclaims_results, version)\n",
    "        print(prompt)\n",
    "        assert False\n",
    "        ans=openai_return(prompt)\n",
    "        res.append({\n",
    "            \"id\": synthetic_data[ind]['id'],\n",
    "            \"difficulty_level\": version,\n",
    "            \"prompt\": prompt,\n",
    "            \"response\": ans\n",
    "        })\n",
    "        \n",
    "        if len(res)%2==0:\n",
    "            print(f\"Completed {len(res)} out of 300\")\n",
    "            with open(save_path, 'w') as outfile:\n",
    "                json.dump(res, outfile, indent=2)\n",
    "    temp=res\n",
    "    assert False\n",
    "        # print(prompt)\n",
    "        # assert False\n",
    "with open(save_path, 'w') as outfile:\n",
    "    json.dump(res, outfile, indent=2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b89ff032",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ff82e523",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "unsloth",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}