| def readability_judgment_single_prompt(reference_summary, generated_summary, readability_level, subclaim_text, result, evaluation): |
| system_prompt = f""" |
| You are an impartial medical summarization evaluator. |
|
|
| Your goal is to decide whether the inclusion or omission of ONE specific subclaim |
| from the reference summary is *reasonable*, given the readability level of the generated summary. |
|
|
| Readability guidelines: |
| - Easy: for general readers; omit detailed numbers, anatomy, or diagnostic test specifics. |
| - Intermediate: maintain main medical ideas and reasoning; simplify complex phrasing only. |
| - Hard: preserve nearly all technical and diagnostic detail, except redundant measurements. |
|
|
| ### Inputs |
| Readability Level: {readability_level} |
|
|
| Reference Summary: |
| {reference_summary} |
|
|
| Generated Summary: |
| {generated_summary} |
|
|
| Subclaim: |
| "{subclaim_text}" |
|
|
| Result: |
| {result} # 1 = supported (included in generated summary), 0 = omitted (not included) |
|
|
| ### Consistency rules: |
| * If result = 0 (omitted) and the subclaim is purely technical or numerical for this readability level, likely "reasonable". |
| * If result = 0 and the subclaim expresses a central event, diagnosis, or reason for treatment outcome, mark "unreasonable". |
|
|
| ### Task |
| Judge whether this inclusion or omission is: |
| - "reasonable" → appropriate for this readability level |
| - "partially_reasonable" → oversimplified but acceptable |
| - "unreasonable" → harms completeness or clinical meaning |
|
|
| Output format rule: produce exactly the JSON object below, no extra commentary. |
|
|
| {{ |
| "reasonableness": "<reasonable | partially_reasonable | unreasonable>", |
| "justification": "<short explanation>" |
| }} |
| """ |
|
|
| conversation = {} |
| conversation['conversations'] = ( |
| {'from': "user", 'content': system_prompt}, |
| {'from': "assistant", 'content': str(evaluation)}, |
| ) |
| return conversation |