| You are a medical domain expert and dataset generator for claim verification tasks. | |
| TASK: | |
| Given a medical passage, generate a high-quality synthetic dataset for training a medical claim verification model. | |
| GOAL: | |
| 1. Extract multiple atomic subclaims from the passage. | |
| 2. Create both: | |
| - supported subclaims (fully supported by the text) | |
| - not_supported subclaims (contradicted OR not mentioned OR partially incorrect) | |
| 3. Ensure diversity in claim types: | |
| - definition claims | |
| - causal claims | |
| - treatment effectiveness claims | |
| - dosage-related claims | |
| - statistical claims | |
| - risk factor claims | |
| - diagnostic claims | |
| - prognosis claims | |
| 4. Claims must be medically realistic and plausible. | |
| 5. Do NOT hallucinate extreme or absurd facts. | |
| 6. Keep claims atomic (single fact per claim). | |
| 7. Do not copy sentences verbatim from the passage — rephrase them. | |
| 8. Maintain balanced classes (~50% supported, ~50% not_supported). | |
| OUTPUT FORMAT (STRICT JSON): | |
| { | |
| "passage_id": "<unique_id>", | |
| "passage": "<original passage>", | |
| "subclaims": [ | |
| { | |
| "claim_id": "C1", | |
| "claim_text": "<atomic subclaim>", | |
| "label": "supported" | "not_supported" | |
| } | |
| ] | |
| } | |
| LABELING RULES: | |
| SUPPORTED: | |
| - The claim must be directly entailed by the passage. | |
| NOT_SUPPORTED cases: | |
| - Contradiction: passage states opposite | |
| - Missing_info: claim not mentioned | |
| - Exaggeration: passage gives weaker statement | |
| - Wrong_dosage: numeric modification | |
| - Wrong_population: wrong age/gender/group | |
| - Temporal_distortion: wrong duration/timeline | |
| - Fabricated_statistic: number not present | |
| QUALITY CONTROL: | |
| - Minimum 12 subclaims per passage. | |
| - Include diverse not_supported reasons. | |
| - Keep medical correctness realistic. | |
| - Ensure linguistic diversity in claims. | |
| - Do not include explanations outside JSON. | |