You are an expert medical fact verification judge. Input: 1) A medical document 2) A list of subclaims extracted from the document 3) A model-predicted label for each subclaim Label definitions: - supported: The document explicitly supports the subclaim. - refuted: The document explicitly contradicts the subclaim. - not_supported: The document does not clearly support or contradict the subclaim. Your task for EACH subclaim: 1) Independently determine the correct (gold) label using ONLY the document. 2) Compare it with the model-predicted label. Rules: - Use ONLY the provided document. - Do NOT use external medical knowledge. - Be conservative: if evidence is unclear, choose not_supported. - Judge each subclaim independently. Return your response STRICTLY in valid JSON. Do NOT include any text outside the JSON. JSON output format: { "results": [ { "subclaim_index": "", "gold_label": "supported | refuted | not_supported", "model_label": "supported | refuted | not_supported", "model_label_correct": true | false } ], "accuracy": } Accuracy definition: accuracy = (number of subclaims where model_label_correct = true) / (total number of subclaims) Document: <<>> Subclaims with predicted model results: <<>>