Spaces:

UTAustin-AIHealth
/

README

Configuration error

App Files Files Community

SP2001 commited on Feb 19, 2025

Commit

e01a1c5

verified ·

1 Parent(s): 78a77be

Update README.md

Browse files

Files changed (1) hide show

README.md +32 -12

README.md CHANGED Viewed

@@ -1,38 +1,58 @@
 # UTAustin-AIHealth
-Welcome to **UTAustin-AIHealth** – a hub dedicated to advancing research in medical AI. Our flagship contribution is the **MedHallu** dataset, which underpins our recent work:
 **MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models**
-MedHallu is a rigorously designed dataset that provides a benchmark for evaluating large language models in detecting hallucinations in medical question-answering tasks. Our goal is to help researchers and practitioners improve the reliability of AI in the medical domain, thereby enhancing patient safety and clinical decision-making.
 ---
-## License
-This dataset and associated resources are distributed under the **MIT License**.
----
 ## How to Use MedHallu
 - **Downloading the Dataset:**
-  Detailed instructions for downloading MedHallu are provided on our website and accompanying documentation.
-- **Usage Guidelines:**
-  We offer example code and tutorials to help you integrate the dataset into your evaluation pipelines. Please refer to our documentation for step-by-step guidance.
 ---
 ## Citations
 If you find MedHallu useful in your research, please consider citing our work:
 ```bibtex
-@inproceedings{your-citation-key,
   title={MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models},
-  author={Your Name and Collaborators},
-  booktitle={Conference/Journal Name},
   year={2025},
-  publisher={Publisher Name}
 }

 # UTAustin-AIHealth
+Welcome to **UTAustin-AIHealth** – a hub dedicated to advancing research in medical AI.
+This repo contains the **MedHallu** dataset, which underpins our recent work:
 **MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models**
+MedHallu is a rigorously designed benchmark intended to evaluate large language models' ability to detect hallucinations in medical question-answering tasks.
+The dataset is organized into two distinct splits:
+- **pqa_labeled:** Contains 1,000 high-quality, human-annotated samples derived from PubMedQA.
+- **pqa_artificial:** Contains 9,000 samples generated via an automated pipeline from PubMedQA.
 ---
+## Setup Environment
+To work with the MedHallu dataset, please install the Hugging Face `datasets` library using pip:
+```bash
+pip install datasets
 ## How to Use MedHallu
 - **Downloading the Dataset:**
+  ```python3
+  from datasets import load_dataset
+  # Load the 'pqa_labeled' split: 1,000 high-quality, human-annotated samples.
+  medhallu_labeled = load_dataset("UTAustin-AIHealth/MedHallu", "pqa_labeled")
+  # Load the 'pqa_artificial' split: 9,000 samples generated via an automated pipeline.
+  medhallu_artificial = load_dataset("UTAustin-AIHealth/MedHallu", "pqa_artificial")
+  ```
 ---
+## License
+This dataset and associated resources are distributed under the [MIT License](https://opensource.org/license/mit/).
+---
 ## Citations
 If you find MedHallu useful in your research, please consider citing our work:
 ```bibtex
+@misc{MedHallu,
   title={MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models},
+  author={},
+  booktitle={},
   year={2025},
+  publisher={}
 }