vimedllm / docs /data_description.md
VuvanAn's picture
Upload folder using huggingface_hub
cc37925 verified

A newer version of the Gradio SDK is available: 6.2.0

Upgrade

Dataset Description

Overview

This document describes the datasets used in this research project.

Dataset 1: ViMedAQA (heart related question filtered)

Description

ViMedAQA: A Vietnamese Medical Abstractive Question-Answering Dataset and Findings of Large Language Model

Source

Statistics

  • Total samples: 1456
  • Average text length: [Number] tokens
  • Max text length: [Number] tokens
  • Min text length: [Number] tokens

Format

{
  "question_idx": "body-part_2201",
  "question": "Khi hình dạng liềm ở góc móng chân tay biến mất thì có thể là dấu hiệu của những tình trạng nào?",
  "answer": "Khi hình dạng liềm ở gốc móng biến mất thì có thể là dấu hiệu của suy dinh dưỡng, trầm cảm hay thiếu máu.",
  "context": "Bạn có nhìn thấy những đường cong nhỏ tròn màu trắng ở gốc móng tay của bạn nhưng không phải ai cũng có chúng. Hầu hết sự có mặt hay không có chúng không có nghĩa lý gì và chúng có thể được ẩn dưới da của bạn. Nếu chúng biến mất, đó có thể là dấu hiệu của tình trạng: - Suy dinh dưỡng.\n- Trầm cảm.\n- Thiếu máu.",
  "title": "Bất thường của móng tay chân - Móng không có hình liềm ở gốc móng",
  "keyword": "Móng tay chân",
  "topic": 0,
  "article_url": "https://youmed.vn/tin-tuc/nhung-bat-thuong-ve-mong-tay-chan/",
  "author": "Bác sĩ Hoàng Thị Việt Trinh",
  "author_url": "https://youmed.vn/tin-tuc/bac-si/bac-si-hoang-thi-viet-trinh/"
}

Preprocessing Steps

  1. Text cleaning: Remove special characters, normalize whitespace
  2. Tokenization: Using [tokenizer name]
  3. Length filtering: Remove texts shorter than [X] tokens
  4. Label encoding: Convert labels to numeric format
  5. Data splitting: 80% train, 10% validation, 10% test

Dataset 2: MedMCQA (heart related filtered)

Description

MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering

Source

Statistics

  • Total samples: 2144

Format

{
  "id": "405b7c79-b6ac-4407-977c-e5595bba56c4",
  "question": "A 46-year-old man presents with diffuse chest pain at rest and recent history of cough, fever, and rhinorrhea lasting for 3 days.",
  "options": {
    "opa": "Acute pericarditis",
    "opb": "Constrictive pericarditis",
    "opc": "Takotsubo-cardiomyopathy",
    "opd": "Cor pulmonale"
  },
  "correct_option": 0,
  "choice_type": "single",
  "explanation": "Ans. (a) Acute pericarditis. The tracing reveals sinus rhythm at approximately 75 beats/min. The PR interval is prolonged to 200 milliseconds, consistent with borderline first-degree AV block. The QRS axis and intervals are normal. ST elevations with concave upward morphology are seen in I and aVL, II and aVF, and V2 through V6. No Q waves are present. Furthermore, subtle PR-segment depression is seen in leads I and II. The differential diagnosis for ST-segment elevation includes, among other things, acute myocardial infarction, pericarditis, and left ventricular aneurysm. In this case, the upward concavity of the ST segment, the PR-segment depression, the lack of Q waves, and the diffuse nature of the ST-segment elevation in more than one coronary artery distribution make pericarditis the likely etiology. Patients with pericarditis will complain of chest pain, typically described as sharp and pleuritic. Radiation is to the trapezius ridge. The pain is improved with sitting up and leaning forward and worsened by leaning backward.",
  "subject_name": "Medicine",
  "topic_name": "Electrocardiography"
}

Preprocessing Steps

[List preprocessing steps]

Dataset 3: MedAB QA

Description

The crawled QA dataset from the online examination.

Statistics

  • Total samples: 1150

Format

{
  "question": "Áp lực tĩnh mạch trung tâm được đo ở............và thường bằng............:",
  "options": {
    "A": "Nhĩ trái; 0 mmHg",
    "B": "Nhĩ phải; 12 cm H2O",
    "C": "Tĩnh mạch chủ trên; -2 mmHg",
    "D": "Tĩnh mạch dưới đòn; 0 mmHg",
    "E": "Nhĩ phải; 0 mmHg"
  },
  "answer": "B"
}

Dataset 4: Mimic_ex

Description

Mimic_ex: A dataset derived from the MIMIC-III database, focusing on medical examinations and related data.

Source

Statistics

  • Total samples: 44914

Format

 baby girl is a 1,385 gram, former 30 and week premature baby, born to an 18 year old, gravida i, para 0, now i, mother with prenatal serologies as follows: a positive, antibody negative, rpr nonreactive, hepatitis b surface antigen negative; gbs unknown. pregnancy was complicated by pprom on when the mother was transferred from hospital to . mother received betamethasone times two as well as ampicillin and erythromycin. she progressed to a spontaneous vaginal delivery on the morning of . the baby emerged vigorous with spontaneous cry; apgars of eight and nine. she was warm, dried and bulb suctioned in the delivery room and brought to the neonatal intensive care unit for further management for prematurity. physical examination: weight 1,385 grams (25th to 50th percentile); length 38 cms (10 to 25 percentile); head circumference 27.5 cms (10 to 25 percentile). she was an active, alert infant, pink, appropriate for gestational age of 31 weeks. anterior fontanel was open and flat with some molding and caput. no dysmorphism. lungs clear to auscultation. heart regular rate and rhythm without murmurs. abdomen was soft without hepatosplenomegaly or masses. hips were stable. premature female genitalia. extremities were well perfused. hospital course: 1.) respiratory: baby girl remained stable on room air throughout her neonatal intensive care unit stay at . she had one apnea and bradycardia episode on day of life five, requiring mild stimulation. 2.) cardiovascular: baby girl had seemed hemodynamically stable throughout her neonatal intensive care unit stay. she had no murmurs on examination. 3.) fluids, electrolytes and nutrition: baby girl had gradually been advanced to total fluids of 150 cc per kg per day; currently tolerating breast milk 22, maintaining good blood glucose. her admission weight was 1,385 grams; her weight on day of life seven prior to discharge was 1,445 grams. gastrointestinal: baby girl ' bilirubin level peaked on day of life three at 8.3, at which time phototherapy was initiated. subsequently, her bilirubin level was 4.2 on day of life six, at which time the phototherapy was discontinued. her rebound bili on day of life seven was 5.1. infectious disease: baby girl was initiated on ampicillin and gentamycin for rule out sepsis. her blood culture remained negative at 48 hours at which time the antibiotics were discontinued. hematology: the patient's initial hematocrit was 42.8 and required no transfusions during this admission. neurology: baby girl had a screening head ultrasound on day of life seven which was negative. condition at transfer: baby girl has been stable on room air and hemodynamically stable, tolerating full feeds of breast milk 22. discharge disposition: baby girl is being discharged to special care nursery. care and recommendations: feeds at discharge: total fluids of 150 cc per kg per day with breast milk 24. medications: none. state newborn screen: sent. follow-up appointment: recommended in two to three days after discharge from the neonatal intensive care unit. discharge diagnoses: prematurity at 31 weeks. rule out sepsis. , m.d. dictated by: medquist36 Procedure: Parenteral infusion of concentrated nutritional substances Enteral infusion of concentrated nutritional substances Other phototherapy Diagnoses: Observation for suspected infectious condition Single liveborn, born in hospital, delivered without mention of cesarean section Neonatal jaundice associated with preterm delivery Other preterm infants, 1,250-1,499 grams 29-30 completed weeks of gestation
allergies: penicillins attending: chief complaint: cc: major surgical or invasive procedure: stereotactic brain biopsy, neuronavigation guided tumor resection. 

Dataset 5: YouMed

Description

YouMed: Crawled from QA page of YouMed Website

Source

Statistics

  • Total samples: 309

Dataset 6: ViWiki (heart relate article filtered)

Description

ViWiki: Crawled from the Vi Wikipedia website

Source

Statistics

  • Total samples: 250

References

  1. [Dataset paper citation]
  2. [Related work citations]
  3. [Preprocessing methodology citations]