Upload folder using huggingface_hub
Browse files
README.md
CHANGED
|
@@ -22,7 +22,8 @@ NV-Reason-CXR-3B is a specialized vision-language model designed for medical rea
|
|
| 22 |
|
| 23 |
This model is for research and development only.
|
| 24 |
|
| 25 |
-
|
|
|
|
| 26 |
|
| 27 |
## Quick start
|
| 28 |
|
|
@@ -111,18 +112,18 @@ This model is designed for research and educational purposes only and should not
|
|
| 111 |
Huggingface: 10/27/2025 via https://huggingface.co/NVIDIA
|
| 112 |
|
| 113 |
## Model Architecture:
|
| 114 |
-
**Architecture Type:** Transformer
|
| 115 |
-
**Network Architecture:** Vision-Language Model based on Qwen2.5-VL architecture with medical reasoning capabilities
|
| 116 |
|
| 117 |
This model was developed by fine-tuning Qwen2.5-VL-3B using Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO) for enhanced medical reasoning.
|
| 118 |
**Number of model parameters:** 3B
|
| 119 |
|
| 120 |
|
| 121 |
## Input:
|
| 122 |
-
**Input Type(s):** Image, Text
|
| 123 |
-
**Input Format(s):** Medical images (JPEG, PNG), Text prompts (string)
|
| 124 |
-
**Input Parameters:** Two-Dimensional (2D) images with accompanying text queries (1D)
|
| 125 |
-
**Other Properties Related to Input:** Supports frontal chest X-ray images with flexible scaling. Accepts natural language prompts for medical queries, follow-up questions, and reasoning requests. Input images are automatically processed without specific size constraints.
|
| 126 |
|
| 127 |
### Input Specifications:
|
| 128 |
- **Medical Images:** Chest X-ray images in standard medical imaging formats
|
|
@@ -130,10 +131,10 @@ This model was developed by fine-tuning Qwen2.5-VL-3B using Supervised Fine-Tuni
|
|
| 130 |
- **Interactive Dialogue:** Support for follow-up questions and clarification requests
|
| 131 |
|
| 132 |
## Output:
|
| 133 |
-
**Output Type(s):** Text
|
| 134 |
-
**Output Format:** Structured reasoning with XML-like tags
|
| 135 |
-
**Output Parameters:** One-Dimensional (1D) Natural language reasoning and analysis
|
| 136 |
-
**Other Properties Related to Output:** Outputs contain structured thinking processes enclosed in `<thinking>` tags showing step-by-step medical reasoning, followed by concise answers in `<answer>` tags. This format enables transparency in the model's diagnostic reasoning process and supports educational use cases.
|
| 137 |
|
| 138 |
Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (GPU cores) and software frameworks (CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
|
| 139 |
|
|
@@ -165,32 +166,6 @@ Large-scale chest X-ray datasets including MIMIC-CXR, ChestXRay14, and CheXpert.
|
|
| 165 |
* Image
|
| 166 |
* Text
|
| 167 |
|
| 168 |
-
**Image Training Data Size:**
|
| 169 |
-
* Less than a Million Images
|
| 170 |
-
|
| 171 |
-
**Text Training Data Size:**
|
| 172 |
-
* Less than a Billion Tokens
|
| 173 |
-
|
| 174 |
-
**Data Collection Method by dataset:**
|
| 175 |
-
* Hybrid: Human, Automatic/Sensors
|
| 176 |
-
|
| 177 |
-
**Labeling Method by dataset:**
|
| 178 |
-
* Hybrid: Human, Synthetic
|
| 179 |
-
|
| 180 |
-
## Testing Dataset:
|
| 181 |
-
**Data Collection Method by dataset:**
|
| 182 |
-
* Hybrid: Human, Automatic/Sensors
|
| 183 |
-
|
| 184 |
-
**Labeling Method by dataset:**
|
| 185 |
-
* Hybrid: Human, Synthetic
|
| 186 |
-
|
| 187 |
-
## Evaluation Dataset:
|
| 188 |
-
**Data Collection Method by dataset:**
|
| 189 |
-
* Hybrid: Human, Automatic/Sensors
|
| 190 |
-
|
| 191 |
-
**Labeling Method by dataset:**
|
| 192 |
-
* Hybrid: Human, Synthetic
|
| 193 |
-
|
| 194 |
## Inference:
|
| 195 |
**Acceleration Engine:** PyTorch, Transformers
|
| 196 |
**Test Hardware:**
|
|
|
|
| 22 |
|
| 23 |
This model is for research and development only.
|
| 24 |
|
| 25 |
+
💻 [\[Github code\]](https://github.com/NVIDIA-Medtech/NV-Reason-CXR)
|
| 26 |
+
🩻 [\[Web Demo\]](https://huggingface.co/spaces/nvidia/nv-reason-cxr)
|
| 27 |
|
| 28 |
## Quick start
|
| 29 |
|
|
|
|
| 112 |
Huggingface: 10/27/2025 via https://huggingface.co/NVIDIA
|
| 113 |
|
| 114 |
## Model Architecture:
|
| 115 |
+
- **Architecture Type:** Transformer
|
| 116 |
+
- **Network Architecture:** Vision-Language Model based on Qwen2.5-VL architecture with medical reasoning capabilities
|
| 117 |
|
| 118 |
This model was developed by fine-tuning Qwen2.5-VL-3B using Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO) for enhanced medical reasoning.
|
| 119 |
**Number of model parameters:** 3B
|
| 120 |
|
| 121 |
|
| 122 |
## Input:
|
| 123 |
+
- **Input Type(s):** Image, Text
|
| 124 |
+
- **Input Format(s):** Medical images (JPEG, PNG), Text prompts (string)
|
| 125 |
+
- **Input Parameters:** Two-Dimensional (2D) images with accompanying text queries (1D)
|
| 126 |
+
- **Other Properties Related to Input:** Supports frontal chest X-ray images with flexible scaling. Accepts natural language prompts for medical queries, follow-up questions, and reasoning requests. Input images are automatically processed without specific size constraints.
|
| 127 |
|
| 128 |
### Input Specifications:
|
| 129 |
- **Medical Images:** Chest X-ray images in standard medical imaging formats
|
|
|
|
| 131 |
- **Interactive Dialogue:** Support for follow-up questions and clarification requests
|
| 132 |
|
| 133 |
## Output:
|
| 134 |
+
- **Output Type(s):** Text
|
| 135 |
+
- **Output Format:** Structured reasoning with XML-like tags
|
| 136 |
+
- **Output Parameters:** One-Dimensional (1D) Natural language reasoning and analysis
|
| 137 |
+
- **Other Properties Related to Output:** Outputs contain structured thinking processes enclosed in `<thinking>` tags showing step-by-step medical reasoning, followed by concise answers in `<answer>` tags. This format enables transparency in the model's diagnostic reasoning process and supports educational use cases.
|
| 138 |
|
| 139 |
Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (GPU cores) and software frameworks (CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
|
| 140 |
|
|
|
|
| 166 |
* Image
|
| 167 |
* Text
|
| 168 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 169 |
## Inference:
|
| 170 |
**Acceleration Engine:** PyTorch, Transformers
|
| 171 |
**Test Hardware:**
|