Update README.md
Browse files
README.md
CHANGED
|
@@ -14,12 +14,17 @@ licence: license
|
|
| 14 |
This model is a fine-tuned version of [google/medgemma-4b-it](https://huggingface.co/google/medgemma-4b-it).
|
| 15 |
It has been trained using [TRL](https://github.com/huggingface/trl).
|
| 16 |
|
| 17 |
-
The ClinicalIntelligence/saama_gemma is a fine-tuned MedGemma model designed to transform unstructured clinical narratives—such as discharge notes—into structured, SDTM-aligned datasets (e.g., Adverse Events, Medical History, Procedures). Trained on an SME-curated dataset derived from MIMIC-III, the model treats clinical data extraction as a complex reasoning task, explicitly evaluating assertion, temporality, and causality to generate accurate, traceable JSON outputs. By learning regulatory semantics directly, it significantly outperforms base models in domain grounding and schema consistency. Users should note current limitations regarding context window constraints for lengthy notes, rare abbreviation handling, and the resolution of multi-domain entities.
|
| 18 |
|
| 19 |
|
| 20 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
|
| 22 |
## Quick start
|
|
|
|
| 23 |
|
| 24 |
```python
|
| 25 |
import re
|
|
@@ -36,6 +41,7 @@ generator = pipeline(
|
|
| 36 |
output = generator(
|
| 37 |
[{"role": "user", "content": prefix + unstructured_text}],
|
| 38 |
return_full_text=False,
|
|
|
|
| 39 |
)[0]
|
| 40 |
llm_output = output["generated_text"]
|
| 41 |
|
|
|
|
| 14 |
This model is a fine-tuned version of [google/medgemma-4b-it](https://huggingface.co/google/medgemma-4b-it).
|
| 15 |
It has been trained using [TRL](https://github.com/huggingface/trl).
|
| 16 |
|
| 17 |
+
The [ClinicalIntelligence/saama_gemma](https://huggingface.co/ClinicalIntelligence/saama_gemma) is a fine-tuned MedGemma model designed to transform unstructured clinical narratives—such as discharge notes—into structured, SDTM-aligned datasets (e.g., Adverse Events, Medical History, Procedures). Trained on an SME-curated dataset derived from MIMIC-III, the model treats clinical data extraction as a complex reasoning task, explicitly evaluating assertion, temporality, and causality to generate accurate, traceable JSON outputs. By learning regulatory semantics directly, it significantly outperforms base models in domain grounding and schema consistency. Users should note current limitations regarding context window constraints for lengthy notes, rare abbreviation handling, and the resolution of multi-domain entities.
|
| 18 |
|
| 19 |
|
| 20 |
|
| 21 |
+
## INSTALLATION
|
| 22 |
+
```
|
| 23 |
+
pip install -U transformers
|
| 24 |
+
```
|
| 25 |
|
| 26 |
## Quick start
|
| 27 |
+
**NOTE** - Adjust the **max_new_tokens** parameter as needed; it is set to 3000 by default.
|
| 28 |
|
| 29 |
```python
|
| 30 |
import re
|
|
|
|
| 41 |
output = generator(
|
| 42 |
[{"role": "user", "content": prefix + unstructured_text}],
|
| 43 |
return_full_text=False,
|
| 44 |
+
max_new_tokens=3000
|
| 45 |
)[0]
|
| 46 |
llm_output = output["generated_text"]
|
| 47 |
|