Spaces:
Configuration error
Configuration error
Update README.md
Browse files
README.md
CHANGED
|
@@ -14,17 +14,17 @@ short_description: Synthetic medical training data for Australian healthcare AI
|
|
| 14 |
|
| 15 |
**Synthetic medical training data for Australian healthcare AI.**
|
| 16 |
|
| 17 |
-
We build PHI-free synthetic document libraries that look and behave like real clinical PDFs
|
| 18 |
|
| 19 |
## What we ship
|
| 20 |
|
| 21 |
-
- π¦πΊ **[Synthetic Australian Medical Documents](https://huggingface.co/datasets/RootCauseAnalytics/synthetic-australian-medical-documents-sample)**
|
| 22 |
- π οΈ **Custom commissions** β bespoke document mixes, additional jurisdictions, or hospital-specific branding for teams with niche training needs.
|
| 23 |
- βοΈ **Generator licences** β run the synthesis pipeline yourself with source code and seeds for unlimited internal generation.
|
| 24 |
|
| 25 |
## Why synthetic
|
| 26 |
|
| 27 |
-
The bottleneck for medical document AI in Australia is training data. Real hospital PDFs are locked behind the Privacy Act. Generic synthetic medical text has no layout, no scans, no labels
|
| 28 |
|
| 29 |
We sit in that gap: **visually realistic, jurisdiction-specific, fully labelled, zero PHI risk**.
|
| 30 |
|
|
|
|
| 14 |
|
| 15 |
**Synthetic medical training data for Australian healthcare AI.**
|
| 16 |
|
| 17 |
+
We build PHI-free synthetic document libraries that look and behave like real clinical PDFs - so teams can train OCR, layout-aware extraction, and clinical NLP models without waiting 18 months for ethics approval.
|
| 18 |
|
| 19 |
## What we ship
|
| 20 |
|
| 21 |
+
- π¦πΊ **[Synthetic Australian Medical Documents](https://huggingface.co/datasets/RootCauseAnalytics/synthetic-australian-medical-documents-sample)** - 5,000 PDFs across 45 document types, modelled on NSW Health practice. Pre-labelled with structured ground truth and pixel-precise bounding boxes for every field. Free 50-document sample available; full library commercially licensable.
|
| 22 |
- π οΈ **Custom commissions** β bespoke document mixes, additional jurisdictions, or hospital-specific branding for teams with niche training needs.
|
| 23 |
- βοΈ **Generator licences** β run the synthesis pipeline yourself with source code and seeds for unlimited internal generation.
|
| 24 |
|
| 25 |
## Why synthetic
|
| 26 |
|
| 27 |
+
The bottleneck for medical document AI in Australia is training data. Real hospital PDFs are locked behind the Privacy Act. Generic synthetic medical text has no layout, no scans, no labels - useless for vision-language models like LayoutLMv3, Donut, or DocFormer. Public datasets like MIMIC are US-centric and increasingly restricted.
|
| 28 |
|
| 29 |
We sit in that gap: **visually realistic, jurisdiction-specific, fully labelled, zero PHI risk**.
|
| 30 |
|