GhostCanary commited on
Commit
ff4ec16
Β·
verified Β·
1 Parent(s): e141c90

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -14,17 +14,17 @@ short_description: Synthetic medical training data for Australian healthcare AI
14
 
15
  **Synthetic medical training data for Australian healthcare AI.**
16
 
17
- We build PHI-free synthetic document libraries that look and behave like real clinical PDFs β€” so teams can train OCR, layout-aware extraction, and clinical NLP models without waiting 18 months for ethics approval.
18
 
19
  ## What we ship
20
 
21
- - πŸ‡¦πŸ‡Ί **[Synthetic Australian Medical Documents](https://huggingface.co/datasets/RootCauseAnalytics/synthetic-australian-medical-documents-sample)** β€” 5,000 PDFs across 45 document types, modelled on NSW Health practice. Pre-labelled with structured ground truth and pixel-precise bounding boxes for every field. Free 50-document sample available; full library commercially licensable.
22
  - πŸ› οΈ **Custom commissions** β€” bespoke document mixes, additional jurisdictions, or hospital-specific branding for teams with niche training needs.
23
  - βš™οΈ **Generator licences** β€” run the synthesis pipeline yourself with source code and seeds for unlimited internal generation.
24
 
25
  ## Why synthetic
26
 
27
- The bottleneck for medical document AI in Australia is training data. Real hospital PDFs are locked behind the Privacy Act. Generic synthetic medical text has no layout, no scans, no labels β€” useless for vision-language models like LayoutLMv3, Donut, or DocFormer. Public datasets like MIMIC are US-centric and increasingly restricted.
28
 
29
  We sit in that gap: **visually realistic, jurisdiction-specific, fully labelled, zero PHI risk**.
30
 
 
14
 
15
  **Synthetic medical training data for Australian healthcare AI.**
16
 
17
+ We build PHI-free synthetic document libraries that look and behave like real clinical PDFs - so teams can train OCR, layout-aware extraction, and clinical NLP models without waiting 18 months for ethics approval.
18
 
19
  ## What we ship
20
 
21
+ - πŸ‡¦πŸ‡Ί **[Synthetic Australian Medical Documents](https://huggingface.co/datasets/RootCauseAnalytics/synthetic-australian-medical-documents-sample)** - 5,000 PDFs across 45 document types, modelled on NSW Health practice. Pre-labelled with structured ground truth and pixel-precise bounding boxes for every field. Free 50-document sample available; full library commercially licensable.
22
  - πŸ› οΈ **Custom commissions** β€” bespoke document mixes, additional jurisdictions, or hospital-specific branding for teams with niche training needs.
23
  - βš™οΈ **Generator licences** β€” run the synthesis pipeline yourself with source code and seeds for unlimited internal generation.
24
 
25
  ## Why synthetic
26
 
27
+ The bottleneck for medical document AI in Australia is training data. Real hospital PDFs are locked behind the Privacy Act. Generic synthetic medical text has no layout, no scans, no labels - useless for vision-language models like LayoutLMv3, Donut, or DocFormer. Public datasets like MIMIC are US-centric and increasingly restricted.
28
 
29
  We sit in that gap: **visually realistic, jurisdiction-specific, fully labelled, zero PHI risk**.
30