Spaces:
Sleeping
Sleeping
File size: 3,935 Bytes
80cb919 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 |
# 📑 MedAI Processing – Request Examples
Base URL of the Space:
**`https://binkhoale1812-medai-processing.hf.space`**
This Space processes medical datasets into a centralised fine-tuning format (JSONL + CSV) with optional augmentations such as **paraphrasing**, **back-translation**, **style standardisation**, **de-identification**, and **deduplication**.
---
## 🔹 1. Process HealthCareMagic
```bash
curl -X POST \
-H "Content-Type: application/json" \
-d '{
"augment": {
"paraphrase_ratio": 0.1,
"backtranslate_ratio": 0.05,
"paraphrase_outputs": false,
"style_standardize": true,
"deidentify": true,
"dedupe": true,
"max_chars": 5000
},
"sample_limit": 2000,
"seed": 42
}' \
https://binkhoale1812-medai-processing.hf.space/process/healthcaremagic
````
---
## 🔹 2. Process iCliniq
```bash
curl -X POST \
-H "Content-Type: application/json" \
-d '{
"augment": {
"paraphrase_ratio": 0.2,
"backtranslate_ratio": 0.1,
"paraphrase_outputs": true,
"style_standardize": true,
"deidentify": true,
"dedupe": true,
"max_chars": 5000
},
"sample_limit": 1500,
"seed": 123
}' \
https://binkhoale1812-medai-processing.hf.space/process/icliniq
```
---
## 🔹 3. Process PubMedQA (Labelled)
```bash
curl -X POST \
-H "Content-Type: application/json" \
-d '{
"augment": {
"paraphrase_ratio": 0.05,
"backtranslate_ratio": 0.02,
"paraphrase_outputs": false,
"style_standardize": true,
"deidentify": false,
"dedupe": true,
"max_chars": 8000
},
"sample_limit": 1000,
"seed": 99
}' \
https://binkhoale1812-medai-processing.hf.space/process/pubmedqa_l
```
---
## 🔹 4. Process PubMedQA (Unlabelled)
```bash
curl -X POST \
-H "Content-Type: application/json" \
-d '{
"augment": {
"paraphrase_ratio": 0.05,
"backtranslate_ratio": 0.05,
"paraphrase_outputs": false,
"style_standardize": true,
"deidentify": true,
"dedupe": true,
"max_chars": 7000,
"consistency_check_ratio": 0.01,
"distill_fraction": 0.1
},
"sample_limit": 500,
"seed": 7
}' \
https://binkhoale1812-medai-processing.hf.space/process/pubmedqa_u
```
---
## 🔹 5. Process PubMedQA (Map)
```bash
curl -X POST \
-H "Content-Type: application/json" \
-d '{
"augment": {
"paraphrase_ratio": 0.1,
"backtranslate_ratio": 0.05,
"paraphrase_outputs": true,
"style_standardize": true,
"deidentify": true,
"dedupe": true,
"max_chars": 6000
},
"sample_limit": 1200,
"seed": 2024
}' \
https://binkhoale1812-medai-processing.hf.space/process/pubmedqa_map
```
---
## 🔹 6. Check Current Job Status
```bash
curl https://binkhoale1812-medai-processing.hf.space/status
```
---
## 🔹 7. List Generated Artifacts
```bash
curl https://binkhoale1812-medai-processing.hf.space/files
```
---
# ✅ Notes
* Each run outputs both `.jsonl` and `.csv` in `cache/outputs/` and also uploads them to Google Drive folder ID:
`1JvW7its63E58fLxurH8ZdhxzdpcMrMbt`
* `augment` options can be adjusted per dataset:
* `paraphrase_ratio` – % of rows paraphrased (0–1)
* `backtranslate_ratio` – % of rows back-translated
* `paraphrase_outputs` – whether to also augment model answers
* `style_standardize` – enforce neutral, clinical style
* `deidentify` – redact PHI (emails, phones, URLs, IPs)
* `dedupe` – skip duplicate pairs
* `consistency_check_ratio` – run lightweight QA sanity check
* `distill_fraction` – generate pseudo-labels for unlabelled data
|