Spaces:
Sleeping
Sleeping
Commit
·
fb6b1e8
1
Parent(s):
becf438
Upd syntax
Browse files- README.md +8 -8
- vi/processing.py +1 -1
README.md
CHANGED
|
@@ -1,17 +1,17 @@
|
|
| 1 |
---
|
| 2 |
-
title:
|
| 3 |
emoji: ⚕️
|
| 4 |
colorFrom: green
|
| 5 |
colorTo: pink
|
| 6 |
sdk: docker
|
| 7 |
pinned: false
|
| 8 |
license: apache-2.0
|
| 9 |
-
short_description: Data processing
|
| 10 |
---
|
| 11 |
|
| 12 |
## 🚀 Quick Access
|
| 13 |
|
| 14 |
-
[HF Space](https://huggingface.co/spaces/
|
| 15 |
|
| 16 |
[MedDialog-100k](https://huggingface.co/datasets/MedAI-COS30018/MedDialog-EN-100k)
|
| 17 |
|
|
@@ -38,12 +38,12 @@ short_description: Data processing with en-vi translation. Derived from 500k mi
|
|
| 38 |
- **Response Validation**: Invalid response detection and retry logic (max 3 attempts)
|
| 39 |
- **Quality Guards**: Length/semantic validation for backtranslation outputs
|
| 40 |
|
| 41 |
-
### 🇻🇳 Vietnamese Translation
|
| 42 |
- **Complete Translation**: All text fields translated when Vietnamese mode is enabled
|
| 43 |
- **Quality Validation**: Translation quality checks with fallback to original text
|
| 44 |
- **SFT Format**: `instruction`, `input`, `output` fields translated
|
| 45 |
- **RAG Format**: `question`, `answer`, `context` fields translated
|
| 46 |
-
- **Sanitization**: Repetition reduction and whitespace normalization
|
| 47 |
|
| 48 |
### 📊 SFT Data Enrichment
|
| 49 |
- **Multiple Answer Variants**: 2-3 different answers per question for better reasoning
|
|
@@ -125,7 +125,7 @@ The system tracks comprehensive statistics:
|
|
| 125 |
## 🔧 Usage
|
| 126 |
|
| 127 |
### Web Interface
|
| 128 |
-
1. Visit the [HF Space](https://huggingface.co/spaces/
|
| 129 |
2. Select dataset and processing mode (SFT/RAG)
|
| 130 |
3. Enable Vietnamese translation if needed
|
| 131 |
4. Click process button
|
|
@@ -133,7 +133,7 @@ The system tracks comprehensive statistics:
|
|
| 133 |
### API Usage
|
| 134 |
```bash
|
| 135 |
# SFT Processing with Vietnamese translation
|
| 136 |
-
curl -X POST "https://huggingface.co/spaces/
|
| 137 |
-H "Content-Type: application/json" \
|
| 138 |
-d '{
|
| 139 |
"augment": {
|
|
@@ -149,7 +149,7 @@ curl -X POST "https://huggingface.co/spaces/MedVietAI/processing/process/healthc
|
|
| 149 |
}'
|
| 150 |
|
| 151 |
# RAG Processing
|
| 152 |
-
curl -X POST "https://huggingface.co/spaces/
|
| 153 |
-H "Content-Type: application/json" \
|
| 154 |
-d '{
|
| 155 |
"vietnamese_translation": true
|
|
|
|
| 1 |
---
|
| 2 |
+
title: Medical Processing
|
| 3 |
emoji: ⚕️
|
| 4 |
colorFrom: green
|
| 5 |
colorTo: pink
|
| 6 |
sdk: docker
|
| 7 |
pinned: false
|
| 8 |
license: apache-2.0
|
| 9 |
+
short_description: Data processing. Derived from 500k medical knowledge mix
|
| 10 |
---
|
| 11 |
|
| 12 |
## 🚀 Quick Access
|
| 13 |
|
| 14 |
+
[HF Space](https://huggingface.co/spaces/MedSwin/medai-processing)
|
| 15 |
|
| 16 |
[MedDialog-100k](https://huggingface.co/datasets/MedAI-COS30018/MedDialog-EN-100k)
|
| 17 |
|
|
|
|
| 38 |
- **Response Validation**: Invalid response detection and retry logic (max 3 attempts)
|
| 39 |
- **Quality Guards**: Length/semantic validation for backtranslation outputs
|
| 40 |
|
| 41 |
+
<!-- ### 🇻🇳 Vietnamese Translation
|
| 42 |
- **Complete Translation**: All text fields translated when Vietnamese mode is enabled
|
| 43 |
- **Quality Validation**: Translation quality checks with fallback to original text
|
| 44 |
- **SFT Format**: `instruction`, `input`, `output` fields translated
|
| 45 |
- **RAG Format**: `question`, `answer`, `context` fields translated
|
| 46 |
+
- **Sanitization**: Repetition reduction and whitespace normalization -->
|
| 47 |
|
| 48 |
### 📊 SFT Data Enrichment
|
| 49 |
- **Multiple Answer Variants**: 2-3 different answers per question for better reasoning
|
|
|
|
| 125 |
## 🔧 Usage
|
| 126 |
|
| 127 |
### Web Interface
|
| 128 |
+
1. Visit the [HF Space](https://huggingface.co/spaces/MedSwin/medai-processing)
|
| 129 |
2. Select dataset and processing mode (SFT/RAG)
|
| 130 |
3. Enable Vietnamese translation if needed
|
| 131 |
4. Click process button
|
|
|
|
| 133 |
### API Usage
|
| 134 |
```bash
|
| 135 |
# SFT Processing with Vietnamese translation
|
| 136 |
+
curl -X POST "https://huggingface.co/spaces/MedSwin/medai-processing/process/healthcaremagic" \
|
| 137 |
-H "Content-Type: application/json" \
|
| 138 |
-d '{
|
| 139 |
"augment": {
|
|
|
|
| 149 |
}'
|
| 150 |
|
| 151 |
# RAG Processing
|
| 152 |
+
curl -X POST "https://huggingface.co/spaces/MedSwin/medai-processing/rag/healthcaremagic" \
|
| 153 |
-H "Content-Type: application/json" \
|
| 154 |
-d '{
|
| 155 |
"vietnamese_translation": true
|
vi/processing.py
CHANGED
|
@@ -99,7 +99,7 @@ def _validate_vi_translation(original: str, translated: str) -> bool:
|
|
| 99 |
# If no Vietnamese characters but significantly different from original, accept it
|
| 100 |
# (some translations might not have Vietnamese diacritics)
|
| 101 |
if len(translated) > len(original) * 0.5 and len(translated) < len(original) * 2.0:
|
| 102 |
-
|
| 103 |
|
| 104 |
return False
|
| 105 |
|
|
|
|
| 99 |
# If no Vietnamese characters but significantly different from original, accept it
|
| 100 |
# (some translations might not have Vietnamese diacritics)
|
| 101 |
if len(translated) > len(original) * 0.5 and len(translated) < len(original) * 2.0:
|
| 102 |
+
return True
|
| 103 |
|
| 104 |
return False
|
| 105 |
|