LiamKhoaLe commited on
Commit
fb6b1e8
·
1 Parent(s): becf438

Upd syntax

Browse files
Files changed (2) hide show
  1. README.md +8 -8
  2. vi/processing.py +1 -1
README.md CHANGED
@@ -1,17 +1,17 @@
1
  ---
2
- title: MedVietAI Processing
3
  emoji: ⚕️
4
  colorFrom: green
5
  colorTo: pink
6
  sdk: docker
7
  pinned: false
8
  license: apache-2.0
9
- short_description: Data processing with en-vi translation. Derived from 500k mi
10
  ---
11
 
12
  ## 🚀 Quick Access
13
 
14
- [HF Space](https://huggingface.co/spaces/MedVietAI/processing)
15
 
16
  [MedDialog-100k](https://huggingface.co/datasets/MedAI-COS30018/MedDialog-EN-100k)
17
 
@@ -38,12 +38,12 @@ short_description: Data processing with en-vi translation. Derived from 500k mi
38
  - **Response Validation**: Invalid response detection and retry logic (max 3 attempts)
39
  - **Quality Guards**: Length/semantic validation for backtranslation outputs
40
 
41
- ### 🇻🇳 Vietnamese Translation
42
  - **Complete Translation**: All text fields translated when Vietnamese mode is enabled
43
  - **Quality Validation**: Translation quality checks with fallback to original text
44
  - **SFT Format**: `instruction`, `input`, `output` fields translated
45
  - **RAG Format**: `question`, `answer`, `context` fields translated
46
- - **Sanitization**: Repetition reduction and whitespace normalization
47
 
48
  ### 📊 SFT Data Enrichment
49
  - **Multiple Answer Variants**: 2-3 different answers per question for better reasoning
@@ -125,7 +125,7 @@ The system tracks comprehensive statistics:
125
  ## 🔧 Usage
126
 
127
  ### Web Interface
128
- 1. Visit the [HF Space](https://huggingface.co/spaces/MedVietAI/processing)
129
  2. Select dataset and processing mode (SFT/RAG)
130
  3. Enable Vietnamese translation if needed
131
  4. Click process button
@@ -133,7 +133,7 @@ The system tracks comprehensive statistics:
133
  ### API Usage
134
  ```bash
135
  # SFT Processing with Vietnamese translation
136
- curl -X POST "https://huggingface.co/spaces/MedVietAI/processing/process/healthcaremagic" \
137
  -H "Content-Type: application/json" \
138
  -d '{
139
  "augment": {
@@ -149,7 +149,7 @@ curl -X POST "https://huggingface.co/spaces/MedVietAI/processing/process/healthc
149
  }'
150
 
151
  # RAG Processing
152
- curl -X POST "https://huggingface.co/spaces/MedVietAI/processing/rag/healthcaremagic" \
153
  -H "Content-Type: application/json" \
154
  -d '{
155
  "vietnamese_translation": true
 
1
  ---
2
+ title: Medical Processing
3
  emoji: ⚕️
4
  colorFrom: green
5
  colorTo: pink
6
  sdk: docker
7
  pinned: false
8
  license: apache-2.0
9
+ short_description: Data processing. Derived from 500k medical knowledge mix
10
  ---
11
 
12
  ## 🚀 Quick Access
13
 
14
+ [HF Space](https://huggingface.co/spaces/MedSwin/medai-processing)
15
 
16
  [MedDialog-100k](https://huggingface.co/datasets/MedAI-COS30018/MedDialog-EN-100k)
17
 
 
38
  - **Response Validation**: Invalid response detection and retry logic (max 3 attempts)
39
  - **Quality Guards**: Length/semantic validation for backtranslation outputs
40
 
41
+ <!-- ### 🇻🇳 Vietnamese Translation
42
  - **Complete Translation**: All text fields translated when Vietnamese mode is enabled
43
  - **Quality Validation**: Translation quality checks with fallback to original text
44
  - **SFT Format**: `instruction`, `input`, `output` fields translated
45
  - **RAG Format**: `question`, `answer`, `context` fields translated
46
+ - **Sanitization**: Repetition reduction and whitespace normalization -->
47
 
48
  ### 📊 SFT Data Enrichment
49
  - **Multiple Answer Variants**: 2-3 different answers per question for better reasoning
 
125
  ## 🔧 Usage
126
 
127
  ### Web Interface
128
+ 1. Visit the [HF Space](https://huggingface.co/spaces/MedSwin/medai-processing)
129
  2. Select dataset and processing mode (SFT/RAG)
130
  3. Enable Vietnamese translation if needed
131
  4. Click process button
 
133
  ### API Usage
134
  ```bash
135
  # SFT Processing with Vietnamese translation
136
+ curl -X POST "https://huggingface.co/spaces/MedSwin/medai-processing/process/healthcaremagic" \
137
  -H "Content-Type: application/json" \
138
  -d '{
139
  "augment": {
 
149
  }'
150
 
151
  # RAG Processing
152
+ curl -X POST "https://huggingface.co/spaces/MedSwin/medai-processing/rag/healthcaremagic" \
153
  -H "Content-Type: application/json" \
154
  -d '{
155
  "vietnamese_translation": true
vi/processing.py CHANGED
@@ -99,7 +99,7 @@ def _validate_vi_translation(original: str, translated: str) -> bool:
99
  # If no Vietnamese characters but significantly different from original, accept it
100
  # (some translations might not have Vietnamese diacritics)
101
  if len(translated) > len(original) * 0.5 and len(translated) < len(original) * 2.0:
102
- return True
103
 
104
  return False
105
 
 
99
  # If no Vietnamese characters but significantly different from original, accept it
100
  # (some translations might not have Vietnamese diacritics)
101
  if len(translated) > len(original) * 0.5 and len(translated) < len(original) * 2.0:
102
+ return True
103
 
104
  return False
105