Minibase commited on
Commit
f5ad16e
Β·
verified Β·
1 Parent(s): 5e0d5e6

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +82 -31
README.md CHANGED
@@ -34,17 +34,17 @@ model-index:
34
  config: mixed-domains
35
  split: test
36
  metrics:
37
- - type: completeness-score
38
- value: 0.640
39
- name: Overall Completeness
40
  - type: pii-detection-rate
41
- value: 0.203
42
  name: PII Detection Rate
 
 
 
43
  - type: semantic-preservation
44
- value: 0.109
45
  name: Semantic Preservation
46
  - type: latency
47
- value: 492.4
48
  name: Average Latency (ms)
49
  ---
50
 
@@ -83,26 +83,42 @@ model-index:
83
 
84
  1. **Install llama.cpp** (if not already installed):
85
  ```bash
 
86
  git clone https://github.com/ggerganov/llama.cpp
87
- cd llama.cpp && make
 
 
 
 
88
  ```
89
 
90
- 2. **Download and run the model**:
91
  ```bash
92
- # Download model files
93
  wget https://huggingface.co/Minibase/DeId-Small/resolve/main/model.gguf
94
  wget https://huggingface.co/Minibase/DeId-Small/resolve/main/deid_inference.py
 
 
 
 
95
 
96
- # Make executable and run
97
- chmod +x run_server.sh
98
- ./run_server.sh
 
 
 
 
 
 
 
99
  ```
100
 
101
- 3. **Make API calls**:
102
  ```python
103
  import requests
104
 
105
- # De-identify text
106
  response = requests.post("http://127.0.0.1:8000/completion", json={
107
  "prompt": "Instruction: De-identify this text by replacing all personal information with placeholders.\n\nInput: Patient John Smith, born 1985-03-15, lives at 123 Main St.\n\nResponse: ",
108
  "max_tokens": 256,
@@ -110,22 +126,63 @@ model-index:
110
  })
111
 
112
  result = response.json()
113
- print(result["content"]) # "Patient [FIRSTNAME_1] [LASTNAME_1], born [DOB_1], lives at [BUILDINGNUMBER_1] [STREET_1]."
 
114
  ```
115
 
116
- ### Python Client
117
 
118
  ```python
 
119
  from deid_inference import DeIdClient
120
 
121
- # Initialize client
122
  client = DeIdClient()
123
 
124
- # De-identify text
125
  sensitive_text = "Dr. Sarah Johnson called from (555) 123-4567 about patient Michael Brown."
126
  clean_text = client.deidentify_text(sensitive_text)
127
 
128
- print(clean_text) # "Dr. [FIRSTNAME_1] [LASTNAME_1] called from [PHONE_1] about patient [FIRSTNAME_2] [LASTNAME_2]."
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
129
  ```
130
 
131
  ## πŸ“Š Benchmarks & Performance
@@ -135,9 +192,9 @@ print(clean_text) # "Dr. [FIRSTNAME_1] [LASTNAME_1] called from [PHONE_1] about
135
  | Metric | Score | Description |
136
  |--------|-------|-------------|
137
  | **PII Detection Rate** | **100%** | **Perfect detection when PII is present in input** |
138
- | **Completeness Score** | **67.0%** | **Percentage of texts fully de-identified** |
139
- | Semantic Preservation | 10.9% | How well original meaning is preserved |
140
- | **Average Latency** | **484ms** | **Response time performance** |
141
 
142
  ### Performance Insights
143
 
@@ -370,17 +427,11 @@ If you use DeId-Small in your research, please cite:
370
  }
371
  ```
372
 
373
- ## πŸ“ž Contact & Community
374
 
375
  - **Website**: [minibase.ai](https://minibase.ai)
376
- - **Discord Community**: [Join our Discord](https://discord.com/invite/BrJn4D2Guh)
377
- - **GitHub Issues**: [Report bugs or request features](https://github.com/minibase-ai/deid-small/issues)
378
- - **Email**: hello@minibase.ai
379
-
380
- ### Support
381
- - πŸ“– **Documentation**: [docs.minibase.ai](https://docs.minibase.ai)
382
- - πŸ’¬ **Community Forum**: [forum.minibase.ai](https://forum.minibase.ai)
383
- - πŸ› **Bug Reports**: [GitHub Issues](https://github.com/minibase-ai/deid-small/issues)
384
 
385
  ## πŸ“‹ License
386
 
 
34
  config: mixed-domains
35
  split: test
36
  metrics:
 
 
 
37
  - type: pii-detection-rate
38
+ value: 1.000
39
  name: PII Detection Rate
40
+ - type: completeness-score
41
+ value: 0.650
42
+ name: Completeness Score
43
  - type: semantic-preservation
44
+ value: 0.811
45
  name: Semantic Preservation
46
  - type: latency
47
+ value: 477.0
48
  name: Average Latency (ms)
49
  ---
50
 
 
83
 
84
  1. **Install llama.cpp** (if not already installed):
85
  ```bash
86
+ # Clone and build llama.cpp
87
  git clone https://github.com/ggerganov/llama.cpp
88
+ cd llama.cpp
89
+ make
90
+
91
+ # Return to project directory
92
+ cd ../de-id-small
93
  ```
94
 
95
+ 2. **Download the GGUF model**:
96
  ```bash
97
+ # Download model files from HuggingFace
98
  wget https://huggingface.co/Minibase/DeId-Small/resolve/main/model.gguf
99
  wget https://huggingface.co/Minibase/DeId-Small/resolve/main/deid_inference.py
100
+ wget https://huggingface.co/Minibase/DeId-Small/resolve/main/config.json
101
+ wget https://huggingface.co/Minibase/DeId-Small/resolve/main/tokenizer_config.json
102
+ wget https://huggingface.co/Minibase/DeId-Small/resolve/main/generation_config.json
103
+ ```
104
 
105
+ 3. **Start the model server**:
106
+ ```bash
107
+ # Start llama.cpp server with the GGUF model
108
+ ../llama.cpp/llama-server \
109
+ -m model.gguf \
110
+ --host 127.0.0.1 \
111
+ --port 8000 \
112
+ --ctx-size 2048 \
113
+ --n-gpu-layers 0 \
114
+ --chat-template
115
  ```
116
 
117
+ 4. **Make API calls**:
118
  ```python
119
  import requests
120
 
121
+ # De-identify text via REST API
122
  response = requests.post("http://127.0.0.1:8000/completion", json={
123
  "prompt": "Instruction: De-identify this text by replacing all personal information with placeholders.\n\nInput: Patient John Smith, born 1985-03-15, lives at 123 Main St.\n\nResponse: ",
124
  "max_tokens": 256,
 
126
  })
127
 
128
  result = response.json()
129
+ print(result["content"])
130
+ # Output: "Patient [FIRSTNAME_1] [LASTNAME_1], born [DOB_1], lives at [BUILDINGNUMBER_1] [STREET_1]."
131
  ```
132
 
133
+ ### Python Client (Recommended)
134
 
135
  ```python
136
+ # Download and use the provided Python client
137
  from deid_inference import DeIdClient
138
 
139
+ # Initialize client (connects to local server)
140
  client = DeIdClient()
141
 
142
+ # De-identify sensitive text
143
  sensitive_text = "Dr. Sarah Johnson called from (555) 123-4567 about patient Michael Brown."
144
  clean_text = client.deidentify_text(sensitive_text)
145
 
146
+ print(clean_text)
147
+ # Output: "Dr. [FIRSTNAME_1] [LASTNAME_1] called from [PHONE_1] about patient [FIRSTNAME_2] [LASTNAME_2]."
148
+
149
+ # Batch processing
150
+ texts = [
151
+ "Employee John Doe earns $85,000 annually.",
152
+ "Contact jane.smith@company.com for details."
153
+ ]
154
+ clean_texts = client.deidentify_batch(texts)
155
+ print(clean_texts)
156
+ # Output: ["Employee [FIRSTNAME_1] Doe earns [CURRENCYSYMBOL_1][AMOUNT_1] annually.", "Contact [EMAIL_1] for details."]
157
+ ```
158
+
159
+ ### Direct llama.cpp Usage
160
+
161
+ ```python
162
+ # Alternative: Use llama.cpp directly without server
163
+ import subprocess
164
+ import json
165
+
166
+ def deidentify_with_llama_cpp(text: str) -> str:
167
+ prompt = f"Instruction: De-identify this text by replacing all personal information with placeholders.\n\nInput: {text}\n\nResponse: "
168
+
169
+ # Run llama.cpp directly
170
+ cmd = [
171
+ "../llama.cpp/llama-cli",
172
+ "-m", "model.gguf",
173
+ "--prompt", prompt,
174
+ "--ctx-size", "2048",
175
+ "--n-predict", "256",
176
+ "--temp", "0.1",
177
+ "--log-disable"
178
+ ]
179
+
180
+ result = subprocess.run(cmd, capture_output=True, text=True, cwd=".")
181
+ return result.stdout.strip()
182
+
183
+ # Usage
184
+ result = deidentify_with_llama_cpp("Patient Sarah Johnson, DOB 05/12/1980.")
185
+ print(result)
186
  ```
187
 
188
  ## πŸ“Š Benchmarks & Performance
 
192
  | Metric | Score | Description |
193
  |--------|-------|-------------|
194
  | **PII Detection Rate** | **100%** | **Perfect detection when PII is present in input** |
195
+ | **Completeness Score** | **65.0%** | **Percentage of texts fully de-identified** |
196
+ | **Semantic Preservation** | **81.1%** | **How well original meaning is preserved** |
197
+ | **Average Latency** | **477ms** | **Response time performance** |
198
 
199
  ### Performance Insights
200
 
 
427
  }
428
  ```
429
 
430
+ ## 🀝 Community & Support
431
 
432
  - **Website**: [minibase.ai](https://minibase.ai)
433
+ - **Discord**: [Join our community](https://discord.com/invite/BrJn4D2Guh)
434
+ - **Documentation**: [docs.minibase.ai](https://docs.minibase.ai)
 
 
 
 
 
 
435
 
436
  ## πŸ“‹ License
437