HoangHa commited on
Commit
8f561e9
·
verified ·
1 Parent(s): 8dbf6f0

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +33 -1
README.md CHANGED
@@ -78,12 +78,15 @@ model-index:
78
 
79
  A **350M-parameter language model** fine-tuned for extracting Personally Identifiable Information (PII) from medical and general-domain text across **17 languages**. Built on [LFM2-350M](https://huggingface.co/LiquidAI/LFM2-350M) with a two-stage training pipeline: supervised fine-tuning (SFT) followed by Group Relative Policy Optimization (GRPO).
80
 
 
 
81
  ## Highlights
82
 
83
  - **17 languages**: English, Vietnamese, French, German, Spanish, Lao, Thai, Burmese, Indonesian, Filipino, Malay, Tamil, Portuguese, Russian, Chinese, Japanese, Korean
84
  - **7 PII entity types**: `address`, `company_name`, `date`, `email_address`, `human_name`, `id_number`, `phone_number`
85
- - **350M params** — runs on consumer GPUs, edge devices, and CPU inference
86
  - **Structured JSON output** — directly usable without post-processing
 
87
 
88
  ## Quick Start
89
 
@@ -134,6 +137,35 @@ output = llm.chat(messages, sampling_params=sampling)
134
  print(output[0].outputs[0].text)
135
  ```
136
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
137
  ## Model Details
138
 
139
  | | |
 
78
 
79
  A **350M-parameter language model** fine-tuned for extracting Personally Identifiable Information (PII) from medical and general-domain text across **17 languages**. Built on [LFM2-350M](https://huggingface.co/LiquidAI/LFM2-350M) with a two-stage training pipeline: supervised fine-tuning (SFT) followed by Group Relative Policy Optimization (GRPO).
80
 
81
+ **[Try it in your browser →](https://huggingface.co/spaces/Meddies/meddies-pii-extractor)** — no setup required, runs entirely client-side via WebGPU.
82
+
83
  ## Highlights
84
 
85
  - **17 languages**: English, Vietnamese, French, German, Spanish, Lao, Thai, Burmese, Indonesian, Filipino, Malay, Tamil, Portuguese, Russian, Chinese, Japanese, Korean
86
  - **7 PII entity types**: `address`, `company_name`, `date`, `email_address`, `human_name`, `id_number`, `phone_number`
87
+ - **350M params** — runs on consumer GPUs, edge devices, and [in the browser](https://huggingface.co/spaces/Meddies/meddies-pii-extractor)
88
  - **Structured JSON output** — directly usable without post-processing
89
+ - **ONNX available** — quantized exports (fp32/fp16/q4/q8) at [Meddies/meddies-pii-onnx](https://huggingface.co/Meddies/meddies-pii-onnx) for Transformers.js & ONNX Runtime
90
 
91
  ## Quick Start
92
 
 
137
  print(output[0].outputs[0].text)
138
  ```
139
 
140
+ ### Using Transformers.js (browser / Node.js)
141
+
142
+ ```javascript
143
+ import { pipeline } from "@huggingface/transformers";
144
+
145
+ const extractor = await pipeline("text-generation", "Meddies/meddies-pii-onnx", {
146
+ dtype: "q4",
147
+ device: "webgpu", // or "wasm" for broader compatibility
148
+ });
149
+
150
+ const messages = [
151
+ { role: "system", content: "Extract <address>, <company_name>, <email_address>, <human_name>, <phone_number>, <id_number>, <date>" },
152
+ { role: "user", content: "Patient John Smith, DOB 03/15/1985, contact: john.smith@email.com" },
153
+ ];
154
+
155
+ const output = await extractor(messages, { max_new_tokens: 512, do_sample: false });
156
+ console.log(output[0].generated_text.at(-1).content);
157
+ ```
158
+
159
+ ### Using ONNX Runtime (Python)
160
+
161
+ ```python
162
+ from optimum.onnxruntime import ORTModelForCausalLM
163
+ from transformers import AutoTokenizer
164
+
165
+ model = ORTModelForCausalLM.from_pretrained("Meddies/meddies-pii-onnx")
166
+ tokenizer = AutoTokenizer.from_pretrained("Meddies/meddies-pii-onnx")
167
+ ```
168
+
169
  ## Model Details
170
 
171
  | | |