Wachu2005 commited on
Commit
570b4ab
·
1 Parent(s): 846623f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -0
README.md CHANGED
@@ -11,6 +11,27 @@ This Natural Language Processing (NLP) model is made available under the Apache
11
  The model is optimized to analyze texts containing up to 512 tokens. If your text exceeds this limit, we recommend splitting it into smaller chunks, each containing no more than 512 tokens. Each chunk can then be processed separately.
12
  ## Supported Languages
13
  Bulgarian, Chinese, Czech, Dutch, English, Estonian, Finnish, French, German, Greek, Indonesian, Italian, Japanese, Korean, Lithuanian, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Spanish, Swedish, Turkish
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
  # Use Cases
16
  ## Data Privacy and Compliance
 
11
  The model is optimized to analyze texts containing up to 512 tokens. If your text exceeds this limit, we recommend splitting it into smaller chunks, each containing no more than 512 tokens. Each chunk can then be processed separately.
12
  ## Supported Languages
13
  Bulgarian, Chinese, Czech, Dutch, English, Estonian, Finnish, French, German, Greek, Indonesian, Italian, Japanese, Korean, Lithuanian, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Spanish, Swedish, Turkish
14
+ ## Example Usage
15
+
16
+ ```python
17
+ import torch
18
+ from transformers import AutoTokenizer, AutoModelForCausalLM
19
+ tokenizer = AutoTokenizer.from_pretrained("metricspace/DataPrivacyComplianceCheck-3B-V0.9")
20
+ model = AutoModelForCausalLM.from_pretrained("metricspace/DataPrivacyComplianceCheck-3B-V0.9", torch_dtype=torch.bfloat16)
21
+
22
+ prompt = f"Check for sensitive information: John, our patient, felt a throbbing headache and dizziness for two weeks. He was immediately..."
23
+ inputs = tokenizer(prompt, return_tensors='pt').to('cuda')
24
+
25
+ max_length = 512
26
+ inputs_length = inputs.input_ids.shape[1]
27
+ max_new_tokens_value = max_length - inputs_length
28
+
29
+ outputs = model.generate(inputs.input_ids, max_new_tokens=max_new_tokens_value, do_sample=False, top_k=50, top_p=0.98)
30
+
31
+ result = tokenizer.batch_decode(outputs, skip_special_tokens=False)[0]
32
+
33
+ print(result)
34
+ ```
35
 
36
  # Use Cases
37
  ## Data Privacy and Compliance