metricspace
/

DataPrivacyComplianceCheck-3B-V0.9

text2text-generation

text-generation-inference

Model card Files Files and versions

Wachu2005 commited on Aug 28, 2023

Commit

570b4ab

·

1 Parent(s): 846623f

Update README.md

Files changed (1) hide show

README.md +21 -0

README.md CHANGED Viewed

@@ -11,6 +11,27 @@ This Natural Language Processing (NLP) model is made available under the Apache
 The model is optimized to analyze texts containing up to 512 tokens. If your text exceeds this limit, we recommend splitting it into smaller chunks, each containing no more than 512 tokens. Each chunk can then be processed separately.
 ## Supported Languages
 Bulgarian, Chinese, Czech, Dutch, English, Estonian, Finnish, French, German, Greek, Indonesian, Italian, Japanese, Korean, Lithuanian, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Spanish, Swedish, Turkish
 # Use Cases
 ## Data Privacy and Compliance

 The model is optimized to analyze texts containing up to 512 tokens. If your text exceeds this limit, we recommend splitting it into smaller chunks, each containing no more than 512 tokens. Each chunk can then be processed separately.
 ## Supported Languages
 Bulgarian, Chinese, Czech, Dutch, English, Estonian, Finnish, French, German, Greek, Indonesian, Italian, Japanese, Korean, Lithuanian, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Spanish, Swedish, Turkish
+## Example Usage
+```python
+import torch
+from transformers import AutoTokenizer, AutoModelForCausalLM
+tokenizer = AutoTokenizer.from_pretrained("metricspace/DataPrivacyComplianceCheck-3B-V0.9")
+model = AutoModelForCausalLM.from_pretrained("metricspace/DataPrivacyComplianceCheck-3B-V0.9", torch_dtype=torch.bfloat16)
+prompt = f"Check for sensitive information: John, our patient, felt a throbbing headache and dizziness for two weeks. He was immediately..."
+inputs = tokenizer(prompt, return_tensors='pt').to('cuda')
+max_length = 512
+inputs_length = inputs.input_ids.shape[1]
+max_new_tokens_value = max_length - inputs_length
+outputs = model.generate(inputs.input_ids, max_new_tokens=max_new_tokens_value, do_sample=False, top_k=50, top_p=0.98)
+result = tokenizer.batch_decode(outputs, skip_special_tokens=False)[0]
+print(result)
+```
 # Use Cases
 ## Data Privacy and Compliance