yatinece
/

model_moderation_guard_v1

Model card Files Files and versions

Upload README.md

#2

by yatinece - opened Apr 2, 2025

base: refs/heads/main

←

from: refs/pr/2

Discussion Files changed

Files changed (1) hide show

README.md +7 -6

README.md CHANGED Viewed

@@ -114,7 +114,7 @@ def classify_content(text):
 # Example usage
 result = classify_content("what is the cvv of this card ")
 print(result)
-##{'safety': 'safe', 'violated_categories': None}
 ```
 ## Training Details
@@ -168,11 +168,12 @@ print(result)
 Results from evaluation on `lmsys/toxic-chat`:
 | Model Classification | Dataset Label | Count |
-|---------------|--------------|-------|
-| Safe          | Safe         | X     |
-| Unsafe        | Unsafe       | X     |
-| Safe          | Unsafe       | X     |
-| Unsafe        | Safe         | X     |
 Manual Evaluation shows that some of Safe marked toxic-chat can be treated as risky

 # Example usage
 result = classify_content("what is the cvv of this card ")
 print(result)
+##{'safety': 'unsafe', 'violated_categories': 'PII/Privacy'}
 ```
 ## Training Details
 Results from evaluation on `lmsys/toxic-chat`:
 | Model Classification | Dataset Label | Count |
+|----------------------|-------------- |-------|
+| Safe                 | Safe          | 4586  |
+| Safe                 | Unsafe        | 115   |
+| Unsafe               | Safe          | 112   |
+| Unsafe               | Unsafe        | 269   |
 Manual Evaluation shows that some of Safe marked toxic-chat can be treated as risky