leannetanyt commited on
Commit
2a3ee0b
·
verified ·
1 Parent(s): 2e231e4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -5
README.md CHANGED
@@ -1,5 +1,70 @@
1
- ---
2
- license: other
3
- license_name: govtech-singapore
4
- license_link: LICENSE
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: govtech-singapore
4
+ license_link: LICENSE
5
+ language:
6
+ - en
7
+ - ms
8
+ - ta
9
+ - zh
10
+ pipeline_tag: text-classification
11
+ tags:
12
+ - classifier
13
+ - safety
14
+ - moderation
15
+ - multilingual
16
+ ---
17
+
18
+ # LionGuard 2.1
19
+ LionGuard 2.1 is a multilingual content moderation classifier tuned for English/Singlish, Chinese, Malay, and Tamil in the Singapore context.
20
+
21
+ It leverages Gemini's `gemini-embedding-001` with a multi-head classifier to return fine-grained scores for the following categories:
22
+ - Overall safety (`binary`)
23
+ - Hate (`hateful_l1`, `hateful_l2`)
24
+ - Insults (`insults`)
25
+ - Sexual content (`sexual_l1`, `sexual_l2`)
26
+ - Physical violence (`physical_violence`)
27
+ - Self-harm (`self_harm_l1`, `self_harm_l2`)
28
+ - Other misconduct (`all_other_misconduct_l1`, `all_other_misconduct_l2`)
29
+
30
+ ---
31
+
32
+ # Taxonomy
33
+
34
+ | S/N | Category | Level | Description | Example |
35
+ |-----|-----------------------|-----------------------------------------|-----------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
36
+ | **1** | Hate | **1 (Discriminatory)** | Derogatory or generalized negative statements targeting a protected group. | “All Chinese people are so scheming.”<br>“Malays are always late and lazy.”<br>“Indians are too noisy; I avoid them whenever possible.” |
37
+ | | | **2 (Hate Speech)** | Explicit calls for harm or violence against a protected group. | “We should kick all Chinese people out of the country.”<br>“Burn down their temple…”<br>“Kill the Indians in Little India; they don’t belong here.” |
38
+ | **2** | Insults | *no sub-levels* | Demeans, humiliates, or mocks without referencing a protected trait. | “You’re so dumb, you probably failed your PSLE.”<br>“Your singing is worse than karaoke at a dodgy KTV.” |
39
+ | **3** | Sexual | **1 (Not appropriate for minors)** | Mild-to-moderate sexual content. | “Let’s talk about how condoms work.” <br>“We had a one-night stand last weekend.” |
40
+ | | | **2 (Not appropriate for all ages)** | Explicit or graphic sexual content. | “Always think they very yandao … but actually is just one ** bin.” |
41
+ | **4** | Physical Violence | *no sub-levels* | Glorification of violence or threats of physical harm. | “If I see him at Clarke Quay, I’m going to whack him.”<br>“Someone should plant a bomb at that office…” |
42
+ | **5** | Self-Harm | **1 (Ideation)** | Expressions of suicidal thoughts or encouragement of self-harm. | “I’m so stressed … I feel like ending it all.”<br>“Failing my poly exams made me want to cut myself.” |
43
+ | | | **2 (Action / Suicide)** | Descriptions of ongoing or imminent self-harm behavior. | “I’ve locked myself in my room and taken a bunch of pills.”<br>“I’m on the rooftop at Toa Payoh, ready to jump.” |
44
+ | **6** | All Other Misconduct | **1 (Generally not socially accepted)**| Unethical or immoral behavior not necessarily illegal. | “Let’s spread fake rumours about her …”<br>“How to secretly record someone’s private conversation?” |
45
+ | | | **2 (Illegal activities)** | Instructions or credible threats of serious harm; facilitation of crimes. | “Anyone know where to buy illegal knives in Geylang?”<br>“Let’s hack that e-commerce site to get credit card details.” |
46
+
47
+ ---
48
+
49
+ # Usage
50
+
51
+ ```python
52
+ import os
53
+ import numpy as np
54
+ from transformers import AutoModel
55
+ from google import genai
56
+
57
+ # Load model directly from HF
58
+ model = AutoModel.from_pretrained("govtech/lionguard-2.1", trust_remote_code=True)
59
+
60
+ # Get embeddings (users to input their own Gemini API key)
61
+ client = genai.Client(api_key=os.getenv("GEMINI_API_KEY"))
62
+ result = client.models.embed_content(
63
+ model="gemini-embedding-001",
64
+ contents=texts
65
+ )
66
+ embeddings = np.array([emb.values for emb in result.embeddings])
67
+
68
+ # Run inference
69
+ results = model.predict(embeddings)
70
+ ```