BaoLocTown commited on
Commit
e0bb6f9
·
1 Parent(s): ee40ff7

[ADD] files

Browse files
Files changed (3) hide show
  1. README.md +6 -6
  2. app.py +150 -0
  3. requirements.txt +2 -0
README.md CHANGED
@@ -1,13 +1,13 @@
1
  ---
2
- title: Gliner Vn
3
- emoji: 🦀
4
- colorFrom: blue
5
- colorTo: yellow
6
  sdk: gradio
7
- sdk_version: 5.49.1
8
  app_file: app.py
9
  pinned: false
10
  license: apache-2.0
11
  ---
12
 
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
1
  ---
2
+ title: GLiNER-Multi-PII
3
+ emoji: 💻
4
+ colorFrom: pink
5
+ colorTo: blue
6
  sdk: gradio
7
+ sdk_version: 4.20.1
8
  app_file: app.py
9
  pinned: false
10
  license: apache-2.0
11
  ---
12
 
13
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
app.py ADDED
@@ -0,0 +1,150 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import Dict, Union
2
+ from gliner import GLiNER
3
+ import gradio as gr
4
+
5
+ model = GLiNER.from_pretrained("BaoLocTown/gliner-vn-demo")
6
+
7
+ examples = [
8
+ [
9
+ "Công ty TNHH XYZ tuyển dụng vị trí **Nhân viên Kinh doanh** với yêu cầu tốt nghiệp đại học, kỹ năng giao tiếp tốt và đam mê xây dựng mối quan hệ khách hàng. Quyền lợi hấp dẫn, lương cạnh tranh, thưởng theo doanh số và cơ hội thăng tiến. Liên hệ: **0903 123 456** hoặc **[tuyendung@xyz.com](mailto:tuyendung@xyz.com)**.",
10
+ "person, company, phone, job title",
11
+ 0.5,
12
+ False,
13
+ ],
14
+ ]
15
+
16
+
17
+ def ner(
18
+ text, labels: str, threshold: float, nested_ner: bool
19
+ ) -> Dict[str, Union[str, int, float]]:
20
+ labels = labels.split(",")
21
+ return {
22
+ "text": text,
23
+ "entities": [
24
+ {
25
+ "entity": entity["label"],
26
+ "word": entity["text"],
27
+ "start": entity["start"],
28
+ "end": entity["end"],
29
+ "score": 0,
30
+ }
31
+ for entity in model.predict_entities(
32
+ text, labels, flat_ner=not nested_ner, threshold=threshold
33
+ )
34
+ ],
35
+ }
36
+
37
+
38
+ with gr.Blocks(title="GLiNER-M-v2.1") as demo:
39
+ gr.Markdown(
40
+ """
41
+ # GLiNER-PII (Personnally Identifiable Information extraction)
42
+
43
+ GLiNER is a Named Entity Recognition (NER) model capable of identifying any entity type using a bidirectional transformer encoder (BERT-like). It provides a practical alternative to traditional NER models, which are limited to predefined entities, and Large Language Models (LLMs) that, despite their flexibility, are costly and large for resource-constrained scenarios.
44
+
45
+ The model has been trained by fine-tuning urchade/gliner_multi-v2.1 on the urchade/synthetic-pii-ner-mistral-v1 dataset.
46
+
47
+ ## Links
48
+
49
+ * Model: https://huggingface.co/urchade/gliner_multi_pii-v1
50
+ * All GLiNER models: https://huggingface.co/models?library=gliner
51
+ * Paper: https://arxiv.org/abs/2311.08526
52
+ * Repository: https://github.com/urchade/GLiNER
53
+ """
54
+ )
55
+ with gr.Accordion("How to run this model locally", open=False):
56
+ gr.Markdown(
57
+ """
58
+ ## Installation
59
+ To use this model, you must install the GLiNER Python library:
60
+ ```
61
+ !pip install gliner
62
+ ```
63
+
64
+ ## Usage
65
+ Once you've downloaded the GLiNER library, you can import the GLiNER class. You can then load this model using `GLiNER.from_pretrained` and predict entities with `predict_entities`.
66
+ """
67
+ )
68
+ gr.Code(
69
+ '''
70
+ from gliner import GLiNER
71
+
72
+ model = GLiNER.from_pretrained("urchade/gliner_multi_pii-v1")
73
+
74
+ text = """
75
+ Harilala Rasoanaivo, un homme d'affaires local d'Antananarivo, a enregistré une nouvelle société nommée "Rasoanaivo Enterprises" au Lot II M 92 Antohomadinika. Son numéro est le +261 32 22 345 67, et son adresse électronique est harilala.rasoanaivo@telma.mg. Il a fourni son numéro de sécu 501-02-1234 pour l'enregistrement.
76
+ """
77
+
78
+ labels = ["work", "booking number", "personally identifiable information", "driver licence", "person", "book", "full address", "company", "actor", "character", "email", "passport number", "Social Security Number", "phone number"]
79
+ entities = model.predict_entities(text, labels)
80
+
81
+ for entity in entities:
82
+ print(entity["text"], "=>", entity["label"])
83
+ ''',
84
+ language="python",
85
+ )
86
+ gr.Code(
87
+ """
88
+ Harilala Rasoanaivo => person
89
+ Rasoanaivo Enterprises => company
90
+ Lot II M 92 Antohomadinika => full address
91
+ +261 32 22 345 67 => phone number
92
+ harilala.rasoanaivo@telma.mg => email
93
+ 501-02-1234 => Social Security Number
94
+ """
95
+ )
96
+
97
+ input_text = gr.Textbox(
98
+ value=examples[0][0], label="Text input", placeholder="Enter your text here"
99
+ )
100
+ with gr.Row() as row:
101
+ labels = gr.Textbox(
102
+ value=examples[0][1],
103
+ label="Labels",
104
+ placeholder="Enter your labels here (comma separated)",
105
+ scale=2,
106
+ )
107
+ threshold = gr.Slider(
108
+ 0,
109
+ 1,
110
+ value=0.3,
111
+ step=0.01,
112
+ label="Threshold",
113
+ info="Lower the threshold to increase how many entities get predicted.",
114
+ scale=1,
115
+ )
116
+ nested_ner = gr.Checkbox(
117
+ value=examples[0][2],
118
+ label="Nested NER",
119
+ info="Allow for nested NER?",
120
+ scale=0,
121
+ )
122
+ output = gr.HighlightedText(label="Predicted Entities")
123
+ submit_btn = gr.Button("Submit")
124
+ examples = gr.Examples(
125
+ examples,
126
+ fn=ner,
127
+ inputs=[input_text, labels, threshold, nested_ner],
128
+ outputs=output,
129
+ cache_examples=True,
130
+ )
131
+
132
+ # Submitting
133
+ input_text.submit(
134
+ fn=ner, inputs=[input_text, labels, threshold, nested_ner], outputs=output
135
+ )
136
+ labels.submit(
137
+ fn=ner, inputs=[input_text, labels, threshold, nested_ner], outputs=output
138
+ )
139
+ threshold.release(
140
+ fn=ner, inputs=[input_text, labels, threshold, nested_ner], outputs=output
141
+ )
142
+ submit_btn.click(
143
+ fn=ner, inputs=[input_text, labels, threshold, nested_ner], outputs=output
144
+ )
145
+ nested_ner.change(
146
+ fn=ner, inputs=[input_text, labels, threshold, nested_ner], outputs=output
147
+ )
148
+
149
+ demo.queue()
150
+ demo.launch(debug=True)
requirements.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ gliner
2
+ scipy==1.12