Ihor commited on
Commit
03cedc7
·
verified ·
1 Parent(s): e34b34e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +147 -3
README.md CHANGED
@@ -1,3 +1,147 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ```yaml
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - microsoft/deberta-v3-small
7
+ - HuggingFaceTB/SmolLM2-135M-Instruct
8
+ pipeline_tag: token-classification
9
+ tags:
10
+ - NER
11
+ - encoder
12
+ - decoder
13
+ - GLiNER
14
+ - information-extraction
15
+ ```
16
+ ![gliner-decoder](image.png)
17
+
18
+ **GLiNER** is a Named Entity Recognition (NER) model capable of identifying *any* entity type in a **zero-shot** manner.
19
+ This architecture combines:
20
+
21
+ * An **encoder** for representing entity spans
22
+ * A **decoder** for generating label names
23
+
24
+ This hybrid approach enables new use cases such as **entity linking** and expands GLiNER’s capabilities.
25
+ By integrating large modern decoders—trained on vast datasets—GLiNER can leverage their **richer knowledge capacity** while maintaining competitive inference speed.
26
+
27
+ ---
28
+
29
+ ## Key Features
30
+
31
+ * **Open ontology**: Works when the label set is unknown
32
+ * **Multi-label entity recognition**: Assign multiple labels to a single entity
33
+ * **Entity linking**: Handle large label sets via constrained generation
34
+ * **Knowledge expansion**: Gain from large decoder models
35
+ * **Efficient**: Minimal speed reduction on GPU compared to single-encoder GLiNER
36
+
37
+ ---
38
+
39
+ ## Installation
40
+
41
+ Update to the latest version of GLiNER:
42
+
43
+ ```bash
44
+ pip install -U gliner
45
+ ```
46
+
47
+ ---
48
+
49
+ ## Usage
50
+
51
+ ```python
52
+ from gliner import GLiNER
53
+
54
+ model = GLiNER.from_pretrained("gliner-decoder-small-v1.0")
55
+
56
+ text = (
57
+ "Apple was founded as Apple Computer Company on April 1, 1976, "
58
+ "by Steve Wozniak, Steve Jobs (1955–2011) and Ronald Wayne to "
59
+ "develop and sell Wozniak's Apple I personal computer."
60
+ )
61
+
62
+ labels = ["person", "other"]
63
+
64
+ model.run(text, labels, threshold=0.3, num_gen_sequences=1)
65
+ ```
66
+
67
+ ---
68
+
69
+ ### Example Output
70
+
71
+ ```json
72
+ [
73
+ [
74
+ {
75
+ "start": 21,
76
+ "end": 26,
77
+ "text": "Apple",
78
+ "label": "other",
79
+ "score": 0.6795641779899597,
80
+ "generated labels": ["Organization"]
81
+ },
82
+ {
83
+ "start": 47,
84
+ "end": 60,
85
+ "text": "April 1, 1976",
86
+ "label": "other",
87
+ "score": 0.44296327233314514,
88
+ "generated labels": ["Date"]
89
+ },
90
+ {
91
+ "start": 65,
92
+ "end": 78,
93
+ "text": "Steve Wozniak",
94
+ "label": "person",
95
+ "score": 0.9934439659118652,
96
+ "generated labels": ["Person"]
97
+ },
98
+ {
99
+ "start": 80,
100
+ "end": 90,
101
+ "text": "Steve Jobs",
102
+ "label": "person",
103
+ "score": 0.9725918769836426,
104
+ "generated labels": ["Person"]
105
+ },
106
+ {
107
+ "start": 107,
108
+ "end": 119,
109
+ "text": "Ronald Wayne",
110
+ "label": "person",
111
+ "score": 0.9964536428451538,
112
+ "generated labels": ["Person"]
113
+ }
114
+ ]
115
+ ]
116
+ ```
117
+
118
+ ---
119
+
120
+ ### Restricting the Decoder
121
+
122
+ You can limit the decoder to generate labels only from a predefined set:
123
+
124
+ ```python
125
+ model.run(
126
+ text, labels,
127
+ threshold=0.3,
128
+ num_gen_sequences=1,
129
+ gen_constraints=[
130
+ "organization", "organization type", "city",
131
+ "technology", "date", "person"
132
+ ]
133
+ )
134
+ ```
135
+
136
+ ---
137
+
138
+ ## Performance Tips
139
+
140
+ Two label trie implementations are available.
141
+ For a **faster, memory-efficient C++ version**, install **Cython**:
142
+
143
+ ```bash
144
+ pip install cython
145
+ ```
146
+
147
+ This can significantly improve performance and reduce memory usage, especially with millions of labels.