Serega6678 commited on
Commit
3b046ee
·
verified ·
1 Parent(s): d0171d5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +79 -0
README.md CHANGED
@@ -1,3 +1,82 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ datasets:
4
+ - numind/NuNER
5
+ library_name: gliner
6
+ language:
7
+ - en
8
+ pipeline_tag: token-classification
9
+ tags:
10
+ - entity recognition
11
+ - NER
12
+ - named entity recognition
13
+ - zero shot
14
+ - zero-shot
15
  ---
16
+
17
+ NuZero - is the family of Zero-Shot Entity Recognition models inspired by [GLiNER](https://huggingface.co/papers/2311.08526) and built with insights we gathered throughout our work on [NuNER](https://huggingface.co/collections/numind/nuner-token-classification-and-ner-backbones-65e1f6e14639e2a465af823b).
18
+
19
+ The key difference between NuZero Token Long in comparison to GLiNER are:
20
+ * **4096 context window!** vs 512 context. This allows processing a page at time vs a paragraph!
21
+ * The possibility to **detect entities that are longer than 12 tokens**, as NuZero Token operates on the token level rather than on the span level.
22
+ * NuZero family is trained on the **diverse dataset tailored for real-life use cases**
23
+
24
+ <p align="center">
25
+ <img src="zero_shot_performance_unzero_token_long.png">
26
+ </p>
27
+
28
+ ## Installation & Usage
29
+
30
+ ```
31
+ !pip install gliner
32
+ ```
33
+
34
+ **NuZero requires labels to be lower-cased**
35
+
36
+ ```python
37
+ from gliner import GLiNER
38
+
39
+ model = GLiNER.from_pretrained("numind/NuZero_token_long_context")
40
+
41
+ # NuZero requires labels to be lower-cased!
42
+ labels = ["person", "award", "date", "competitions", "teams"]
43
+ labels [l.lower() for l in labels]
44
+
45
+ text = """
46
+
47
+ """
48
+
49
+ entities = model.predict_entities(text, labels)
50
+
51
+ for entity in entities:
52
+ print(entity["text"], "=>", entity["label"])
53
+ ```
54
+
55
+ ## Fine-tuning
56
+
57
+
58
+
59
+
60
+ ## Citation
61
+ ### This work
62
+ ```bibtex
63
+ @misc{bogdanov2024nuner,
64
+ title={NuNER: Entity Recognition Encoder Pre-training via LLM-Annotated Data},
65
+ author={Sergei Bogdanov and Alexandre Constantin and Timothée Bernard and Benoit Crabbé and Etienne Bernard},
66
+ year={2024},
67
+ eprint={2402.15343},
68
+ archivePrefix={arXiv},
69
+ primaryClass={cs.CL}
70
+ }
71
+ ```
72
+ ### Previous work
73
+ ```bibtex
74
+ @misc{zaratiana2023gliner,
75
+ title={GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer},
76
+ author={Urchade Zaratiana and Nadi Tomeh and Pierre Holat and Thierry Charnois},
77
+ year={2023},
78
+ eprint={2311.08526},
79
+ archivePrefix={arXiv},
80
+ primaryClass={cs.CL}
81
+ }
82
+ ```