File size: 4,540 Bytes
5726f0b
cce3085
5726f0b
cce3085
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5726f0b
 
 
cce3085
 
 
 
 
 
5726f0b
cce3085
5726f0b
cce3085
 
 
 
 
5726f0b
cce3085
5726f0b
cce3085
5726f0b
cce3085
 
 
 
 
 
 
 
 
 
 
5726f0b
cce3085
5726f0b
cce3085
5726f0b
cce3085
5726f0b
cce3085
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
---
library_name: gliner2
---
## Model Description

GLiNER2 extends the original GLiNER architecture to support multi-task information extraction with a schema-driven interface. This base model provides efficient CPU-based inference while maintaining high accuracy across diverse extraction tasks.

**Key Features:**
- Multi-task capability: NER, classification, and structured extraction
- Schema-driven interface with field types and constraints
- CPU-first design for fast inference without GPU requirements
- 100% local processing with zero external dependencies

## Installation

```bash
pip install gliner2
```

## Usage

### Entity Extraction

```python
from gliner2 import GLiNER2

# Load the model
extractor = GLiNER2.from_pretrained("fastino/gliner2-multi-v1")

# Extract entities
text = "Apple CEO Tim Cook announced iPhone 15 in Cupertino yesterday."
result = extractor.extract_entities(text, ["company", "person", "product", "location"])

print(result)
# Output: {'entities': {'company': ['Apple'], 'person': ['Tim Cook'], 'product': ['iPhone 15'], 'location': ['Cupertino']}}
```

### Text Classification

```python
# Single-label classification
result = extractor.classify_text(
    "This laptop has amazing performance but terrible battery life!",
    {"sentiment": ["positive", "negative", "neutral"]}
)
print(result)
# Output: {'sentiment': 'negative'}

# Multi-label classification
result = extractor.classify_text(
    "Great camera quality, decent performance, but poor battery life.",
    {
        "aspects": {
            "labels": ["camera", "performance", "battery", "display", "price"],
            "multi_label": True,
            "cls_threshold": 0.4
        }
    }
)
print(result)
# Output: {'aspects': ['camera', 'performance', 'battery']}
```

### Structured Data Extraction

```python
text = "iPhone 15 Pro Max with 256GB storage, A17 Pro chip, priced at $1199."

result = extractor.extract_json(
    text,
    {
        "product": [
            "name::str::Full product name and model",
            "storage::str::Storage capacity",
            "processor::str::Chip or processor information",
            "price::str::Product price with currency"
        ]
    }
)

print(result)
# Output: {
#     'product': [{
#         'name': 'iPhone 15 Pro Max',
#         'storage': '256GB',
#         'processor': 'A17 Pro chip',
#         'price': '$1199'
#     }]
# }
```

### Multi-Task Schema Composition

```python
# Combine all extraction types
schema = (extractor.create_schema()
    .entities({
        "person": "Names of people or individuals",
        "company": "Organization or business names",
        "product": "Products or services mentioned"
    })
    .classification("sentiment", ["positive", "negative", "neutral"])
    .structure("product_info")
        .field("name", dtype="str")
        .field("price", dtype="str")
        .field("features", dtype="list")
)

text = "Apple CEO Tim Cook unveiled the iPhone 15 Pro for $999."
results = extractor.extract(text, schema)

print(results)
# Output: {
#     'entities': {'person': ['Tim Cook'], 'company': ['Apple'], 'product': ['iPhone 15 Pro']},
#     'sentiment': 'positive',
#     'product_info': [{'name': 'iPhone 15 Pro', 'price': '$999', 'features': [...]}]
# }
```

## Model Details

- **Model Type:** Bidirectional Transformer Encoder (BERT-based)
- **Parameters:** 205M
- **Input:** Text sequences
- **Output:** Entities, classifications, and structured data
- **Architecture:** Based on GLiNER with multi-task extensions
- **Training Data:** Multi-domain datasets for NER, classification, and structured extraction

## Performance

This model is optimized for:
- Fast CPU inference (no GPU required)
- Low latency applications
- Resource-constrained environments
- Multi-task extraction scenarios

## Citation

If you use this model in your research, please cite:

```bibtex
@misc{zaratiana2025gliner2efficientmultitaskinformation,
      title={GLiNER2: An Efficient Multi-Task Information Extraction System with Schema-Driven Interface}, 
      author={Urchade Zaratiana and Gil Pasternak and Oliver Boyd and George Hurn-Maloney and Ash Lewis},
      year={2025},
      eprint={2507.18546},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2507.18546}, 
}
```

## License

This project is licensed under the Apache License 2.0.

## Links

- **Repository:** https://github.com/fastino-ai/GLiNER2
- **Paper:** https://arxiv.org/abs/2507.18546
- **Organization:** [Fastino AI](https://fastino.ai)