gliner2-large-v1 / README.md

add `gliner2` as the library name (#3)

bec53ee verified 18 days ago

6.33 kB

	---
	library_name: gliner2
	---
	## Model Description

	GLiNER2 extends the original GLiNER architecture to support multi-task information extraction with a schema-driven interface. This large model offers improved performance on challenging extraction tasks while maintaining efficient CPU-based inference.

	Key Features:
	- Multi-task capability: NER, classification, and structured extraction
	- Schema-driven interface with field types and constraints
	- Enhanced accuracy for complex and ambiguous extraction scenarios
	- CPU-first design for inference without GPU requirements
	- 100% local processing with zero external dependencies

	## Installation

	```bash
	pip install gliner2
	```

	## Usage

	### Entity Extraction

	```python
	from gliner2 import GLiNER2

	# Load the model
	extractor = GLiNER2.from_pretrained("fastino/gliner2-large-v1")

	# Extract entities with descriptions for higher precision
	text = "Patient received 400mg ibuprofen for severe headache at 2 PM."
	result = extractor.extract_entities(
	text,
	{
	"medication": "Names of drugs, medications, or pharmaceutical substances",
	"dosage": "Specific amounts like '400mg', '2 tablets', or '5ml'",
	"symptom": "Medical symptoms, conditions, or patient complaints",
	"time": "Time references like '2 PM', 'morning', or 'after lunch'"
	}
	)

	print(result)
	# Output: {'entities': {'medication': ['ibuprofen'], 'dosage': ['400mg'], 'symptom': ['severe headache'], 'time': ['2 PM']}}
	```

	### Text Classification

	```python
	# Single-label classification
	result = extractor.classify_text(
	"This laptop has amazing performance but terrible battery life!",
	{"sentiment": ["positive", "negative", "neutral"]}
	)
	print(result)
	# Output: {'sentiment': 'negative'}

	# Multi-label classification
	result = extractor.classify_text(
	"Great camera quality, decent performance, but poor battery life.",
	{
	"aspects": {
	"labels": ["camera", "performance", "battery", "display", "price"],
	"multi_label": True,
	"cls_threshold": 0.4
	}
	}
	)
	print(result)
	# Output: {'aspects': ['camera', 'performance', 'battery']}
	```

	### Structured Data Extraction

	```python
	# Financial document processing
	text = """
	Transaction Report: Goldman Sachs processed a $2.5M equity trade for Tesla Inc.
	on March 15, 2024. Commission: $1,250. Status: Completed.
	"""

	result = extractor.extract_json(
	text,
	{
	"transaction": [
	"broker::str::Financial institution or brokerage firm",
	"amount::str::Transaction amount with currency",
	"security::str::Stock, bond, or financial instrument",
	"date::str::Transaction date",
	"commission::str::Fees or commission charged",
	"status::str::Transaction status",
	"type::[equity\|bond\|option\|future\|forex]::str::Type of financial instrument"
	]
	}
	)

	print(result)
	# Output: {
	# 'transaction': [{
	# 'broker': 'Goldman Sachs',
	# 'amount': '$2.5M',
	# 'security': 'Tesla Inc.',
	# 'date': 'March 15, 2024',
	# 'commission': '$1,250',
	# 'status': 'Completed',
	# 'type': 'equity'
	# }]
	# }
	```

	### Multi-Task Schema Composition

	```python
	# Comprehensive legal contract analysis
	contract_text = """
	Service Agreement between TechCorp LLC and DataSystems Inc., effective January 1, 2024.
	Monthly fee: $15,000. Contract term: 24 months with automatic renewal.
	Termination clause: 30-day written notice required.
	"""

	schema = (extractor.create_schema()
	.entities(["company", "date", "duration", "fee"])
	.classification("contract_type", ["service", "employment", "nda", "partnership"])
	.structure("contract_terms")
	.field("parties", dtype="list")
	.field("effective_date", dtype="str")
	.field("monthly_fee", dtype="str")
	.field("term_length", dtype="str")
	.field("renewal", dtype="str", choices=["automatic", "manual", "none"])
	.field("termination_notice", dtype="str")
	)

	results = extractor.extract(contract_text, schema)

	print(results)
	# Output: {
	# 'entities': {
	# 'company': ['TechCorp LLC', 'DataSystems Inc.'],
	# 'date': ['January 1, 2024'],
	# 'duration': ['24 months'],
	# 'fee': ['$15,000']
	# },
	# 'contract_type': 'service',
	# 'contract_terms': [{
	# 'parties': ['TechCorp LLC', 'DataSystems Inc.'],
	# 'effective_date': 'January 1, 2024',
	# 'monthly_fee': '$15,000',
	# 'term_length': '24 months',
	# 'renewal': 'automatic',
	# 'termination_notice': '30-day written notice'
	# }]
	# }
	```

	## Model Details

	- Model Type: Bidirectional Transformer Encoder (BERT-based)
	- Parameters: 340M
	- Input: Text sequences
	- Output: Entities, classifications, and structured data
	- Architecture: Based on GLiNER with multi-task extensions (large variant)
	- Training Data: Multi-domain datasets for NER, classification, and structured extraction

	## Performance

	This large model provides:
	- Enhanced accuracy on complex extraction tasks
	- Better performance on ambiguous or difficult cases
	- Improved handling of specialized domains (medical, legal, financial)
	- Efficient CPU inference (GPU optional for faster processing)
	- Superior multi-task performance

	## Use Cases

	The large model excels in:
	- Medical information extraction
	- Legal document analysis
	- Financial document processing
	- Complex multi-entity scenarios
	- High-precision extraction requirements
	- Domain-specific applications

	## Citation

	If you use this model in your research, please cite:

	```bibtex
	@misc{zaratiana2025gliner2efficientmultitaskinformation,
	title={GLiNER2: An Efficient Multi-Task Information Extraction System with Schema-Driven Interface},
	author={Urchade Zaratiana and Gil Pasternak and Oliver Boyd and George Hurn-Maloney and Ash Lewis},
	year={2025},
	eprint={2507.18546},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2507.18546},
	}
	```

	## License

	This project is licensed under the Apache License 2.0.

	## Links

	- Repository: https://github.com/fastino-ai/GLiNER2
	- Paper: https://arxiv.org/abs/2507.18546
	- Organization: [Fastino AI](https://fastino.ai)