Instructions to use antflydb/gliner2-base-v1-q4_k with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- GLiNER2
How to use antflydb/gliner2-base-v1-q4_k with GLiNER2:
from gliner2 import GLiNER2 model = GLiNER2.from_pretrained("antflydb/gliner2-base-v1-q4_k") # Extract entities text = "Apple CEO Tim Cook announced iPhone 15 in Cupertino yesterday." result = extractor.extract_entities(text, ["company", "person", "product", "location"]) print(result) - llama-cpp-python
How to use antflydb/gliner2-base-v1-q4_k with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="antflydb/gliner2-base-v1-q4_k", filename="encoder.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use antflydb/gliner2-base-v1-q4_k with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf antflydb/gliner2-base-v1-q4_k # Run inference directly in the terminal: llama-cli -hf antflydb/gliner2-base-v1-q4_k
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf antflydb/gliner2-base-v1-q4_k # Run inference directly in the terminal: llama-cli -hf antflydb/gliner2-base-v1-q4_k
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf antflydb/gliner2-base-v1-q4_k # Run inference directly in the terminal: ./llama-cli -hf antflydb/gliner2-base-v1-q4_k
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf antflydb/gliner2-base-v1-q4_k # Run inference directly in the terminal: ./build/bin/llama-cli -hf antflydb/gliner2-base-v1-q4_k
Use Docker
docker model run hf.co/antflydb/gliner2-base-v1-q4_k
- LM Studio
- Jan
- Ollama
How to use antflydb/gliner2-base-v1-q4_k with Ollama:
ollama run hf.co/antflydb/gliner2-base-v1-q4_k
- Unsloth Studio new
How to use antflydb/gliner2-base-v1-q4_k with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for antflydb/gliner2-base-v1-q4_k to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for antflydb/gliner2-base-v1-q4_k to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for antflydb/gliner2-base-v1-q4_k to start chatting
- Docker Model Runner
How to use antflydb/gliner2-base-v1-q4_k with Docker Model Runner:
docker model run hf.co/antflydb/gliner2-base-v1-q4_k
- Lemonade
How to use antflydb/gliner2-base-v1-q4_k with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull antflydb/gliner2-base-v1-q4_k
Run and chat with the model
lemonade run user.gliner2-base-v1-q4_k-{{QUANT_TAG}}List all available models
lemonade list
GLiNER2 Base v1 Q4_K - Zero-Shot Entity Recognition
GLiNER2 Base v1 Q4_K is a Termite split GGUF export of fastino/gliner2-base-v1 for zero-shot named entity recognition and relation-oriented extraction workflows.
Built by antflydb for use with Termite, a standalone ML inference service for embeddings, chunking, reranking, and recognition.
Architecture
Text + labels -> DeBERTa encoder GGUF -> GLiNER2 span head GGUF -> labeled spans
- Encoder: DeBERTa-style encoder from
fastino/gliner2-base-v1, exported asencoder.gguf. - Head: GLiNER2 span/head sidecar exported as
gliner_head.gguf. - Quantization: eligible encoder, embedding, relative-position, and head tensors are stored as
Q4_K; small normalization/bias tensors remain dense. - Bundle format:
termite_bundle.jsonmarks this as agliner2_split_bundle/v1.
Intended Uses
- Zero-shot named entity recognition with caller-provided labels
- Entity extraction for Antfly indexes and document pipelines
- Lightweight local recognition experiments through Termite
- Relation extraction workflows that build on GLiNER2 entity spans
How to Use with Termite
termite recognize ./gliner2-base-v1-q4_k \
"John works at Google in California." \
--label person \
--label organization \
--label location \
--backend native \
--graph-runtime partitioned
Example output from local validation:
{
"entities": [[
{"text": "John", "label": "person", "score": 0.99997926},
{"text": "Google", "label": "organization", "score": 0.9999995},
{"text": "California.", "label": "location", "score": 0.9748632}
]]
}
Export Details
This bundle was created from the local Termite model cache:
termite export /Users/timkaye/.termite/models/recognizers/fastino/gliner2-base-v1 \
--target gguf \
--format q4_k \
--output /private/tmp/gliner2-q4k-export-full-bundle/encoder.gguf
This export uses the full q4_k pass, including quantized token embeddings, relative-position embeddings, and the GLiNER head position embedding. Termite native recognition support for rank-3 quantized embedding tables is required for this bundle.
GGUF Files
| File | Description | Local size |
|---|---|---|
encoder.gguf |
DeBERTa encoder with 74 Q4_K tensors, including token and relative-position embeddings |
~102 MB |
gliner_head.gguf |
GLiNER2 span/head sidecar with 12 Q4_K tensors, including count_embed.pos_embedding.weight |
~27 MB |
termite_bundle.json |
Termite split-bundle marker | <1 KB |
Additional files include tokenizer.json, tokenizer_config.json, special_tokens_map.json, added_tokens.json, config.json, gliner_config.json, and model_manifest.json.
Validation
The exported bundle was inspected and run locally with Termite:
termite smoke /private/tmp/gliner2-q4k-export-full-bundle test --inspect-only
Inspection found a parseable DeBERTa GGUF with no unsupported tensor types and no missing required tensors.
termite recognize /private/tmp/gliner2-q4k-export-full-bundle \
"John works at Google in California." \
--label person \
--label organization \
--label location \
--backend native \
--graph-runtime partitioned
Limitations
- This is a Termite split GGUF bundle, not a generic Transformers checkpoint.
- The current package is intended for Termite native inference. Metal support depends on a Termite build with GGUF quantized embedding lookup support.
- Small tensors such as normalization weights and biases remain dense where Q4_K is not appropriate.
- Accuracy and label behavior inherit the limitations of
fastino/gliner2-base-v1and the caller-provided label set.
Citation
If you use this bundle, cite the upstream GLiNER2 model and the underlying DeBERTa backbone as appropriate for your work.
- Downloads last month
- 264
We're not able to determine the quantization variants.
Model tree for antflydb/gliner2-base-v1-q4_k
Base model
fastino/gliner2-base-v1