Instructions to use antflydb/gliner2-base-v1-q4_k with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use antflydb/gliner2-base-v1-q4_k with GLiNER2:

from gliner2 import GLiNER2

model = GLiNER2.from_pretrained("antflydb/gliner2-base-v1-q4_k")

# Extract entities
text = "Apple CEO Tim Cook announced iPhone 15 in Cupertino yesterday."
result = extractor.extract_entities(text, ["company", "person", "product", "location"])

print(result)

llama-cpp-python

How to use antflydb/gliner2-base-v1-q4_k with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="antflydb/gliner2-base-v1-q4_k",
	filename="encoder.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use antflydb/gliner2-base-v1-q4_k with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf antflydb/gliner2-base-v1-q4_k
# Run inference directly in the terminal:
llama-cli -hf antflydb/gliner2-base-v1-q4_k

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf antflydb/gliner2-base-v1-q4_k
# Run inference directly in the terminal:
llama-cli -hf antflydb/gliner2-base-v1-q4_k

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf antflydb/gliner2-base-v1-q4_k
# Run inference directly in the terminal:
./llama-cli -hf antflydb/gliner2-base-v1-q4_k

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf antflydb/gliner2-base-v1-q4_k
# Run inference directly in the terminal:
./build/bin/llama-cli -hf antflydb/gliner2-base-v1-q4_k

Use Docker

docker model run hf.co/antflydb/gliner2-base-v1-q4_k

LM Studio
Jan
Ollama
How to use antflydb/gliner2-base-v1-q4_k with Ollama:
```
ollama run hf.co/antflydb/gliner2-base-v1-q4_k
```

Unsloth Studio new

How to use antflydb/gliner2-base-v1-q4_k with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for antflydb/gliner2-base-v1-q4_k to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for antflydb/gliner2-base-v1-q4_k to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for antflydb/gliner2-base-v1-q4_k to start chatting

Docker Model Runner
How to use antflydb/gliner2-base-v1-q4_k with Docker Model Runner:
```
docker model run hf.co/antflydb/gliner2-base-v1-q4_k
```

Lemonade

How to use antflydb/gliner2-base-v1-q4_k with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull antflydb/gliner2-base-v1-q4_k

Run and chat with the model

lemonade run user.gliner2-base-v1-q4_k-{{QUANT_TAG}}

List all available models

lemonade list

GLiNER2 Base v1 Q4_K - Zero-Shot Entity Recognition

GLiNER2 Base v1 Q4_K is a Termite split GGUF export of fastino/gliner2-base-v1 for zero-shot named entity recognition and relation-oriented extraction workflows.

Built by antflydb for use with Termite, a standalone ML inference service for embeddings, chunking, reranking, and recognition.

Architecture

Text + labels -> DeBERTa encoder GGUF -> GLiNER2 span head GGUF -> labeled spans

Encoder: DeBERTa-style encoder from fastino/gliner2-base-v1, exported as encoder.gguf.
Head: GLiNER2 span/head sidecar exported as gliner_head.gguf.
Quantization: eligible encoder, embedding, relative-position, and head tensors are stored as Q4_K; small normalization/bias tensors remain dense.
Bundle format: termite_bundle.json marks this as a gliner2_split_bundle/v1.

Intended Uses

Zero-shot named entity recognition with caller-provided labels
Entity extraction for Antfly indexes and document pipelines
Lightweight local recognition experiments through Termite
Relation extraction workflows that build on GLiNER2 entity spans

How to Use with Termite

termite recognize ./gliner2-base-v1-q4_k \
  "John works at Google in California." \
  --label person \
  --label organization \
  --label location \
  --backend native \
  --graph-runtime partitioned

Example output from local validation:

{
  "entities": [[
    {"text": "John", "label": "person", "score": 0.99997926},
    {"text": "Google", "label": "organization", "score": 0.9999995},
    {"text": "California.", "label": "location", "score": 0.9748632}
  ]]
}

Export Details

This bundle was created from the local Termite model cache:

termite export /Users/timkaye/.termite/models/recognizers/fastino/gliner2-base-v1 \
  --target gguf \
  --format q4_k \
  --output /private/tmp/gliner2-q4k-export-full-bundle/encoder.gguf

This export uses the full q4_k pass, including quantized token embeddings, relative-position embeddings, and the GLiNER head position embedding. Termite native recognition support for rank-3 quantized embedding tables is required for this bundle.

GGUF Files

File	Description	Local size
`encoder.gguf`	DeBERTa encoder with 74 `Q4_K` tensors, including token and relative-position embeddings	~102 MB
`gliner_head.gguf`	GLiNER2 span/head sidecar with 12 `Q4_K` tensors, including `count_embed.pos_embedding.weight`	~27 MB
`termite_bundle.json`	Termite split-bundle marker	<1 KB

Additional files include tokenizer.json, tokenizer_config.json, special_tokens_map.json, added_tokens.json, config.json, gliner_config.json, and model_manifest.json.

Validation

The exported bundle was inspected and run locally with Termite:

termite smoke /private/tmp/gliner2-q4k-export-full-bundle test --inspect-only

Inspection found a parseable DeBERTa GGUF with no unsupported tensor types and no missing required tensors.

termite recognize /private/tmp/gliner2-q4k-export-full-bundle \
  "John works at Google in California." \
  --label person \
  --label organization \
  --label location \
  --backend native \
  --graph-runtime partitioned

Limitations

This is a Termite split GGUF bundle, not a generic Transformers checkpoint.
The current package is intended for Termite native inference. Metal support depends on a Termite build with GGUF quantized embedding lookup support.
Small tensors such as normalization weights and biases remain dense where Q4_K is not appropriate.
Accuracy and label behavior inherit the limitations of fastino/gliner2-base-v1 and the caller-provided label set.

Citation

If you use this bundle, cite the upstream GLiNER2 model and the underlying DeBERTa backbone as appropriate for your work.

Downloads last month: 264

GGUF

Model size

0.2B params

Architecture

deberta

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for antflydb/gliner2-base-v1-q4_k

Base model

fastino/gliner2-base-v1

Quantized

(4)

this model