Instructions to use knowledgator/retrico-lm-4b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use knowledgator/retrico-lm-4b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="knowledgator/retrico-lm-4b") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("knowledgator/retrico-lm-4b") model = AutoModelForImageTextToText.from_pretrained("knowledgator/retrico-lm-4b") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use knowledgator/retrico-lm-4b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "knowledgator/retrico-lm-4b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "knowledgator/retrico-lm-4b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/knowledgator/retrico-lm-4b
- SGLang
How to use knowledgator/retrico-lm-4b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "knowledgator/retrico-lm-4b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "knowledgator/retrico-lm-4b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "knowledgator/retrico-lm-4b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "knowledgator/retrico-lm-4b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use knowledgator/retrico-lm-4b with Docker Model Runner:
docker model run hf.co/knowledgator/retrico-lm-4b
RetriCo-LM: Structured Information Extraction Models
Model Family
| Model | Parameters | Base Model |
|---|---|---|
| knowledgator/retrico-lm-0.8b | 0.8B | Qwen3.5-0.8B |
| knowledgator/retrico-lm-2b | 2B | Qwen3.5-2B |
| knowledgator/retrico-lm-4b | 4B | Qwen3.5-4B |
| knowledgator/retrico-lm-8b | 8B | Qwen3.5-8B |
Description
Retrico-LM is a family of compact language models fine-tuned for structured information extraction. Given a text and a JSON schema, the model extracts relevant information and returns it as a valid JSON object conforming to the provided schema.
Key Features
- Schema-guided extraction: Provide any JSON schema and the model will populate it from the input text.
- Lightweight: Designed to run on consumer hardware — the 0.8B variant fits on a single GPU with minimal memory.
- Structured output: Outputs valid JSON, reducing the need for post-processing.
- Open-domain: Works across domains without task-specific fine-tuning.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import json
model_name = "knowledgator/retrico-lm-4b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="bfloat16", device_map="auto")
schema = json.dumps({
"entities": [{"entity": "string", "type": "string"}],
"triplets": [{"head": "string", "relation": "string", "tail": "string"}]
}, indent=1)
text = "John Smith joined Google as a senior engineer in 2023."
prompt = (
"Extract entities and relations from the following text according to the JSON template.\n\n"
"Important rules:\n"
"- If a field's value is not mentioned or cannot be found in the text, set it to null.\n"
"- Do not infer, guess, or hallucinate values that are not explicitly stated.\n"
"- For list fields with no values found, return [] not [null].\n"
"- Entity text must be exact substrings from the input text.\n"
"- Entity types must be one of: person, organization, role\n"
"- Relation types must be one of: works at, has role\n\n"
f"Template:\n{schema}\n\n"
f"Text:\n{text}\n\n"
"Return only the extracted JSON, nothing else."
)
formatted = tokenizer.apply_chat_template(
[{"role": "user", "content": prompt}],
tokenize=False, add_generation_prompt=True, enable_thinking=False,
)
inputs = tokenizer(formatted, return_tensors="pt", truncation=True, max_length=4096)
inputs = {k: v.to(model.device) for k, v in inputs.items()}
outputs = model.generate(**inputs, max_new_tokens=1024, do_sample=False)
result = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(result)
Output:
{
"entities": [
{"entity": "John Smith", "type": "person"},
{"entity": "Google", "type": "organization"},
{"entity": "senior engineer", "type": "role"}
],
"triplets": [
{"head": "John Smith", "relation": "works at", "tail": "Google"},
{"head": "John Smith", "relation": "has role", "tail": "senior engineer"}
]
}
Use Cases
Named Entity Recognition (NER)
schema = json.dumps({
"entities": [{"text": "string", "type": "string"}]
}, indent=1)
text = "Elon Musk founded SpaceX in 2002 and Tesla in 2003, both headquartered in California."
prompt = (
"Extract all named entities from the following text according to the JSON template.\n\n"
"Important rules:\n"
"- Entity text must be an exact substring from the input text.\n"
"- Entity types must be one of: person, organization, location, date\n"
"- For list fields with no values found, return [] not [null].\n\n"
f"Template:\n{schema}\n\n"
f"Text:\n{text}\n\n"
"Return only the extracted JSON, nothing else."
)
Output:
{
"entities": [
{"text": "Elon Musk", "type": "person"},
{"text": "SpaceX", "type": "organization"},
{"text": "2002", "type": "date"},
{"text": "Tesla", "type": "organization"},
{"text": "2003", "type": "date"},
{"text": "California", "type": "location"}
]
}
Text Classification
schema = json.dumps({
"label": "string",
"confidence": "string",
"reasoning": "string"
}, indent=1)
text = "The new iPhone 16 features a larger display, improved battery life, and a new camera system with 5x optical zoom."
prompt = (
"Classify the following text according to the JSON template.\n\n"
"Important rules:\n"
"- label must be one of: technology, politics, sports, finance, health, entertainment\n"
"- confidence must be one of: high, medium, low\n"
"- reasoning should be a brief explanation grounded in the text.\n\n"
f"Template:\n{schema}\n\n"
f"Text:\n{text}\n\n"
"Return only the extracted JSON, nothing else."
)
Output:
{
"label": "technology",
"confidence": "high",
"reasoning": "The text describes the iPhone 16, a mobile device, and its technical specifications such as display size, battery life, and camera system."
}
Extraction from HTML
retrico-lm supports structured input formats including HTML and Markdown — not just plain text. Pass the raw markup directly as the input.
schema = json.dumps({
"title": "string",
"author": "string",
"published_date": "string",
"tags": ["string"],
"summary": "string"
}, indent=1)
html = """
<article>
<h1>OpenAI releases GPT-5</h1>
<span class="author">Jane Doe</span>
<time datetime="2025-03-15">March 15, 2025</time>
<ul class="tags"><li>AI</li><li>LLM</li><li>OpenAI</li></ul>
<p>OpenAI has announced GPT-5, claiming significant improvements in reasoning and multimodal understanding over its predecessor.</p>
</article>
"""
prompt = (
"Extract structured information from the following HTML according to the JSON template.\n\n"
"Important rules:\n"
"- If a field's value is not mentioned or cannot be found, set it to null.\n"
"- Do not infer or hallucinate values not present in the markup.\n"
"- For list fields with no values found, return [] not [null].\n\n"
f"Template:\n{schema}\n\n"
f"Text:\n{html}\n\n"
"Return only the extracted JSON, nothing else."
)
Output:
{
"title": "OpenAI releases GPT-5",
"author": "Jane Doe",
"published_date": "March 15, 2025",
"tags": ["AI", "LLM", "OpenAI"],
"summary": "OpenAI has announced GPT-5, claiming significant improvements in reasoning and multimodal understanding over its predecessor."
}
Extraction from Markdown
schema = json.dumps({
"title": "string",
"sections": [{"heading": "string", "content": "string"}],
"code_languages": ["string"]
}, indent=1)
markdown = (
"# Getting Started with FastAPI\n\n"
"## Installation\n"
"Install FastAPI and uvicorn using pip:\n"
" pip install fastapi uvicorn\n\n"
"## Hello World\n"
"Create a simple app with a single route:\n"
" from fastapi import FastAPI\n"
" app = FastAPI()\n\n"
" @app.get('/')\n"
" def read_root():\n"
" return {'Hello': 'World'}\n"
)
prompt = (
"Extract structured information from the following Markdown document according to the JSON template.\n\n"
"Important rules:\n"
"- If a field's value is not mentioned or cannot be found, set it to null.\n"
"- For list fields with no values found, return [] not [null].\n\n"
f"Template:\n{schema}\n\n"
f"Text:\n{markdown}\n\n"
"Return only the extracted JSON, nothing else."
)
Output:
{
"title": "Getting Started with FastAPI",
"sections": [
{"heading": "Installation", "content": "Install FastAPI and uvicorn using pip:"},
{"heading": "Hello World", "content": "Create a simple app with a single route:"}
],
"code_languages": ["python"]
}
Using with vLLM
Serving
vllm serve knowledgator/retrico-lm-4b --dtype bfloat16 --port 8000 --language-model-only
Note: The
--language-model-onlyflag is required since retrico-lm is built on the Qwen3.5 architecture, which vLLM treats as a multimodal model by default.
Querying the server
from openai import OpenAI
import json
client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")
schema = json.dumps({
"entities": [{"entity": "string", "type": "string"}],
"triplets": [{"head": "string", "relation": "string", "tail": "string"}]
}, indent=1)
text = "John Smith joined Google as a senior engineer in 2023."
response = client.chat.completions.create(
model="knowledgator/retrico-lm-4b",
messages=[{"role": "user", "content": (
"Extract entities and relations from the following text according to the JSON template.\n\n"
"Important rules:\n"
"- If a field's value is not mentioned or cannot be found in the text, set it to null.\n"
"- Do not infer, guess, or hallucinate values that are not explicitly stated.\n"
"- For list fields with no values found, return [] not [null].\n"
"- Entity text must be exact substrings from the input text.\n"
"- Entity types must be one of: person, organization, role\n"
"- Relation types must be one of: works at, has role\n\n"
f"Template:\n{schema}\n\n"
f"Text:\n{text}\n\n"
"Return only the extracted JSON, nothing else."
)}],
max_tokens=1024,
temperature=0,
extra_body={"chat_template_kwargs": {"enable_thinking": False}},
)
print(response.choices[0].message.content)
Offline inference
from vllm import LLM, SamplingParams
import json
llm = LLM(
model="knowledgator/retrico-lm-4b",
dtype="bfloat16",
language_model_only=True,
)
schema = json.dumps({
"entities": [{"entity": "string", "type": "string"}],
"triplets": [{"head": "string", "relation": "string", "tail": "string"}]
}, indent=1)
text = "John Smith joined Google as a senior engineer in 2023."
prompt = (
"Extract entities and relations from the following text according to the JSON template.\n\n"
"Important rules:\n"
"- If a field's value is not mentioned or cannot be found in the text, set it to null.\n"
"- Do not infer, guess, or hallucinate values that are not explicitly stated.\n"
"- For list fields with no values found, return [] not [null].\n"
"- Entity text must be exact substrings from the input text.\n"
"- Entity types must be one of: person, organization, role\n"
"- Relation types must be one of: works at, has role\n\n"
f"Template:\n{schema}\n\n"
f"Text:\n{text}\n\n"
"Return only the extracted JSON, nothing else."
)
messages = [{"role": "user", "content": prompt}]
sampling = SamplingParams(max_tokens=1024, temperature=0)
outputs = llm.chat(messages, sampling_params=sampling,
chat_template_kwargs={"enable_thinking": False})
print(outputs[0].outputs[0].text)
Evaluation Metrics
We evaluate retrico-lm using three complementary metrics:
WL Graph Kernel — A graph-based metric that converts both predicted and ground-truth JSON objects into trees, computes semantic embeddings for each node using a sentence transformer, and propagates information via Weisfeiler-Leman message passing. The final score incorporates cross-graph node similarity with depth penalty and leaf weighting, yielding precision, recall, F1, as well as structural and semantic sub-scores. This metric captures both the structural correctness of the output JSON and the semantic similarity of extracted values.
ROUGE-L — Measures the longest common subsequence between predicted and reference JSON strings, providing a surface-level text overlap score.
Attribution Score — Measures how well the extracted values are grounded in the source text. Each non-null leaf value in the predicted JSON is checked against the input text; the score is the fraction of values that can be traced back to the source.
Benchmarks
1. General Extraction
Evaluated on a held-out general-domain extraction benchmark covering diverse entity and relation schemas.
WL Graph Kernel
| Model | Precision | Recall | F1 | Structural | Semantic |
|---|---|---|---|---|---|
| knowledgator/retrico-lm-0.8b | 0.7300 | 0.7397 | 0.7264 | 0.7408 | 0.7185 |
| knowledgator/retrico-lm-2b | 0.7835 | 0.8163 | 0.7902 | 0.8020 | 0.7831 |
| knowledgator/retrico-lm-4b | 0.8269 | 0.8772 | 0.8404 | 0.8546 | 0.8328 |
| knowledgator/retrico-lm-8b | 0.8715 | 0.9190 | 0.8802 | 0.8956 | 0.8717 |
ROUGE-L
| Model | F1 |
|---|---|
| knowledgator/retrico-lm-0.8b | 0.4558 |
| knowledgator/retrico-lm-2b | 0.4796 |
| knowledgator/retrico-lm-4b | 0.4972 |
| knowledgator/retrico-lm-8b | 0.5241 |
Attribution
| Model | Score |
|---|---|
| knowledgator/retrico-lm-0.8b | 0.7150 |
| knowledgator/retrico-lm-2b | 0.7590 |
| knowledgator/retrico-lm-4b | 0.8091 |
| knowledgator/retrico-lm-8b | 0.8620 |
2. Markup Extraction
Evaluated on structured documents in HTML, XML, and Markdown formats. The model is prompted to extract information according to a schema derived from the document's markup structure.
WL Graph Kernel
| Model | Precision | Recall | F1 | Structural | Semantic |
|---|---|---|---|---|---|
| knowledgator/retrico-lm-0.8b | 0.9460 | 0.9359 | 0.9374 | 0.9426 | 0.9336 |
| knowledgator/retrico-lm-2b | 0.8990 | 0.9154 | 0.9047 | 0.9074 | 0.9027 |
| knowledgator/retrico-lm-4b | 0.9496 | 0.9482 | 0.9471 | 0.9520 | 0.9444 |
| knowledgator/retrico-lm-8b | 0.9911 | 0.9706 | 0.9768 | 0.9805 | 0.9748 |
ROUGE-L
| Model | F1 |
|---|---|
| knowledgator/retrico-lm-0.8b | 0.6987 |
| knowledgator/retrico-lm-2b | 0.7055 |
| knowledgator/retrico-lm-4b | 0.7172 |
| knowledgator/retrico-lm-8b | 0.7294 |
Attribution
| Model | Score |
|---|---|
| knowledgator/retrico-lm-0.8b | 0.9393 |
| knowledgator/retrico-lm-2b | 0.8952 |
| knowledgator/retrico-lm-4b | 0.9445 |
| knowledgator/retrico-lm-8b | 0.9701 |
3. Relation Extraction
Evaluated on standard relation extraction benchmarks. We report Micro-F1, Macro-F1, Precision, and Recall.
DocRED
| Model | Micro-F1 | Macro-F1 |
|---|---|---|
| numind/NuExtract-2.0-4B | 1.3 | 1.7 |
| fastino/gliner2-large-v1 | 13.8 | 6.9 |
| knowledgator/retrico-lm-0.8b | 0.6 | 1.7 |
| knowledgator/retrico-lm-2b | 8.0 | 4.9 |
| knowledgator/retrico-lm-4b | 12.1 | 6.2 |
| knowledgator/retrico-lm-8b | 18.6 | 10.4 |
Using with RetriCo Framework
retrico-lm integrates with the RetriCo framework for building end-to-end knowledge extraction pipelines — from raw text to a structured knowledge graph in a few lines of code.
from retrico import RetriCoBuilder
builder = RetriCoBuilder(name="demo")
builder.chunker(method="sentence")
builder.relex_llm(
relation_labels=["CEO of", "headquartered in", "born in"],
model="knowledgator/retrico-lm-4b",
base_url="http://localhost:8000/v1",
api_key="dummy",
)
builder.graph_writer()
executor = builder.build()
result = executor.run(texts=[
"Tim Cook is the CEO of Apple. Apple is headquartered in Cupertino.",
])
Output:
┌──────────────────────────────────────────────┐
│ ENTITIES │
├────────────────────┬─────────────────────────┤
│ Tim Cook │ person │
│ Apple │ company │
│ Cupertino │ city │
└────────────────────┴─────────────────────────┘
┌──────────────────────────────────────────────┐
│ RELATIONS │
├──────────────────────────────────────────────┤
│ Tim Cook ──[CEO of]──▶ Apple │
│ Apple ──[headquartered in]──▶ Cupertino │
└──────────────────────────────────────────────┘
3 entities · 2 relations
Citation
@misc{knowledgator2025retrico,
title={retrico-lm: Schema-Guided Structured Information Extraction},
author={Knowledgator Engineering},
year={2026},
url={https://huggingface.co/knowledgator}
}
Links
- Downloads last month
- 44