RetriCo-LM: Structured Information Extraction Models

Model Family

Model Parameters Base Model
knowledgator/retrico-lm-0.8b 0.8B Qwen3.5-0.8B
knowledgator/retrico-lm-2b 2B Qwen3.5-2B
knowledgator/retrico-lm-4b 4B Qwen3.5-4B
knowledgator/retrico-lm-8b 8B Qwen3.5-8B

Description

Retrico-LM is a family of compact language models fine-tuned for structured information extraction. Given a text and a JSON schema, the model extracts relevant information and returns it as a valid JSON object conforming to the provided schema.

Key Features

  • Schema-guided extraction: Provide any JSON schema and the model will populate it from the input text.
  • Lightweight: Designed to run on consumer hardware — the 0.8B variant fits on a single GPU with minimal memory.
  • Structured output: Outputs valid JSON, reducing the need for post-processing.
  • Open-domain: Works across domains without task-specific fine-tuning.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import json

model_name = "knowledgator/retrico-lm-4b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="bfloat16", device_map="auto")

schema = json.dumps({
    "entities": [{"entity": "string", "type": "string"}],
    "triplets": [{"head": "string", "relation": "string", "tail": "string"}]
}, indent=1)

text = "John Smith joined Google as a senior engineer in 2023."

prompt = (
    "Extract entities and relations from the following text according to the JSON template.\n\n"
    "Important rules:\n"
    "- If a field's value is not mentioned or cannot be found in the text, set it to null.\n"
    "- Do not infer, guess, or hallucinate values that are not explicitly stated.\n"
    "- For list fields with no values found, return [] not [null].\n"
    "- Entity text must be exact substrings from the input text.\n"
    "- Entity types must be one of: person, organization, role\n"
    "- Relation types must be one of: works at, has role\n\n"
    f"Template:\n{schema}\n\n"
    f"Text:\n{text}\n\n"
    "Return only the extracted JSON, nothing else."
)

formatted = tokenizer.apply_chat_template(
    [{"role": "user", "content": prompt}],
    tokenize=False, add_generation_prompt=True, enable_thinking=False,
)

inputs = tokenizer(formatted, return_tensors="pt", truncation=True, max_length=4096)
inputs = {k: v.to(model.device) for k, v in inputs.items()}

outputs = model.generate(**inputs, max_new_tokens=1024, do_sample=False)
result = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(result)

Output:

{
  "entities": [
    {"entity": "John Smith", "type": "person"},
    {"entity": "Google", "type": "organization"},
    {"entity": "senior engineer", "type": "role"}
  ],
  "triplets": [
    {"head": "John Smith", "relation": "works at", "tail": "Google"},
    {"head": "John Smith", "relation": "has role", "tail": "senior engineer"}
  ]
}

Use Cases

Named Entity Recognition (NER)

schema = json.dumps({
    "entities": [{"text": "string", "type": "string"}]
}, indent=1)

text = "Elon Musk founded SpaceX in 2002 and Tesla in 2003, both headquartered in California."

prompt = (
    "Extract all named entities from the following text according to the JSON template.\n\n"
    "Important rules:\n"
    "- Entity text must be an exact substring from the input text.\n"
    "- Entity types must be one of: person, organization, location, date\n"
    "- For list fields with no values found, return [] not [null].\n\n"
    f"Template:\n{schema}\n\n"
    f"Text:\n{text}\n\n"
    "Return only the extracted JSON, nothing else."
)

Output:

{
  "entities": [
    {"text": "Elon Musk", "type": "person"},
    {"text": "SpaceX", "type": "organization"},
    {"text": "2002", "type": "date"},
    {"text": "Tesla", "type": "organization"},
    {"text": "2003", "type": "date"},
    {"text": "California", "type": "location"}
  ]
}

Text Classification

schema = json.dumps({
    "label": "string",
    "confidence": "string",
    "reasoning": "string"
}, indent=1)

text = "The new iPhone 16 features a larger display, improved battery life, and a new camera system with 5x optical zoom."

prompt = (
    "Classify the following text according to the JSON template.\n\n"
    "Important rules:\n"
    "- label must be one of: technology, politics, sports, finance, health, entertainment\n"
    "- confidence must be one of: high, medium, low\n"
    "- reasoning should be a brief explanation grounded in the text.\n\n"
    f"Template:\n{schema}\n\n"
    f"Text:\n{text}\n\n"
    "Return only the extracted JSON, nothing else."
)

Output:

{
  "label": "technology",
  "confidence": "high",
  "reasoning": "The text describes the iPhone 16, a mobile device, and its technical specifications such as display size, battery life, and camera system."
}

Extraction from HTML

retrico-lm supports structured input formats including HTML and Markdown — not just plain text. Pass the raw markup directly as the input.

schema = json.dumps({
    "title": "string",
    "author": "string",
    "published_date": "string",
    "tags": ["string"],
    "summary": "string"
}, indent=1)

html = """
<article>
  <h1>OpenAI releases GPT-5</h1>
  <span class="author">Jane Doe</span>
  <time datetime="2025-03-15">March 15, 2025</time>
  <ul class="tags"><li>AI</li><li>LLM</li><li>OpenAI</li></ul>
  <p>OpenAI has announced GPT-5, claiming significant improvements in reasoning and multimodal understanding over its predecessor.</p>
</article>
"""

prompt = (
    "Extract structured information from the following HTML according to the JSON template.\n\n"
    "Important rules:\n"
    "- If a field's value is not mentioned or cannot be found, set it to null.\n"
    "- Do not infer or hallucinate values not present in the markup.\n"
    "- For list fields with no values found, return [] not [null].\n\n"
    f"Template:\n{schema}\n\n"
    f"Text:\n{html}\n\n"
    "Return only the extracted JSON, nothing else."
)

Output:

{
  "title": "OpenAI releases GPT-5",
  "author": "Jane Doe",
  "published_date": "March 15, 2025",
  "tags": ["AI", "LLM", "OpenAI"],
  "summary": "OpenAI has announced GPT-5, claiming significant improvements in reasoning and multimodal understanding over its predecessor."
}

Extraction from Markdown

schema = json.dumps({
    "title": "string",
    "sections": [{"heading": "string", "content": "string"}],
    "code_languages": ["string"]
}, indent=1)

markdown = (
    "# Getting Started with FastAPI\n\n"
    "## Installation\n"
    "Install FastAPI and uvicorn using pip:\n"
    "    pip install fastapi uvicorn\n\n"
    "## Hello World\n"
    "Create a simple app with a single route:\n"
    "    from fastapi import FastAPI\n"
    "    app = FastAPI()\n\n"
    "    @app.get('/')\n"
    "    def read_root():\n"
    "        return {'Hello': 'World'}\n"
)

prompt = (
    "Extract structured information from the following Markdown document according to the JSON template.\n\n"
    "Important rules:\n"
    "- If a field's value is not mentioned or cannot be found, set it to null.\n"
    "- For list fields with no values found, return [] not [null].\n\n"
    f"Template:\n{schema}\n\n"
    f"Text:\n{markdown}\n\n"
    "Return only the extracted JSON, nothing else."
)

Output:

{
  "title": "Getting Started with FastAPI",
  "sections": [
    {"heading": "Installation", "content": "Install FastAPI and uvicorn using pip:"},
    {"heading": "Hello World", "content": "Create a simple app with a single route:"}
  ],
  "code_languages": ["python"]
}

Using with vLLM

Serving

vllm serve knowledgator/retrico-lm-4b --dtype bfloat16 --port 8000 --language-model-only

Note: The --language-model-only flag is required since retrico-lm is built on the Qwen3.5 architecture, which vLLM treats as a multimodal model by default.

Querying the server

from openai import OpenAI
import json

client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")

schema = json.dumps({
    "entities": [{"entity": "string", "type": "string"}],
    "triplets": [{"head": "string", "relation": "string", "tail": "string"}]
}, indent=1)

text = "John Smith joined Google as a senior engineer in 2023."

response = client.chat.completions.create(
    model="knowledgator/retrico-lm-4b",
    messages=[{"role": "user", "content": (
        "Extract entities and relations from the following text according to the JSON template.\n\n"
        "Important rules:\n"
        "- If a field's value is not mentioned or cannot be found in the text, set it to null.\n"
        "- Do not infer, guess, or hallucinate values that are not explicitly stated.\n"
        "- For list fields with no values found, return [] not [null].\n"
        "- Entity text must be exact substrings from the input text.\n"
        "- Entity types must be one of: person, organization, role\n"
        "- Relation types must be one of: works at, has role\n\n"
        f"Template:\n{schema}\n\n"
        f"Text:\n{text}\n\n"
        "Return only the extracted JSON, nothing else."
    )}],
    max_tokens=1024,
    temperature=0,
    extra_body={"chat_template_kwargs": {"enable_thinking": False}},
)

print(response.choices[0].message.content)

Offline inference

from vllm import LLM, SamplingParams
import json

llm = LLM(
    model="knowledgator/retrico-lm-4b",
    dtype="bfloat16",
    language_model_only=True,
)

schema = json.dumps({
    "entities": [{"entity": "string", "type": "string"}],
    "triplets": [{"head": "string", "relation": "string", "tail": "string"}]
}, indent=1)

text = "John Smith joined Google as a senior engineer in 2023."

prompt = (
    "Extract entities and relations from the following text according to the JSON template.\n\n"
    "Important rules:\n"
    "- If a field's value is not mentioned or cannot be found in the text, set it to null.\n"
    "- Do not infer, guess, or hallucinate values that are not explicitly stated.\n"
    "- For list fields with no values found, return [] not [null].\n"
    "- Entity text must be exact substrings from the input text.\n"
    "- Entity types must be one of: person, organization, role\n"
    "- Relation types must be one of: works at, has role\n\n"
    f"Template:\n{schema}\n\n"
    f"Text:\n{text}\n\n"
    "Return only the extracted JSON, nothing else."
)

messages = [{"role": "user", "content": prompt}]
sampling = SamplingParams(max_tokens=1024, temperature=0)

outputs = llm.chat(messages, sampling_params=sampling,
                    chat_template_kwargs={"enable_thinking": False})
print(outputs[0].outputs[0].text)

Evaluation Metrics

We evaluate retrico-lm using three complementary metrics:

WL Graph Kernel — A graph-based metric that converts both predicted and ground-truth JSON objects into trees, computes semantic embeddings for each node using a sentence transformer, and propagates information via Weisfeiler-Leman message passing. The final score incorporates cross-graph node similarity with depth penalty and leaf weighting, yielding precision, recall, F1, as well as structural and semantic sub-scores. This metric captures both the structural correctness of the output JSON and the semantic similarity of extracted values.

ROUGE-L — Measures the longest common subsequence between predicted and reference JSON strings, providing a surface-level text overlap score.

Attribution Score — Measures how well the extracted values are grounded in the source text. Each non-null leaf value in the predicted JSON is checked against the input text; the score is the fraction of values that can be traced back to the source.

Benchmarks

1. General Extraction

Evaluated on a held-out general-domain extraction benchmark covering diverse entity and relation schemas.

WL Graph Kernel

Model Precision Recall F1 Structural Semantic
knowledgator/retrico-lm-0.8b 0.7300 0.7397 0.7264 0.7408 0.7185
knowledgator/retrico-lm-2b 0.7835 0.8163 0.7902 0.8020 0.7831
knowledgator/retrico-lm-4b 0.8269 0.8772 0.8404 0.8546 0.8328
knowledgator/retrico-lm-8b 0.8715 0.9190 0.8802 0.8956 0.8717

ROUGE-L

Model F1
knowledgator/retrico-lm-0.8b 0.4558
knowledgator/retrico-lm-2b 0.4796
knowledgator/retrico-lm-4b 0.4972
knowledgator/retrico-lm-8b 0.5241

Attribution

Model Score
knowledgator/retrico-lm-0.8b 0.7150
knowledgator/retrico-lm-2b 0.7590
knowledgator/retrico-lm-4b 0.8091
knowledgator/retrico-lm-8b 0.8620

2. Markup Extraction

Evaluated on structured documents in HTML, XML, and Markdown formats. The model is prompted to extract information according to a schema derived from the document's markup structure.

WL Graph Kernel

Model Precision Recall F1 Structural Semantic
knowledgator/retrico-lm-0.8b 0.9460 0.9359 0.9374 0.9426 0.9336
knowledgator/retrico-lm-2b 0.8990 0.9154 0.9047 0.9074 0.9027
knowledgator/retrico-lm-4b 0.9496 0.9482 0.9471 0.9520 0.9444
knowledgator/retrico-lm-8b 0.9911 0.9706 0.9768 0.9805 0.9748

ROUGE-L

Model F1
knowledgator/retrico-lm-0.8b 0.6987
knowledgator/retrico-lm-2b 0.7055
knowledgator/retrico-lm-4b 0.7172
knowledgator/retrico-lm-8b 0.7294

Attribution

Model Score
knowledgator/retrico-lm-0.8b 0.9393
knowledgator/retrico-lm-2b 0.8952
knowledgator/retrico-lm-4b 0.9445
knowledgator/retrico-lm-8b 0.9701

3. Relation Extraction

Evaluated on standard relation extraction benchmarks. We report Micro-F1, Macro-F1, Precision, and Recall.

DocRED

Model Micro-F1 Macro-F1
numind/NuExtract-2.0-4B 1.3 1.7
fastino/gliner2-large-v1 13.8 6.9
knowledgator/retrico-lm-0.8b 0.6 1.7
knowledgator/retrico-lm-2b 8.0 4.9
knowledgator/retrico-lm-4b 12.1 6.2
knowledgator/retrico-lm-8b 18.6 10.4

Using with RetriCo Framework

retrico-lm integrates with the RetriCo framework for building end-to-end knowledge extraction pipelines — from raw text to a structured knowledge graph in a few lines of code.

from retrico import RetriCoBuilder

builder = RetriCoBuilder(name="demo")
builder.chunker(method="sentence")
builder.relex_llm(
    relation_labels=["CEO of", "headquartered in", "born in"],
    model="knowledgator/retrico-lm-4b",
    base_url="http://localhost:8000/v1",
    api_key="dummy",
)
builder.graph_writer()

executor = builder.build()
result = executor.run(texts=[
    "Tim Cook is the CEO of Apple. Apple is headquartered in Cupertino.",
])

Output:

┌──────────────────────────────────────────────┐
│  ENTITIES                                    │
├────────────────────┬─────────────────────────┤
│  Tim Cook          │  person                 │
│  Apple             │  company                │
│  Cupertino         │  city                   │
└────────────────────┴─────────────────────────┘

┌──────────────────────────────────────────────┐
│  RELATIONS                                   │
├──────────────────────────────────────────────┤
│  Tim Cook ──[CEO of]──▶ Apple                │
│  Apple ──[headquartered in]──▶ Cupertino     │
└──────────────────────────────────────────────┘

  3 entities · 2 relations

Citation

@misc{knowledgator2025retrico,
  title={retrico-lm: Schema-Guided Structured Information Extraction},
  author={Knowledgator Engineering},
  year={2026},
  url={https://huggingface.co/knowledgator}
}

Links

Downloads last month
44
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for knowledgator/retrico-lm-4b

Finetuned
(194)
this model