---
license: apache-2.0
language:
- en
library_name: transformers
pipeline_tag: text-generation
base_model: Qwen/Qwen3-4B-Instruct-2507
tags:
- information-extraction
- named-entity-recognition
- relation-extraction
- grpo
- reinforcement-learning
- qwen3
- scientific-text
- biomedical
---

# Agents-K1

**Knowledge extraction model in Agents-K1** is a 4B-parameter language model fine-tuned from
[`Qwen/Qwen3-4B-Instruct-2507`](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507)
with **GRPO** (Group Relative Policy Optimization) on the information-extraction
corpus, targeting **Named Entity Recognition (NER)** and **Relation Extraction (RE)**
in English scientific and general-domain text.

The model produces structured JSON extractions with explicit step-by-step
reasoning, enabling its use as a building block in downstream knowledge-graph
construction, citation linking, and multi-hop QA pipelines.

## Highlights

- **+3.3 absolute F1** averaged over 10 NER/RE benchmarks vs. the
  Qwen3-4B-Instruct base model, with **gains on every dataset evaluated**
  (including held-out CrossNER domains).
- Trained with rule-based rewards (format + JSON validity + entity/relation F1),
  no human preference data required.
- Outputs follow a strict `<think>…</think><answer>…</answer>` schema, making
  reasoning auditable and JSON parsing reliable.

## Intended use

Designed as an extraction backbone for:

- Scientific-literature mining (entities/relations in biomedicine, chemistry,
  CS, etc.)
- Knowledge-graph construction
- Pre-processing for retrieval / multi-hop QA systems

Not intended for general-purpose chat — it has been specialized for structured
extraction.

## Usage

The model uses the same chat template as Qwen3-4B-Instruct and expects a
schema-driven user prompt. The reply will contain a `<think>` block followed by
an `<answer>` block with a JSON object.

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "InternScience/Agents-K1"
tok   = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="bfloat16", device_map="auto")

system = (
    "You are an expert in information extraction. Given a task instruction "
    "with schema definitions and input text, extract the required information.\n\n"
    "You should think step by step about the extraction task, then provide "
    "your answer in JSON format.\n\n"
    "Format your response as:\n"
    "<think>\nYour step-by-step reasoning...\n</think>\n"
    "<answer>\nYour JSON extraction result here\n</answer>"
)

user = (
    "You are an expert in named entity recognition. Please extract entities "
    "that match the schema definition from the input. Return an empty list if "
    "the entity type does not exist. Please respond in the format of a JSON "
    "dictionary.\n\n"
    'Entity types to extract: ["person", "organization", "location"]\n\n'
    "Input text: Marie Curie worked at the University of Paris.\n\n"
    "Please think step by step and respond in the following format:\n"
    "<think>\nYour reasoning process...\n</think>\n"
    "<answer>\nYour JSON extraction result\n</answer>"
)

messages = [{"role": "system", "content": system},
            {"role": "user",   "content": user}]
inputs = tok.apply_chat_template(messages, add_generation_prompt=True,
                                 return_tensors="pt").to(model.device)
out = model.generate(inputs, max_new_tokens=512, do_sample=False)
print(tok.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))
```

For RE, replace the user template with `Relation types to extract: [...]`
and a relation-extraction instruction; the output schema is a JSON dict mapping
relation types to lists of `{head, tail}` pairs.

## Training data

Training data comes from **IEPile**, restricted to:

- English NER and RE tasks
- 22 source datasets, mixing scientific (SciERC, GENIA_NER, BC5CDR, BC2GM,
  BC4CHEMD, AnatEM, NCBI) and general-domain (CoNLL2003, conll04, FabNER,
  MultiNERD, NYT11, kbp37, …) corpora

| Split      | Size   | Notes |
|-----------:|-------:|-------|
| Train      | 14,400 | 90/10 split, seed=42; each source capped to balance the mix |
| Validation | 1,600  | |

70% of samples have non-empty gold labels; 30% are empty-label cases (to prevent
the model from defaulting to non-empty outputs).

## Training procedure

- **Algorithm:** GRPO (PPO without a critic), implemented in
  [veRL](https://github.com/volcengine/verl).
- **Reward** ∈ \[0, 1\]:
  - format reward: `0.1 · 𝟙[has <think>] + 0.1 · 𝟙[has <answer>]`
  - JSON validity: `0.1 · 𝟙[valid JSON dict]` (or `0.05` for non-dict valid JSON)
  - task F1: `0.7 · F1(pred, gold)` — entity-set F1 for NER, triple-set F1 for RE


## Evaluation

Reported numbers are micro-F1 on each benchmark's official test split, using
the same prompt template as training. Gains are **base → Agents-K1 (GRPO)**.

| Dataset                         | Task | n     | Base F1 | Agent-K1 F1 | Δ |
|---------------------------------|:----:|------:|--------:|--------------:|------:|
| CoNLL2003                       | NER  | 3,184 | 0.6547 | **0.7007** | +0.046 |
| NCBI-Disease                    | NER  |   937 | 0.6737 | **0.7340** | +0.060 |
| BC5CDR                          | NER  | 4,788 | 0.7126 | **0.7494** | +0.037 |
| CrossNER — AI *(held-out)*      | NER  |   430 | 0.4862 | **0.5400** | +0.054 |
| CrossNER — Literature *(held)*  | NER  |   416 | 0.5462 | **0.5736** | +0.027 |
| CrossNER — Music *(held)*       | NER  |   457 | 0.5791 | **0.6050** | +0.026 |
| CrossNER — Politics *(held)*    | NER  |   650 | 0.6611 | **0.6855** | +0.024 |
| CrossNER — Science *(held)*     | NER  |   532 | 0.5928 | **0.6132** | +0.020 |
| SciERC                          | NER  |   397 | 0.1166 | **0.1270** | +0.010 |
| conll04                         | RE   |   287 | 0.2933 | **0.3181** | +0.025 |
| **Average**                     |      |       | 0.5317 | **0.5647** | **+0.033** |

All 10/10 benchmarks improve, including the 5 CrossNER domains that are
**not** in the training mix — evidence of generalization rather than mere
fitting to in-distribution sources.

## Limitations

- **Schema-driven prompting required.** Free-form questions will likely
  return malformed JSON; always supply explicit entity / relation type lists.

## License

Released under the **Apache-2.0** license, following the upstream
[Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507)
license. Users must also comply with the licenses of the IEPile component
datasets when using this model in derivative works.