--- license: apache-2.0 language: - en library_name: transformers pipeline_tag: text-generation base_model: Qwen/Qwen3-4B-Instruct-2507 tags: - information-extraction - named-entity-recognition - relation-extraction - grpo - reinforcement-learning - qwen3 - scientific-text - biomedical --- # Agents-K1 **Knowledge extraction model in Agents-K1** is a 4B-parameter language model fine-tuned from [`Qwen/Qwen3-4B-Instruct-2507`](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) with **GRPO** (Group Relative Policy Optimization) on the information-extraction corpus, targeting **Named Entity Recognition (NER)** and **Relation Extraction (RE)** in English scientific and general-domain text. The model produces structured JSON extractions with explicit step-by-step reasoning, enabling its use as a building block in downstream knowledge-graph construction, citation linking, and multi-hop QA pipelines. ## Highlights - **+3.3 absolute F1** averaged over 10 NER/RE benchmarks vs. the Qwen3-4B-Instruct base model, with **gains on every dataset evaluated** (including held-out CrossNER domains). - Trained with rule-based rewards (format + JSON validity + entity/relation F1), no human preference data required. - Outputs follow a strict `` schema, making reasoning auditable and JSON parsing reliable. ## Intended use Designed as an extraction backbone for: - Scientific-literature mining (entities/relations in biomedicine, chemistry, CS, etc.) - Knowledge-graph construction - Pre-processing for retrieval / multi-hop QA systems Not intended for general-purpose chat — it has been specialized for structured extraction. ## Usage The model uses the same chat template as Qwen3-4B-Instruct and expects a schema-driven user prompt. The reply will contain a `` block followed by an `` block with a JSON object. ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_id = "InternScience/Agents-K1" tok = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="bfloat16", device_map="auto") system = ( "You are an expert in information extraction. Given a task instruction " "with schema definitions and input text, extract the required information.\n\n" "You should think step by step about the extraction task, then provide " "your answer in JSON format.\n\n" "Format your response as:\n" "\nYour step-by-step reasoning...\n\n" "\nYour JSON extraction result here\n" ) user = ( "You are an expert in named entity recognition. Please extract entities " "that match the schema definition from the input. Return an empty list if " "the entity type does not exist. Please respond in the format of a JSON " "dictionary.\n\n" 'Entity types to extract: ["person", "organization", "location"]\n\n' "Input text: Marie Curie worked at the University of Paris.\n\n" "Please think step by step and respond in the following format:\n" "\nYour reasoning process...\n\n" "\nYour JSON extraction result\n" ) messages = [{"role": "system", "content": system}, {"role": "user", "content": user}] inputs = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device) out = model.generate(inputs, max_new_tokens=512, do_sample=False) print(tok.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True)) ``` For RE, replace the user template with `Relation types to extract: [...]` and a relation-extraction instruction; the output schema is a JSON dict mapping relation types to lists of `{head, tail}` pairs. ## Training data Training data comes from **IEPile**, restricted to: - English NER and RE tasks - 22 source datasets, mixing scientific (SciERC, GENIA_NER, BC5CDR, BC2GM, BC4CHEMD, AnatEM, NCBI) and general-domain (CoNLL2003, conll04, FabNER, MultiNERD, NYT11, kbp37, …) corpora | Split | Size | Notes | |-----------:|-------:|-------| | Train | 14,400 | 90/10 split, seed=42; each source capped to balance the mix | | Validation | 1,600 | | 70% of samples have non-empty gold labels; 30% are empty-label cases (to prevent the model from defaulting to non-empty outputs). ## Training procedure - **Algorithm:** GRPO (PPO without a critic), implemented in [veRL](https://github.com/volcengine/verl). - **Reward** ∈ \[0, 1\]: - format reward: `0.1 · 𝟙[has ] + 0.1 · 𝟙[has ]` - JSON validity: `0.1 · 𝟙[valid JSON dict]` (or `0.05` for non-dict valid JSON) - task F1: `0.7 · F1(pred, gold)` — entity-set F1 for NER, triple-set F1 for RE ## Evaluation Reported numbers are micro-F1 on each benchmark's official test split, using the same prompt template as training. Gains are **base → Agents-K1 (GRPO)**. | Dataset | Task | n | Base F1 | Agent-K1 F1 | Δ | |---------------------------------|:----:|------:|--------:|--------------:|------:| | CoNLL2003 | NER | 3,184 | 0.6547 | **0.7007** | +0.046 | | NCBI-Disease | NER | 937 | 0.6737 | **0.7340** | +0.060 | | BC5CDR | NER | 4,788 | 0.7126 | **0.7494** | +0.037 | | CrossNER — AI *(held-out)* | NER | 430 | 0.4862 | **0.5400** | +0.054 | | CrossNER — Literature *(held)* | NER | 416 | 0.5462 | **0.5736** | +0.027 | | CrossNER — Music *(held)* | NER | 457 | 0.5791 | **0.6050** | +0.026 | | CrossNER — Politics *(held)* | NER | 650 | 0.6611 | **0.6855** | +0.024 | | CrossNER — Science *(held)* | NER | 532 | 0.5928 | **0.6132** | +0.020 | | SciERC | NER | 397 | 0.1166 | **0.1270** | +0.010 | | conll04 | RE | 287 | 0.2933 | **0.3181** | +0.025 | | **Average** | | | 0.5317 | **0.5647** | **+0.033** | All 10/10 benchmarks improve, including the 5 CrossNER domains that are **not** in the training mix — evidence of generalization rather than mere fitting to in-distribution sources. ## Limitations - **Schema-driven prompting required.** Free-form questions will likely return malformed JSON; always supply explicit entity / relation type lists. ## License Released under the **Apache-2.0** license, following the upstream [Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) license. Users must also comply with the licenses of the IEPile component datasets when using this model in derivative works.