File size: 4,928 Bytes
9ade236 7d01797 9ade236 7d01797 9ade236 256f197 7d01797 d9a7255 7d01797 6a468a7 7d01797 d66b69d 7d01797 9ae27df 7d01797 9ae27df d66b69d 7d01797 d66b69d 7d01797 d66b69d 7d01797 d66b69d 7d01797 9ade236 7d01797 9ade236 7d01797 9ade236 d66b69d 6a468a7 d66b69d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 | ---
license: apache-2.0
base_model: microsoft/phi-4
tags:
- text-generation-inference
- transformers
- unsloth
- phi-4
- information-extraction
- ner
- relation-extraction
- knowledge-graph
- slm
model_creator: FinaPolat
language:
- en
---
# Phi-4-AdaptableIE: Efficient Adaptive Knowledge Graph Extraction
#### This model has gguf version: https://huggingface.co/FinaPolat/phi4_adaptableIE_v2-gguf
Phi-4-AdaptableIE is a specialized **14.7B parameter Small Language Model (SLM)** optimized via **Supervised Fine-Tuning (SFT)** for high-precision, **Joint Named Entity Recognition (NER) and Relation Extraction (RE)**.
Unlike traditional multi-stage pipelines that are prone to cascading error propagation, this model performs entity identification and relational mapping in a single cohesive pass. It is designed to be **ontology-adaptive**, allowing it to conform to dynamic, unseen schemas at inference time through a specialized **Structured Prompt Architecture**.
## 🚀 Model Highlights
- **Joint Extraction:** Unified NER + RE reducing pipeline complexity.
- **Ontology-Adaptive:** Zero-shot adaptation to diverse domains (Astronomy, Music, Healthcare, etc.) via dynamic schema variables.
- **Local & Private:** Optimized for **local CPU-only inference** (via GGUF/Ollama - FinaPolat/phi4_adaptableIE_v2-gguf ), ensuring data sovereignty without external API dependencies.
- **Instruction Aligned:** Fine-tuned to follow strict negative constraints, ensuring zero conversational filler in outputs.
## 🛠 Methodology
The model was fine-tuned using **QLoRA** on the **WebNLG** subset of the **Text2KGBench** benchmark. The training process focused on **Conversational Alignment**, ensuring the model treats extraction as a strict logical mapping:
`Prompt = f(task, schema, example, text)`
---
## 📝 Prompting Strategy
To achieve high-fidelity extraction, the model requires a specific prompt structure.
### 1. System Prompt
```json
{
"role": "system",
"content": "You are a helpful AI assistant specializing in Information Extraction tasks such as Named Entity Recognition and Relation Extraction. Follow the instructions given by the user."
}
```
### 2. User Prompt Template
```css
Information Extraction is the process of automatically identifying and extracting structured information from unstructured text data... [Context] ...
Always extract numbers, dates, and currency values regardless of the specific task.
The task at hand is {task}.
Here is an example of task execution:
{example}
Analyze the text and targets carefully, identify relevant information.
Extract the information in the following format: `{output_format}`.
If no matching entities are found, return an empty list: [].
Please provide only the extracted information without any explanations.
Schema: {schema}
Text: {inputs}
```
### 3. 💻 Usage Examples
Option 1: Transformers (Single GPU)
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "FinaPolat/phi4_adaptableIE_v2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", trust_remote_code=True)
task = "Joint NER and RE"
schema = "['CelestialBody', 'apoapsis', 'averageSpeed']"
inputs = "(19255) 1994 VK8 has an average speed of 4.56 km per second."
output_format = "[('subject', 'predicate', 'object')]"
prompt = f"Task: {task}\nSchema: {schema}\nText: {inputs}\nExtract:"
input_ids = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=256, temperature=0.0)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
Option 2: High-Throughput Batch Inference (vLLM)
```python
from vllm import LLM, SamplingParams
llm = LLM(
model="FinaPolat/phi4_adaptableIE_v2",
dtype="bfloat16",
trust_remote_code=True,
gpu_memory_utilization=0.9,
max_model_len=3000,
enforce_eager=True,
distributed_executor_backend="uni"
)
sampling_params = SamplingParams(temperature=0.0, max_tokens=256)
outputs = llm.chat(batch_prompts, sampling_params=sampling_params, use_tqdm=True)
```
### 4. 📦 Deployment & Hardware Requirements
| Deployment Mode | Quantization | Hardware Requirement | Target Latency |
|-----------------|--------------|------------------------------------------|----------------|
| Server-side | BF16 | 1× NVIDIA A100 / RTX 4090 (24GB+) | Ultra-Low |
| Local Consumer | 4-bit GGUF | 16GB RAM (Apple Silicon / PC CPU) | Moderate |
For CPU-only local execution, refer to the GGUF version: phi4_adaptableIE_v2-gguf📜
### 5. Citation & Credits
If you use this model in your research, please cite the Text2KGBench framework and the Microsoft Phi-4 technical report and our work:
https://github.com/FinaPolat/ENEXA_adaptable_extraction
Video: https://www.youtube.com/watch?v=your-video-
|