Phi-3.5 Mini Instruct — Text-to-KG (UK Government Contracts)

Model Summary

This is a LoRA fine-tuned version of Phi-3.5 Mini Instruct trained to extract structured RDF knowledge graph triples from raw UK government procurement contract text. The model was developed as part of a UEL–Depixen industrial placement research project focused on building trustworthy, hallucination-free domain-specific SLMs.

Key Results

Metric Score
F1 Score 0.9954
BERTScore F1 0.9997
Hallucination Rate 0.00% (Zero)
Test Contracts 1,387 unseen contracts

Model Details

  • Base Model: microsoft/Phi-3.5-mini-instruct
  • Fine-tuning Method: LoRA (Low-Rank Adaptation)
  • Task: Text-to-KG — extracting RDF triples from contract text
  • Domain: UK Government Procurement Contracts
  • Training Dataset: 9,244 verified UK government contracts
  • Hardware: NVIDIA A100
  • Framework: PyTorch, Hugging Face PEFT, TRL, SFTTrainer

Training Data

  • Source: UK Government procurement contracts
  • Size: 9,244 training samples | 1,387 test samples
  • Format: Contract text → RDF triple extraction
  • Dataset: BSVGK/uk-contracts-text-to-kg

Hallucination Evaluation Framework

This model was evaluated using a novel dual-level hallucination evaluation framework:

  • L1 — Relation Validity: Checks if extracted relations exist in the ontology
  • L2 — Entity Grounding: Verifies entities are grounded in the source contract text

This framework proved that training loss alone is not a reliable quality signal for KG extraction tasks.

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("BSVGK/phi35-mini-lora-text2kg-merged")
model = AutoModelForCausalLM.from_pretrained("BSVGK/phi35-mini-lora-text2kg-merged")

prompt = """Extract RDF triples from the following UK government contract text:

Contract: [paste your contract text here]

RDF Triples:"""

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Downloads last month
7
Safetensors
Model size
4B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for BSVGK/phi35-mini-lora-text2kg-merged

Adapter
(714)
this model

Dataset used to train BSVGK/phi35-mini-lora-text2kg-merged