File size: 3,115 Bytes

84d98c3
 
6ebfc07
84d98c3
 
6ebfc07
 
 
 
 
 
 
 
84d98c3
6ebfc07
 
84d98c3
 
6ebfc07
84d98c3
6ebfc07
84d98c3
6ebfc07
 
 
 
 
 
 
 
 
 
 
 
 
 
84d98c3
 
6ebfc07
 
 
84d98c3
6ebfc07
 
84d98c3
6ebfc07
 
 
 
 
 
 
84d98c3
6ebfc07
 
 
 
 
84d98c3
6ebfc07
84d98c3
6ebfc07
 
 
 
 
 
 
 
 
 
84d98c3
6ebfc07
84d98c3
6ebfc07
84d98c3
6ebfc07
84d98c3
6ebfc07
 
84d98c3
6ebfc07
84d98c3
 
6ebfc07
 
 
 
 
84d98c3
6ebfc07

---
base_model: Qwen/Qwen3.5-4B
license: apache-2.0
library_name: peft
tags:
  - base_model:adapter:Qwen/Qwen3.5-4B
  - lora
  - sft
  - transformers
  - knowledge-graph
  - fine-tuning
  - medical
  - financial
pipeline_tag: text-generation
datasets:
  - likhithv/knowledgemesh-benchmark-eval
---

# KnowledgeMesh Full Model — LoRA Adapter

LoRA adapter for `Qwen/Qwen3.5-4B` fine-tuned on **4,361 knowledge graph-guided training samples** generated by the KnowledgeMesh pipeline from financial (Apple 10-K) and medical (PubMed abstracts) documents.

This is the **KM (full)** model from the paper *"Knowledge Graph-Guided Fine-Tuning Data Generation: A Rigorous Benchmark"*.

## Benchmark Results

Evaluated by Gemini 2.5 Flash pointwise judge (1–5 scale, 4 dimensions):

| Eval Set | Base | Meta SDK | **This Model** | Delta |
|---|---|---|---|---|
| Primary (n=473, KM-generated) | 1.79 | 1.93 | **2.47** | **+0.54** |
| Independent (n=955, Gemini-generated) | 1.96 | 2.17 | **2.90** | **+0.72** |

The independent eval set (+0.72, p < 0.0001, Cohen's d = 0.57) is the primary claim — questions were generated by a different model (Gemini) with no access to the KG structure, eliminating question-style bias as an explanation.

## Usage

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

base_model_id = "Qwen/Qwen3.5-4B"
adapter_id = "likhithv/km-full-model"

tokenizer = AutoTokenizer.from_pretrained(base_model_id)
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model = PeftModel.from_pretrained(base_model, adapter_id)

messages = [{"role": "user", "content": "What are the main risk factors for type 2 diabetes?"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
outputs = model.generate(inputs.to(model.device), max_new_tokens=256)
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
```

## Training Details

| Parameter | Value |
|---|---|
| Base model | Qwen/Qwen3.5-4B (4-bit quantized via bitsandbytes) |
| Fine-tuning method | LoRA (rank=16, alpha=16) |
| Training samples | 4,361 (KG-guided: atomic, aggregated, multihop, chain-of-thought) |
| Epochs | 3 |
| Learning rate | 2e-4 |
| Effective batch size | 8 |
| Hardware | Kaggle T4 GPU (16 GB) |
| Domains | Financial (Apple 10-K 2023), Medical (PubMed abstracts) |

## Eval Datasets

- [`likhithv/knowledgemesh-benchmark-eval`](https://huggingface.co/datasets/likhithv/knowledgemesh-benchmark-eval) — both primary (n=473) and independent (n=955) eval sets

## Compared Models

- This model: trained on 4,361 KG-guided samples
- [`likhithv/meta-sdk-baseline`](https://huggingface.co/likhithv/meta-sdk-baseline) — trained on 1,209 chunk-based samples (Meta Synthetic Data Kit)

## Citation

```bibtex
@misc{knowledgemesh2026,
  title={Knowledge Graph-Guided Fine-Tuning Data Generation: A Rigorous Benchmark},
  author={Likhith V},
  year={2026},
  howpublished={https://huggingface.co/likhithv/km-full-model}
}
```