Update model card with benchmark results and dataset links

6ebfc07 verified 22 days ago

3.12 kB

	---
	base_model: Qwen/Qwen3.5-4B
	license: apache-2.0
	library_name: peft
	tags:
	- base_model:adapter:Qwen/Qwen3.5-4B
	- lora
	- sft
	- transformers
	- knowledge-graph
	- fine-tuning
	- medical
	- financial
	pipeline_tag: text-generation
	datasets:
	- likhithv/knowledgemesh-benchmark-eval
	---

	# KnowledgeMesh Full Model — LoRA Adapter

	LoRA adapter for `Qwen/Qwen3.5-4B` fine-tuned on 4,361 knowledge graph-guided training samples generated by the KnowledgeMesh pipeline from financial (Apple 10-K) and medical (PubMed abstracts) documents.

	This is the KM (full) model from the paper "Knowledge Graph-Guided Fine-Tuning Data Generation: A Rigorous Benchmark".

	## Benchmark Results

	Evaluated by Gemini 2.5 Flash pointwise judge (1–5 scale, 4 dimensions):

	\| Eval Set \| Base \| Meta SDK \| This Model \| Delta \|
	\|---\|---\|---\|---\|---\|
	\| Primary (n=473, KM-generated) \| 1.79 \| 1.93 \| 2.47 \| +0.54 \|
	\| Independent (n=955, Gemini-generated) \| 1.96 \| 2.17 \| 2.90 \| +0.72 \|

	The independent eval set (+0.72, p < 0.0001, Cohen's d = 0.57) is the primary claim — questions were generated by a different model (Gemini) with no access to the KG structure, eliminating question-style bias as an explanation.

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	from peft import PeftModel
	import torch

	base_model_id = "Qwen/Qwen3.5-4B"
	adapter_id = "likhithv/km-full-model"

	tokenizer = AutoTokenizer.from_pretrained(base_model_id)
	base_model = AutoModelForCausalLM.from_pretrained(
	base_model_id,
	torch_dtype=torch.bfloat16,
	device_map="auto",
	)
	model = PeftModel.from_pretrained(base_model, adapter_id)

	messages = [{"role": "user", "content": "What are the main risk factors for type 2 diabetes?"}]
	inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
	outputs = model.generate(inputs.to(model.device), max_new_tokens=256)
	print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
	```

	## Training Details

	\| Parameter \| Value \|
	\|---\|---\|
	\| Base model \| Qwen/Qwen3.5-4B (4-bit quantized via bitsandbytes) \|
	\| Fine-tuning method \| LoRA (rank=16, alpha=16) \|
	\| Training samples \| 4,361 (KG-guided: atomic, aggregated, multihop, chain-of-thought) \|
	\| Epochs \| 3 \|
	\| Learning rate \| 2e-4 \|
	\| Effective batch size \| 8 \|
	\| Hardware \| Kaggle T4 GPU (16 GB) \|
	\| Domains \| Financial (Apple 10-K 2023), Medical (PubMed abstracts) \|

	## Eval Datasets

	- [`likhithv/knowledgemesh-benchmark-eval`](https://huggingface.co/datasets/likhithv/knowledgemesh-benchmark-eval) — both primary (n=473) and independent (n=955) eval sets

	## Compared Models

	- This model: trained on 4,361 KG-guided samples
	- [`likhithv/meta-sdk-baseline`](https://huggingface.co/likhithv/meta-sdk-baseline) — trained on 1,209 chunk-based samples (Meta Synthetic Data Kit)

	## Citation

	```bibtex
	@misc{knowledgemesh2026,
	title={Knowledge Graph-Guided Fine-Tuning Data Generation: A Rigorous Benchmark},
	author={Likhith V},
	year={2026},
	howpublished={https://huggingface.co/likhithv/km-full-model}
	}
	```