AtalGun
/

nl2cypher-phi3

Model card Files Files and versions

nl2cypher-phi3 / README.md

AtalGun's picture

Add model card

95409e7 verified 5 days ago

|

history blame contribute delete

2.59 kB

	---
	language: en
	tags:
	- cypher
	- neo4j
	- graph-rag
	- text2cypher
	- phi-3
	- fine-tuned
	- nlp
	license: mit
	base_model: microsoft/Phi-3-mini-4k-instruct
	---

	# NL → Cypher · Graph RAG (Phi-3-mini)

	Fine-tuned [Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)
	to convert natural language questions into Neo4j Cypher queries for Graph RAG pipelines.

	## Example

	\| Input \| Output \|
	\|-------\|--------\|
	\| Who acted in Inception? \| `MATCH (p:Person)-[:ACTED_IN]->(m:Movie {title: 'Inception'}) RETURN p.name` \|
	\| Top 3 highest rated movies? \| `MATCH (m:Movie) RETURN m.title, m.rating ORDER BY m.rating DESC LIMIT 3` \|
	\| People older than 30 in Chennai? \| `MATCH (p:Person) WHERE p.age > 30 AND p.city = 'Chennai' RETURN p.name, p.age` \|

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained("AtalGun/nl2cypher-phi3")
	tokenizer = AutoTokenizer.from_pretrained("AtalGun/nl2cypher-phi3", use_fast=False)

	SCHEMA = """
	Node types:
	- Person { name, age, email, city }
	- Movie { title, year, genre, rating }
	- Company { name, industry, country }
	Relationships:
	- (Person)-[:ACTED_IN]->(Movie)
	- (Person)-[:DIRECTED]->(Movie)
	- (Person)-[:WORKS_AT]->(Company)
	- (Person)-[:KNOWS]->(Person)
	"""

	def ask(question: str) -> str:
	prompt = (
	f"<\|system\|>\nYou are a Cypher query generator.\n"
	f"Schema:\n{SCHEMA}<\|end\|>\n"
	f"<\|user\|>\n{question}<\|end\|>\n"
	f"<\|assistant\|>\n"
	)
	inputs = tokenizer(prompt, return_tensors="pt")
	inputs.pop("token_type_ids", None)
	out = model.generate(**inputs, max_new_tokens=128, do_sample=False)
	return tokenizer.decode(
	out[0][inputs["input_ids"].shape[1]:],
	skip_special_tokens=True
	).strip()

	print(ask("Who acted in Inception?"))
	# MATCH (p:Person)-[:ACTED_IN]->(m:Movie {title: 'Inception'}) RETURN p.name
	```

	## Training Details

	\| \| \|
	\|---\|---\|
	\| Base model \| microsoft/Phi-3-mini-4k-instruct \|
	\| Method \| QLoRA (r=16, alpha=32) \|
	\| Framework \| Unsloth + TRL SFTTrainer \|
	\| Dataset \| neo4j/text2cypher-2024v1 + custom seed examples \|
	\| Hardware \| Google Colab T4 GPU \|
	\| Epochs \| 3 \|
	\| Precision \| fp16 \|

	## Graph Schema

	The model was fine-tuned on a Person / Movie / Company knowledge graph.
	Inject your own schema into the system prompt to adapt it to any Neo4j graph.

	## Limitations

	- Best results when graph schema is explicitly provided in the system prompt
	- Designed for Neo4j Cypher — not tested on other graph query languages