metadata
language: en
tags:
- cypher
- neo4j
- graph-rag
- text2cypher
- phi-3
- fine-tuned
- nlp
license: mit
base_model: microsoft/Phi-3-mini-4k-instruct
NL → Cypher · Graph RAG (Phi-3-mini)
Fine-tuned Phi-3-mini-4k-instruct to convert natural language questions into Neo4j Cypher queries for Graph RAG pipelines.
Example
| Input | Output |
|---|---|
| Who acted in Inception? | MATCH (p:Person)-[:ACTED_IN]->(m:Movie {title: 'Inception'}) RETURN p.name |
| Top 3 highest rated movies? | MATCH (m:Movie) RETURN m.title, m.rating ORDER BY m.rating DESC LIMIT 3 |
| People older than 30 in Chennai? | MATCH (p:Person) WHERE p.age > 30 AND p.city = 'Chennai' RETURN p.name, p.age |
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("AtalGun/nl2cypher-phi3")
tokenizer = AutoTokenizer.from_pretrained("AtalGun/nl2cypher-phi3", use_fast=False)
SCHEMA = """
Node types:
- Person { name, age, email, city }
- Movie { title, year, genre, rating }
- Company { name, industry, country }
Relationships:
- (Person)-[:ACTED_IN]->(Movie)
- (Person)-[:DIRECTED]->(Movie)
- (Person)-[:WORKS_AT]->(Company)
- (Person)-[:KNOWS]->(Person)
"""
def ask(question: str) -> str:
prompt = (
f"<|system|>\nYou are a Cypher query generator.\n"
f"Schema:\n{SCHEMA}<|end|>\n"
f"<|user|>\n{question}<|end|>\n"
f"<|assistant|>\n"
)
inputs = tokenizer(prompt, return_tensors="pt")
inputs.pop("token_type_ids", None)
out = model.generate(**inputs, max_new_tokens=128, do_sample=False)
return tokenizer.decode(
out[0][inputs["input_ids"].shape[1]:],
skip_special_tokens=True
).strip()
print(ask("Who acted in Inception?"))
# MATCH (p:Person)-[:ACTED_IN]->(m:Movie {title: 'Inception'}) RETURN p.name
Training Details
| Base model | microsoft/Phi-3-mini-4k-instruct |
| Method | QLoRA (r=16, alpha=32) |
| Framework | Unsloth + TRL SFTTrainer |
| Dataset | neo4j/text2cypher-2024v1 + custom seed examples |
| Hardware | Google Colab T4 GPU |
| Epochs | 3 |
| Precision | fp16 |
Graph Schema
The model was fine-tuned on a Person / Movie / Company knowledge graph. Inject your own schema into the system prompt to adapt it to any Neo4j graph.
Limitations
- Best results when graph schema is explicitly provided in the system prompt
- Designed for Neo4j Cypher — not tested on other graph query languages