NL → Cypher · Graph RAG (Phi-3-mini)

Fine-tuned Phi-3-mini-4k-instruct to convert natural language questions into Neo4j Cypher queries for Graph RAG pipelines.

Example

Input Output
Who acted in Inception? MATCH (p:Person)-[:ACTED_IN]->(m:Movie {title: 'Inception'}) RETURN p.name
Top 3 highest rated movies? MATCH (m:Movie) RETURN m.title, m.rating ORDER BY m.rating DESC LIMIT 3
People older than 30 in Chennai? MATCH (p:Person) WHERE p.age > 30 AND p.city = 'Chennai' RETURN p.name, p.age

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model     = AutoModelForCausalLM.from_pretrained("AtalGun/nl2cypher-phi3")
tokenizer = AutoTokenizer.from_pretrained("AtalGun/nl2cypher-phi3", use_fast=False)

SCHEMA = """
Node types:
  - Person   { name, age, email, city }
  - Movie    { title, year, genre, rating }
  - Company  { name, industry, country }
Relationships:
  - (Person)-[:ACTED_IN]->(Movie)
  - (Person)-[:DIRECTED]->(Movie)
  - (Person)-[:WORKS_AT]->(Company)
  - (Person)-[:KNOWS]->(Person)
"""

def ask(question: str) -> str:
    prompt = (
        f"<|system|>\nYou are a Cypher query generator.\n"
        f"Schema:\n{SCHEMA}<|end|>\n"
        f"<|user|>\n{question}<|end|>\n"
        f"<|assistant|>\n"
    )
    inputs = tokenizer(prompt, return_tensors="pt")
    inputs.pop("token_type_ids", None)
    out = model.generate(**inputs, max_new_tokens=128, do_sample=False)
    return tokenizer.decode(
        out[0][inputs["input_ids"].shape[1]:],
        skip_special_tokens=True
    ).strip()

print(ask("Who acted in Inception?"))
# MATCH (p:Person)-[:ACTED_IN]->(m:Movie {title: 'Inception'}) RETURN p.name

Training Details

Base model microsoft/Phi-3-mini-4k-instruct
Method QLoRA (r=16, alpha=32)
Framework Unsloth + TRL SFTTrainer
Dataset neo4j/text2cypher-2024v1 + custom seed examples
Hardware Google Colab T4 GPU
Epochs 3
Precision fp16

Graph Schema

The model was fine-tuned on a Person / Movie / Company knowledge graph. Inject your own schema into the system prompt to adapt it to any Neo4j graph.

Limitations

  • Best results when graph schema is explicitly provided in the system prompt
  • Designed for Neo4j Cypher — not tested on other graph query languages
Downloads last month
26
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AtalGun/nl2cypher-phi3

Finetuned
(848)
this model