File size: 2,590 Bytes
95409e7 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 | ---
language: en
tags:
- cypher
- neo4j
- graph-rag
- text2cypher
- phi-3
- fine-tuned
- nlp
license: mit
base_model: microsoft/Phi-3-mini-4k-instruct
---
# NL → Cypher · Graph RAG (Phi-3-mini)
Fine-tuned **[Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)**
to convert natural language questions into Neo4j Cypher queries for Graph RAG pipelines.
## Example
| Input | Output |
|-------|--------|
| Who acted in Inception? | `MATCH (p:Person)-[:ACTED_IN]->(m:Movie {title: 'Inception'}) RETURN p.name` |
| Top 3 highest rated movies? | `MATCH (m:Movie) RETURN m.title, m.rating ORDER BY m.rating DESC LIMIT 3` |
| People older than 30 in Chennai? | `MATCH (p:Person) WHERE p.age > 30 AND p.city = 'Chennai' RETURN p.name, p.age` |
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("AtalGun/nl2cypher-phi3")
tokenizer = AutoTokenizer.from_pretrained("AtalGun/nl2cypher-phi3", use_fast=False)
SCHEMA = """
Node types:
- Person { name, age, email, city }
- Movie { title, year, genre, rating }
- Company { name, industry, country }
Relationships:
- (Person)-[:ACTED_IN]->(Movie)
- (Person)-[:DIRECTED]->(Movie)
- (Person)-[:WORKS_AT]->(Company)
- (Person)-[:KNOWS]->(Person)
"""
def ask(question: str) -> str:
prompt = (
f"<|system|>\nYou are a Cypher query generator.\n"
f"Schema:\n{SCHEMA}<|end|>\n"
f"<|user|>\n{question}<|end|>\n"
f"<|assistant|>\n"
)
inputs = tokenizer(prompt, return_tensors="pt")
inputs.pop("token_type_ids", None)
out = model.generate(**inputs, max_new_tokens=128, do_sample=False)
return tokenizer.decode(
out[0][inputs["input_ids"].shape[1]:],
skip_special_tokens=True
).strip()
print(ask("Who acted in Inception?"))
# MATCH (p:Person)-[:ACTED_IN]->(m:Movie {title: 'Inception'}) RETURN p.name
```
## Training Details
| | |
|---|---|
| Base model | microsoft/Phi-3-mini-4k-instruct |
| Method | QLoRA (r=16, alpha=32) |
| Framework | Unsloth + TRL SFTTrainer |
| Dataset | neo4j/text2cypher-2024v1 + custom seed examples |
| Hardware | Google Colab T4 GPU |
| Epochs | 3 |
| Precision | fp16 |
## Graph Schema
The model was fine-tuned on a Person / Movie / Company knowledge graph.
Inject your own schema into the system prompt to adapt it to any Neo4j graph.
## Limitations
- Best results when graph schema is explicitly provided in the system prompt
- Designed for Neo4j Cypher — not tested on other graph query languages
|