--- language: en tags: - cypher - neo4j - graph-rag - text2cypher - phi-3 - fine-tuned - nlp license: mit base_model: microsoft/Phi-3-mini-4k-instruct --- # NL → Cypher · Graph RAG (Phi-3-mini) Fine-tuned **[Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)** to convert natural language questions into Neo4j Cypher queries for Graph RAG pipelines. ## Example | Input | Output | |-------|--------| | Who acted in Inception? | `MATCH (p:Person)-[:ACTED_IN]->(m:Movie {title: 'Inception'}) RETURN p.name` | | Top 3 highest rated movies? | `MATCH (m:Movie) RETURN m.title, m.rating ORDER BY m.rating DESC LIMIT 3` | | People older than 30 in Chennai? | `MATCH (p:Person) WHERE p.age > 30 AND p.city = 'Chennai' RETURN p.name, p.age` | ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("AtalGun/nl2cypher-phi3") tokenizer = AutoTokenizer.from_pretrained("AtalGun/nl2cypher-phi3", use_fast=False) SCHEMA = """ Node types: - Person { name, age, email, city } - Movie { title, year, genre, rating } - Company { name, industry, country } Relationships: - (Person)-[:ACTED_IN]->(Movie) - (Person)-[:DIRECTED]->(Movie) - (Person)-[:WORKS_AT]->(Company) - (Person)-[:KNOWS]->(Person) """ def ask(question: str) -> str: prompt = ( f"<|system|>\nYou are a Cypher query generator.\n" f"Schema:\n{SCHEMA}<|end|>\n" f"<|user|>\n{question}<|end|>\n" f"<|assistant|>\n" ) inputs = tokenizer(prompt, return_tensors="pt") inputs.pop("token_type_ids", None) out = model.generate(**inputs, max_new_tokens=128, do_sample=False) return tokenizer.decode( out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True ).strip() print(ask("Who acted in Inception?")) # MATCH (p:Person)-[:ACTED_IN]->(m:Movie {title: 'Inception'}) RETURN p.name ``` ## Training Details | | | |---|---| | Base model | microsoft/Phi-3-mini-4k-instruct | | Method | QLoRA (r=16, alpha=32) | | Framework | Unsloth + TRL SFTTrainer | | Dataset | neo4j/text2cypher-2024v1 + custom seed examples | | Hardware | Google Colab T4 GPU | | Epochs | 3 | | Precision | fp16 | ## Graph Schema The model was fine-tuned on a Person / Movie / Company knowledge graph. Inject your own schema into the system prompt to adapt it to any Neo4j graph. ## Limitations - Best results when graph schema is explicitly provided in the system prompt - Designed for Neo4j Cypher — not tested on other graph query languages