| --- |
| language: en |
| tags: |
| - cypher |
| - neo4j |
| - graph-rag |
| - text2cypher |
| - phi-3 |
| - fine-tuned |
| - nlp |
| license: mit |
| base_model: microsoft/Phi-3-mini-4k-instruct |
| --- |
| |
| # NL → Cypher · Graph RAG (Phi-3-mini) |
|
|
| Fine-tuned **[Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)** |
| to convert natural language questions into Neo4j Cypher queries for Graph RAG pipelines. |
|
|
| ## Example |
|
|
| | Input | Output | |
| |-------|--------| |
| | Who acted in Inception? | `MATCH (p:Person)-[:ACTED_IN]->(m:Movie {title: 'Inception'}) RETURN p.name` | |
| | Top 3 highest rated movies? | `MATCH (m:Movie) RETURN m.title, m.rating ORDER BY m.rating DESC LIMIT 3` | |
| | People older than 30 in Chennai? | `MATCH (p:Person) WHERE p.age > 30 AND p.city = 'Chennai' RETURN p.name, p.age` | |
|
|
| ## Usage |
|
|
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| model = AutoModelForCausalLM.from_pretrained("AtalGun/nl2cypher-phi3") |
| tokenizer = AutoTokenizer.from_pretrained("AtalGun/nl2cypher-phi3", use_fast=False) |
| |
| SCHEMA = """ |
| Node types: |
| - Person { name, age, email, city } |
| - Movie { title, year, genre, rating } |
| - Company { name, industry, country } |
| Relationships: |
| - (Person)-[:ACTED_IN]->(Movie) |
| - (Person)-[:DIRECTED]->(Movie) |
| - (Person)-[:WORKS_AT]->(Company) |
| - (Person)-[:KNOWS]->(Person) |
| """ |
| |
| def ask(question: str) -> str: |
| prompt = ( |
| f"<|system|>\nYou are a Cypher query generator.\n" |
| f"Schema:\n{SCHEMA}<|end|>\n" |
| f"<|user|>\n{question}<|end|>\n" |
| f"<|assistant|>\n" |
| ) |
| inputs = tokenizer(prompt, return_tensors="pt") |
| inputs.pop("token_type_ids", None) |
| out = model.generate(**inputs, max_new_tokens=128, do_sample=False) |
| return tokenizer.decode( |
| out[0][inputs["input_ids"].shape[1]:], |
| skip_special_tokens=True |
| ).strip() |
| |
| print(ask("Who acted in Inception?")) |
| # MATCH (p:Person)-[:ACTED_IN]->(m:Movie {title: 'Inception'}) RETURN p.name |
| ``` |
|
|
| ## Training Details |
|
|
| | | | |
| |---|---| |
| | Base model | microsoft/Phi-3-mini-4k-instruct | |
| | Method | QLoRA (r=16, alpha=32) | |
| | Framework | Unsloth + TRL SFTTrainer | |
| | Dataset | neo4j/text2cypher-2024v1 + custom seed examples | |
| | Hardware | Google Colab T4 GPU | |
| | Epochs | 3 | |
| | Precision | fp16 | |
|
|
| ## Graph Schema |
|
|
| The model was fine-tuned on a Person / Movie / Company knowledge graph. |
| Inject your own schema into the system prompt to adapt it to any Neo4j graph. |
|
|
| ## Limitations |
|
|
| - Best results when graph schema is explicitly provided in the system prompt |
| - Designed for Neo4j Cypher — not tested on other graph query languages |
|
|