File size: 2,590 Bytes
95409e7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
---
language: en
tags:
  - cypher
  - neo4j
  - graph-rag
  - text2cypher
  - phi-3
  - fine-tuned
  - nlp
license: mit
base_model: microsoft/Phi-3-mini-4k-instruct
---

# NL → Cypher · Graph RAG (Phi-3-mini)

Fine-tuned **[Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)**
to convert natural language questions into Neo4j Cypher queries for Graph RAG pipelines.

## Example

| Input | Output |
|-------|--------|
| Who acted in Inception? | `MATCH (p:Person)-[:ACTED_IN]->(m:Movie {title: 'Inception'}) RETURN p.name` |
| Top 3 highest rated movies? | `MATCH (m:Movie) RETURN m.title, m.rating ORDER BY m.rating DESC LIMIT 3` |
| People older than 30 in Chennai? | `MATCH (p:Person) WHERE p.age > 30 AND p.city = 'Chennai' RETURN p.name, p.age` |

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model     = AutoModelForCausalLM.from_pretrained("AtalGun/nl2cypher-phi3")
tokenizer = AutoTokenizer.from_pretrained("AtalGun/nl2cypher-phi3", use_fast=False)

SCHEMA = """
Node types:
  - Person   { name, age, email, city }
  - Movie    { title, year, genre, rating }
  - Company  { name, industry, country }
Relationships:
  - (Person)-[:ACTED_IN]->(Movie)
  - (Person)-[:DIRECTED]->(Movie)
  - (Person)-[:WORKS_AT]->(Company)
  - (Person)-[:KNOWS]->(Person)
"""

def ask(question: str) -> str:
    prompt = (
        f"<|system|>\nYou are a Cypher query generator.\n"
        f"Schema:\n{SCHEMA}<|end|>\n"
        f"<|user|>\n{question}<|end|>\n"
        f"<|assistant|>\n"
    )
    inputs = tokenizer(prompt, return_tensors="pt")
    inputs.pop("token_type_ids", None)
    out = model.generate(**inputs, max_new_tokens=128, do_sample=False)
    return tokenizer.decode(
        out[0][inputs["input_ids"].shape[1]:],
        skip_special_tokens=True
    ).strip()

print(ask("Who acted in Inception?"))
# MATCH (p:Person)-[:ACTED_IN]->(m:Movie {title: 'Inception'}) RETURN p.name
```

## Training Details

| | |
|---|---|
| Base model | microsoft/Phi-3-mini-4k-instruct |
| Method | QLoRA (r=16, alpha=32) |
| Framework | Unsloth + TRL SFTTrainer |
| Dataset | neo4j/text2cypher-2024v1 + custom seed examples |
| Hardware | Google Colab T4 GPU |
| Epochs | 3 |
| Precision | fp16 |

## Graph Schema

The model was fine-tuned on a Person / Movie / Company knowledge graph.
Inject your own schema into the system prompt to adapt it to any Neo4j graph.

## Limitations

- Best results when graph schema is explicitly provided in the system prompt
- Designed for Neo4j Cypher — not tested on other graph query languages