AtalGun commited on
Commit
95409e7
·
verified ·
1 Parent(s): 361aa7f

Add model card

Browse files
Files changed (1) hide show
  1. README.md +87 -0
README.md ADDED
@@ -0,0 +1,87 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ tags:
4
+ - cypher
5
+ - neo4j
6
+ - graph-rag
7
+ - text2cypher
8
+ - phi-3
9
+ - fine-tuned
10
+ - nlp
11
+ license: mit
12
+ base_model: microsoft/Phi-3-mini-4k-instruct
13
+ ---
14
+
15
+ # NL → Cypher · Graph RAG (Phi-3-mini)
16
+
17
+ Fine-tuned **[Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)**
18
+ to convert natural language questions into Neo4j Cypher queries for Graph RAG pipelines.
19
+
20
+ ## Example
21
+
22
+ | Input | Output |
23
+ |-------|--------|
24
+ | Who acted in Inception? | `MATCH (p:Person)-[:ACTED_IN]->(m:Movie {title: 'Inception'}) RETURN p.name` |
25
+ | Top 3 highest rated movies? | `MATCH (m:Movie) RETURN m.title, m.rating ORDER BY m.rating DESC LIMIT 3` |
26
+ | People older than 30 in Chennai? | `MATCH (p:Person) WHERE p.age > 30 AND p.city = 'Chennai' RETURN p.name, p.age` |
27
+
28
+ ## Usage
29
+
30
+ ```python
31
+ from transformers import AutoModelForCausalLM, AutoTokenizer
32
+
33
+ model = AutoModelForCausalLM.from_pretrained("AtalGun/nl2cypher-phi3")
34
+ tokenizer = AutoTokenizer.from_pretrained("AtalGun/nl2cypher-phi3", use_fast=False)
35
+
36
+ SCHEMA = """
37
+ Node types:
38
+ - Person { name, age, email, city }
39
+ - Movie { title, year, genre, rating }
40
+ - Company { name, industry, country }
41
+ Relationships:
42
+ - (Person)-[:ACTED_IN]->(Movie)
43
+ - (Person)-[:DIRECTED]->(Movie)
44
+ - (Person)-[:WORKS_AT]->(Company)
45
+ - (Person)-[:KNOWS]->(Person)
46
+ """
47
+
48
+ def ask(question: str) -> str:
49
+ prompt = (
50
+ f"<|system|>\nYou are a Cypher query generator.\n"
51
+ f"Schema:\n{SCHEMA}<|end|>\n"
52
+ f"<|user|>\n{question}<|end|>\n"
53
+ f"<|assistant|>\n"
54
+ )
55
+ inputs = tokenizer(prompt, return_tensors="pt")
56
+ inputs.pop("token_type_ids", None)
57
+ out = model.generate(**inputs, max_new_tokens=128, do_sample=False)
58
+ return tokenizer.decode(
59
+ out[0][inputs["input_ids"].shape[1]:],
60
+ skip_special_tokens=True
61
+ ).strip()
62
+
63
+ print(ask("Who acted in Inception?"))
64
+ # MATCH (p:Person)-[:ACTED_IN]->(m:Movie {title: 'Inception'}) RETURN p.name
65
+ ```
66
+
67
+ ## Training Details
68
+
69
+ | | |
70
+ |---|---|
71
+ | Base model | microsoft/Phi-3-mini-4k-instruct |
72
+ | Method | QLoRA (r=16, alpha=32) |
73
+ | Framework | Unsloth + TRL SFTTrainer |
74
+ | Dataset | neo4j/text2cypher-2024v1 + custom seed examples |
75
+ | Hardware | Google Colab T4 GPU |
76
+ | Epochs | 3 |
77
+ | Precision | fp16 |
78
+
79
+ ## Graph Schema
80
+
81
+ The model was fine-tuned on a Person / Movie / Company knowledge graph.
82
+ Inject your own schema into the system prompt to adapt it to any Neo4j graph.
83
+
84
+ ## Limitations
85
+
86
+ - Best results when graph schema is explicitly provided in the system prompt
87
+ - Designed for Neo4j Cypher — not tested on other graph query languages