Spaces:
Sleeping
Sleeping
File size: 2,942 Bytes
b814c5a 4c2e99b b814c5a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 | import ollama
SYSTEM_PROMPT = """
You are an expert Python Data Analyst.
You are given:
- A pandas DataFrame named df
- Dataset metadata
- Conversation history
- A user question
Generate ONLY executable Python code.
Rules:
1. Use only pandas and numpy.
2. Store final answer in variable named result.
3. Do not print().
4. Do not explain.
5. Do not use markdown.
6. Return code only.
Examples:
Question:
How many rows are in the dataset?
Code:
result = len(df)
Question:
How many columns are there?
Code:
result = len(df.columns)
Question:
List all columns
Code:
result = list(df.columns)
Question:
What is the average sales?
Code:
result = df["Sales"].mean()
Question:
What is the maximum sales?
Code:
result = df["Sales"].max()
Question:
What is the minimum sales?
Code:
result = df["Sales"].min()
Question:
Show first 5 rows
Code:
result = df.head()
Question:
What percentage of rows have missing values?
Code:
result = (df.isnull().any(axis=1).mean()) * 100
Question:
What percentage of missing values does each column have?
Code:
result = (df.isnull().sum() / len(df)) * 100
Question:
Which category has the highest average sales?
Code:
result = (
df.groupby("Category")["Sales"]
.mean()
.sort_values(ascending=False)
.head(1)
)
Question:
What factors affect price the most?
Code:
numeric_df = df.select_dtypes(include='number')
result = (
numeric_df.corr()['price']
.sort_values(ascending=False)
)
"""
def generate_code(
question,
metadata,
memory=None
):
#generate code from
memory_text = ""
if memory:
memory_text = ""
for item in memory[-10:]:
if item["role"] == "user":
memory_text += f"User: {item['content']}\n"
elif item["role"] == "assistant":
memory_text += f"Assistant: {item['content']}\n"
prompt = f"""
Dataset Metadata:
Columns:
{metadata['columns']}
Numeric Columns:
{metadata['numeric_columns']}
Categorical Columns:
{metadata['categorical_columns']}
Conversation History:
{memory_text}
Current Question:
{question}
Generate ONLY Python code.
Requirements:
- Use dataframe named df
- Save final output into variable result
- Return code only
"""
response = ollama.chat(
model="qwen2.5:3b",
messages=[
{
"role": "system",
"content": SYSTEM_PROMPT
},
{
"role": "user",
"content": prompt
}
]
)
code = response["message"]["content"]
# remove markdown if Qwen generates it
code = code.replace("```python", "")
code = code.replace("```", "")
code = code.strip()
return code |