Valtry commited on
Commit
6fd963b
Β·
verified Β·
1 Parent(s): 256c1de

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +105 -60
app.py CHANGED
@@ -35,58 +35,58 @@ print(f"βœ… Loaded on {device.upper()}")
35
  # SYSTEM PROMPT
36
  # =========================
37
 
38
- SYSTEM_PROMPT = """You are a memory compression engine. Compress and merge facts into one short dense paragraph.
39
 
40
  EXAMPLE 1:
41
  EXISTING MEMORY: (none)
42
  USER SAID: I am building a weather app using React and OpenWeatherMap API.
43
  ASSISTANT REPLIED: Fetch data with axios. Store API key in .env via process.env.
44
- UPDATED MEMORY: User building React weather app using OpenWeatherMap API. Data fetched via axios. API key stored in .env.
45
 
46
  EXAMPLE 2:
47
- EXISTING MEMORY: User building React weather app using OpenWeatherMap API. Data fetched via axios. API key stored in .env.
48
  USER SAID: How do I cache the weather data so I do not hit the API limit?
49
  ASSISTANT REPLIED: Use localStorage to cache responses with a timestamp. If cache is under 10 minutes old, return it instead of calling the API.
50
- UPDATED MEMORY: User building React weather app using OpenWeatherMap API. Data fetched via axios, cached in localStorage with 10-minute expiry to avoid API limit.
51
 
52
  EXAMPLE 3:
53
  EXISTING MEMORY: User building job board with Django, React, PostgreSQL. JWT auth via djangorestframework-simplejwt. Custom user model with company and jobseeker roles. Job model has title, description, skills, salary range, location.
54
  USER SAID: How do job seekers apply for a job?
55
- ASSISTANT REPLIED: Create Application model with ForeignKey to Job and User, status field, resume FileField in S3.
56
- UPDATED MEMORY: User building job board with Django, React, PostgreSQL, JWT auth. Custom user model with company/jobseeker roles. Job model has title, description, skills, salary range, location. Application model has ForeignKey to Job and User, status field, resume stored in S3.
57
 
58
  EXAMPLE 4:
59
- EXISTING MEMORY: User building job board with Django, React, PostgreSQL, JWT auth. Custom user model with company/jobseeker roles. Job model has title, description, skills, salary range, location. Application model has ForeignKey to Job and User, status field, resume stored in S3.
60
  USER SAID: I want to add search and filters for title, location, and salary range.
61
- ASSISTANT REPLIED: Use Django Q objects and django-filter. Add query params to job list endpoint.
62
- UPDATED MEMORY: User building job board with Django, React, PostgreSQL, JWT auth. Company/jobseeker roles. Job and Application models complete with S3 resumes. Job search via django-filter and Q objects on title, location, salary range.
63
 
64
  EXAMPLE 5:
65
- EXISTING MEMORY: User building job board with Django, React, PostgreSQL, JWT auth. Company/jobseeker roles. Job and Application models complete with S3 resumes. Job search via django-filter and Q objects on title, location, salary range.
66
- USER SAID: How do I notify applicants when status changes?
67
- ASSISTANT REPLIED: Use Django signals on Application post_save. Trigger SendGrid email via Celery async task.
68
- UPDATED MEMORY: User building job board with Django, React, PostgreSQL, JWT auth, Celery, SendGrid. Company/jobseeker roles. Job and Application models with S3 resumes and django-filter search. Status change notifications via Django signals and Celery tasks.
69
 
70
  EXAMPLE 6:
71
- EXISTING MEMORY: User building job board with Django, React, PostgreSQL, JWT auth, Celery, SendGrid. Company/jobseeker roles. Job and Application models with S3 resumes and django-filter search. Status change notifications via Django signals and Celery tasks.
72
  USER SAID: How do I deploy this on a VPS?
73
- ASSISTANT REPLIED: Docker Compose with Django, React, PostgreSQL, Redis, Celery services. Gunicorn behind nginx. Certbot for SSL.
74
- UPDATED MEMORY: User building job board with Django, React, PostgreSQL, JWT auth, Celery, SendGrid, Redis. Company/jobseeker roles. Job and Application models with S3 resumes and django-filter search. Status notifications via Django signals. Deployed via Docker Compose with Gunicorn, nginx, Certbot SSL.
75
 
76
  EXAMPLE 7:
77
- EXISTING MEMORY: User building job board with Django, React, PostgreSQL, JWT auth, Celery, SendGrid, Redis. Company/jobseeker roles. Job and Application models with S3 resumes and django-filter search. Status notifications via Django signals. Deployed via Docker Compose with Gunicorn, nginx, Certbot SSL.
78
  USER SAID: What is still left to build?
79
- ASSISTANT REPLIED: Admin panel, pagination, rate limiting, frontend loading states and error handling.
80
- UPDATED MEMORY: User building job board with Django, React, PostgreSQL, JWT auth, Celery, SendGrid, Redis. Company/jobseeker roles. Job, Application models with S3 resumes, django-filter search, Docker Compose deployment. Pending: admin panel, pagination, rate limiting, frontend loading states and error handling.
81
 
82
  STRICT RULES:
83
  - Output ONLY the updated memory. No labels. No preamble. No explanation.
84
- - COMPRESS the existing memory. Do not copy it verbatim. Rewrite it shorter.
85
- - Keep ALL technical facts. Remove only filler words.
86
- - Add new facts merged in, not appended as separate sentences.
87
- - No filler: no "ensuring", "enhances", "this setup", "this approach", "in order to".
88
  - No questions. No advice. No "you". No "I".
89
- - One short dense paragraph. Maximum 3 sentences."""
90
 
91
  # =========================
92
  # FILLER PATTERNS
@@ -101,63 +101,84 @@ FILLER_PATTERNS = [
101
  r"for (better|improved|efficient|effective|optimal)\s[^.]*\.",
102
  r"in order to\s[^.]*\.",
103
  r"To (enhance|improve|ensure|enable)\s[^.]*\.",
 
 
104
  ]
105
 
 
 
 
 
 
 
 
 
106
  # =========================
107
  # HELPERS
108
  # =========================
109
 
110
  def clean_assistant_message(text: str) -> str:
111
  """
112
- Strip code blocks from assistant responses.
113
- Extract function/class names and key terms before removing.
114
- Keep only prose explanation, cap at 500 chars.
115
  """
116
- # Extract key identifiers from code before removing
117
- code_blocks = re.findall(r"```[\w]*\n?(.*?)```", text, re.DOTALL)
118
- extracted_terms = []
119
-
120
- for block in code_blocks:
121
- # Grab function/class/variable names
122
- names = re.findall(
123
- r"(?:def|class|const|let|var|function)\s+(\w+)", block
124
- )
125
- extracted_terms.extend(names)
126
-
127
- # Remove code blocks
128
  text = re.sub(r"```[\w]*\n?.*?```", "", text, flags=re.DOTALL)
129
 
130
- # Remove inline code but keep the text
131
  text = re.sub(r"`([^`]+)`", r"\1", text)
132
 
133
- # Append extracted key names if any
134
- if extracted_terms:
135
- text += " Key identifiers: " + ", ".join(extracted_terms) + "."
136
-
137
  # Collapse whitespace
138
  text = re.sub(r"\s{2,}", " ", text).strip()
139
 
140
- return text[:500]
141
 
142
 
143
- def enforce_memory_limit(text: str, max_chars: int = 600) -> str:
144
  """
145
- Hard cap on memory length.
146
- If over limit, keep complete sentences up to the limit.
 
 
 
 
 
 
 
 
 
 
 
147
  """
148
- if len(text) <= max_chars:
 
149
  return text
150
 
151
- sentences = re.split(r"(?<=[.!?])\s+", text)
152
- result = ""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
153
 
154
- for sentence in sentences:
155
- if len(result) + len(sentence) + 1 <= max_chars:
156
- result += ("" if not result else " ") + sentence
157
- else:
158
- break
159
 
160
- return result.strip()
 
 
161
 
162
  # =========================
163
  # REQUEST MODEL
@@ -189,6 +210,10 @@ UPDATED MEMORY:"""
189
  {"role": "user", "content": user_content},
190
  ]
191
 
 
 
 
 
192
  text = tokenizer.apply_chat_template(
193
  messages,
194
  tokenize=False,
@@ -200,14 +225,22 @@ UPDATED MEMORY:"""
200
  return_tensors="pt"
201
  ).to(model.device)
202
 
 
 
 
 
203
  output = model.generate(
204
  **inputs,
205
- max_new_tokens=220,
206
  do_sample=False,
207
  repetition_penalty=1.15,
208
  eos_token_id=tokenizer.eos_token_id,
209
  )
210
 
 
 
 
 
211
  result = tokenizer.decode(
212
  output[0][inputs.input_ids.shape[1]:],
213
  skip_special_tokens=True
@@ -236,6 +269,12 @@ UPDATED MEMORY:"""
236
  for pattern in FILLER_PATTERNS:
237
  result = re.sub(pattern, "", result, flags=re.IGNORECASE)
238
 
 
 
 
 
 
 
239
  # =========================
240
  # CLEAN β€” deduplicate lines
241
  # =========================
@@ -255,7 +294,7 @@ UPDATED MEMORY:"""
255
  # HARD MEMORY LENGTH CAP
256
  # =========================
257
 
258
- result = enforce_memory_limit(result, max_chars=600)
259
 
260
  return {"memory": result}
261
 
@@ -268,8 +307,14 @@ def root():
268
  return {
269
  "status": "Memory Summarizer Running πŸš€",
270
  "model": MODEL_ID,
271
- "device": device.upper()
 
 
272
  }
273
 
 
 
 
 
274
  if __name__ == "__main__":
275
  uvicorn.run("app:app", host="0.0.0.0", port=7860)
 
35
  # SYSTEM PROMPT
36
  # =========================
37
 
38
+ SYSTEM_PROMPT = """You are a memory compression engine. Compress and merge facts into dense paragraphs.
39
 
40
  EXAMPLE 1:
41
  EXISTING MEMORY: (none)
42
  USER SAID: I am building a weather app using React and OpenWeatherMap API.
43
  ASSISTANT REPLIED: Fetch data with axios. Store API key in .env via process.env.
44
+ UPDATED MEMORY: User building React weather app using OpenWeatherMap API. Data fetched via axios. API key stored in .env via process.env.
45
 
46
  EXAMPLE 2:
47
+ EXISTING MEMORY: User building React weather app using OpenWeatherMap API. Data fetched via axios. API key stored in .env via process.env.
48
  USER SAID: How do I cache the weather data so I do not hit the API limit?
49
  ASSISTANT REPLIED: Use localStorage to cache responses with a timestamp. If cache is under 10 minutes old, return it instead of calling the API.
50
+ UPDATED MEMORY: User building React weather app using OpenWeatherMap API. Data fetched via axios. API key in .env. Responses cached in localStorage with 10-minute timestamp expiry to avoid API rate limit.
51
 
52
  EXAMPLE 3:
53
  EXISTING MEMORY: User building job board with Django, React, PostgreSQL. JWT auth via djangorestframework-simplejwt. Custom user model with company and jobseeker roles. Job model has title, description, skills, salary range, location.
54
  USER SAID: How do job seekers apply for a job?
55
+ ASSISTANT REPLIED: Create Application model with ForeignKey to Job and User, status field (applied, reviewed, rejected, accepted), resume FileField stored in S3.
56
+ UPDATED MEMORY: User building job board with Django, React, PostgreSQL. JWT auth via djangorestframework-simplejwt. Custom user model with company and jobseeker roles. Job model has title, description, skills, salary range, location. Application model has ForeignKey to Job and User, status field (applied/reviewed/rejected/accepted), resume FileField stored in S3.
57
 
58
  EXAMPLE 4:
59
+ EXISTING MEMORY: User building job board with Django, React, PostgreSQL. JWT auth via djangorestframework-simplejwt. Custom user model with company and jobseeker roles. Job model has title, description, skills, salary range, location. Application model has ForeignKey to Job and User, status field (applied/reviewed/rejected/accepted), resume FileField stored in S3.
60
  USER SAID: I want to add search and filters for title, location, and salary range.
61
+ ASSISTANT REPLIED: Use Django Q objects and django-filter. Add query params title, location, salary_min, salary_max to job list endpoint.
62
+ UPDATED MEMORY: User building job board with Django, React, PostgreSQL. JWT auth via djangorestframework-simplejwt. Company and jobseeker roles. Job model has title, description, skills, salary range, location. Application model has status field and S3 resume. Job search and filtering via django-filter and Q objects on title, location, salary_min, salary_max query params.
63
 
64
  EXAMPLE 5:
65
+ EXISTING MEMORY: User building job board with Django, React, PostgreSQL. JWT auth via djangorestframework-simplejwt. Company and jobseeker roles. Job model has title, description, skills, salary range, location. Application model has status field and S3 resume. Job search and filtering via django-filter and Q objects on title, location, salary_min, salary_max query params.
66
+ USER SAID: How do I notify applicants when their application status changes?
67
+ ASSISTANT REPLIED: Use Django signals. On Application post_save, detect status change and trigger email via Celery async task using SendGrid.
68
+ UPDATED MEMORY: User building job board with Django, React, PostgreSQL. JWT auth via djangorestframework-simplejwt. Company and jobseeker roles. Job model has title, description, skills, salary range, location. Application model has status field (applied/reviewed/rejected/accepted) and S3 resume. Job search via django-filter and Q objects. Status change notifications triggered via Django signals on Application post_save, sending emails via Celery tasks and SendGrid.
69
 
70
  EXAMPLE 6:
71
+ EXISTING MEMORY: User building job board with Django, React, PostgreSQL. JWT auth via djangorestframework-simplejwt. Company and jobseeker roles. Job model has title, description, skills, salary range, location. Application model has status field (applied/reviewed/rejected/accepted) and S3 resume. Job search via django-filter and Q objects. Status change notifications triggered via Django signals on Application post_save, sending emails via Celery tasks and SendGrid.
72
  USER SAID: How do I deploy this on a VPS?
73
+ ASSISTANT REPLIED: Docker Compose with services for Django, React, PostgreSQL, Redis, Celery. Serve Django via Gunicorn behind nginx. Certbot for SSL. Secrets in .env file.
74
+ UPDATED MEMORY: User building job board with Django, React, PostgreSQL, Redis, Celery, SendGrid. JWT auth via djangorestframework-simplejwt. Company and jobseeker roles. Job model has title, description, skills, salary range, location. Application model has status field and S3 resume. Job search via django-filter and Q objects. Status change emails via Django signals and Celery tasks. Deployed via Docker Compose with Gunicorn, nginx reverse proxy, Certbot SSL, secrets in .env.
75
 
76
  EXAMPLE 7:
77
+ EXISTING MEMORY: User building job board with Django, React, PostgreSQL, Redis, Celery, SendGrid. JWT auth via djangorestframework-simplejwt. Company and jobseeker roles. Job model has title, description, skills, salary range, location. Application model has status field and S3 resume. Job search via django-filter and Q objects. Status change emails via Django signals and Celery tasks. Deployed via Docker Compose with Gunicorn, nginx reverse proxy, Certbot SSL, secrets in .env.
78
  USER SAID: What is still left to build?
79
+ ASSISTANT REPLIED: Admin panel for moderating posts, pagination on job listings, rate limiting on API, frontend loading states and error handling.
80
+ UPDATED MEMORY: User building job board with Django, React, PostgreSQL, Redis, Celery, SendGrid. JWT auth via djangorestframework-simplejwt. Company and jobseeker roles. Job model has title, description, skills, salary range, location. Application model has status field and S3 resume. Job search via django-filter and Q objects. Status change emails via Django signals and Celery tasks. Deployed via Docker Compose with Gunicorn, nginx, Certbot SSL. Pending: admin moderation panel, pagination, API rate limiting, frontend loading states and error handling.
81
 
82
  STRICT RULES:
83
  - Output ONLY the updated memory. No labels. No preamble. No explanation.
84
+ - COMPRESS the existing memory. Do not copy it verbatim. Rewrite it shorter and denser.
85
+ - Keep ALL technical facts: stack, frameworks, APIs, models, field names, architecture decisions, unfinished tasks, user preferences.
86
+ - Add new facts merged in naturally, not appended as separate sentences.
87
+ - No filler: no "ensuring", "enhances", "this setup", "this approach", "in order to", "it is worth noting".
88
  - No questions. No advice. No "you". No "I".
89
+ - Dense technical paragraph. Maximum 8 sentences."""
90
 
91
  # =========================
92
  # FILLER PATTERNS
 
101
  r"for (better|improved|efficient|effective|optimal)\s[^.]*\.",
102
  r"in order to\s[^.]*\.",
103
  r"To (enhance|improve|ensure|enable)\s[^.]*\.",
104
+ r"It is worth noting that\s[^.]*\.",
105
+ r"Additionally,\s*(it|this)\s[^.]*\.",
106
  ]
107
 
108
+ # =========================
109
+ # MEMORY LIMIT CONFIG
110
+ # =========================
111
+
112
+ MEMORY_SOFT_LIMIT = 1600 # ~400 tokens β€” compress aggressively beyond this
113
+ MEMORY_HARD_LIMIT = 2000 # ~500 tokens β€” absolute cap, never exceed
114
+
115
+
116
  # =========================
117
  # HELPERS
118
  # =========================
119
 
120
  def clean_assistant_message(text: str) -> str:
121
  """
122
+ Strip code blocks and inline code backticks from assistant responses.
123
+ Model picks up function/class names from prose naturally.
124
+ Cap at 800 chars to give model more context from long responses.
125
  """
126
+ # Remove full code blocks entirely
 
 
 
 
 
 
 
 
 
 
 
127
  text = re.sub(r"```[\w]*\n?.*?```", "", text, flags=re.DOTALL)
128
 
129
+ # Remove inline code backticks but keep the text inside
130
  text = re.sub(r"`([^`]+)`", r"\1", text)
131
 
 
 
 
 
132
  # Collapse whitespace
133
  text = re.sub(r"\s{2,}", " ", text).strip()
134
 
135
+ return text[:800]
136
 
137
 
138
+ def enforce_memory_limit(text: str) -> str:
139
  """
140
+ Three-stage memory length enforcement.
141
+
142
+ Stage 1 β€” Under 1600 chars (~400 tokens):
143
+ Memory is healthy. Return as-is.
144
+
145
+ Stage 2 β€” Between 1600 and 2000 chars (soft limit):
146
+ Memory is getting long. Keep complete sentences
147
+ that fit within 2000 chars. Oldest appended facts
148
+ may be trimmed; core stack in early sentences is preserved.
149
+
150
+ Stage 3 β€” Over 2000 chars (hard limit):
151
+ Force trim to last complete sentence before 2000 chars.
152
+ Never cuts mid-sentence.
153
  """
154
+ # Stage 1 β€” healthy
155
+ if len(text) <= MEMORY_SOFT_LIMIT:
156
  return text
157
 
158
+ # Stage 2 β€” soft limit: trim to complete sentences within hard limit
159
+ if len(text) <= MEMORY_HARD_LIMIT:
160
+ sentences = re.split(r"(?<=[.!?])\s+", text)
161
+ result = ""
162
+ for sentence in sentences:
163
+ candidate = (result + " " + sentence).strip()
164
+ if len(candidate) <= MEMORY_HARD_LIMIT:
165
+ result = candidate
166
+ else:
167
+ break
168
+ return result.strip()
169
+
170
+ # Stage 3 β€” hard limit: force trim at last period before 2000 chars
171
+ trimmed = text[:MEMORY_HARD_LIMIT]
172
+ last_period = trimmed.rfind(".")
173
+ if last_period != -1:
174
+ trimmed = trimmed[:last_period + 1]
175
+
176
+ return trimmed.strip()
177
 
 
 
 
 
 
178
 
179
+ def strip_backticks(text: str) -> str:
180
+ """Remove any backtick formatting that leaks into memory output."""
181
+ return re.sub(r"`([^`]+)`", r"\1", text)
182
 
183
  # =========================
184
  # REQUEST MODEL
 
210
  {"role": "user", "content": user_content},
211
  ]
212
 
213
+ # =========================
214
+ # FORMAT CHAT
215
+ # =========================
216
+
217
  text = tokenizer.apply_chat_template(
218
  messages,
219
  tokenize=False,
 
225
  return_tensors="pt"
226
  ).to(model.device)
227
 
228
+ # =========================
229
+ # GENERATE
230
+ # =========================
231
+
232
  output = model.generate(
233
  **inputs,
234
+ max_new_tokens=400,
235
  do_sample=False,
236
  repetition_penalty=1.15,
237
  eos_token_id=tokenizer.eos_token_id,
238
  )
239
 
240
+ # =========================
241
+ # DECODE
242
+ # =========================
243
+
244
  result = tokenizer.decode(
245
  output[0][inputs.input_ids.shape[1]:],
246
  skip_special_tokens=True
 
269
  for pattern in FILLER_PATTERNS:
270
  result = re.sub(pattern, "", result, flags=re.IGNORECASE)
271
 
272
+ # =========================
273
+ # CLEAN β€” strip backticks
274
+ # =========================
275
+
276
+ result = strip_backticks(result)
277
+
278
  # =========================
279
  # CLEAN β€” deduplicate lines
280
  # =========================
 
294
  # HARD MEMORY LENGTH CAP
295
  # =========================
296
 
297
+ result = enforce_memory_limit(result)
298
 
299
  return {"memory": result}
300
 
 
307
  return {
308
  "status": "Memory Summarizer Running πŸš€",
309
  "model": MODEL_ID,
310
+ "device": device.upper(),
311
+ "memory_soft_limit": f"{MEMORY_SOFT_LIMIT} chars (~400 tokens)",
312
+ "memory_hard_limit": f"{MEMORY_HARD_LIMIT} chars (~500 tokens)",
313
  }
314
 
315
+ # =========================
316
+ # RUN
317
+ # =========================
318
+
319
  if __name__ == "__main__":
320
  uvicorn.run("app:app", host="0.0.0.0", port=7860)