digifreely commited on
Commit
7485eda
Β·
verified Β·
1 Parent(s): 6c99236

Upload 3 files

Browse files
Files changed (3) hide show
  1. README.md +29 -6
  2. app.py +570 -0
  3. requirements.txt +34 -0
README.md CHANGED
@@ -1,13 +1,36 @@
1
  ---
2
- title: Chatmessenger
3
- emoji: 🐨
4
- colorFrom: green
5
- colorTo: pink
6
  sdk: gradio
7
- sdk_version: 6.11.0
8
  app_file: app.py
9
  pinned: false
10
  license: mit
11
  ---
12
 
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Maria Learning Service
3
+ emoji: πŸ“š
4
+ colorFrom: blue
5
+ colorTo: green
6
  sdk: gradio
7
+ sdk_version: 5.9.1
8
  app_file: app.py
9
  pinned: false
10
  license: mit
11
  ---
12
 
13
+ # Maria Learning Service
14
+
15
+ A FastAPI-based AI tutoring service powered by Qwen3.5-2B (4-bit quantized) with ZeroGPU.
16
+
17
+ ## Endpoint
18
+
19
+ `POST /chat` β€” Main tutoring endpoint
20
+ `GET /health` β€” Health check
21
+
22
+ ## Authentication
23
+
24
+ Pass **one** of these headers per request:
25
+
26
+ | Header | Description |
27
+ |--------|-------------|
28
+ | `auth_code` | Raw value whose SHA-256 must match `HASH_VALUE` secret |
29
+ | `cf-turnstile-token` | Cloudflare Turnstile token verified against `CF_SECRET_KEY` secret |
30
+
31
+ ## Secrets Required
32
+
33
+ Set these in your Space β†’ Settings β†’ Secrets:
34
+
35
+ - `HASH_VALUE` β€” SHA-256 hex digest of your auth code
36
+ - `CF_SECRET_KEY` β€” Cloudflare Turnstile secret key
app.py ADDED
@@ -0,0 +1,570 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ─────────────────────────────────────────────────────────────────────────────
2
+ # Maria Learning Service | app.py
3
+ # FastAPI + ZeroGPU (Qwen3.5-2B int4) + FAISS RAG + gTTS
4
+ # ─────────────────────────────────────────────────────────────────────────────
5
+
6
+ import os
7
+ import gc
8
+ import json
9
+ import base64
10
+ import hashlib
11
+ import logging
12
+ import copy
13
+ from io import BytesIO
14
+ from typing import List, Any
15
+
16
+ import httpx
17
+ import numpy as np
18
+ import pandas as pd
19
+ import faiss
20
+ import gradio as gr
21
+ from fastapi import FastAPI, HTTPException, Request
22
+ from fastapi.responses import JSONResponse
23
+ from pydantic import BaseModel
24
+ from huggingface_hub import hf_hub_download
25
+ from gtts import gTTS
26
+
27
+ # ── ZeroGPU: import spaces only when running inside HF Spaces ─────────────────
28
+ try:
29
+ import spaces as _spaces
30
+ _ZEROGPU = True
31
+ except ImportError:
32
+ # Running locally β€” provide a no-op decorator so the rest of the code
33
+ # works unchanged without modifying anything.
34
+ import types
35
+
36
+ class _spaces: # noqa: N801
37
+ @staticmethod
38
+ def GPU(fn):
39
+ return fn
40
+
41
+ _ZEROGPU = False
42
+
43
+ logging.basicConfig(
44
+ level=logging.INFO,
45
+ format="%(asctime)s %(levelname)-8s %(message)s",
46
+ )
47
+ log = logging.getLogger(__name__)
48
+
49
+ # ─────────────────────────────────────────────────────────────────────────────
50
+ # Config / Secrets
51
+ # ─────────────────────────────────────────────────────────────────────────────
52
+ HASH_VALUE = os.environ.get("HASH_VALUE", "")
53
+ CF_SECRET_KEY = os.environ.get("CF_SECRET_KEY", "")
54
+ HF_REPO_ID = "digifreely/Maria"
55
+ LLM_MODEL_ID = "Qwen/Qwen3.5-2B"
56
+
57
+
58
+ # ─────────────────────────────────────────────────────────────────────────────
59
+ # Embedding model (CPU, loaded once per container lifetime)
60
+ # ─────────────────────────────────────────────────────────────────────────────
61
+ _emb_model = None
62
+
63
+ def _get_emb_model(name: str = "sentence-transformers/all-MiniLM-L6-v2"):
64
+ global _emb_model
65
+ if _emb_model is None:
66
+ from sentence_transformers import SentenceTransformer
67
+ log.info("Loading embedding model: %s", name)
68
+ _emb_model = SentenceTransformer(name)
69
+ return _emb_model
70
+
71
+
72
+ # ─────────────────────────────────────────────────────────────────────────────
73
+ # Security helpers
74
+ # ─────────────────────────────────────────────────────────────────────────────
75
+ def _check_auth_code(code: str) -> bool:
76
+ if not HASH_VALUE:
77
+ return False
78
+ return hashlib.sha256(code.encode()).hexdigest() == HASH_VALUE
79
+
80
+
81
+ async def _check_turnstile(token: str) -> bool:
82
+ if not CF_SECRET_KEY:
83
+ return False
84
+ try:
85
+ async with httpx.AsyncClient(timeout=8.0) as client:
86
+ resp = await client.post(
87
+ "https://challenges.cloudflare.com/turnstile/v0/siteverify",
88
+ data={"secret": CF_SECRET_KEY, "response": token},
89
+ )
90
+ return resp.json().get("success", False)
91
+ except Exception as exc:
92
+ log.error("Turnstile verification error: %s", exc)
93
+ return False
94
+
95
+
96
+ # ─────────────────────────────────────────────────────────────────────────────
97
+ # Dataset loading (called per request β€” no pre-loading)
98
+ # ─────────────────────────────────────────────────────────────────────────────
99
+ def _load_dataset(board: str, cls: str, subject: str):
100
+ """Download config / FAISS index / metadata from HF Hub and return them."""
101
+ prefix = f"knowledgebase/{board}/{cls}/{subject}"
102
+ log.info("Fetching dataset: %s", prefix)
103
+
104
+ config_path = hf_hub_download(
105
+ repo_id=HF_REPO_ID,
106
+ filename=f"{prefix}/config.json",
107
+ repo_type="dataset",
108
+ )
109
+ faiss_path = hf_hub_download(
110
+ repo_id=HF_REPO_ID,
111
+ filename=f"{prefix}/faiss_index.bin",
112
+ repo_type="dataset",
113
+ )
114
+ meta_path = hf_hub_download(
115
+ repo_id=HF_REPO_ID,
116
+ filename=f"{prefix}/metadata.parquet",
117
+ repo_type="dataset",
118
+ )
119
+
120
+ with open(config_path) as fh:
121
+ config = json.load(fh)
122
+
123
+ index = faiss.read_index(faiss_path)
124
+ metadata = pd.read_parquet(meta_path)
125
+ return config, index, metadata
126
+
127
+
128
+ def _rag_search(
129
+ query: str,
130
+ config: dict,
131
+ index,
132
+ metadata: pd.DataFrame,
133
+ k: int = 3,
134
+ ) -> List[str]:
135
+ """Embed query, search FAISS, return top-k text chunks."""
136
+ emb_model_name = config.get(
137
+ "embedding_model", "sentence-transformers/all-MiniLM-L6-v2"
138
+ )
139
+ emb = _get_emb_model(emb_model_name)
140
+ vec = emb.encode([query], normalize_embeddings=True).astype(np.float32)
141
+ _, idxs = index.search(vec, k)
142
+
143
+ # Try common column names used when building the index
144
+ text_cols = ["text", "content", "chunk", "passage", "answer", "description"]
145
+ chunks: List[str] = []
146
+ for i in idxs[0]:
147
+ if 0 <= i < len(metadata):
148
+ row = metadata.iloc[i]
149
+ for col in text_cols:
150
+ if col in metadata.columns and pd.notna(row[col]):
151
+ chunks.append(str(row[col])[:800])
152
+ break
153
+ return chunks
154
+
155
+
156
+ # ─────────────────────────────────────────────────────────────────────────────
157
+ # LLM inference β€” decorated with @spaces.GPU so GPU is only held during call
158
+ # ─────────────────────────────────────────────────────────────────────────────
159
+ def _model_generate(system_prompt: str, user_prompt: str) -> str:
160
+ """
161
+ Loads Qwen3.5-2B (NF4 4-bit), runs generation, unloads model, returns text.
162
+ Kept as a plain function so the spaces.GPU decorator can wrap it cleanly.
163
+ """
164
+ import torch
165
+ from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
166
+
167
+ log.info("Loading %s (4-bit NF4)…", LLM_MODEL_ID)
168
+ quant = BitsAndBytesConfig(
169
+ load_in_4bit=True,
170
+ bnb_4bit_compute_dtype=torch.float16,
171
+ bnb_4bit_use_double_quant=True,
172
+ bnb_4bit_quant_type="nf4",
173
+ )
174
+ tok = AutoTokenizer.from_pretrained(LLM_MODEL_ID, trust_remote_code=True)
175
+ model = AutoModelForCausalLM.from_pretrained(
176
+ LLM_MODEL_ID,
177
+ quantization_config=quant,
178
+ device_map="auto",
179
+ trust_remote_code=True,
180
+ )
181
+ model.eval()
182
+
183
+ messages = [
184
+ {"role": "system", "content": system_prompt},
185
+ {"role": "user", "content": user_prompt},
186
+ ]
187
+
188
+ # Qwen3.5-2B is non-thinking by default; enable_thinking=False is explicit.
189
+ text = tok.apply_chat_template(
190
+ messages,
191
+ tokenize=False,
192
+ add_generation_prompt=True,
193
+ enable_thinking=False,
194
+ )
195
+ inputs = tok([text], return_tensors="pt").to(model.device)
196
+
197
+ with torch.no_grad():
198
+ out_ids = model.generate(
199
+ **inputs,
200
+ max_new_tokens=600,
201
+ temperature=0.7,
202
+ top_p=0.9,
203
+ do_sample=True,
204
+ repetition_penalty=1.1,
205
+ pad_token_id=tok.eos_token_id,
206
+ )
207
+
208
+ new_tokens = out_ids[0][inputs.input_ids.shape[1]:]
209
+ result = tok.decode(new_tokens, skip_special_tokens=True).strip()
210
+
211
+ # Release GPU memory before returning
212
+ del model, tok
213
+ gc.collect()
214
+ torch.cuda.empty_cache()
215
+
216
+ log.info("Inference complete. Output length: %d chars", len(result))
217
+ return result
218
+
219
+
220
+ # Apply ZeroGPU decorator
221
+ run_inference = _spaces.GPU(_model_generate)
222
+
223
+
224
+ # ─────────────────────────────────────────────────────────────────────────────
225
+ # Text-to-Speech
226
+ # ─────────────────────────────────────────────────────────────────────────────
227
+ def _tts_to_b64(text: str) -> str:
228
+ try:
229
+ tts = gTTS(text=text[:3000], lang="en", tld="co.uk", slow=False)
230
+ buf = BytesIO()
231
+ tts.write_to_fp(buf)
232
+ buf.seek(0)
233
+ return base64.b64encode(buf.read()).decode("utf-8")
234
+ except Exception as exc:
235
+ log.error("TTS error: %s", exc)
236
+ return ""
237
+
238
+
239
+ # ──────────────────────────────��──────────────────────────────────────────────
240
+ # Prompt builder
241
+ # ─────────────────────────────────────────────────────────────────────────────
242
+ def _build_system_prompt(lp: dict, rag_chunks: List[str]) -> str:
243
+ persona = lp.get("teacher_persona", "A friendly and patient teacher")
244
+ student = lp.get("student_name", "Student")
245
+ chat_history = lp.get("chat_history", [])[-6:] # last 6 turns
246
+ scratchpad = lp.get("scratchpad", [])[-3:] # last 3 entries
247
+ current_learning = lp.get("assessment_stages", {}).get("current_learning", [])
248
+
249
+ history_block = "\n".join(
250
+ f'Student: {h.get("user_input","")}\nTeacher: {h.get("system_output","")}'
251
+ for h in chat_history
252
+ ) or "No conversation history yet."
253
+
254
+ scratch_block = "\n".join(
255
+ f'[id={s.get("chat_id","")}] Thought: {s.get("thought","")} | '
256
+ f'Action: {s.get("action","")} | Obs: {s.get("observation","")}'
257
+ for s in scratchpad
258
+ ) or "Empty."
259
+
260
+ rag_block = "\n---\n".join(rag_chunks) if rag_chunks else "No relevant content found in the knowledge base."
261
+ cl_block = json.dumps(current_learning, indent=2) if current_learning else "[]"
262
+
263
+ return f"""You are {persona}. You are teaching {student}, a child aged 6 to 12 years old.
264
+ Always use simple and clear English. Do not use emojis. Be warm, patient, and encouraging.
265
+
266
+ STUDENT NAME: {student}
267
+
268
+ CURRENT LEARNING OBJECTIVES:
269
+ {cl_block}
270
+
271
+ KNOWLEDGE BASE (use this to teach or answer questions):
272
+ {rag_block}
273
+
274
+ RECENT CONVERSATION:
275
+ {history_block}
276
+
277
+ INTERNAL NOTES (scratchpad):
278
+ {scratch_block}
279
+
280
+ YOUR TASK:
281
+ Step 1 β€” Decide the intent of the student message: block, questions, curriculum, or chitchat.
282
+ Step 2 β€” Respond to the student following the rules for that intent.
283
+ Step 3 β€” Return ONLY a valid JSON object. Nothing before or after the JSON.
284
+
285
+ INTENT RULES:
286
+
287
+ "block"
288
+ The student said something rude, disrespectful, or inappropriate for a child aged 6 to 12.
289
+ Check the recent conversation to decide if this is a repeated pattern.
290
+ First occurrence: politely discourage and redirect to current learning.
291
+ Repeated pattern: gently but firmly end the conversation.
292
+ Never use harsh or unkind language.
293
+
294
+ "questions"
295
+ The student asked a general question that is not about the current learning topic.
296
+ Search the knowledge base for an answer.
297
+ If found: answer briefly in simple language, then redirect to current learning.
298
+ If not found: say you do not know and redirect to current learning.
299
+
300
+ "curriculum"
301
+ The student is engaging with the current learning topic.
302
+ For each goal in current_learning, follow these stages IN ORDER:
303
+ 1. teach β€” Explain the goal using the knowledge base. Mark teach=complete.
304
+ 2. re_teach β€” Ask one question to check understanding.
305
+ If the answer is wrong, re-explain clearly. Mark re_teach=complete.
306
+ 3. show_and_tell β€” Ask a similar but different question. Mark show_and_tell=complete.
307
+ 4. assess β€” Decide pass or fail.
308
+ Pass: mark assess=complete and congratulate.
309
+ Fail: explain the mistake kindly and set assess=Not_Complete so it retries next turn.
310
+ Only advance to the next stage when the current one is complete.
311
+
312
+ "chitchat"
313
+ Casual conversation such as greetings, sharing something personal, or general chat.
314
+ Respond warmly and naturally, then gently bring up the current learning topic.
315
+
316
+ RESPONSE FORMAT β€” return ONLY this JSON object, nothing else:
317
+ {{
318
+ "intent": "<block|questions|curriculum|chitchat>",
319
+ "response": "<your response to the student in plain English>",
320
+ "stage_updates": [
321
+ {{
322
+ "topic": "<exact topic string from current_learning>",
323
+ "goal": "<exact goal string from learning_objectives>",
324
+ "teach": "<complete|Not_Complete>",
325
+ "re_teach": "<complete|Not_Complete>",
326
+ "show_and_tell": "<complete|Not_Complete>",
327
+ "assess": "<complete|Not_Complete>"
328
+ }}
329
+ ],
330
+ "thought": "<your internal reasoning>",
331
+ "action": "<teach|re_teach|show_and_tell|assess|answer|redirect|discourage|end|chitchat>",
332
+ "observation": "<what you observed about the student>"
333
+ }}"""
334
+
335
+
336
+ # ─────────────────────────────────────────────────────────────────────────────
337
+ # JSON parser (robust β€” handles markdown fences, partial JSON, etc.)
338
+ # ─────────────────────────────────────────────────────────────────────────────
339
+ def _parse_llm_output(raw: str) -> dict:
340
+ text = raw.strip()
341
+
342
+ # Strip markdown code fences if present
343
+ if "```" in text:
344
+ for part in text.split("```"):
345
+ part = part.strip()
346
+ if part.startswith("json"):
347
+ part = part[4:].strip()
348
+ try:
349
+ return json.loads(part)
350
+ except json.JSONDecodeError:
351
+ continue
352
+
353
+ # Direct parse
354
+ try:
355
+ return json.loads(text)
356
+ except json.JSONDecodeError:
357
+ pass
358
+
359
+ # Locate first { ... } block
360
+ start = text.find("{")
361
+ end = text.rfind("}") + 1
362
+ if start != -1 and end > start:
363
+ try:
364
+ return json.loads(text[start:end])
365
+ except json.JSONDecodeError:
366
+ pass
367
+
368
+ log.warning("Could not parse JSON from model output. Using raw text as response.")
369
+ return {
370
+ "intent": "questions",
371
+ "response": raw,
372
+ "stage_updates": [],
373
+ "thought": "",
374
+ "action": "answer",
375
+ "observation": "json_parse_failed",
376
+ }
377
+
378
+
379
+ # ─────────────────────────────────────────────────────────────────────────────
380
+ # State updater
381
+ # ─────────────────────────────────────────────────────────────────────────────
382
+ def _apply_state_updates(
383
+ lp: dict,
384
+ parsed: dict,
385
+ user_msg: str,
386
+ ai_msg: str,
387
+ ) -> dict:
388
+ lp = copy.deepcopy(lp)
389
+
390
+ # Chat history β€” append new turn
391
+ history = lp.setdefault("chat_history", [])
392
+ new_id = (history[-1]["chat_id"] + 1) if history else 1
393
+ history.append({
394
+ "chat_id": new_id,
395
+ "user_input": user_msg,
396
+ "system_output": ai_msg,
397
+ })
398
+
399
+ # Scratchpad β€” append new entry
400
+ scratch = lp.setdefault("scratchpad", [])
401
+ scratch.append({
402
+ "chat_id": new_id,
403
+ "thought": parsed.get("thought", ""),
404
+ "action": parsed.get("action", ""),
405
+ "action_input": user_msg,
406
+ "observation": parsed.get("observation", ""),
407
+ })
408
+
409
+ # Assessment stages β€” apply stage_updates from model
410
+ current_learning = lp.get("assessment_stages", {}).get("current_learning", [])
411
+ valid_statuses = {"complete", "Not_Complete"}
412
+
413
+ for upd in parsed.get("stage_updates", []):
414
+ for item in current_learning:
415
+ if item.get("topic") == upd.get("topic"):
416
+ for obj in item.get("learning_objectives", []):
417
+ if obj.get("goal") == upd.get("goal"):
418
+ for stage in ("teach", "re_teach", "show_and_tell", "assess"):
419
+ val = upd.get(stage)
420
+ if val in valid_statuses:
421
+ obj[stage] = val
422
+
423
+ lp.setdefault("assessment_stages", {})["current_learning"] = current_learning
424
+ return lp
425
+
426
+
427
+ # ─────────────────────────────────────────────────────────────────────────────
428
+ # FastAPI application
429
+ # ─────────────────────────────────────────────────────────────────────────────
430
+ _fastapi = FastAPI(
431
+ title="Maria Learning Service",
432
+ description="AI tutoring API powered by Qwen3.5-2B with ZeroGPU.",
433
+ version="1.0.0",
434
+ docs_url="/docs",
435
+ redoc_url="/redoc",
436
+ )
437
+
438
+
439
+ class ChatRequest(BaseModel):
440
+ learning_path: dict[str, Any]
441
+ query: dict[str, Any]
442
+
443
+
444
+ @_fastapi.get("/health", tags=["Utility"])
445
+ async def health():
446
+ return {"status": "ok", "model": LLM_MODEL_ID, "zerogpu": _ZEROGPU}
447
+
448
+
449
+ @_fastapi.post("/chat", tags=["Tutor"])
450
+ async def chat(request: Request, body: ChatRequest):
451
+ # ── 1. Authentication ───────────────────────────────────────────────────
452
+ auth_code = request.headers.get("auth_code")
453
+ cf_token = request.headers.get("cf-turnstile-token")
454
+
455
+ authenticated = False
456
+ if auth_code:
457
+ authenticated = _check_auth_code(auth_code)
458
+ elif cf_token:
459
+ authenticated = await _check_turnstile(cf_token)
460
+
461
+ if not authenticated:
462
+ raise HTTPException(status_code=403, detail="Forbidden")
463
+
464
+ # ── 2. Validate request body ────────────────────────────────────────────
465
+ lp = body.learning_path
466
+ msg = body.query.get("request_message", "").strip()
467
+ if not msg:
468
+ raise HTTPException(status_code=422, detail="request_message must not be empty")
469
+
470
+ board = lp.get("board", "").strip()
471
+ cls = lp.get("class", "").strip()
472
+ subject = lp.get("subject", "").strip()
473
+
474
+ if not all([board, cls, subject]):
475
+ raise HTTPException(
476
+ status_code=422,
477
+ detail="learning_path must contain board, class, and subject",
478
+ )
479
+
480
+ # ── 3. Load dataset files from HF Hub ───────────────────────────────────
481
+ try:
482
+ config, faiss_index, metadata = _load_dataset(board, cls, subject)
483
+ except Exception as exc:
484
+ log.error("Dataset load error: %s", exc)
485
+ raise HTTPException(
486
+ status_code=500,
487
+ detail=f"Could not load dataset for {board}/{cls}/{subject}: {exc}",
488
+ )
489
+
490
+ # ── 4. RAG retrieval ────────────────────────────────────────────────────
491
+ try:
492
+ rag_chunks = _rag_search(msg, config, faiss_index, metadata)
493
+ except Exception as exc:
494
+ log.warning("RAG search failed (%s) β€” continuing without context", exc)
495
+ rag_chunks = []
496
+
497
+ # ── 5. Build prompt and run LLM ─────────────────────────────────────────
498
+ system_prompt = _build_system_prompt(lp, rag_chunks)
499
+ user_prompt = f"Student message: {msg}"
500
+
501
+ try:
502
+ raw_output = run_inference(system_prompt, user_prompt)
503
+ except Exception as exc:
504
+ log.error("Inference error: %s", exc)
505
+ raise HTTPException(status_code=500, detail=f"Inference failed: {exc}")
506
+
507
+ # ── 6. Parse structured output ──────────────────────────────────────────
508
+ parsed = _parse_llm_output(raw_output)
509
+ ai_text = parsed.get("response", raw_output).strip()
510
+
511
+ # ── 7. Text-to-speech ───────────────────────────────────────────────────
512
+ audio_b64 = _tts_to_b64(ai_text)
513
+
514
+ # ── 8. Update learning path state ───────────────────────────────────────
515
+ updated_lp = _apply_state_updates(lp, parsed, msg, ai_text)
516
+
517
+ # ── 9. Return response ──────────────────────────────────────────────────
518
+ return JSONResponse({
519
+ "learning_path": updated_lp,
520
+ "query": {
521
+ "response_message": {
522
+ "text": ai_text,
523
+ "visual": "No",
524
+ "visual_content": "",
525
+ "audio_output": audio_b64,
526
+ }
527
+ },
528
+ })
529
+
530
+
531
+ # ─────────────────────────────────────────────────────────────────────────────
532
+ # Gradio shim
533
+ # Required so the HF Spaces Gradio SDK runner detects a live Gradio app and
534
+ # ZeroGPU's @spaces.GPU decorator registers correctly.
535
+ # All actual functionality is in the FastAPI routes above.
536
+ # ─────────────────────────────────────────────────────────────────────────────
537
+ with gr.Blocks(title="Maria Learning Service") as _gradio_ui:
538
+ gr.Markdown(
539
+ """
540
+ ## Maria Learning Service
541
+ This Space exposes a **REST API** β€” it is not a chat UI.
542
+
543
+ | Endpoint | Method | Description |
544
+ |---|---|---|
545
+ | `/chat` | POST | Main tutoring endpoint |
546
+ | `/health` | GET | Health check |
547
+ | `/docs` | GET | Swagger UI |
548
+
549
+ Authenticate via `auth_code` header or `cf-turnstile-token` header.
550
+ """
551
+ )
552
+
553
+ # Mount Gradio UI at /ui β€” keeps FastAPI routes at root level
554
+ app = gr.mount_gradio_app(_fastapi, _gradio_ui, path="/ui")
555
+
556
+
557
+ # ─────────────────────────────────────────────────────────────────────────────
558
+ # Entry point
559
+ # HF Spaces runs `python app.py` which triggers this block.
560
+ # uvicorn starts on 0.0.0.0:7860 (the port HF Spaces expects).
561
+ # ─────────────────────────────────────────────────────────────────────────────
562
+ if __name__ == "__main__":
563
+ import uvicorn
564
+ uvicorn.run(
565
+ "app:app",
566
+ host="0.0.0.0",
567
+ port=7860,
568
+ log_level="info",
569
+ workers=1, # Single worker β€” ZeroGPU requires this
570
+ )
requirements.txt ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ── Web framework ─────────────────────────────────────────────────────────────
2
+ fastapi==0.115.6
3
+ uvicorn[standard]==0.32.1
4
+ pydantic==2.10.3
5
+ python-multipart==0.0.20
6
+ httpx==0.28.1
7
+
8
+ # ── HuggingFace ecosystem ─────────────────────────────────────────────────────
9
+ # transformers: unpinned upper bound so Qwen3.5-2B tokenizer is always supported
10
+ huggingface-hub>=0.27.0
11
+ transformers>=4.50.0
12
+ tokenizers>=0.21.0
13
+ safetensors>=0.5.0
14
+ accelerate>=1.3.0
15
+
16
+ # ── Quantisation ──────────────────────────────────────────────────────────────
17
+ bitsandbytes>=0.45.0
18
+
19
+ # ── Embeddings ────────────────────────────────────────────────────────────────
20
+ sentence-transformers>=3.3.0
21
+
22
+ # ── Vector search ─────────────────────────────────────────────────────────────
23
+ faiss-cpu>=1.9.0
24
+
25
+ # ── Data ──────────────────────────────────────────────────────────────────────
26
+ pandas>=2.2.0
27
+ pyarrow>=14.0.0
28
+ numpy>=1.26.0,<2.0.0
29
+
30
+ # ── Audio ─────────────────────────────────────────────────────────────────────
31
+ gTTS>=2.5.0
32
+
33
+ # ── ZeroGPU (pre-installed in HF Spaces; listed for local dev) ────────────────
34
+ # spaces # auto-installed by HF Spaces runner β€” do NOT pin; omit if local