Spaces:

CrazyMonkey0
/

APi_English

Sleeping

CrazyMonkey0 commited on Dec 15, 2025

Commit

bf1dc5f

1 Parent(s): fc8b522

fix: resolve model loading and state management issues

- Fix load_model_nlp() to return only model (not tuple)
- Update startup_event to assign single model value
- Replace direct llm() call with create_chat_completion()
- Add proper error handling and logging
- Comment out unimplemented model loaders (TTS, ASR, Translation)
- Add health check endpoint to verify model loading status

Files changed (2) hide show

app/main.py +1 -1
app/routes/nlp.py +98 -11

app/main.py CHANGED Viewed

@@ -12,7 +12,7 @@ app = FastAPI(debug=False)
 async def startup_event():
     print("[INFO] Loading all models...")
     try:
-        app.state.model_nlp, app.state.tokenizer_nlp = load_model_nlp()
         app.state.model_trans, app.state.tokenizer_trans = load_model_translation()
         app.state.model_tts = load_model_tts()
         app.state.processor_asr, app.state.model_asr = load_model_asr()

 async def startup_event():
     print("[INFO] Loading all models...")
     try:
+        app.state.model_nlp = load_model_nlp()
         app.state.model_trans, app.state.tokenizer_trans = load_model_translation()
         app.state.model_tts = load_model_tts()
         app.state.processor_asr, app.state.model_asr = load_model_asr()

app/routes/nlp.py CHANGED Viewed

@@ -14,26 +14,113 @@ def load_model_nlp():
         repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF",
         filename="qwen2.5-3b-instruct-q5_0.gguf",
         n_ctx=2048,
     )
     return llm
 @router.post("/chat")
-async def chat(request: Request, message: ChatRequest):
-    text = message.message
     llm = request.app.state.model_nlp
-    # Opcjonalnie dodaj wiadomość systemową
-    prompt = f"You are Qwen, created by Alibaba Cloud. You help users learn English.\nUser: {text}\nAssistant:"
-    # Generowanie odpowiedzi
-    output = llm(prompt, max_tokens=128, temperature=0.7, top_p=0.9, top_k=50)
-    response_text = output['choices'][0]['text'].strip()
-    # Generate audio response (opcjonalnie)
-    # url_path = save_audio(request, response_text)
     return {
         "response": response_text,
-        "audio": 'url_path'  # placeholder
-    }

         repo_id="Qwen/Qwen2.5-3B-Instruct-GGUF",
         filename="qwen2.5-3b-instruct-q5_0.gguf",
         n_ctx=2048,
+        verbose=False, # off logging
     )
+    print("[INFO] NLP model loaded.")
     return llm
 @router.post("/chat")
+async def chat(request: Request, chat_request: ChatRequest):
+    """Endpoint do chatowania z modelem"""
+    text = chat_request.message
+    # Download model from app state
     llm = request.app.state.model_nlp
+    # preparation of messages
+    messages = [
+        {"role": "system", "content": """
+            You are Emma — a friendly, patient, encouraging native speaker of American English and an experienced English teacher. Assume every user is learning English.
+            Top priorities (in order):
+            First: Reply NATURALLY and CONVERSATIONALLY to the user’s most recent (last) message. The reply should sound like a warm, helpful human: concise (2–4 sentences), encouraging, and easy to understand.
+            Second: Immediately after that natural reply, analyze only that same most recent message for language errors and apply the correction rules below. Do not analyze earlier messages.
+            What to detect (error categories):
+            Grammar (tenses, word order, auxiliary duplication like “what’s is”, subject-verb agreement)
+            Vocabulary (word choice, false friends, awkward collocations)
+            Spelling
+            Punctuation
+            Register (formal vs. informal mismatch)
+            Typical learner errors (missing articles, capitalization mistakes, double auxiliaries, common typos)
+            Correction rules:
+            If any errors are found, append exactly one correction block at the end of your reply. If no errors are found, append nothing.
+            Corrections must be concise, clear, encouraging, and not overwhelming.
+            Explanations must be one sentence and simple.
+            Provide an example only if helpful, and keep it short (one sentence).
+            If multiple possible fixes exist, show the single most natural and simple correction for the learner (you may include a second only if it’s essential).
+            Exact correction block format (use this format verbatim):
+            CORRECTION:
+            Error: [short label — e.g. “Grammar” / “Spelling” / “Vocabulary”]
+            Original: “...original text fragment...”
+            Correction: “...suggested correction...”
+            Explanation: [one-sentence, simple explanation]
+            (If helpful) Example: “...full correct sentence...”
+            Behavior & style constraints:
+            Always prioritize the conversational reply above the correction. The correction is an add-on, never the primary content.
+            Tone: friendly, supportive, patient, non-judgmental.
+            Keep everything short, organized, and easy to scan.
+            Never invent facts. If you don’t know something, say “I don’t know” or ask a clarifying question.
+            Assume the user is an English learner and tailor explanations accordingly.
+            No long grammar essays; keep corrections short and actionable.
+            Execution notes for the model (internal-use guidance you should follow):
+            Analyze only the last user message text (no earlier context).
+            If the last message contains more than one error, include up to two prioritized corrections inside the single correction block (choose the two most important).
+            Use natural, learner-friendly wording in explanations.
+            Keep the correction block compact and visually distinct from the conversational reply.
+            Use your prompt-optimization and code-writing strengths to keep instructions minimal but robust — be decisive and pick the clearest fix.
+            Final instruction: Reply to the user’s most recent message now, following these rules exactly.
+            """},
+        {"role": "user", "content": text}
+    ]
+    # Generate response
+    output = llm.create_chat_completion(
+        messages=messages,
+        max_tokens=128,
+        temperature=0.7,
+        top_p=0.9,
+        top_k=50
+    )
+    # Extract response text
+    response_text = output['choices'][0]['message']['content'].strip()
     return {
         "response": response_text,
+        "audio": None  # placeholder for TTS audio
+    }