Spaces:

st192011
/

Bitnet-Socratic-1-Bit

Running

App Files Files Community

st192011 commited on 3 days ago

Commit

dbdd36b

verified ·

1 Parent(s): 6f0c92d

Update app.py

Browse files

Files changed (1) hide show

app.py +6 -7

app.py CHANGED Viewed

@@ -14,7 +14,7 @@ MODEL_PATH = "models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf"
 DEFAULT_SYSTEM_PROMPT = (
     "You are a Socratic assistant. Do not answer questions directly. "
     "Instead, respond exclusively with 3 deep, reflective questions. "
-    "Then generate stop token"
 )
 # ==============================================================================
@@ -51,9 +51,8 @@ def streaming_chat(user_query, system_prompt):
     # These are the markers our Python function uses to slice the text
     stop_markers = [
-        "Stop token", "stop token",
-        "Stop.", "stop.",
-        "Response:", "Response",
         "Assistant:"
     ]
@@ -128,7 +127,7 @@ We deployed `microsoft/bitnet-b1.58-2B-4T-gguf`. While this preserved its founda
 #### The Stop-Token Anchor Hack
 To enforce structure, we modified the System Prompt to force the model to declare its own stopping point:
-> *"You are a Socratic assistant... Respond exclusively with 3 deep, reflective questions. Then generate stop token"*
 This instruction forces the text-prediction engine to anchor itself on a predictable phrase. While the model still experiences trailing hallucinations, it prints a recognizable marker *immediately after* providing the high-quality questions.
@@ -158,9 +157,9 @@ with gr.Blocks(theme=gr.themes.Soft()) as demo:
                     gr.Markdown("### 🛠️ The \"Stop Token\" Hack")
                     gr.Markdown(
                         "**Base models don't know how to stop talking!**\n\n"
-                        "To prevent infinite loops, our system prompt instructs the model to literally type the words `Stop token` when it is finished. "
                         "Our Python backend uses a **Lookahead Buffer** to watch for those words. If it sees them, it instantly slices them out and kills the engine.\n\n"
-                        "*🧪 Try deleting the words `'Then generate stop token'` from the prompt below and see what happens!*"
                     )
                 with gr.Column(scale=2):

 DEFAULT_SYSTEM_PROMPT = (
     "You are a Socratic assistant. Do not answer questions directly. "
     "Instead, respond exclusively with 3 deep, reflective questions. "
+    "Then generate %^%^%^"
 )
 # ==============================================================================
     # These are the markers our Python function uses to slice the text
     stop_markers = [
+        "%^%^%^",
+        "User:",
         "Assistant:"
     ]
 #### The Stop-Token Anchor Hack
 To enforce structure, we modified the System Prompt to force the model to declare its own stopping point:
+> *"You are a Socratic assistant... Respond exclusively with 3 deep, reflective questions. Then generate %^%^%^"*
 This instruction forces the text-prediction engine to anchor itself on a predictable phrase. While the model still experiences trailing hallucinations, it prints a recognizable marker *immediately after* providing the high-quality questions.
                     gr.Markdown("### 🛠️ The \"Stop Token\" Hack")
                     gr.Markdown(
                         "**Base models don't know how to stop talking!**\n\n"
+                        "To prevent infinite loops, our system prompt instructs the model to literally type the words `%^%^%^` when it is finished. "
                         "Our Python backend uses a **Lookahead Buffer** to watch for those words. If it sees them, it instantly slices them out and kills the engine.\n\n"
+                        "*🧪 Try deleting the words `'Then generate %^%^%^'` from the prompt below and see what happens!*"
                     )
                 with gr.Column(scale=2):