Spaces:

KiWA001
/

kai-api-gateway

Sleeping

App Files Files Community

kai-api-gateway / test_regex.py

KiWA001

feat: add Gemini browser provider

d999280 2 months ago

raw

history blame contribute delete

4.33 kB


	import re

	# User's exact example (Step 1116)
	# Note: JSON string had \n escaped. Python string formatting:
	user_example = """Thought Process
	Considering the user's simple greeting, I need to respond with an appropriate friendly opening. A warm welcome is essential, so I'll start with "Hello!" as it sets a positive tone. Given my role as a GLM language model trained by Z.ai, I should introduce my purpose: being a helpful assistant capable of answering diverse questions. Including a prompt for their needs next would encourage engagement, like asking "How can I assist you today?" This matches my identity and keeps the interaction open-ended for their response. I'll maintain a cheerful and professional demeanor throughout to build rapport.
	Hello! I'm GLM, a large language model trained by Z.ai. I'm here to assist with information, answer questions, or just chat about topics that interest you.
	How can I help you today? Whether you need information, creative ideas, or just want to have a conversation, I'm happy to engage."""

	# My previous example (double newline)
	double_newline_example = """Thought Process

	The user said "hi there".
	This is a simple greeting.

	Hello! How can I help you today?"""

	def test_clean(text, name):
	print(f"--- {name} ---")
	# Regex Strategy:
	# Match "Thought Process" at start.
	# Match any content (non-greedy) including newlines.
	# Stop at the last newline before the Answer?

	# User said: "remove all the way to the forward slash n"
	# This implies there is a newline separating Thoughts and Answer.

	# Attempt 1: Remove `Thought Process` + `\s+` + `Content` + `\n`
	# How do we know what is content vs answer?
	# Heuristic: Answer usually starts with Hello, Yes, The, etc.
	# Thoughts are usually "Considering...", "Analysis:".

	# What if we just split by first newline that matches a Lookahead?
	# No.

	# If the text starts with "Thought Process", we assume the FIRST chunk is thoughts.
	# If there is a newline, we split on it.

	# Let's try splitting by `\n`.
	parts = text.split('\n')
	# Use index 0: "Thought Process".
	# Index 1: "Considering..." (The thought body).
	# Index 2: "Hello! ..." (The answer).

	# So for User Example:
	# we want to drop parts[0] and parts[1].
	# Return parts[2:].

	# For Double Newline Example:
	# "Thought Process"
	# ""
	# "The user said..."
	# "This is..."
	# ""
	# "Hello!..."

	# This looks like we want to drop up to the last empty string?

	# Revised Logic based on User Feedback:
	# "Remove the whole thought process all the way to the forward slash n"
	# This assumes the thought process is ONE block followed by \n.

	clean = text.strip()

	if clean.startswith("Thought Process"):
	# Find the split point.
	# If we assume Thought Process is the Header.
	# And the Next line is the Body.
	# And the Next line is the Answer.

	# Regex: `^Thought Process\s\n.?\n` matches Header + Body + Newline.
	# Result should be Answer.

	# NOTE: `.` does not match `\n` unless DOTALL.
	# But `.?\n` finds the first* newline after matching start?
	# "Considering ... rapport.\n" -> Match!

	# Let's try this specific regex.
	# Matches "Thought Process" + optional whitespace/newlines + content + newline.
	# Non-greedy `.*?` ensures we stop at the FIRST newline after the content begins?
	# Wait, "Thinking contents" might be multiline.

	# Let's try: Remove everything up to the first occurrence of `\n(?=[A-Z])`?

	# Simple Logic:
	# If single-newline separator (User example):
	# We want to remove the first 2 lines?
	# "Thought Process" (1)
	# "Considering..." (2)
	# Answer (3)

	# Let's try Regex: `r"^Thought Process.*?(\n\n\|\n)(?=[A-Z])"`?
	# Using DOTALL for content.

	cleaned = re.sub(r"^Thought Process.*?\n(?=[A-Z])", "", clean, flags=re.DOTALL \| re.MULTILINE).strip()
	print(f"REGEX OUTPUT:\n{cleaned}\n")
	return cleaned

	return clean

	test_clean(user_example, "User Example")
	test_clean(double_newline_example, "Double Line Example")