Spaces:

Luigi
/

tiny-scribe

Running

Luigi Claude Sonnet 4.5 commited on Jan 30

Commit

62499af

1 Parent(s): fc5ac33

Fix: Support both <think> and <thinking> tag formats in parser

- Update regex pattern to match both <think> and <thinking> tags
- Fixes issue where models using <think> tags had all output in Thinking field
- Summary field now correctly displays content outside thinking blocks
- Applied to both app.py (Gradio) and summarize_transcript.py (CLI)
- Updated CLAUDE.md documentation

Resolves: Summary output remaining empty when model uses <think> tags

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Files changed (3) hide show

CLAUDE.md +6 -4
app.py +3 -1
summarize_transcript.py +5 -3

CLAUDE.md CHANGED Viewed

@@ -135,14 +135,16 @@ def summarize_streaming(...) -> Generator[Tuple[str, str], None, None]:
 ### Thinking Block Parsing
-Models may wrap reasoning in special tags that should be separated from final output:
-- **CLI version** expects: `<think>reasoning</think>`
-- **Gradio version** expects: `<thinking>reasoning</thinking>`
 Regex pattern:
 ```python
-pattern = r'<thinking>(.*?)</thinking>'
 matches = re.findall(pattern, content, re.DOTALL)
 thinking = '\n\n'.join(match.strip() for match in matches)
 summary = re.sub(pattern, '', content, flags=re.DOTALL).strip()

 ### Thinking Block Parsing
+Models may wrap reasoning in special tags that should be separated from final output.
+**Both versions now support both tag formats:**
+- `<think>reasoning</think>` (common with Qwen models)
+- `<thinking>reasoning</thinking>` (Claude-style)
 Regex pattern:
 ```python
+# Matches both <think> and <thinking> tags
+pattern = r'<think(?:ing)?>(.*?)</think(?:ing)?>'
 matches = re.findall(pattern, content, re.DOTALL)
 thinking = '\n\n'.join(match.strip() for match in matches)
 summary = re.sub(pattern, '', content, flags=re.DOTALL).strip()

app.py CHANGED Viewed

@@ -61,6 +61,7 @@ def load_model():
 def parse_thinking_blocks(content: str) -> Tuple[str, str]:
     """
     Parse thinking blocks from model output.
     Args:
         content: Full model response
@@ -68,7 +69,8 @@ def parse_thinking_blocks(content: str) -> Tuple[str, str]:
     Returns:
         Tuple of (thinking_content, summary_content)
     """
-    pattern = r'<thinking>(.*?)</thinking>'
     matches = re.findall(pattern, content, re.DOTALL)
     if not matches:

 def parse_thinking_blocks(content: str) -> Tuple[str, str]:
     """
     Parse thinking blocks from model output.
+    Supports both <think> and <thinking> tags.
     Args:
         content: Full model response
     Returns:
         Tuple of (thinking_content, summary_content)
     """
+    # Match both <think> and <thinking> tags
+    pattern = r'<think(?:ing)?>(.*?)</think(?:ing)?>'
     matches = re.findall(pattern, content, re.DOTALL)
     if not matches:

summarize_transcript.py CHANGED Viewed

@@ -36,17 +36,19 @@ def read_transcript(file_path):
 def parse_thinking_blocks(content: str) -> Tuple[str, str]:
     """
-    Parse thinking blocks from Qwen3 model output.
     Args:
         content: Full model response containing thinking blocks and summary
     Returns:
         Tuple of (thinking_content, summary_content)
-        - thinking_content: All text between <think> tags (or empty string)
         - summary_content: All text outside thinking blocks (or full content if no tags)
     """
-    pattern = r'<think>(.*?)</think>'
     matches = re.findall(pattern, content, re.DOTALL)
     if not matches:

 def parse_thinking_blocks(content: str) -> Tuple[str, str]:
     """
+    Parse thinking blocks from model output.
+    Supports both <think> and <thinking> tags.
     Args:
         content: Full model response containing thinking blocks and summary
     Returns:
         Tuple of (thinking_content, summary_content)
+        - thinking_content: All text between <think>/<thinking> tags (or empty string)
         - summary_content: All text outside thinking blocks (or full content if no tags)
     """
+    # Match both <think> and <thinking> tags
+    pattern = r'<think(?:ing)?>(.*?)</think(?:ing)?>'
     matches = re.findall(pattern, content, re.DOTALL)
     if not matches: