Spaces:

ltg
/

fluency-annotation

Sleeping

App Files Files Community

davda54 commited on Aug 12, 2025

Commit

a0f4c64

verified ·

1 Parent(s): f3cd202

Update app.py

Browse files

Files changed (1) hide show

app.py +109 -28

app.py CHANGED Viewed

@@ -5,6 +5,101 @@ from datetime import datetime
 from typing import Dict, List, Tuple
 import hashlib
 # Dummy dataset - replace with actual HuggingFace dataset loading
 DUMMY_DATASET = [
     {
@@ -227,42 +322,28 @@ custom_css = """
 """
 # Create Gradio interface
-with gr.Blocks(theme=gr.themes.Soft(), title="Dataset Annotation Tool", css=custom_css) as app:
     gr.Markdown("# Norwegian Fluency Annotation")
     with gr.Accordion("▶ Click here to see the full annotation guidelines:", open=False):
-            gr.Markdown("""
-            ## Detailed Information
-            This content is hidden by default and can be expanded.
-            - Point 1
-            - Point 2
-            - Point 3
-            You can put any Gradio components here, including:
-            - Markdown
-            - Code blocks
-            - Images
-            - Interactive components
-            """, padding=True)
     user_state = gr.State("")
     # Login Interface
-    with gr.Group(visible=True, elem_id="login-group") as login_interface:
-        gr.Markdown("## Login", padding=True)
-        user_id_input = gr.Textbox(
-            label="Enter your unique annotator ID to begin",
-            placeholder="Annotator ID"
-        )
-    with gr.Row():
-        login_btn = gr.Button("Login", variant="primary", scale=0.2, min_width=100)
-        gr.HTML("")
-    login_status = gr.Markdown("", padding=True)
     # Annotation Interface
-    with gr.Group(visible=False, elem_id="annotation-group") as annotation_interface:
         progress_label = gr.Markdown("")
         # Row 1: Prompt

 from typing import Dict, List, Tuple
 import hashlib
+guideline = """
+## Overview
+This document provides guidelines for evaluating the fluency of Norwegian responses generated by language models. Annotators will compare pairs of responses (Response A and Response B) and determine which response demonstrates better fluency, or if they are equally fluent. The evaluation focuses exclusively on language quality, naturalness, and grammaticality.
+## Key principle
+**Fluency evaluation is strictly limited to linguistic quality.** Do NOT consider:
+- Factual accuracy or correctness
+- Completeness of information
+- Creativity or originality
+- Formatting or structure (unless it affects readability)
+- Length or conciseness
+## Definitions
+### What is fluency?
+Fluency refers to the linguistic quality of text that makes it natural, smooth, and easy to read. A fluent response:
+- **Grammatically correct**: Follows standard grammar rules with proper syntax
+- **Natural-sounding**: Reads like something a native speaker would write
+- **Coherent**: Maintains logical flow between sentences and paragraphs
+- **Well-formed**: Uses appropriate vocabulary, punctuation, and sentence structure
+- **Smooth**: Flows naturally without awkward phrasing or jarring transitions
+- **Norwegian**: The models respond to Norwegian prompts and so they should always be either in Norwegian Bokmål or Norwegian Nynorsk
+### Fluency issues to look for
+When evaluating fluency, pay attention to:
+1. **Grammar errors**: Subject-verb disagreement, incorrect tense, wrong word forms
+2. **Awkward phrasing**: Unnatural word order, stilted expressions, robotic language
+3. **Punctuation problems**: Missing or incorrect punctuation that affects readability
+4. **Word choice issues**: Inappropriate vocabulary, incorrect word usage, repetitive language
+5. **Sentence structure problems**: Run-on sentences, fragments, unclear pronoun references
+6. **Flow disruptions**: Abrupt transitions, disconnected ideas within sentences
+7. **Spelling errors**: Typos and misspellings that affect readability
+8. **Translationese**: A common problem of language models is that they base their output on English -- the majority language in the language corpus. This can result in unnatural language patterns that look like literal translations from English, such as: TODO
+## Annotation procedure
+### Step-by-Step process
+1. **Read both responses completely** without making immediate judgments
+2. **Focus solely on language quality** - ignore content accuracy and relevance
+3. **Identify fluency issues** in each response using the criteria above
+4. **Compare the severity and frequency** of fluency issues between responses
+5. **Make your decision** based on overall fluency
+### Decision options
+You must select one of three options:
+- **A is more fluent**: Response A has better overall language quality than Response B
+- **B is more fluent**: Response B has better overall language quality than Response A
+- **Equal fluency**: Both responses have similar language quality (minor differences that don't clearly favor either response)
+### Important guidelines
+- **Minor differences matter**: Even small improvements in fluency should influence your decision
+- **Consider overall impression**: Multiple minor issues may outweigh a single major issue
+- **Be consistent**: Apply the same standards across all evaluations
+- **When in doubt about equality**: If you cannot decisively determine which is better after careful analysis, select "Equal fluency"
+## Examples
+### Example 1: Clear fluency difference
+TODO
+### Example 2: Equal fluency
+TODO
+### Example 3: Subtle fluency difference
+TODO
+### Example 4: Content vs. fluency
+TODO
+## Edge cases and special considerations
+TODO
+**Technical or specialized language**: Technical terminology and domain-specific language should be considered fluent if used correctly and consistently, even if it might seem less natural to a general audience.
+**Formatting issues**: Ignore formatting differences (bold, italics, bullet points) unless they directly impact readability or sentence structure.
+**Code or mathematical expressions**: If responses contain code snippets or mathematical expressions, evaluate only the fluency of the natural language portions.
+"""
 # Dummy dataset - replace with actual HuggingFace dataset loading
 DUMMY_DATASET = [
     {
 """
 # Create Gradio interface
+with gr.Blocks(theme=gr.themes.Soft(font=[gr.themes.GoogleFont("Source Sans Pro"), "Arial"]), title="Dataset Annotation Tool", css=custom_css) as app:
     gr.Markdown("# Norwegian Fluency Annotation")
     with gr.Accordion("▶ Click here to see the full annotation guidelines:", open=False):
+        gr.Markdown(guideline, padding=True)
     user_state = gr.State("")
     # Login Interface
+    with gr.Column(visible=True, elem_id="login-group") as login_interface:
+        with gr.Group():
+            gr.Markdown("## Login", padding=True)
+            user_id_input = gr.Textbox(
+                label="Enter your unique annotator ID to begin",
+                placeholder="Annotator ID"
+            )
+        with gr.Row():
+            login_btn = gr.Button("Login", variant="primary", scale=0.2, min_width=100)
+            gr.HTML("")
+        login_status = gr.Markdown("", padding=True)
     # Annotation Interface
+    with gr.Column(visible=False, elem_id="annotation-group") as annotation_interface:
         progress_label = gr.Markdown("")
         # Row 1: Prompt