Spaces:

ltg
/

fluency-annotation

Sleeping

App Files Files Community

davda54 commited on Aug 18, 2025

Commit

ddffe2a

verified ·

1 Parent(s): c5e760f

read guidelines from a file

Browse files

Files changed (1) hide show

app.py +1 -91

app.py CHANGED Viewed

@@ -129,97 +129,7 @@ class Soft(Base):
         )
-guideline = """
-## Overview
-This document provides guidelines for evaluating the fluency of Norwegian responses generated by language models. Annotators will compare pairs of responses (Response A and Response B) and determine which response demonstrates better fluency, or if they are equally fluent. The evaluation focuses exclusively on language quality, naturalness, and grammaticality.
-## Key principle
-**Fluency evaluation is strictly limited to linguistic quality.** Do NOT consider:
-- Factual accuracy or correctness
-- Completeness of information
-- Creativity or originality
-- Formatting or structure (unless it affects readability)
-- Length or conciseness
-## Definitions
-### What is fluency?
-Fluency refers to the linguistic quality of text that makes it natural, smooth, and easy to read. A fluent response:
-- **Grammatically correct**: Follows standard grammar rules with proper syntax
-- **Natural-sounding**: Reads like something a native speaker would write
-- **Coherent**: Maintains logical flow between sentences and paragraphs
-- **Well-formed**: Uses appropriate vocabulary, punctuation, and sentence structure
-- **Smooth**: Flows naturally without awkward phrasing or jarring transitions
-- **Norwegian**: The models respond to Norwegian prompts and so they should always be either in Norwegian Bokmål or Norwegian Nynorsk
-### Fluency issues to look for
-When evaluating fluency, pay attention to:
-1. **Grammar errors**: Subject-verb disagreement, incorrect tense, wrong word forms
-2. **Awkward phrasing**: Unnatural word order, stilted expressions, robotic language
-3. **Punctuation problems**: Missing or incorrect punctuation that affects readability
-4. **Word choice issues**: Inappropriate vocabulary, incorrect word usage, repetitive language
-5. **Flow disruptions**: Abrupt transitions, disconnected ideas within sentences
-6. **Spelling errors**: Typos and misspellings that affect readability
-7. **Translationese**: A common problem of language models is that they base their output on English -- the majority language in the language corpus. This can result in unnatural language patterns that look like literal translations from English, such as: TODO
-## Annotation procedure
-### Step-by-Step process
-1. **Read both responses completely** without making immediate judgments
-2. **Focus solely on language quality** - ignore content accuracy and relevance
-3. **Identify fluency issues** in each response using the criteria above
-4. **Compare the severity and frequency** of fluency issues between responses
-5. **Make your decision** based on overall fluency
-### Decision options
-You must select one of three options:
-- **A is more fluent**: Response A has better overall language quality than Response B
-- **B is more fluent**: Response B has better overall language quality than Response A
-- **Equal fluency**: Both responses have similar language quality (minor differences that don't clearly favor either response)
-### Important guidelines
-- **Minor differences matter**: Even small improvements in fluency should influence your decision
-- **Be consistent**: Apply the same standards across all evaluations
-- **When in doubt about equality**: If you cannot decisively determine which is better after careful analysis, select "Equal fluency"
-## Examples
-### Example 1: Clear fluency difference
-TODO
-### Example 2: Equal fluency
-TODO
-### Example 3: Subtle fluency difference
-TODO
-### Example 4: Content vs. fluency
-TODO
-## Edge cases and special considerations
-TODO
-**Technical or specialized language**: Technical terminology and domain-specific language should be considered fluent if used correctly and consistently, even if it might seem less natural to a general audience.
-**Formatting issues**: Ignore formatting differences (bold, italics, bullet points) unless they directly impact readability or sentence structure.
-**Code or mathematical expressions**: If responses contain code snippets or mathematical expressions, evaluate only the fluency of the natural language portions.
-"""
 # Configuration for the output dataset
 ANNOTATIONS_REPO = "ltg/fluency-annotations"  # Change to your repo name

         )
+guideline = open("guidelines.md").read().strip()
 # Configuration for the output dataset
 ANNOTATIONS_REPO = "ltg/fluency-annotations"  # Change to your repo name