Spaces:
Sleeping
Sleeping
read guidelines from a file
Browse files
app.py
CHANGED
|
@@ -129,97 +129,7 @@ class Soft(Base):
|
|
| 129 |
)
|
| 130 |
|
| 131 |
|
| 132 |
-
guideline = ""
|
| 133 |
-
## Overview
|
| 134 |
-
|
| 135 |
-
This document provides guidelines for evaluating the fluency of Norwegian responses generated by language models. Annotators will compare pairs of responses (Response A and Response B) and determine which response demonstrates better fluency, or if they are equally fluent. The evaluation focuses exclusively on language quality, naturalness, and grammaticality.
|
| 136 |
-
|
| 137 |
-
## Key principle
|
| 138 |
-
|
| 139 |
-
**Fluency evaluation is strictly limited to linguistic quality.** Do NOT consider:
|
| 140 |
-
- Factual accuracy or correctness
|
| 141 |
-
- Completeness of information
|
| 142 |
-
- Creativity or originality
|
| 143 |
-
- Formatting or structure (unless it affects readability)
|
| 144 |
-
- Length or conciseness
|
| 145 |
-
|
| 146 |
-
## Definitions
|
| 147 |
-
|
| 148 |
-
### What is fluency?
|
| 149 |
-
|
| 150 |
-
Fluency refers to the linguistic quality of text that makes it natural, smooth, and easy to read. A fluent response:
|
| 151 |
-
|
| 152 |
-
- **Grammatically correct**: Follows standard grammar rules with proper syntax
|
| 153 |
-
- **Natural-sounding**: Reads like something a native speaker would write
|
| 154 |
-
- **Coherent**: Maintains logical flow between sentences and paragraphs
|
| 155 |
-
- **Well-formed**: Uses appropriate vocabulary, punctuation, and sentence structure
|
| 156 |
-
- **Smooth**: Flows naturally without awkward phrasing or jarring transitions
|
| 157 |
-
- **Norwegian**: The models respond to Norwegian prompts and so they should always be either in Norwegian Bokmål or Norwegian Nynorsk
|
| 158 |
-
|
| 159 |
-
### Fluency issues to look for
|
| 160 |
-
|
| 161 |
-
When evaluating fluency, pay attention to:
|
| 162 |
-
|
| 163 |
-
1. **Grammar errors**: Subject-verb disagreement, incorrect tense, wrong word forms
|
| 164 |
-
2. **Awkward phrasing**: Unnatural word order, stilted expressions, robotic language
|
| 165 |
-
3. **Punctuation problems**: Missing or incorrect punctuation that affects readability
|
| 166 |
-
4. **Word choice issues**: Inappropriate vocabulary, incorrect word usage, repetitive language
|
| 167 |
-
5. **Flow disruptions**: Abrupt transitions, disconnected ideas within sentences
|
| 168 |
-
6. **Spelling errors**: Typos and misspellings that affect readability
|
| 169 |
-
7. **Translationese**: A common problem of language models is that they base their output on English -- the majority language in the language corpus. This can result in unnatural language patterns that look like literal translations from English, such as: TODO
|
| 170 |
-
|
| 171 |
-
## Annotation procedure
|
| 172 |
-
|
| 173 |
-
### Step-by-Step process
|
| 174 |
-
|
| 175 |
-
1. **Read both responses completely** without making immediate judgments
|
| 176 |
-
2. **Focus solely on language quality** - ignore content accuracy and relevance
|
| 177 |
-
3. **Identify fluency issues** in each response using the criteria above
|
| 178 |
-
4. **Compare the severity and frequency** of fluency issues between responses
|
| 179 |
-
5. **Make your decision** based on overall fluency
|
| 180 |
-
|
| 181 |
-
### Decision options
|
| 182 |
-
|
| 183 |
-
You must select one of three options:
|
| 184 |
-
|
| 185 |
-
- **A is more fluent**: Response A has better overall language quality than Response B
|
| 186 |
-
- **B is more fluent**: Response B has better overall language quality than Response A
|
| 187 |
-
- **Equal fluency**: Both responses have similar language quality (minor differences that don't clearly favor either response)
|
| 188 |
-
|
| 189 |
-
### Important guidelines
|
| 190 |
-
|
| 191 |
-
- **Minor differences matter**: Even small improvements in fluency should influence your decision
|
| 192 |
-
- **Be consistent**: Apply the same standards across all evaluations
|
| 193 |
-
- **When in doubt about equality**: If you cannot decisively determine which is better after careful analysis, select "Equal fluency"
|
| 194 |
-
|
| 195 |
-
## Examples
|
| 196 |
-
|
| 197 |
-
### Example 1: Clear fluency difference
|
| 198 |
-
|
| 199 |
-
TODO
|
| 200 |
-
|
| 201 |
-
### Example 2: Equal fluency
|
| 202 |
-
|
| 203 |
-
TODO
|
| 204 |
-
|
| 205 |
-
### Example 3: Subtle fluency difference
|
| 206 |
-
|
| 207 |
-
TODO
|
| 208 |
-
|
| 209 |
-
### Example 4: Content vs. fluency
|
| 210 |
-
|
| 211 |
-
TODO
|
| 212 |
-
|
| 213 |
-
## Edge cases and special considerations
|
| 214 |
-
|
| 215 |
-
TODO
|
| 216 |
-
|
| 217 |
-
**Technical or specialized language**: Technical terminology and domain-specific language should be considered fluent if used correctly and consistently, even if it might seem less natural to a general audience.
|
| 218 |
-
|
| 219 |
-
**Formatting issues**: Ignore formatting differences (bold, italics, bullet points) unless they directly impact readability or sentence structure.
|
| 220 |
-
|
| 221 |
-
**Code or mathematical expressions**: If responses contain code snippets or mathematical expressions, evaluate only the fluency of the natural language portions.
|
| 222 |
-
"""
|
| 223 |
|
| 224 |
# Configuration for the output dataset
|
| 225 |
ANNOTATIONS_REPO = "ltg/fluency-annotations" # Change to your repo name
|
|
|
|
| 129 |
)
|
| 130 |
|
| 131 |
|
| 132 |
+
guideline = open("guidelines.md").read().strip()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 133 |
|
| 134 |
# Configuration for the output dataset
|
| 135 |
ANNOTATIONS_REPO = "ltg/fluency-annotations" # Change to your repo name
|