davda54 commited on
Commit
ddffe2a
·
verified ·
1 Parent(s): c5e760f

read guidelines from a file

Browse files
Files changed (1) hide show
  1. app.py +1 -91
app.py CHANGED
@@ -129,97 +129,7 @@ class Soft(Base):
129
  )
130
 
131
 
132
- guideline = """
133
- ## Overview
134
-
135
- This document provides guidelines for evaluating the fluency of Norwegian responses generated by language models. Annotators will compare pairs of responses (Response A and Response B) and determine which response demonstrates better fluency, or if they are equally fluent. The evaluation focuses exclusively on language quality, naturalness, and grammaticality.
136
-
137
- ## Key principle
138
-
139
- **Fluency evaluation is strictly limited to linguistic quality.** Do NOT consider:
140
- - Factual accuracy or correctness
141
- - Completeness of information
142
- - Creativity or originality
143
- - Formatting or structure (unless it affects readability)
144
- - Length or conciseness
145
-
146
- ## Definitions
147
-
148
- ### What is fluency?
149
-
150
- Fluency refers to the linguistic quality of text that makes it natural, smooth, and easy to read. A fluent response:
151
-
152
- - **Grammatically correct**: Follows standard grammar rules with proper syntax
153
- - **Natural-sounding**: Reads like something a native speaker would write
154
- - **Coherent**: Maintains logical flow between sentences and paragraphs
155
- - **Well-formed**: Uses appropriate vocabulary, punctuation, and sentence structure
156
- - **Smooth**: Flows naturally without awkward phrasing or jarring transitions
157
- - **Norwegian**: The models respond to Norwegian prompts and so they should always be either in Norwegian Bokmål or Norwegian Nynorsk
158
-
159
- ### Fluency issues to look for
160
-
161
- When evaluating fluency, pay attention to:
162
-
163
- 1. **Grammar errors**: Subject-verb disagreement, incorrect tense, wrong word forms
164
- 2. **Awkward phrasing**: Unnatural word order, stilted expressions, robotic language
165
- 3. **Punctuation problems**: Missing or incorrect punctuation that affects readability
166
- 4. **Word choice issues**: Inappropriate vocabulary, incorrect word usage, repetitive language
167
- 5. **Flow disruptions**: Abrupt transitions, disconnected ideas within sentences
168
- 6. **Spelling errors**: Typos and misspellings that affect readability
169
- 7. **Translationese**: A common problem of language models is that they base their output on English -- the majority language in the language corpus. This can result in unnatural language patterns that look like literal translations from English, such as: TODO
170
-
171
- ## Annotation procedure
172
-
173
- ### Step-by-Step process
174
-
175
- 1. **Read both responses completely** without making immediate judgments
176
- 2. **Focus solely on language quality** - ignore content accuracy and relevance
177
- 3. **Identify fluency issues** in each response using the criteria above
178
- 4. **Compare the severity and frequency** of fluency issues between responses
179
- 5. **Make your decision** based on overall fluency
180
-
181
- ### Decision options
182
-
183
- You must select one of three options:
184
-
185
- - **A is more fluent**: Response A has better overall language quality than Response B
186
- - **B is more fluent**: Response B has better overall language quality than Response A
187
- - **Equal fluency**: Both responses have similar language quality (minor differences that don't clearly favor either response)
188
-
189
- ### Important guidelines
190
-
191
- - **Minor differences matter**: Even small improvements in fluency should influence your decision
192
- - **Be consistent**: Apply the same standards across all evaluations
193
- - **When in doubt about equality**: If you cannot decisively determine which is better after careful analysis, select "Equal fluency"
194
-
195
- ## Examples
196
-
197
- ### Example 1: Clear fluency difference
198
-
199
- TODO
200
-
201
- ### Example 2: Equal fluency
202
-
203
- TODO
204
-
205
- ### Example 3: Subtle fluency difference
206
-
207
- TODO
208
-
209
- ### Example 4: Content vs. fluency
210
-
211
- TODO
212
-
213
- ## Edge cases and special considerations
214
-
215
- TODO
216
-
217
- **Technical or specialized language**: Technical terminology and domain-specific language should be considered fluent if used correctly and consistently, even if it might seem less natural to a general audience.
218
-
219
- **Formatting issues**: Ignore formatting differences (bold, italics, bullet points) unless they directly impact readability or sentence structure.
220
-
221
- **Code or mathematical expressions**: If responses contain code snippets or mathematical expressions, evaluate only the fluency of the natural language portions.
222
- """
223
 
224
  # Configuration for the output dataset
225
  ANNOTATIONS_REPO = "ltg/fluency-annotations" # Change to your repo name
 
129
  )
130
 
131
 
132
+ guideline = open("guidelines.md").read().strip()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
133
 
134
  # Configuration for the output dataset
135
  ANNOTATIONS_REPO = "ltg/fluency-annotations" # Change to your repo name