davda54 commited on
Commit
eaf95d3
·
verified ·
1 Parent(s): 8df895a

Update guidelines.md

Browse files
Files changed (1) hide show
  1. guidelines.md +12 -24
guidelines.md CHANGED
@@ -1,28 +1,14 @@
1
  ## Overview
2
 
3
- This document provides guidelines for evaluating the fluency of Norwegian responses generated by language models. Annotators will compare pairs of responses (Response A and Response B) and determine which response demonstrates better fluency, or if they are equally fluent. The evaluation focuses exclusively on language quality, naturalness, and grammaticality.
4
 
5
- ## Key principle
6
-
7
- **Fluency evaluation is strictly limited to linguistic quality.** Do NOT consider:
8
- - Factual accuracy or correctness
9
- - Completeness of information
10
- - Creativity or originality
11
- - Formatting or structure (unless it affects readability)
12
- - Length or conciseness
13
 
14
  ## Definitions
15
 
16
  ### What is fluency?
17
 
18
- Fluency refers to the linguistic quality of text that makes it natural, smooth, and easy to read. A fluent response:
19
-
20
- - **Grammatically correct**: Follows standard grammar rules with proper syntax
21
- - **Natural-sounding**: Reads like something a native speaker would write
22
- - **Coherent**: Maintains logical flow between sentences and paragraphs
23
- - **Well-formed**: Uses appropriate vocabulary, punctuation, and sentence structure
24
- - **Smooth**: Flows naturally without awkward phrasing or jarring transitions
25
- - **Norwegian**: The models respond to Norwegian prompts and so they should always be either in Norwegian Bokmål or Norwegian Nynorsk
26
 
27
  ### Fluency issues to look for
28
 
@@ -31,18 +17,18 @@ When evaluating fluency, pay attention to:
31
  1. **Grammar errors**: agreement errors (e.g. adjective-noun or determiner-noun disagreement), incorrect verb tense, incorrect word order (violating V2 requirement), wrong word forms
32
  2. **Awkward phrasing**: Unnatural word order, stilted expressions, robotic language
33
  3. **Punctuation problems**: Missing or incorrect punctuation that affects readability
34
- 4. **Word choice issues**: Inappropriate vocabulary, incorrect word usage, repetitive language
35
  5. **Flow disruptions**: Abrupt transitions, disconnected ideas within sentences
36
- 6. **Spelling errors**: Typos and misspellings that affect readability
37
  7. **Translationese**: A common problem of language models is that they base their output on English -- the majority language in the language corpus. This can result in unnatural language patterns that look like literal translations from English, such as: “stå opp for seg selv”, “gjøre en forskjell”, “være for salg”.
38
 
39
  ## Annotation procedure
40
 
41
  ### Step-by-Step process
42
 
43
- 1. **Read both responses completely** without making immediate judgments
44
- 2. **Focus solely on language quality** - ignore content accuracy and relevance
45
- 3. **Identify fluency issues** in each response using the criteria above
46
  4. **Compare the severity and frequency** of fluency issues between responses
47
  5. **Make your decision** based on overall fluency
48
 
@@ -52,13 +38,13 @@ You must select one of three options:
52
 
53
  - **A is more fluent**: Response A has better overall language quality than Response B
54
  - **B is more fluent**: Response B has better overall language quality than Response A
55
- - **Equal fluency**: Both responses have similar language quality (minor differences that don't clearly favor either response)
56
 
57
  ### Important guidelines
58
 
59
  - **Minor differences matter**: Even small improvements in fluency should influence your decision
60
  - **Be consistent**: Apply the same standards across all evaluations
61
- - **When in doubt about equality**: If you cannot decisively determine which is better after careful analysis, select "Equal fluency"
62
 
63
  ## Examples
64
 
@@ -82,6 +68,8 @@ TODO
82
 
83
  TODO
84
 
 
 
85
  **Technical or specialized language**: Technical terminology and domain-specific language should be considered fluent if used correctly and consistently, even if it might seem less natural to a general audience.
86
 
87
  **Formatting issues**: Ignore formatting differences (bold, italics, bullet points) unless they directly impact readability or sentence structure.
 
1
  ## Overview
2
 
3
+ This document provides guidelines for evaluating the fluency of responses generated by Norwegian language models. Annotators will compare pairs of responses (Response A and Response B) and determine which response demonstrates better fluency, or if they are equally fluent.
4
 
5
+ The evaluation focuses exclusively on language quality, naturalness, and grammaticality. Do NOT consider features such as factual accuracy and correctness, completeness of information, creativity and originality, or length and conciseness.
 
 
 
 
 
 
 
6
 
7
  ## Definitions
8
 
9
  ### What is fluency?
10
 
11
+ Fluency refers to the linguistic quality of text that makes it natural, smooth, and easy to read. It should look like a text written by a native speaker. A fluent text should consistently use either Bokmål or Nynorsk (depending on the prompt), and should sound genuinely Norwegian rather than as it were translated from another language.
 
 
 
 
 
 
 
12
 
13
  ### Fluency issues to look for
14
 
 
17
  1. **Grammar errors**: agreement errors (e.g. adjective-noun or determiner-noun disagreement), incorrect verb tense, incorrect word order (violating V2 requirement), wrong word forms
18
  2. **Awkward phrasing**: Unnatural word order, stilted expressions, robotic language
19
  3. **Punctuation problems**: Missing or incorrect punctuation that affects readability
20
+ 4. **Word choice issues**: Inappropriate vocabulary, incorrect word usage, repetitive language, wrong use of idioms or phrases, incorrect spacing of formation of compound words ("kaffe kopp" vs "kaffekopp"), preposition errors ("på" vs "i")
21
  5. **Flow disruptions**: Abrupt transitions, disconnected ideas within sentences
22
+ 6. **Spelling errors**: Typos and misspellings, wrong capitalization, incorrect use of diacritics (e.g. "å" vs "a", "ø" vs "o")
23
  7. **Translationese**: A common problem of language models is that they base their output on English -- the majority language in the language corpus. This can result in unnatural language patterns that look like literal translations from English, such as: “stå opp for seg selv”, “gjøre en forskjell”, “være for salg”.
24
 
25
  ## Annotation procedure
26
 
27
  ### Step-by-Step process
28
 
29
+ 1. **Read the prompt**: Do not analyze the fluency of the prompt, but look at it to understand the context and language style.
30
+ 2. **Read both responses completely** without making immediate judgments
31
+ 3. **Identify fluency issues** in each response using the criteria above, ignore content accuracy and relevance
32
  4. **Compare the severity and frequency** of fluency issues between responses
33
  5. **Make your decision** based on overall fluency
34
 
 
38
 
39
  - **A is more fluent**: Response A has better overall language quality than Response B
40
  - **B is more fluent**: Response B has better overall language quality than Response A
41
+ - **Equally fluent**: Both responses have similar language quality (minor differences that don't clearly favor either response)
42
 
43
  ### Important guidelines
44
 
45
  - **Minor differences matter**: Even small improvements in fluency should influence your decision
46
  - **Be consistent**: Apply the same standards across all evaluations
47
+ - **When in doubt about equality**: If you cannot decisively determine which is better after careful analysis, select "Equally fluent"
48
 
49
  ## Examples
50
 
 
68
 
69
  TODO
70
 
71
+ **Other language than Norwegian**: If one of the responses is in a different language (e.g. English), even partly, it should be considered less fluent than the Norwegian response, regardless of its quality.
72
+
73
  **Technical or specialized language**: Technical terminology and domain-specific language should be considered fluent if used correctly and consistently, even if it might seem less natural to a general audience.
74
 
75
  **Formatting issues**: Ignore formatting differences (bold, italics, bullet points) unless they directly impact readability or sentence structure.