rafmacalaba commited on
Commit
f6b1e29
ยท
1 Parent(s): 69b2704

docs: add annotator guide for the Data Use Annotation Tool

Browse files
Files changed (1) hide show
  1. ANNOTATOR_GUIDE.md +162 -0
ANNOTATOR_GUIDE.md ADDED
@@ -0,0 +1,162 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ๐Ÿ“– Data Use Annotation Tool โ€” Annotator Guide
2
+
3
+ Welcome! This guide explains how to use the **Data Use Annotation Tool** to review and annotate data/dataset mentions in documents.
4
+
5
+ ---
6
+
7
+ ## 1. Getting Started
8
+
9
+ ### Signing In
10
+ 1. Open the tool โ€” you'll see a login screen
11
+ 2. Click **๐Ÿค— Sign in with HuggingFace**
12
+ 3. Authorize with your HuggingFace account
13
+ 4. You'll be redirected to the tool showing your assigned documents
14
+
15
+ > **Note:** Only accounts listed in the annotator configuration will see documents. If you see no documents after logging in, contact the admin.
16
+
17
+ ---
18
+
19
+ ## 2. Interface Overview
20
+
21
+ The tool has two main panels:
22
+
23
+ | Panel | Purpose |
24
+ |-------|---------|
25
+ | **Left โ€” PDF Viewer** | Shows the original PDF for the current page |
26
+ | **Right โ€” Markdown Annotation** | Shows extracted text with highlighted data mentions |
27
+
28
+ ### Top Bar
29
+ - **Title** โ€” "Data Use Annotation Tool"
30
+ - **Progress Bar** โ€” Overall annotation progress across all corpora
31
+ - **User Badge** โ€” Your HuggingFace username
32
+ - **๐Ÿ“Š Leaderboard** โ€” See annotation stats for all annotators
33
+
34
+ ### Document Selector
35
+ - Dropdown at top-left showing your assigned documents
36
+ - Documents are labeled by corpus: **[World Bank]**, **[UNHCR]**, etc.
37
+ - Format: `[Corpus] Doc N (X pages)`
38
+
39
+ ---
40
+
41
+ ## 3. Page Navigation
42
+
43
+ At the bottom of the screen you'll find the page navigator:
44
+
45
+ ```
46
+ โฎ โ† Prev | Page 3 โ— (3 / 11) | Next โ†’ โญ
47
+ ```
48
+
49
+ | Button | Action |
50
+ |--------|--------|
51
+ | **โ† Prev / Next โ†’** | Move one page at a time |
52
+ | **โฎ / โญ** | Jump to the previous/next page that has data mentions |
53
+ | **โ— (green dot)** | Indicates the current page has AI-detected data mentions |
54
+
55
+ All pages are shown, including those without mentions. Use the jump buttons to quickly navigate to pages of interest.
56
+
57
+ ---
58
+
59
+ ## 4. Understanding Data Mentions
60
+
61
+ The AI model pre-detects potential dataset mentions in the text. Each mention is highlighted with a color based on its **tag**:
62
+
63
+ | Color | Tag | Meaning |
64
+ |-------|-----|---------|
65
+ | ๐ŸŸข Green | **Named** | A specific, named dataset (e.g. "2022 National Census") |
66
+ | ๐ŸŸก Amber | **Descriptive** | A described but not formally named dataset (e.g. "a household survey") |
67
+ | ๐ŸŸฃ Purple | **Vague** | An unclear or ambiguous data reference |
68
+ | โšช Gray | **Non-Dataset** | Flagged by the model but not actually a dataset |
69
+
70
+ A **legend** above the text shows the count of each type on the current page.
71
+
72
+ ---
73
+
74
+ ## 5. Reviewing Existing Mentions (Validation)
75
+
76
+ Click the **toggle button (โ€น)** on the right edge to open the **Data Mentions** side panel. For each AI-detected mention you can:
77
+
78
+ ### Validate
79
+ 1. Click **Validate** on a mention
80
+ 2. Optionally add notes explaining your decision
81
+ 3. Choose one of:
82
+ - โœ… **Correct** โ€” The mention is a real dataset
83
+ - โŒ **Wrong** โ€” The mention is not a dataset (false positive)
84
+
85
+ ### Change Tag
86
+ - Click the **tag badge** (e.g. "Named") to edit it
87
+ - Select the correct tag from the dropdown
88
+ - Click **Save** to update
89
+
90
+ ### Delete
91
+ - Click **๐Ÿ—‘ Delete** to remove a false mention
92
+ - Click again to confirm (auto-cancels after 3 seconds)
93
+
94
+ ### Status Indicators
95
+ - **"Needs review"** โ€” Not yet validated by you
96
+ - **"โœ“ verified"** / **"โœ— rejected"** โ€” Your validation result
97
+ - A checkmark appears next to validated mentions
98
+
99
+ ---
100
+
101
+ ## 6. Adding New Annotations
102
+
103
+ If you spot a dataset mention that the AI missed:
104
+
105
+ 1. **Select the text** โ€” Click and drag to highlight the dataset name in the markdown preview
106
+ 2. **Click "โœ๏ธ Annotate Selection"** โ€” The annotation modal will appear
107
+ 3. **Choose a Dataset Tag**:
108
+ - **Named Dataset** โ€” A specific named dataset
109
+ - **Descriptive** โ€” A described but unnamed dataset
110
+ - **Vague** โ€” An ambiguous reference
111
+ 4. **Click "Save Annotation"** โ€” Your annotation is saved
112
+
113
+ > **Tip:** If no text is selected when you click the button, it will shake to remind you to select text first.
114
+
115
+ ---
116
+
117
+ ## 7. Page Workflow
118
+
119
+ For each page, the recommended workflow is:
120
+
121
+ 1. **Read** the markdown text on the right while referencing the PDF on the left
122
+ 2. **Review** each highlighted mention โ€” validate or reject in the side panel
123
+ 3. **Add** any missed mentions using text selection
124
+ 4. **Move** to the next page (a warning appears if you have unvalidated mentions)
125
+
126
+ ### Unvalidated Mentions Warning
127
+ When moving to the next page with unvalidated mentions, you'll see:
128
+ > โš ๏ธ You have N unverified data mention(s) on this page. Do you want to proceed?
129
+
130
+ You can proceed or go back to finish validating.
131
+
132
+ ---
133
+
134
+ ## 8. Tips & Best Practices
135
+
136
+ - **Use the PDF** for context โ€” the markdown is extracted text and may have formatting issues
137
+ - **Jump buttons (โฎ/โญ)** let you skip pages without mentions quickly
138
+ - **Pages without mentions** may still contain datasets the AI missed โ€” browse them when possible
139
+ - **Validate everything** on a page before moving on for the most efficient workflow
140
+ - **Be precise** when selecting text for new annotations โ€” select just the dataset name, not surrounding context
141
+
142
+ ---
143
+
144
+ ## 9. FAQ
145
+
146
+ **Q: Can I undo a validation?**
147
+ A: Click "Validate" again to re-validate with a different verdict.
148
+
149
+ **Q: What if the markdown text doesn't match the PDF?**
150
+ A: This can happen with complex layouts (tables, figures). Annotate based on what you can read. The PDF is the source of truth.
151
+
152
+ **Q: Why are some pages empty?**
153
+ A: Some pages (like cover pages or blank pages) may have no extracted text. Use the jump buttons to skip them.
154
+
155
+ **Q: Who sees my annotations?**
156
+ A: Annotations are stored centrally. Admins and other annotators with access to the same documents may see your work.
157
+
158
+ ---
159
+
160
+ ## Need Help?
161
+
162
+ Contact the project admin if you encounter issues or have questions about specific annotation decisions.