Spaces:
Running
Running
| # ๐ Data Use Annotation Tool โ Annotator Guide | |
| Welcome! This guide explains how to use the **Data Use Annotation Tool** to review and annotate data/dataset mentions in documents. | |
| --- | |
| ## 1. Getting Started | |
| ### Signing In | |
| 1. Open the tool โ you'll see a login screen | |
| 2. Click **๐ค Sign in with HuggingFace** | |
| 3. Authorize with your HuggingFace account | |
| 4. You'll be redirected to the tool showing your assigned documents | |
| > **Note:** Only accounts listed in the annotator configuration will see documents. If you see no documents after logging in, contact the admin. | |
| --- | |
| ## 2. Interface Overview | |
| The tool has two main panels: | |
| | Panel | Purpose | | |
| |-------|---------| | |
| | **Left โ PDF Viewer** | Shows the original PDF for the current page | | |
| | **Right โ Markdown Annotation** | Shows extracted text with highlighted data mentions | | |
| ### Top Bar | |
| - **Title** โ "Data Use Annotation Tool" | |
| - **Progress Bar** โ Overall annotation progress across all corpora | |
| - **User Badge** โ Your HuggingFace username | |
| - **๐ Leaderboard** โ See annotation stats for all annotators | |
| ### Document Selector | |
| - Dropdown at top-left showing your assigned documents | |
| - Documents are labeled by corpus: **[World Bank]**, **[UNHCR]**, etc. | |
| - Format: `[Corpus] Doc N (X pages)` | |
| --- | |
| ## 3. Page Navigation | |
| At the bottom of the screen you'll find the page navigator: | |
| ``` | |
| โฎ โ Prev | Page 3 โ (3 / 11) | Next โ โญ | |
| ``` | |
| | Button | Action | | |
| |--------|--------| | |
| | **โ Prev / Next โ** | Move one page at a time | | |
| | **โฎ / โญ** | Jump to the previous/next page that has data mentions | | |
| | **โ (green dot)** | Indicates the current page has AI-detected data mentions | | |
| All pages are shown, including those without mentions. Use the jump buttons to quickly navigate to pages of interest. | |
| --- | |
| ## 4. Understanding Data Mentions | |
| The AI model pre-detects potential dataset mentions in the text. Each mention is highlighted with a color based on its **tag**: | |
| | Color | Tag | Meaning | | |
| |-------|-----|---------| | |
| | ๐ข Green | **Named** | A specific, named dataset (e.g. "2022 National Census") | | |
| | ๐ก Amber | **Descriptive** | A described but not formally named dataset (e.g. "a household survey") | | |
| | ๐ฃ Purple | **Vague** | An unclear or ambiguous data reference | | |
| | โช Gray | **Non-Dataset** | Flagged by the model but not actually a dataset | | |
| A **legend** above the text shows the count of each type on the current page. | |
| --- | |
| ## 5. Reviewing Existing Mentions (Validation) | |
| Click the **toggle button (โน)** on the right edge to open the **Data Mentions** side panel. For each AI-detected mention you can: | |
| ### Validate | |
| 1. Click **Validate** on a mention | |
| 2. Optionally add notes explaining your decision | |
| 3. Choose one of: | |
| - โ **Correct** โ The mention is a real dataset | |
| - โ **Wrong** โ The mention is not a dataset (false positive) | |
| ### Change Tag | |
| - Click the **tag badge** (e.g. "Named") to edit it | |
| - Select the correct tag from the dropdown | |
| - Click **Save** to update | |
| ### Delete | |
| - Click **๐ Delete** to remove a false mention | |
| - Click again to confirm (auto-cancels after 3 seconds) | |
| ### Status Indicators | |
| - **"Needs review"** โ Not yet validated by you | |
| - **"โ verified"** / **"โ rejected"** โ Your validation result | |
| - A checkmark appears next to validated mentions | |
| --- | |
| ## 6. Adding New Annotations | |
| If you spot a dataset mention that the AI missed: | |
| 1. **Select the text** โ Click and drag to highlight the dataset name in the markdown preview | |
| 2. **Click "โ๏ธ Annotate Selection"** โ The annotation modal will appear | |
| 3. **Choose a Dataset Tag**: | |
| - **Named Dataset** โ A specific named dataset | |
| - **Descriptive** โ A described but unnamed dataset | |
| - **Vague** โ An ambiguous reference | |
| 4. **Click "Save Annotation"** โ Your annotation is saved | |
| > **Tip:** If no text is selected when you click the button, it will shake to remind you to select text first. | |
| --- | |
| ## 7. Page Workflow | |
| For each page, the recommended workflow is: | |
| 1. **Read** the markdown text on the right while referencing the PDF on the left | |
| 2. **Review** each highlighted mention โ validate or reject in the side panel | |
| 3. **Add** any missed mentions using text selection | |
| 4. **Move** to the next page (a warning appears if you have unvalidated mentions) | |
| ### Unvalidated Mentions Warning | |
| When moving to the next page with unvalidated mentions, you'll see: | |
| > โ ๏ธ You have N unverified data mention(s) on this page. Do you want to proceed? | |
| You can proceed or go back to finish validating. | |
| --- | |
| ## 8. Tips & Best Practices | |
| - **Use the PDF** for context โ the markdown is extracted text and may have formatting issues | |
| - **Jump buttons (โฎ/โญ)** let you skip pages without mentions quickly | |
| - **Pages without mentions** may still contain datasets the AI missed โ browse them when possible | |
| - **Validate everything** on a page before moving on for the most efficient workflow | |
| - **Be precise** when selecting text for new annotations โ select just the dataset name, not surrounding context | |
| --- | |
| ## 9. FAQ | |
| **Q: Can I undo a validation?** | |
| A: Click "Validate" again to re-validate with a different verdict. | |
| **Q: What if the markdown text doesn't match the PDF?** | |
| A: This can happen with complex layouts (tables, figures). Annotate based on what you can read. The PDF is the source of truth. | |
| **Q: Why are some pages empty?** | |
| A: Some pages (like cover pages or blank pages) may have no extracted text. Use the jump buttons to skip them. | |
| **Q: Who sees my annotations?** | |
| A: Annotations are stored centrally. Admins and other annotators with access to the same documents may see your work. | |
| --- | |
| ## Need Help? | |
| Contact the project admin if you encounter issues or have questions about specific annotation decisions. | |