GitHub Actions
Sync from GitHub
0be7900
---
title: Distractor Annotation Tool
emoji: 🎯
colorFrom: purple
colorTo: indigo
sdk: streamlit
sdk_version: 1.36.0
app_file: app.py
pinned: false
---
# 🎯 Distractor Annotation Tool
Collaborative annotation GUI for the MSc NLP research project **"Keeping LLMs on Track in Task-Oriented Dialogue"**.
## One-time Setup
### 1. Create a private HF Dataset repo for shared annotations
Go to [huggingface.co/new-dataset](https://huggingface.co/new-dataset), make it **private**, and note the repo ID (e.g. `yourgroup/distractor-annotations`).
### 2. Set secrets in your HF Space
In your Space β†’ Settings β†’ Repository secrets, add:
| Secret | Value |
|---|---|
| `HF_TOKEN` | Your HF token with **write** access |
| `ANNOTATIONS_REPO_ID` | e.g. `yourgroup/distractor-annotations` |
### 3. Set secrets in your GitHub repo
In GitHub β†’ Settings β†’ Secrets and variables β†’ Actions, add the same `HF_TOKEN`.
### 4. Update the sync workflow
In `.github/workflows/sync_to_hf.yml`, replace `YOUR_HF_USERNAME` and `YOUR_SPACE_NAME` with your actual values.
### 5. Import seed data
On first run, go to the **Dashboard** and click **Import Seed Data** to populate the shared repo with the group's initial entries.
## Workflow
| Page | Purpose |
|---|---|
| 🏠 Dashboard | Stats overview, seed import, config check |
| πŸ“š Browse | Explore the base nvidia dataset and seed entries |
| ✏️ Annotate | Create multi-turn distractor entries |
| πŸ‘₯ Annotations | View, edit, review all group work |
| πŸ’¬ Test LLM | Send distractors to a live LLM, judge if it gets distracted |
## Annotation Schema
Each annotation follows the `nvidia/CantTalkAboutThis` schema, extended with:
- `distractors_multiturn`: rich multi-turn distractor sequences
- `_id`, `_annotator`, `_review_status`, `_created_at`, `_updated_at`
- `_llm_test_results`: logged results from the Test LLM page
## Base Dataset
[nvidia/CantTalkAboutThis-Topic-Control-Dataset](https://huggingface.co/datasets/nvidia/CantTalkAboutThis-Topic-Control-Dataset)
## Related Papers
- [2024.findings-emnlp.713](https://aclanthology.org/2024.findings-emnlp.713)
- [arXiv:2511.05018](https://arxiv.org/abs/2511.05018)