--- title: QALoop Annotation Platform emoji: 📝 colorFrom: blue colorTo: green sdk: docker pinned: false --- # QA Annotation System The **Expert Validation Platform** in [QALoop](https://github.com/JackKuo666/QALoop) — a human-in-the-loop framework for large-scale agricultural QA construction and evaluation (ICDM). This web application supports **collaborative QA annotation and quality management**: multi-user task workflows, configurable annotation schemas, statistics and export, plus optional **LLM-assisted review of annotation notes**. It receives candidate QA pairs from upstream synthesis pipelines and converts expert judgments into structured feedback for pipeline iteration. ## Demo Usage This Space is a demo environment — **works out of the box with no manual Secrets configuration**. ### Login When you open the Space page, the login form is **pre-filled** with the default admin credentials. Click "Login" to start exploring. | Item | Default | |------|---------| | Admin username | `admin` | | Admin password | `123456` | ### Quick Demo Flow After login, sample projects and QA data are pre-loaded. You can: 1. **Browse projects / datasets** — View the pre-loaded "Test" project and QA pairs in the admin dashboard 2. **View annotation configs** — Explore existing rating, single-select, and other annotation dimensions 3. **Claim and annotate** — Switch to the user center to complete tasks 4. **View analysis / export** — Check annotation progress, LLM analysis reports, and export results To build from scratch, you can also create projects, configure schemas, and import JSON / CSV data yourself. ### Demo Sample Data (`seed/demo.sql`) On first HF Space startup with an empty database, `scripts/space_init.py` automatically imports `seed/demo.sql` (SQLite text dump, git-friendly; HF does not accept binary `.db` files). | Item | Details | |------|---------| | Pre-loaded content | 1 project, 3 datasets, 50 QA pairs, annotation configs, and sample annotation results | | LLM config | Pre-filled Base URL and `gpt-5.1`; API Key injected via `LLM_API_KEY` env var, **not written to sql** | | Runtime data | After import, written to `/data/annotations.db` inside the container; web UI changes only affect the runtime DB | | Relation to sql | **Web UI actions do not write back to `demo.sql`**; updating Demo data requires manual export and push | Example for updating Demo data: ```bash # Export from local runtime DB (remember to redact llm_api_key) sqlite3 ../data/annotations.db ".dump" > seed/demo.sql git add seed/demo.sql && git commit -m "Update demo seed" && git push space main ``` To disable auto-import: set environment variable `SEED_DEMO_DATA=false`. ### Registering Regular Users Currently in `production` mode: new users can self-register, but must be **manually enabled** by an admin in "User Management" before they can annotate. ### Demo Default Configuration | Setting | Default | Description | |---------|---------|-------------| | `SECRET_KEY` | `qaloop-demo-jwt-secret-key-32bytes` | JWT signing key (Demo only, do not use in production) | | `ADMIN_USERNAME` | `admin` | Admin auto-created on first startup | | `ADMIN_PASSWORD` | `123456` | Admin password | | `DB_DIR` | `/data` | Database storage directory | | `ENVIRONMENT` | `production` | New registrations require admin enablement | > **Security note**: The above are public Demo settings. Do not store real sensitive data. For production deployment, override `SECRET_KEY` and `ADMIN_PASSWORD` via HF Secrets. ## Deploy to Hugging Face Spaces 1. Create a Space on [Hugging Face](https://huggingface.co/new-space), SDK type **Docker** 2. Push the contents of `platform/` to the Space repository (you can upload only the platform subdirectory as the repo root) 3. **No Secrets needed for Demo** — build and run directly 4. For production deployment, override in Space **Settings → Repository secrets**: | Secret | Demo Default | Description | |--------|--------------|-------------| | `SECRET_KEY` | `qaloop-demo-jwt-secret-key-32bytes` | JWT key; use a random string in production | | `ADMIN_USERNAME` | `admin` | Admin username auto-created on first startup | | `ADMIN_PASSWORD` | `123456` | Admin password (at least 6 characters) | 5. Optional **Variables**: | Variable | Default | Description | |----------|---------|-------------| | `DB_DIR` | `/data` | SQLite data directory (Dockerfile defaults to `/data`) | | `ENVIRONMENT` | `production` | New registrations require admin enablement | | `LLM_BASE_URL` | `http://43.159.131.233:3001/v1` | LLM API Base URL (OpenAI-compatible) | | `LLM_MODEL_NAME` | `gpt-5.1` | LLM model name | 6. **LLM Analysis (optional)**: Set in Space **Settings → Repository secrets**: | Secret | Description | |--------|-------------| | `LLM_API_KEY` | LLM API Token (auto-written to system config on startup) | Demo pre-fills Base URL and model name for OpenAI-compatible calls: ```python from openai import OpenAI client = OpenAI( base_url="http://43.159.131.233:3001/v1", api_key="your-token", # Set LLM_API_KEY in HF Secrets ) # model: gpt-5.1 ``` 7. After the Space build completes, visit the page and log in with `admin` / `123456` > **Note**: On free Spaces, annotation data may be lost after sleep/restart if persistent storage is not enabled. Fine for Demo purposes; for production collaborative annotation, deploy on your own server. ## Highlights - **Projects and datasets** — Project-level organization with JSON / CSV import and export - **Flexible annotation configs** — Rating, category, text, single/multi-select, binary; optional reason and confidence fields - **Collaboration and permissions** — Superuser admin dashboard; regular users claim tasks and annotate; JWT auth - **Analysis and export** — Annotation progress and statistics; configurable simplified export - **Optional LLM** — OpenAI-compatible Chat API, default `http://43.159.131.233:3001/v1` + `gpt-5.1`; Token injected via `LLM_API_KEY` env var - **Seed questions** — Predefined question templates organized by type/subtype - **UI language** — Chinese and English (i18n) ## System Overview ![QALoop System Architecture](https://raw.githubusercontent.com/JackKuo666/QALoop/main/platform/seed/figure1.png) The diagram above shows the full QALoop workflow. This platform corresponds to the **Expert Validation Platform** stage — receiving candidate QA pairs from upstream synthesis pipelines and supporting human quality review, feedback, and export. This repository covers **data import → schema configuration → multi-user annotation → statistics/analysis → export**. It does not include built-in "automatic QA generation" or "automatic model scoring/evaluation" modules; these can be integrated via upstream/downstream systems. ```mermaid flowchart LR subgraph external [External Optional] Gen[QA Generation] Eval[Model Evaluation] end subgraph app [This System] Import[Import Dataset] Config[Annotation Config] Ann[Human Annotation] Stats[Statistics & Analysis] Export[Export Results] end Gen -.-> Import Import --> Config --> Ann --> Stats --> Export Export -.-> Eval ``` ## Configurable Schema Example Annotation dimensions and field names are defined by admins in **Annotation Config**, not hardcoded in the product. If your research uses multi-dimensional quality scales (e.g., factuality, completeness), implement them via multiple `score` / `category` configs. Below is an **illustrative JSON** (actual export structure depends on your config): ```json { "question": "What is drought resistance?", "answer": "Drought resistance refers to a plant's ability to maintain growth and yield under drought conditions.", "annotations": { "factuality_score": 5, "completeness_score": 4, "notes": "Correct explanation but could add mechanistic details" } } ``` ## Tech Stack - **Backend**: Python 3.12+ / FastAPI / SQLAlchemy / SQLite - **Frontend**: Vanilla HTML + JS + CSS (no framework) - **Auth**: JWT (SHA-256 client-side hashing) - **Package manager**: uv ## Requirements - Python >= 3.12 - [uv](https://docs.astral.sh/uv/) (Python package manager) ## Quick Start ### 1. Clone the Repository ```bash git clone cd qa-annotation ``` ### 2. Install Dependencies ```bash # Install uv (if not already installed) curl -LsSf https://astral.sh/uv/install.sh | sh # Create virtual environment and install all dependencies uv sync # For dev tools (pre-commit, etc.) uv sync --group dev ``` ### 3. Configure Environment Variables Create a `.env` file: ```bash cp .env.example .env ``` Then edit `.env` and **must update `SECRET_KEY`**: ```ini # Required SECRET_KEY=your-random-secret-key-here # Optional (defaults below) ENVIRONMENT=development HOST=0.0.0.0 PORT=8000 RELOAD=false TOKEN_EXPIRE_DAYS=7 ``` | Variable | Required | Description | Default | |----------|----------|-------------|---------| | `SECRET_KEY` | Yes | JWT key; must be a random string in production | - | | `ENVIRONMENT` | No | `development` or `production` | `development` | | `HOST` | No | Listen address | `0.0.0.0` | | `PORT` | No | Listen port | `8000` | | `RELOAD` | No | Enable hot reload (for development) | `false` | | `TOKEN_EXPIRE_DAYS` | No | Token expiration in days | `7` | > In production, newly registered users are disabled by default and require manual admin enablement. ### 4. Create Superuser ```bash # Interactive creation (recommended) python scripts/create_superuser.py # Create via command-line arguments python scripts/create_superuser.py --username admin --password yourpassword ``` ### 5. Start the Server ```bash # Option 1: Direct run uvicorn qa_annotate.main:app --reload --host 0.0.0.0 --port 8000 # Option 2: Via entry point (requires uv sync first) qa ``` After startup, visit `http://localhost:8000` and log in with the superuser account. API docs: `http://localhost:8000/docs`. ## User Guide ### Admin Workflow 1. **Login** — Log in with superuser account, enter admin dashboard 2. **Create project** — Set name and description 3. **Create annotation config** — Define task type (rating, category, text, single-select, multi-select, binary), and whether reason/confidence fields are required 4. **Link config to project** — Associate annotation configs with projects, with ordering support 5. **Import dataset** — Upload QA data in JSON or CSV format 6. **Manage users** — Create/enable/disable annotator accounts 7. **Configure LLM (optional)** — Fill in the following keys in "System Config" (stored in database, **not** `.env`): | Config Key | Description | |------------|-------------| | `llm_api_key` | LLM API Key | | `llm_base_url` | Base URL (e.g., `https://api.openai.com/v1`) | | `llm_model_name` | Model name (e.g., `gpt-4o`) | Use "Test Connection" to verify. Current implementation is mainly for **summarizing and analyzing annotation notes**; core annotation works without LLM config. 8. **View analysis** — Annotation progress, statistics, and (if configured) LLM analysis reports ### Annotator Workflow 1. **Register/Login** — In development, registered users can usually annotate immediately; in production, admin must enable the account 2. **Claim task** — Claim a dataset from "Available Tasks" 3. **Annotate** — Fill in fields according to config 4. **My tasks** — View and track progress ### Permissions (Summary) | Capability | Superuser | Regular User | |------------|-----------|--------------| | Admin dashboard (projects, datasets, users, system config, etc.) | Yes | No | | Claim tasks and submit annotations | Yes | Yes (must be enabled) | Superusers also have an active regular user identity and can participate in annotation when needed. ### Data Format **Import QA dataset (JSON)**: ```json [ { "question": "Question content", "answer": "Answer content" } ] ``` Extra fields are supported; specify which extra fields to display in project config. ## Project Structure ``` qa_annotate/ ├── api/ # FastAPI router modules │ ├── analysis.py # Annotation result analysis and LLM endpoints │ ├── annotation.py # Annotation configs and results │ ├── auth.py # Authentication and permissions │ ├── dataset.py # Dataset management │ ├── project.py # Project management │ ├── seed_question.py # Seed questions │ ├── system_config.py # System configuration │ └── user.py # User management ├── database/ # Database layer │ ├── base.py # Engine, session, initialization │ ├── models.py # SQLAlchemy ORM models │ └── crud.py # CRUD operations ├── schema/ # Pydantic request/response models ├── services/ # Business services (e.g., LLM calls) ├── utils/ # Utilities (password hashing, etc.) ├── config.py # Global config (pydantic-settings) ├── html/ # HTML pages ├── static/ │ ├── js/ # JavaScript │ ├── css/ # Stylesheets │ └── locales/ # i18n translation files └── main.py # Application entry point ``` ## Database Backup ```bash # Manual backup python scripts/backup_db.py # Scheduled backup (every 12 hours) python scripts/backup_db.py --schedule --interval 12 ``` ## Development ```bash # Install dev dependencies uv sync --group dev # Install pre-commit hooks pre-commit install # Lint and format ruff check --fix . ruff format . ``` ## Research and Citation This platform is the **Expert Validation Platform** component of QALoop. If you use it in a publication, please cite the QALoop ICDM paper and describe **annotation dimensions and guideline version**, **export format and field meanings**. See the [root README](https://github.com/JackKuo666/QALoop#citation) for BibTeX. ## Roadmap - Inter-annotator agreement (IAA) metrics and adjudication tools - Active learning or priority queue (integrated with task claiming) - Richer chain-of-thought / multi-segment annotation display (extend Schema and UI as needed) - Multimodal field display (if datasets include image URLs, etc.) ## License MIT