grading-answers / README.md
Fredrik Sitje
Enhance README.md with new indexing columns for categories and subcategories. Removed deprecated category order configuration file and updated Streamlit app to sort categories and subcategories based on their respective indices. This improves the display order and maintains consistency in data presentation.
c3069c3
metadata
title: Grading Answers
emoji: πŸš€
colorFrom: red
colorTo: red
sdk: docker
app_port: 8501
tags:
  - streamlit
pinned: false
short_description: A space for grading generated answers

Grading Answers App

A Streamlit application for grading AI-generated legal answers across multiple jurisdictions. The app connects to a private Hugging Face dataset repository to store user credentials and grading data.

Private Repository Usage

This app connects to the existing private Hugging Face dataset repository: TransLegal/grading-answers

Repository Structure

The app expects the following structure (jurisdictions are discovered automatically):

TransLegal/grading-answers/
β”œβ”€β”€ en-us/
β”‚   β”œβ”€β”€ grading_template.parquet
β”‚   └── users/
β”œβ”€β”€ hr-hr/
β”‚   β”œβ”€β”€ grading_template.parquet
β”‚   └── users/
└── [jurisdiction-code]/
    β”œβ”€β”€ grading_template.parquet
    └── users/

How It Works:

  • The app automatically discovers jurisdictions by scanning for subdirectories containing grading_template.parquet
  • Each jurisdiction has isolated user accounts and data
  • The users/ subdirectory is created automatically when the first user registers in that jurisdiction

Adding New Jurisdictions

To add a new jurisdiction to the repository:

  1. Create jurisdiction subdirectory in the TransLegal/grading-answers repository:

    • Format: {language-code}-{country-code} following ISO standards:
      • Language code: ISO 639-1 (2-letter lowercase, e.g., hr, en, sv)
      • Country code: ISO 3166-1 alpha-2 (2-letter lowercase, e.g., hr, us, se)
      • Combined format: lowercase language code, hyphen, lowercase country code (e.g., hr-hr, en-us, sv-se)
    • The app automatically converts these codes to display names (e.g., "Croatian-Croatia", "English-United States", "Swedish-Sweden")
    • Example: Create sv-se/ directory for Swedish-Sweden
  2. Add grading template file:

    • Upload grading_template.parquet to {jurisdiction}/grading_template.parquet
    • Required Structure: The parquet file must contain the following columns:
      • term (string) - The legal term being assessed
      • category (string) - Category within the term
      • category_index (integer) - Display order for categories (lower numbers appear first)
      • subcategory (string) - Subcategory within the category
      • subcategory_index (integer) - Display order for subcategories within each category (lower numbers appear first)
      • question (string) - The question being asked
      • answer (string) - The AI-generated answer to be graded
    • Special Values: Answers can be "Unknown." or "Unknown" to indicate unknown/unavailable information (these are automatically scored as "Irrelevant / NA")
    • Display Order: The category_index and subcategory_index columns control the order in which categories and subcategories are displayed in the app. Items with lower index values appear first.
  3. Create users directory:

    • Create {jurisdiction}/users/ directory with an empty .gitkeep file (so the directory is tracked in Git)
    • The users.json file will be created automatically on first user registration
  4. Verify:

    • The new jurisdiction will appear automatically in the spaces's jurisdiction selector
    • No code changes or redeployment needed - discovery is dynamic

File Structure Per Jurisdiction:

  • {jurisdiction}/grading_template.parquet - Required (grading questions/answers template)
  • {jurisdiction}/users/ - Created automatically (stores user data)
  • {jurisdiction}/users/users.json - Created on first registration (user credentials)
  • {jurisdiction}/users/{username}_answers.parquet - Created per user (grading data)

Configuration (Hugging Face Spaces)

The following is already configured in the Hugging Face Space settings. If you need to change these settings, ensure they are implemented correctly:

Variables

  • HF_DATASET_REPO: The name of your private dataset repository
    • Currently set to: TransLegal/grading-answers LINK to dataset repo
    • Location: TransLegal/grading-answers (SPACES) Settings β†’ Variables and secrets β†’ Variables LINK
    • Default: TransLegal/grading-answers (if not set)

Secrets

  • HF_TOKEN: A Hugging Face access token with read/write permissions to the private dataset repository
    • Location: TransLegal/grading-answers (SPACES) Settings β†’ Variables and secrets β†’ Secrets LINK
    • Required Permission: Enable "Write access to contents/settings of selected repos" when generating the token
    • Generate at: https://huggingface.co/settings/tokens

How It Works

  1. Jurisdiction Discovery: The app automatically discovers available jurisdictions by scanning the repository for subdirectories containing grading_template.parquet
  2. User Accounts: Each jurisdiction has separate user accounts (same username can exist in different jurisdictions)
  3. Data Storage: All user data is stored in the private Hugging Face dataset repository, organized by jurisdiction

Deployment

This app is designed to run on Hugging Face Spaces using Docker. After configuring the variables and secrets above, push this repository using git push and it will automatically deploy.