--- title: Grading Answers emoji: 🚀 colorFrom: red colorTo: red sdk: docker app_port: 8501 tags: - streamlit pinned: false short_description: A space for grading generated answers --- # Grading Answers App A Streamlit application for grading AI-generated legal answers across multiple jurisdictions. The app connects to a private Hugging Face dataset repository to store user credentials and grading data. ## Private Repository Usage This app connects to the existing private Hugging Face dataset repository: [TransLegal/grading-answers](https://huggingface.co/datasets/TransLegal/grading-answers/tree/main) ### Repository Structure The app expects the following structure (jurisdictions are discovered automatically): ``` TransLegal/grading-answers/ ├── en-us/ │ ├── grading_template.parquet │ └── users/ ├── hr-hr/ │ ├── grading_template.parquet │ └── users/ └── [jurisdiction-code]/ ├── grading_template.parquet └── users/ ``` **How It Works:** - The app automatically discovers jurisdictions by scanning for subdirectories containing `grading_template.parquet` - Each jurisdiction has isolated user accounts and data - The `users/` subdirectory is created automatically when the first user registers in that jurisdiction ### Adding New Jurisdictions To add a new jurisdiction to the repository: 1. **Create jurisdiction subdirectory** in the [TransLegal/grading-answers](https://huggingface.co/datasets/TransLegal/grading-answers) repository: - **Format:** `{language-code}-{country-code}` following ISO standards: - Language code: ISO 639-1 (2-letter lowercase, e.g., `hr`, `en`, `sv`) - Country code: ISO 3166-1 alpha-2 (2-letter lowercase, e.g., `hr`, `us`, `se`) - Combined format: lowercase language code, hyphen, lowercase country code (e.g., `hr-hr`, `en-us`, `sv-se`) - The app automatically converts these codes to display names (e.g., "Croatian-Croatia", "English-United States", "Swedish-Sweden") - Example: Create `sv-se/` directory for Swedish-Sweden 2. **Add grading template file:** - Upload `grading_template.parquet` to `{jurisdiction}/grading_template.parquet` - **Required Structure:** The parquet file must contain the following columns: - `term` (string) - The legal term being assessed - `category` (string) - Category within the term - `category_index` (integer) - Display order for categories (lower numbers appear first) - `subcategory` (string) - Subcategory within the category - `subcategory_index` (integer) - Display order for subcategories within each category (lower numbers appear first) - `question` (string) - The question being asked - `answer` (string) - The AI-generated answer to be graded - **Special Values:** Answers can be `"Unknown."` or `"Unknown"` to indicate unknown/unavailable information (these are automatically scored as "Irrelevant / NA") - **Display Order:** The `category_index` and `subcategory_index` columns control the order in which categories and subcategories are displayed in the app. Items with lower index values appear first. 3. **Create users directory:** - Create `{jurisdiction}/users/` directory with an empty `.gitkeep` file (so the directory is tracked in Git) - The `users.json` file will be created automatically on first user registration 4. **Verify:** - The new jurisdiction will appear automatically in the spaces's jurisdiction selector - No code changes or redeployment needed - discovery is dynamic **File Structure Per Jurisdiction:** - `{jurisdiction}/grading_template.parquet` - Required (grading questions/answers template) - `{jurisdiction}/users/` - Created automatically (stores user data) - `{jurisdiction}/users/users.json` - Created on first registration (user credentials) - `{jurisdiction}/users/{username}_answers.parquet` - Created per user (grading data) ## Configuration (Hugging Face Spaces) The following is already configured in the Hugging Face Space settings. If you need to change these settings, ensure they are implemented correctly: ### Variables - **`HF_DATASET_REPO`**: The name of your private dataset repository - Currently set to: `TransLegal/grading-answers` [LINK to dataset repo](https://huggingface.co/datasets/TransLegal/grading-answers) - Location: TransLegal/grading-answers (SPACES) Settings → Variables and secrets → Variables [LINK](https://huggingface.co/spaces/TransLegal/grading-answers/settings) - Default: `TransLegal/grading-answers` (if not set) ### Secrets - **`HF_TOKEN`**: A Hugging Face access token with read/write permissions to the private dataset repository - Location: TransLegal/grading-answers (SPACES) Settings → Variables and secrets → Secrets [LINK](https://huggingface.co/spaces/TransLegal/grading-answers/settings) - **Required Permission:** Enable "Write access to contents/settings of selected repos" when generating the token - Generate at: https://huggingface.co/settings/tokens ## How It Works 1. **Jurisdiction Discovery:** The app automatically discovers available jurisdictions by scanning the repository for subdirectories containing `grading_template.parquet` 2. **User Accounts:** Each jurisdiction has separate user accounts (same username can exist in different jurisdictions) 3. **Data Storage:** All user data is stored in the private Hugging Face dataset repository, organized by jurisdiction ## Deployment This app is designed to run on Hugging Face Spaces using Docker. After configuring the variables and secrets above, push this repository using `git push` and it will automatically deploy.