File size: 5,666 Bytes
39b0ef0
940dbe5
39b0ef0
 
 
 
 
 
940dbe5
39b0ef0
abc33c8
39b0ef0
 
6df93c7
39b0ef0
6df93c7
39b0ef0
6df93c7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
110375c
 
 
 
 
 
6df93c7
 
 
 
 
 
c3069c3
6df93c7
c3069c3
6df93c7
 
 
c3069c3
6df93c7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
---
title: Grading Answers
emoji: πŸš€
colorFrom: red
colorTo: red
sdk: docker
app_port: 8501
tags:
- streamlit
pinned: false
short_description: A space for grading generated answers
---

# Grading Answers App

A Streamlit application for grading AI-generated legal answers across multiple jurisdictions. The app connects to a private Hugging Face dataset repository to store user credentials and grading data.

## Private Repository Usage

This app connects to the existing private Hugging Face dataset repository: [TransLegal/grading-answers](https://huggingface.co/datasets/TransLegal/grading-answers/tree/main)

### Repository Structure

The app expects the following structure (jurisdictions are discovered automatically):

```
TransLegal/grading-answers/
β”œβ”€β”€ en-us/
β”‚   β”œβ”€β”€ grading_template.parquet
β”‚   └── users/
β”œβ”€β”€ hr-hr/
β”‚   β”œβ”€β”€ grading_template.parquet
β”‚   └── users/
└── [jurisdiction-code]/
    β”œβ”€β”€ grading_template.parquet
    └── users/
```

**How It Works:**
- The app automatically discovers jurisdictions by scanning for subdirectories containing `grading_template.parquet`
- Each jurisdiction has isolated user accounts and data
- The `users/` subdirectory is created automatically when the first user registers in that jurisdiction

### Adding New Jurisdictions

To add a new jurisdiction to the repository:

1. **Create jurisdiction subdirectory** in the [TransLegal/grading-answers](https://huggingface.co/datasets/TransLegal/grading-answers) repository:
   - **Format:** `{language-code}-{country-code}` following ISO standards:
     - Language code: ISO 639-1 (2-letter lowercase, e.g., `hr`, `en`, `sv`)
     - Country code: ISO 3166-1 alpha-2 (2-letter lowercase, e.g., `hr`, `us`, `se`)
     - Combined format: lowercase language code, hyphen, lowercase country code (e.g., `hr-hr`, `en-us`, `sv-se`)
   - The app automatically converts these codes to display names (e.g., "Croatian-Croatia", "English-United States", "Swedish-Sweden")
   - Example: Create `sv-se/` directory for Swedish-Sweden

2. **Add grading template file:**
   - Upload `grading_template.parquet` to `{jurisdiction}/grading_template.parquet`
   - **Required Structure:** The parquet file must contain the following columns:
     - `term` (string) - The legal term being assessed
     - `category` (string) - Category within the term
     - `category_index` (integer) - Display order for categories (lower numbers appear first)
     - `subcategory` (string) - Subcategory within the category
     - `subcategory_index` (integer) - Display order for subcategories within each category (lower numbers appear first)
     - `question` (string) - The question being asked
     - `answer` (string) - The AI-generated answer to be graded
   - **Special Values:** Answers can be `"Unknown."` or `"Unknown"` to indicate unknown/unavailable information (these are automatically scored as "Irrelevant / NA")
   - **Display Order:** The `category_index` and `subcategory_index` columns control the order in which categories and subcategories are displayed in the app. Items with lower index values appear first.

3. **Create users directory:**
   - Create `{jurisdiction}/users/` directory with an empty `.gitkeep` file (so the directory is tracked in Git)
   - The `users.json` file will be created automatically on first user registration

4. **Verify:**
   - The new jurisdiction will appear automatically in the spaces's jurisdiction selector
   - No code changes or redeployment needed - discovery is dynamic

**File Structure Per Jurisdiction:**
- `{jurisdiction}/grading_template.parquet` - Required (grading questions/answers template)
- `{jurisdiction}/users/` - Created automatically (stores user data)
- `{jurisdiction}/users/users.json` - Created on first registration (user credentials)
- `{jurisdiction}/users/{username}_answers.parquet` - Created per user (grading data)

## Configuration (Hugging Face Spaces)

The following is already configured in the Hugging Face Space settings. If you need to change these settings, ensure they are implemented correctly:

### Variables
- **`HF_DATASET_REPO`**: The name of your private dataset repository
  - Currently set to: `TransLegal/grading-answers` [LINK to dataset repo](https://huggingface.co/datasets/TransLegal/grading-answers)
  - Location: TransLegal/grading-answers (SPACES) Settings β†’ Variables and secrets β†’ Variables [LINK](https://huggingface.co/spaces/TransLegal/grading-answers/settings)
  - Default: `TransLegal/grading-answers` (if not set)

### Secrets
- **`HF_TOKEN`**: A Hugging Face access token with read/write permissions to the private dataset repository
  - Location: TransLegal/grading-answers (SPACES) Settings β†’ Variables and secrets β†’ Secrets [LINK](https://huggingface.co/spaces/TransLegal/grading-answers/settings)
  - **Required Permission:** Enable "Write access to contents/settings of selected repos" when generating the token
  - Generate at: https://huggingface.co/settings/tokens

## How It Works

1. **Jurisdiction Discovery:** The app automatically discovers available jurisdictions by scanning the repository for subdirectories containing `grading_template.parquet`
2. **User Accounts:** Each jurisdiction has separate user accounts (same username can exist in different jurisdictions)
3. **Data Storage:** All user data is stored in the private Hugging Face dataset repository, organized by jurisdiction

## Deployment

This app is designed to run on Hugging Face Spaces using Docker. After configuring the variables and secrets above, push this repository using `git push` and it will automatically deploy.