constantinSch commited on
Commit
e4154b5
·
1 Parent(s): 50fd52c

Enhance README

Browse files
Files changed (2) hide show
  1. .gitignore +1 -0
  2. README.md +83 -1
.gitignore CHANGED
@@ -4,3 +4,4 @@ __pycache__/
4
  annotations.db
5
  .env
6
  *_evaluation_dataset.jsonl
 
 
4
  annotations.db
5
  .env
6
  *_evaluation_dataset.jsonl
7
+ *.local.md
README.md CHANGED
@@ -4,8 +4,90 @@ emoji: ⚡
4
  colorFrom: gray
5
  colorTo: yellow
6
  sdk: docker
 
7
  pinned: false
8
  short_description: Evaluation Zusammenfassung Rundfunktranskripte
9
  ---
10
 
11
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  colorFrom: gray
5
  colorTo: yellow
6
  sdk: docker
7
+ app_port: 7860
8
  pinned: false
9
  short_description: Evaluation Zusammenfassung Rundfunktranskripte
10
  ---
11
 
12
+ # Evaluation Summarization Space
13
+
14
+ This repository contains a Docker-based Hugging Face Space for evaluating generated summaries against source transcripts. The app serves a lightweight annotation interface backed by Flask and stores submitted judgements in SQLite.
15
+
16
+ ## What the Space does
17
+
18
+ - Presents evaluation items from a JSONL dataset.
19
+ - Lets annotators score summary quality across multiple criteria.
20
+ - Persists annotations in a SQLite database.
21
+ - Exports collected annotations as JSONL.
22
+ - Supports optional password protection through a Space secret.
23
+
24
+ ## Runtime model
25
+
26
+ This Space uses the `docker` SDK and starts the Flask app defined in [app.py](app.py). The container exposes port `7860`, which is declared in the YAML front matter above and in [Dockerfile](Dockerfile).
27
+
28
+ At runtime, the app reads:
29
+
30
+ - `DATASET_PATH` for the evaluation dataset JSONL
31
+ - `DB_PATH` for the SQLite annotations database
32
+ - `APP_PASSWORD` for optional login protection
33
+ - `SECRET_KEY` for stable HMAC token generation
34
+
35
+ By default, the Docker image is configured for Hugging Face persistent storage mounted at `/data`:
36
+
37
+ - dataset: `/data/2026-04-20_evaluation_dataset.jsonl`
38
+ - annotations DB: `/data/annotations.db`
39
+
40
+ ## Deploying on Hugging Face Spaces
41
+
42
+ 1. Create a new Space using the `Docker` SDK.
43
+ 2. Add persistent storage mounted at `/data`.
44
+ 3. Set the secret `APP_PASSWORD` if the UI should require a login.
45
+ 4. Set the secret `SECRET_KEY` if you want authentication tokens to remain valid across container restarts.
46
+ 5. Push this repository to the Space.
47
+ 6. Upload the dataset JSONL into `/data/` in the Space storage browser.
48
+ 7. Confirm that the uploaded dataset filename matches `DATASET_PATH` in [Dockerfile](Dockerfile).
49
+
50
+ Without persistent storage, annotations stored in SQLite will be lost when the container filesystem is replaced.
51
+
52
+ ## Local development
53
+
54
+ The app can also run locally. In local development, it defaults to the dataset file in the project root when `DATASET_PATH` is unset.
55
+
56
+ Example environment variables:
57
+
58
+ ```powershell
59
+ $env:DATASET_PATH = "2026-04-20_evaluation_dataset.jsonl"
60
+ $env:DB_PATH = "annotations.db"
61
+ $env:APP_PASSWORD = "your-password"
62
+ python app.py
63
+ ```
64
+
65
+ The Python dependency is defined in [pyproject.toml](pyproject.toml). The container build uses `uv` as configured in [Dockerfile](Dockerfile).
66
+
67
+ ## Data expectations
68
+
69
+ The dataset is expected to be a JSONL file with one evaluation item per line. The application relies on stable item IDs and the text fields required to render the transcript, summary, and metadata shown in the UI.
70
+
71
+ Submitted annotations are stored by `eval_id` and include these fields:
72
+
73
+ - `bewertung`
74
+ - `korrekt`
75
+ - `relevant`
76
+ - `vollstaendig`
77
+ - `kohaerenz`
78
+ - `anmerkungen`
79
+
80
+ If you replace the dataset with a new file that uses different `eval_id` values, existing annotation rows in the SQLite database will no longer line up with the new items.
81
+
82
+ ## Notes for duplication
83
+
84
+ If someone duplicates this Space, the most important setup steps are:
85
+
86
+ 1. mount persistent storage at `/data`
87
+ 2. upload a dataset JSONL file to `/data`
88
+ 3. keep `DATASET_PATH` aligned with the uploaded filename
89
+ 4. configure `APP_PASSWORD` if access should be restricted
90
+
91
+ For Hugging Face Space metadata options, see the Spaces configuration reference:
92
+
93
+ https://huggingface.co/docs/hub/spaces-config-reference