File size: 2,719 Bytes
af8d788
 
c59578d
 
 
af8d788
c59578d
af8d788
c59578d
af8d788
 
c59578d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
---
title: Automated CV Parser
emoji: 🧩
colorFrom: indigo
colorTo: blue
sdk: docker
app_port: 7860
pinned: false
short_description: Resume NER β€” extracts Job Titles, Skills & Education
---

# CV Parser Dashboard

A Streamlit app on top of the WQF7007 resume-NER model. Deployed as a Hugging
Face Space; the model is loaded from the Hub so it can be updated without a
redeploy.

- **πŸ”Ž Live Parser** β€” paste/upload one CV and watch it get tokenized and
  classified: sub-word token chips coloured by predicted label, the original text
  with highlighted entities, and a structured summary.
- **πŸ“Š Analytics** β€” batch-upload CVs (PDF / DOCX / TXT) for a skills word cloud
  and top-entity charts across the set.
- **πŸ” Manage Model** β€” password-gated; teammates upload a new exported model and
  it's pushed to the Hub repo the app reads from.

## Models β€” how swapping works

The app loads its model **from a Hugging Face Hub repo** (`PRIMARY_MODEL_ID` in
`config.py`, default `Zeqhx/cv-parser-ner`). A Space's own disk is wiped on
restart, so the Hub repo is the durable store. To update the live model:

- **Easiest:** open **Manage Model**, enter the page password, upload a `.zip` of
  your exported model folder (the `exported_models/…` folder the training notebooks
  produce). It's validated against the 7-tag scheme and pushed to the Hub repo.
- **Or:** push files straight to the Hub model repo (web UI / CLI).
- Then click **πŸ”„ Reload model** in the sidebar to pick up the new weights.

The sidebar picker also has a **Custom HF model ID** box to load any repo live.

## Secrets (Space β†’ Settings β†’ Variables and secrets)

| Name | Purpose |
|------|---------|
| `HF_TOKEN` | A Hugging Face **write** token, so Manage Model can push to the Hub. |
| `MANAGE_PASSWORD` | Password protecting the Manage Model page (page is disabled if unset). |
| `DASHBOARD_MODEL_ID` | *(optional)* Override the model repo the app loads. |

## Run locally

```bash
pip install -r requirements.txt
streamlit run app.py
```

Local exported models in `../exported_models/…` are auto-offered in the picker
when present (see `config.MODEL_REGISTRY`).

## Layout

```
app.py                  # landing page + model-status banner
config.py               # model registry/resolution, labels, colours, sample CV
pages/
  1_Live_Parser.py
  2_Analytics.py
  3_Manage_Model.py     # password-gated model uploader -> Hub
lib/
  model.py              # load (local/Hub/fallback) + sliding-window inference
  extract.py            # PDF / DOCX / TXT -> text (adaptive spacing fix)
  ui.py                 # shared sidebar model picker
  viz.py                # entity HTML, token chips, word cloud, bar charts
```