Christianvedel commited on
Commit
085551f
·
verified ·
1 Parent(s): f95590f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +44 -3
README.md CHANGED
@@ -1,3 +1,44 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ # OccCANINE: I-CeM Occupational Classification (OCCICEM)
5
+
6
+ ## Overview
7
+
8
+ OccCANINE_OCCICEM is a version of [OccCANINE](https://github.com/christianvedels/OccCANINE) fine-tuned to automatically convert English occupational descriptions into [I-CeM](https://www.essex.ac.uk/research-projects/integrated-census-microdata) (Integrated Census Microdata) occupational codes. It uses a CANINE encoder with a sequential decoder trained using a mixed loss, fine-tuned from the [OccCANINE_s2s_mix](https://huggingface.co/Christianvedel/OccCANINE_s2s_mix) base model on IPUMS UK census data.
9
+
10
+ See more on: [GitHub.com/christianvedels/OccCANINE](https://github.com/christianvedels/OccCANINE)
11
+
12
+ Read the paper on arXiv: [https://arxiv.org/abs/2402.13604](https://arxiv.org/abs/2402.13604)
13
+
14
+ ## Key Features
15
+
16
+ - **English**: Trained and evaluated on English occupational descriptions.
17
+ - **Sequential decoding**: Outputs I-CeM codes digit-by-digit.
18
+ - **Mixed loss training**: Combines sequence-level and flat classification losses.
19
+ - **Fine-tuned**: Initialized from OccCANINE_s2s_mix and fine-tuned on IPUMS UK I-CeM data.
20
+
21
+ ## Usage
22
+
23
+ ```python
24
+ from histocc import OccCANINE
25
+
26
+ model = OccCANINE(name="OccCANINE_OCCICEM", system="OCCICEM", hf=True)
27
+
28
+ result = model.predict("blacksmith", lang="en")
29
+ ```
30
+
31
+ ## Contribution and Support
32
+
33
+ Developed at the University of Southern Denmark by Christian Møller Dahl, Torben Johansen and Christian Vedel.
34
+
35
+ ---
36
+
37
+ **Model Details:**
38
+ - **Task**: Text Classification / Sequence Generation
39
+ - **Base Model**: CANINE (fine-tuned from OccCANINE_s2s_mix)
40
+ - **Target system**: I-CeM (OCCICEM)
41
+ - **Language**: English
42
+ - **Framework**: Transformers / PyTorch
43
+ - **License**: Apache 2.0
44
+ - **Paper**: arXiv 2402.13604