Breaking the HISCO Barrier: Automatic Occupational Standardization with OccCANINE
Paper
• 2402.13604 • Published
OccCANINE_OCCICEM is a version of OccCANINE fine-tuned to automatically convert English occupational descriptions into I-CeM (Integrated Census Microdata) occupational codes. It uses a CANINE encoder with a sequential decoder trained using a mixed loss, fine-tuned from the OccCANINE_s2s_mix base model on IPUMS UK census data.
See more on: GitHub.com/christianvedels/OccCANINE
Read the paper on arXiv: https://arxiv.org/abs/2402.13604
from histocc import OccCANINE
model = OccCANINE(name="OccCANINE_OCCICEM", system="OCCICEM", hf=True)
result = model.predict("blacksmith", lang="en")
Developed at the University of Southern Denmark by Christian Møller Dahl, Torben Johansen and Christian Vedel.
Model Details: