Breaking the HISCO Barrier: Automatic Occupational Standardization with OccCANINE
Paper
• 2402.13604 • Published
OccCANINE_ISCO68 is a version of OccCANINE fine-tuned to automatically convert English occupational descriptions into ISCO-68A codes (International Standard Classification of Occupations, 1968 revision, as used in IPUMS UK). It uses a CANINE encoder with a sequential decoder trained using a mixed loss, fine-tuned from the OccCANINE_s2s_mix base model on IPUMS UK census data.
See more on: GitHub.com/christianvedels/OccCANINE
Read the paper on arXiv: https://arxiv.org/abs/2402.13604
from histocc import OccCANINE
model = OccCANINE(name="OccCANINE_ISCO68", system="ISCO68A", hf=True)
result = model.predict("blacksmith", lang="en")
Developed at the University of Southern Denmark by Christian Møller Dahl, Torben Johansen and Christian Vedel.
Model Details: