New Methods for Job and Occupation Classification

Research question/goal: 

Currently, most surveys ask for occupation with open-ended questions. The verbatim responses are coded afterwards into a classification with hundreds of categories and thousands of jobs, which is an error-prone, time-consuming and costly task. When textual answers have a low level of detail, exact coding may be impossible. The project investigates how to improve this process by asking response-dependent questions during the interview. Candidate job categories are predicted with a machine learning algorithm and the most relevant categories are provided to the interviewer. Using this job list, the interviewer can ask for more detailed information about the job. The proposed method is tested in a telephone survey conducted by the Institute for Employment Research (IAB). Administrative data are used to assess the relative quality resulting from traditional coding and interview coding. This project is carried out in cooperation with Arne Bethmann (IAB, University of Mannheim), Manfred Antoni (IAB), Markus Zielonka (LIfBi), Daniel Bela (LIfBi), and Knut Wenzig (DIW).

Current stage: 

Building on the promising results from a pilot study, we developed a revised instrument for interview coding. In particular, an auxiliary classification describing work activities was developed and published. In addition, we compared several machine-learning algorithms. The results were presented at international conferences and prepared for publication. Based on the auxiliary classification and the comparison of algorithms, we developed a new prototype for further testing, which coding experts from survey institutes are currently checking for strengths and weaknesses.

Fact sheet

2014 to 2020
Data Sources: 
ALWA and NEPS survey data, additional sources
Geographic Space: 



Foster, Ian, Rayid Ghani, Ron S. Jarmin, Frauke Kreuter and Julia Lane (Eds.) (2017): Big Data and Social Science: A Practical Guide to Methods and Tools. London: Chapman & Hall / CRC Press. [Chapman & Hall/CRC Statistics in the Social and Behavioral Sciences] more