The merits of Python for social scientists become tangible when working on a concrete use case. In this follow-up event of our Social Science Data Lab workshop series on Python we use Jupyter Notebooks in the Google Colab environment to implement a simple machine learning routine for prediction. To do that, we first take a step-by-step look at the peculiarities of Python such as data wrangling and basic visualization techniques. With that knowledge, we delve into the basics of applied machine learning by implementing the pipeline for both a logistic regression as well as a random forest model using the Python package scikit-learn. We conclude this workshop with a brief outlook on more advanced possibilities with Python to lay the foundation for your own research.
Andreas Küpfer is a doctoral researcher at the University of Darmstadt. His interdisciplinary research interests include text as data, applying machine learning technologies, and substantial inference in the fields of political communication and political competition.
Ruben Bach is a postdoctoral researcher at the MZES, University of Mannheim, focusing on social science quantitative research methods. His interests include topics related to big data in the social sciences, machine learning, causal inference, and survey research.