Brian Kim, Christoph Kern, Jonathan Morgan, Clayton Hunter, Avishek Kumar

Pp. 333-340 in: Ian Foster, Rayid Ghani, Ron S. Jarmin, Frauke Kreuter, Julia Lane (Eds.): Big data and social science: Data science methods and tools for research and practice. 2nd edition 2021. Boca Raton: Chapman and Hall/CRC

This chapter provides an overview of the Python workbooks that accompany this book. The workbooks are implemented using Jupyter notebooks, interactive documents that mix formatted text and Python code samples that can be edited and run in real time in a Jupyter notebook server. The Databases notebook builds the foundation of using Structured Query Language (SQL) to query data. The Dataset Exploration and Visualization notebook further explores the North Carolina Department of Corrections data, demonstrating how to work with missing values and date variables and join tables by using SQL in Python. Although some of the SQL from the Databases notebook is revisited, the focus is on practicing Python code and using Python for data analysis. The Machine Learning—Creating Labels workbook is the first of a three-part Machine Learning workbook sequence, starting with how to create an outcome variable for a machine learning task by using SQL in Python.