Encoding Culture

This class is oversubscribed for Spring 2023 for a limited number of slots.

Priority will be given to students to pre-registered for the class and then to students who contact the instructor (rahmed@mit.edu) to express interest before the first week of classes. There will be a limited number of spaces available for students who did not pre-register.

In order to participate in the lottery, you must attend the first day of class (M 2/6, 2:30-4 pm). We will distribute a questionnaire and determine enrollment based on answers to the questionnaire and a lottery.

Learn to...

Analyze texts, images, audio, and datasets using Python
Use tools like NumPy, TensorFlow, NLTK, and more
Apply your coding skills to real-world problems
Combine your love of tech with the humanities
Think critically about computing in the humanities

Basic Info

Spring 2023
MW 2:30 - 4
HASS-E (HASS Elective - counts towards your additional 1-2 HASS subjects needed after you've fulfilled your concentration and distribution requirements)
12 units (3-0-9)
Course Staff
- Lecturer: Ryaan Ahmed (rahmed@mit.edu)
- Teaching Assistant: Hannah Shumway
- Undergrad TA: Justice Vidal
Prerequisites: This class requires the ability to write basic programs in Python, as demonstrated by successfully completing 6.100A (or 6.100A Advanced Standing Exam), 6.100B, or 6.100L.

This class is in development for Spring 2023. The schedule, course description, and other information below are subject to change.

Description

Computers allow scholars and artists to study and play with media such as texts, images, audio, and numerical datasets with unprecedented scale and speed. These affordances open a world of opportunity for cultural production: artists can sketch, remix, and make on machines, and an individual scholar can access and analyze more and more varied cultural artifacts than ever before.

But what does it mean to model, create, or analyze these media on a computer? The humanities and arts are built on the fundamental understanding that nothing is binary, but computers only understand 1s and 0s!

What happens when we digitally encode culture?

This course explores this question, in the technical sense of how we represent these media as bits on a hard drive, and by considering the consequences of doing so. Students will learn the history and current practice of digitally encoding text, images, audio, and tabular datasets, along with the cultural and social issues implicit in these systems. They will apply computational methods for manipulating and analyzing encoded media, drawing from a wide range of practices including computational linguistics, audio processing, computer vision, and machine learning. In doing this work, students will confront underlying issues of what is lost and gained when we encode culture, and equip themselves to think critically about their own computing work.

After taking this course, you should be able to:

Think and write critically about the opportunities afforded by and challenges inherent in digitally encoding and analyzing culture
Describe the digital encoding schemes for the most common kinds of cultural artifacts
Write Python programs that use common libraries to perform quantitative analyses on text, audio, image, and tabular datasets, and interpret and present the results of such analyses

Schedule

Unit 0: Introduction

Lecture: Introduction to digital encoding and analysis of cultural artifacts.

Unit 1: Text

Lecture 1: What are close and distant reading? (Guest lecturer from Literature)

Lecture 2: History and current practices of character encoding. ASCII. Unicode.

Lab 1: Working with text in Python programs. Character encoding “by hand.” Text manipulation with standard string library. Basic text analysis with standard library. Weird art moment: Unicode art.

Lecture 3: Machine reading. Introduction to the Python Natural Language Toolkit.

Lab 2: Text analysis using NLTK. Visualization of results using matplotlib.

Unit 2: Audio

Lecture 4: How do musicians listen? (Guest lecturer from Music)

Lecture 5: What is digital audio? WAV Files. Sampling.

Lab 3: Exploring audio with NumPy. Audio manipulation. Weird art moment: mangling audio in NumPy.

Lecture 6: Metadata. Compression. Introduction to librosa.

Lab 4: Audio analysis using librosa.

Unit 3: Images

Lecture 7: How do historians view photos? (Guest lecturer from History)

Lecture 8: What is a digital image?

Lab 5: Image processing. Filters and edge detection. Weird art moment: image mangling through bit manipulation and terrible masks.

Lecture 9: Object detection and image similarity. Introduction to TensorFlow.

Lab 6: Image analysis using TensorFlow.

Unit 4: People

Lecture 10: How do social scientists collect and analyze data? (Guest lecturer from Social Sciences)

Lecture 11: History and current practices of structured data. CSV. Excel. XML. JSON. Databases.

Lab 7: Loading and manipulating structured data using Python standard library. Visualization in matplotlib. Make and parse your own data format.

Lecture 12: Gathering data: web scraping, using web APIs. k-nearest neighbors and logistic regression.

Lab 8: Create a full data analysis pipeline, from scraping to statistics to visualization.