Data-Driven DJ

by Brian Foo

Color Balance
A Soundtrack For Gender, Race, and Ethnicity in Movies

This song was created using data on gender, race, and ethnicity of actors and actresses featured in the last decade of blockbuster movies. The goal of this song and visualization is to give an intuitive and visceral understanding of how different groups of people are represented (or underrepresented) in mainstream Hollywood movies.

You can also explore the data in an interactive web interface here:

The Challenges

This song and visualization was particularly challenging for a few reasons. First, the data I was looking for was very hard to find. Though there are a number of reports that are published about gender, race, and ethnicity in movies, there are little to no free, reliable, and open datasets with this type of information, particularly how actors and actresses identify in terms of race or ethnicity. The result was a lot of manual research on my part.

Given that, the next major challenge is that it is very hard confidently identify someone's race. First, there is a the question of identity. Someone can be half black and half white, but may identify as either black, white, mixed race, or simply a person of color. Though many sources give information about a person's racial background, they rarely give information about their racial identity.

And then there is the issue of ethnicity versus race, particularly with individuals who identify as Hispanic. The U.S. Census does not include Hispanic as a race, but as an ethnicity. So an individual can be white and Hispanic, black and Latino, etc. Individuals from Spain may or may not identify as Hispanic. In these cases self-identification cannot be overlooked.

For the above reasons and for the purposes of this song, I decided to focus on determining if individuals identified as people of color or not, rather than try to categorize by race or ethnicity. The data is not perfect, but for sake of transparency, I listed all the data sources I could find in this public spreadsheet. I encourage you to send me corrections or improvements to this data. I created an interactive way to explore the data which contains notes for each person.

The last thing I'll say about this is that, despite all the challenges of identification, one thing we cannot deny is what we see on screen. The bottom line is that many individuals don't see people who look like them in popular media. There may be a few cast members with complicated racial or ethnic backgrounds, but as a whole, there are obvious imbalances of genders, races, and ethnicities in popular media as compared to the actual demographics of our country.

The Data

Given all the caveats above, let's dive into the data. Here is the high-level approach of how I chose the data:

I retrieved the top 10 highest-grossing domestic (U.S.) films from the past 10 years (2006-2015), for a total of 100 films. Source: Box Office Mojo.
For each film, I retrieved the top 5 featured actors and actresses. Source: IMDB, which lists the movie's credits in order of top billing. Sometimes, IMDB lists the credits in order of appearance, in which case I refer to Wikipedia's entry of the movie for featured cast.
I include animated movies. I decided these movies are important and should not be ignored, especially given that children represent a large portion of that audience. In the data, I note the voice actor's gender and race/ethnicity. I also note if the actor's gender/race/ethnicity is reflected in the character.
For sake of transparency, I made the data and sources public for anyone to use and scrutinize.

Click the graphic below to see an interactive view of the data

Some striking takeaways:

All minority groups (male or female) as well as white females are under-represented in these 100 movies compared to the latest U.S. Census data from 2014.
White males are the only group that are over-represented in these 100 movies. White males make up about 59% of the roles in these movies, but they only make up about 30.6% of the population according to the U.S. Census.
While women make up 50.8% of the overall U.S. population they only make up 29.4% of these featured roles.
And while people of color make up about 38% of the overall population, they only make up 14% of these featured roles.
And looking at an extreme example, there is exactly one role in these 500 roles that was given to a female Asian or Pacific Islander. And that role was not an on-screen role.

Composition

Essentially I wanted the gender/racial diversity of these blockbuster films to control the sonic diversity of the song. Here is a high-level view of how the song is composed:

Different instruments/sounds were defined and mapped to different groups of cast members:
- Piano (e.g. , , ): Identifies as white
- Guitar (e.g. , , ): Identifies as person of color
- Female Vocal (e.g. , , ): Identifies as female
Additional instrumentation (e.g. , , ) was added for movies that were more diverse
The 100 movies (as defined above) were listed in chronological order from 2006 to 2015. Within each year, the movies were listed from lowest to highest gross sales.
The song sequentially goes through each movie and each movie's top 5 featured cast. For each movie, the song plays the instruments that are associated with movie's cast (as defined above). For example, if a movie has 5 white males, the song will play the piano 5 times. A movie with 3 white males, 1 white female, and 1 female person of color, you will hear the piano play 3 times, 2 female voices, and 1 guitar.

This song was challenging given that if the diversity of the data controls the diversity of sound in the song, the song will sound bland and repetitive when movies do not have a lot of diversity (which is true.) My goal then was to find sounds and genres that work well as repetitive or bland.

Style & Sounds

I found Brian Eno's Music For Films, a conceptual music album intended as a soundtrack for imaginary films. The album consists of 18 short tracks, mostly ambient with a broad sonic palette. Sonically and conceptually, I thought this was a perfect fit. Soundtrack music is often ambient and atmospheric in nature and is meant to support, not overpower, the scene. Thus, it may be perceived as bland or repetitive (like the data) when experienced alone.

Here are some of the samples that I took to create this song:

11 samples (e.g. , , ) from From the Same Hill by Brian Eno in Music for Films
12 samples (e.g. , , ) from Events in Dense Fog by Brian Eno in Music for Films
7 samples (e.g. , , ) from Final Sunset by Brian Eno in Music for Films
4 samples (e.g. , , ) from 73 Poems by Joan La Barbara, an American vocalist and composer whose works often involve multiple layers of her own voice, creating a kind of sonic canvas on which she throws splashes of vocal colors.

Tools & Code

This song was algorithmically generated in that I wrote a computer program that took data and music samples as input and generated the song as output. I did not manually compose any part of this song. I used Python to crunch the data, ChucK to sequence and build the song, and Processing to generate the visualization.

For those interested in replicating, adapting, or extending my process, all of the code and sound files are open source and can be found here. It also contains a comprehensive README to guide you through the setup and configuration. The following is a brief outline of my process:

If you happen to use my code and create something new, please shoot me an email at hello@brianfoo.com. I'd love to see and share your work!

Questions & Feedback

I'd love to hear from you. I'm sure I've also made some erroneous statements somewhere, so please correct me. You can use the widget below or email me at hello@brianfoo.com.

Data-Driven DJ is a series of music experiments that combine data, algorithms, and borrowed sounds.

My goal is to explore new experiences around data consumption beyond the written and visual forms by taking advantage of music's temporal nature and capacity to alter one's mood. Topics will range from social and cultural to health and environmental.

Each song will be made out in the open: the creative process will be documented and published online, and all custom software written will be open-source. Stealing, extending, and remixing are inevitable, welcome, and encouraged. Check out the FAQs for more information.

About me

My name is Brian Foo and I am a programmer and visual artist living and working in New York City. Learn more about what I do on my personal website. You can also follow my work on Twitter, Facebook, Soundcloud, or Vimeo.

Table of Contents