Thoughts on ImageNet Roulette

This post originated as a tweet thread – I’m reposting here for posterity.

A couple thoughts on this piece from Wired discussing the awesome ImageNet Roulette by @katecrawford and @trevorpaglen, mentioning some of my research analyzing ImageNet (and inc. a quote from me!)

First, I don’t think changing social norms from the development of WordNet in the 80s to today explain many of the issues with the taxonomy of ImageNet. As Kate and Trevor explain in , many classes are not merely outdated, but fundamentally offensive or immaterial (how do you visualize a “hypocrite”?), leading to the perpetuation of stereotypes and phrenological ideas.

This was true in the 80s and is true today. In the rush to crowdsource labels and get AI “to work”, these issues were overlooked. We’ve all seen many examples recently of technologists not considering the negative implications of their inventions – this is but another example.

Second, efforts by ImageNet creators to debias the “person” category are a good step, but removing data from public access in the interim, without transparency, is a bad way to ensure this important issue is not lost to history.

We are deemed to repeat these mistakes if we remove the “bad parts” from this story – I would hope the original data is still accessible in the future for researchers to study.

And again on debiasing the “person” class… what about the rest of the ImageNet? As noted in the article, the ImageNet Challenge subset (1k classes, conventionally used for pre-training) contains only 3 classes specifically of people: “scuba diver”, “groom” & “baseball player”…

However, in my studies, I’ve found that a person appears in closer to **27%** of all images in the ImageNet Challenge subset.

Top categories containing people by running an object detection model on the data (% of images in a category containing >= 1 person):

Lots of fish classes… people like photos of them holding fish! (typically older white men, that is)

As Cisse et al. note in , underrepresentation of black persons in ImageNet in classes other than “basketball” may lead to models learning a narrow and biased association.

So while problematic classes are not included in the subset of ImageNet used by practitioners, biases remain. What does this mean for ML development when the standard process is to start with a model pre-trained on this data?

Maybe nothing, as downstream tasks update these weights and potentially biased feature representations reduce to edge and shape detectors. But maybe something. I think it’s an area worth exploring.

TLDR: Even if problematic ImageNet classes are seldom used in practice, the many social & political implications of classification that this work brings to light are incredibly valuable as we move forward in the field.