“The Chopsticks Dilemma” in Generative Images AI

Published in

Prototypr

3 min readMar 19, 2024

“The Chopsticks Dilemma” in Generative Images AI

For the last few years I’ve been trying to develop my understanding of AI, taking courses on transformer models and exploring the depths of chatGPT and Midjourney prompts. However, on my journey, I’ve come across an interesting phenomenon which I’d like to coin as “The Chopsticks Dilemma”.

But first, here’s a funny gag showing the dilemma. https://youtu.be/VaADDSuQqYE?si=cSPNotngNk3KznJB

The Chopsticks Dilemma comes from the struggle for AI models to generate a realistic bowl of ramen, without a pair of chopsticks in the photo. Here are some example prompts and the results:

a bowl of ramen without chopsticks

a bowl of japanese ramen, but I don’t want to include any cutleries in the picture

give me just a bowl of ramen but don’t include anything else in the photo (this is the closest but it’s still in the background)

someone holding a bowl of ramen (finally no chopsticks, but we got freaky hands instead)

So even though it’s not impossible to create, why is something so simple for us to visualise so difficult for AI to create consistently?

I think there are a few issues causing this dilemma:

Training Data Bias — Generative AI models learn from the the quality and diversity of the data they have access to and it significantly influence their outputs. This means that if the training data contains certain information then they tend to generate it together. So in this instance, training data for bowl of ramen predominantly consists of images featuring a pair of chopsticks.
The majority rule — This is related to the point above, even if we were to introduce variation into the training data, AI models will often discarding outliers in favor of dominant result. In this case, if we introduce images of ramen without chopsticks, or images of ramen with forks, because there is an overwhelming amount of ramen with chopsticks, the AI will always consider them as outliers and no longer factor them in.
Lack of Contextual understanding — AI lacks the contextual understanding humans possess regarding minor details. Humans can differeciate chopstick and ramen as 2 different thing, but AI may not possess this understanding and may default to including chopsticks regardless of the prompt used to generate the image, because it thinks that chopsticks are a part of asian food.
Unexplained detail — Another issue with AI generated ramen, beside the compuslory chopsticks, is the lack of understanding of chopsticks. Chopsticks always come in pairs, always be the same length and in chinese culture, never crossover as an etiquette. Which is not something that AI can articulate easily.

In conculsion, the Chopsticks Dilemma defines the limitations of current AI’s ability to understand and replicate nuances and contextual details. Addressing this dilemma is essential in order to start creating a diversify set of training data, enhance contextual understanding, and refine AI algorithms to better capture and represent the complexities of outlier within the training data.

Now I am by no means an AI expert, or an AI journalist, this is just a more light hearted take on a more serious article about AI stereotype data biased by Victoria Turk on restofworld.org, which you can read here :) https://restofworld.org/2023/ai-image-stereotypes/

“The Chopsticks Dilemma” in Generative Images AI

“The Chopsticks Dilemma” in Generative Images AI

Written by Kevin Auyeung