Explore the dataset

On our validation set with around 13K images, we took all object descriptions, extracted noun words, encoded them with GPT-2, and applied a k-means clustering to group our data. Below are the word clouds for 15 clusters. Click on the word clouds to explore the annotated images of that cluster ...