The next generation of perception should understand complex free-form object descriptions, rather than a fixed set of categories. To accelerate this vision, we propose a novel & challenging benchmark. Checkout our task description and paper for more details


Our evaluation dataset annotates free-form text descriptions of objects on more than 25K images (~13K validation & ~12K test). The descriptions are challenging and can refer to multiple objects.  Explore and download the dataset to try it out. See our paper for more details


How do you evaluate your method? We provide a simple Python toolkit that lets you interact with the data, visualize samples, get statistics and evaluate your method


We are organizing a challenge with the OmniLabel benchmark along with our CVPR23 workshop. Participate and compare your method against others ...