The next generation of perception should understand complex free-form object descriptions, rather than a fixed set of categories. To accelerate this vision, we propose a novel & challenging benchmark. Checkout our task description and paper for more details


Our evaluation dataset annotates free-form text descriptions of objects on more than 25K images (~13K validation & ~12K test). The descriptions are challenging and can refer to multiple objects.  Explore and download the dataset to try it out. See our paper for more details


How do you evaluate your method? We provide a simple Python toolkit that lets you interact with the data, visualize samples, get statistics and evaluate your method


We are organizing a challenge in conjunction with our ECCV'24 workshop. We'd love to see you participate and compare your method against others ...

Results of last year's challenge with CVPR'23 are here.