VQA
VQA: Visual Question Answering. VQA is a new dataset containing open-ended questions about images. These questions require an understanding of vision, language and commonsense knowledge to answer. 265,016 images (COCO and abstract scenes). At least 3 questions (5.4 questions on average) per image. 10 ground truth answers per question. 3 plausible (but likely incorrect) answers per question. Automatic evaluation metric.
Keywords for this software
References in zbMATH (referenced in 5 articles , 1 standard article )
Showing results 1 to 5 of 5.
Sorted by year (- Ras, Gabrielle; Xie, Ning; van Gerven, Marcel; Doran, Derek: Explainable deep learning: a field guide for the uninitiated (2022)
- Ostovar, Ahmad; Bensch, Suna; Hellström, Thomas: Natural language guided object retrieval in images (2021)
- Gulcehre, Caglar; Chandar, Sarath; Cho, Kyunghyun; Bengio, Yoshua: Dynamic neural Turing machine with continuous and discrete addressing schemes (2018)
- Aishwarya Agrawal, Jiasen Lu, Stanislaw Antol, Margaret Mitchell, C. Lawrence Zitnick, Dhruv Batra, Devi Parikh: VQA: Visual Question Answering (2015) arXiv
- Bryan A. Plummer, Liwei Wang, Chris M. Cervantes, Juan C. Caicedo, Julia Hockenmaier, Svetlana Lazebnik: Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models (2015) arXiv