Logo
User: Guest  Login
Authors:
Rösch, Philipp J.; Libovický, Jindřich
Document type:
Konferenzbeitrag / Conference Paper
Title:
Probing the Role of Positional Information in Vision-Language Models
Title of conference publication:
Findings of the Association for Computational Linguistics: NAACL 2022
Conference title:
Conference of the North American Chapter of the Association for Computational Linguistics (2022, Seattle, WA)
Venue:
Seattle, WA, United States
Year of conference:
2022
Date of conference beginning:
10.07.2022
Date of conference ending:
15.07.2022
Publisher:
Association for Computational Linguistics (ACL)
Year:
2022
Pages from - to:
1031-1041
Language:
Englisch
Abstract:
In most Vision-Language models (VL), the understanding of the image structure is enabled by injecting the position information (PI) about objects in the image. In our case study of LXMERT, a state-of-the-art VL model, we probe the use of the PI in the representation and study its effect on Visual Question Answering. We show that the model is not capable of leveraging the PI for the image-text matching task on a challenge set where only position differs. Yet, our experiments with probing confirm...     »
URL:
https://aclanthology.org/2022.findings-naacl.77/
Department:
Fakultät für Elektrotechnik und Technische Informatik
Institute:
ETTI 2 - Institut für Verteilte Intelligente Systeme
Chair:
Oswald, Norbert
Open Access yes or no?:
Ja / Yes
Type of OA license:
CC BY 4.0
Licence URL:
https://creativecommons.org/licenses/by/4.0/
Miscellaneous:
https://www.unibw.de/vis-en/naacl2022
 BibTeX