View a PDF of the paper titled Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning, by Yian Li and 7 other authors
Abstract:Recently, Multimodal Large Language Models (MLLMs) have achieved significant success across multiple disciplines due to their exceptional instruction-following capabilities and extensive world knowledge. However, whether these MLLMs possess human-like compositional reasoning abilities remains an open problem. To unveil their reasoning behaviors, we first curate a \textbf{M}ultimodal \textbf{A}ssumptive \textbf{R}ea\textbf{s}oning Benchmark (MARS-Bench) in this paper. Interestingly, we find that most prevalent MLLMs can be easily fooled by the introduction of a presupposition into the question, whereas such presuppositions appear naive to human reasoning. Besides, we also propose a simple yet effective method, Active Deduction (AD), a novel reinforcement learning paradigm to encourage the model to actively perform composite deduction before reaching a final decision. Equipped with the proposed AD method, a MLLM demonstrates significant improvements in assumptive reasoning abilities without compromising its general-purpose question-answering performance. We also provide extensive evaluations of both open-source and private MLLMs on MARS-Bench, along with experimental analyses of the AD method.
Submission history
From: Yian Li [view email]
[v1]
Fri, 19 Apr 2024 15:53:27 UTC (2,067 KB)
[v2]
Wed, 24 Apr 2024 10:33:26 UTC (2,123 KB)
[v3]
Fri, 30 Aug 2024 09:00:38 UTC (1,762 KB)
[v4]
Tue, 19 Nov 2024 15:22:16 UTC (471 KB)
[v5]
Thu, 17 Apr 2025 08:05:10 UTC (4,096 KB)