MAR 2026 - Multimodal Algorithmic Reasoning

About MAR 2026

In this workshop, we plan to gather researchers working in neural algorithmic learning, multimodal reasoning, and cognitive models of intelligence to showcase their cutting-edge research, discuss the latest challenges, as well as bring to the forefront problems in perception and language modeling that are often overlooked but are pivotal in achieving true artificial general intelligence. An emphasis of this workshop is on the emerging topic of multimodal algorithmic reasoning, where a reasoning agent is required to automatically deduce new algorithms/procedures for solving real-world tasks, e.g., algorithms that use multimodal foundational models for analysis, synthesis, and planning, new approaches towards solving challenging vision-and-language mathematical (Olympiad type) reasoning problems, deriving winning strategies in multimodal games, procedures for using tools in robotic manipulation, etc. We hope to deep dive into this exciting topic at the intersection of multimodal learning and cognitive science to understand what we have achieved thus far in machine intelligence and what we are lacking in relation to the human way of thinking -- through talks from outstanding researchers and faculty that could inspire the audience to search for the missing rungs on the ladder to true intelligence.

Where

Room 601, Colorado Convention Center, Denver, CO, USA

When

8:55 AM - 12:30 PM MDT on June 4, 2026

Keynote Speakers

Juan Carlos Niebles

Salesforce AI Research

Melanie Mitchell

Santa Fe Institute

Jiayuan Mao

University of Pennsylvania

Jialong Wu

Tsinghua University

MAR 2026 Schedule

[in Denver local time]

08:55 AM

Opening Remarks Anoop Cherian

09:00 AM

Keynote Juan Carlos Niebles

Efficient Understanding & Action for Agentic Ambient Intelligence.

09:30 AM

Keynote Jiayuan Mao

Learning, Reasoning, and Planning with Neuro-Symbolic Concepts.

10:00 AM

Oral Paper Presentation

OrigamiBench: An Interactive Environment to Synthesize Flat-Foldable Origamis.

Focus Ambiguity in Visual Questions: A Disambiguation Problem, Not Instance Segmentation.

On Robustness and Chain-of-Thought Consistency of RL-Finetuned VLMs.

10:25 AM

Break

10:45 AM

Keynote Melanie Mitchell

Six Principles for Evaluating Cognitive Capabilities in AI Models.

11:15 AM

Keynote Jialong Wu

Bridging World Models and Multimodal Reasoning.

11:45 AM

Closing Remarks Suhas Lohit

11:50 AM - 12:30 PM

Poster Session All the accepted papers

Location: Exhibit Hall A (poster board numbers: 175-182).

Submissions

[Call for Contributions PDF]

Call for Contributions

Large AI frameworks have been rapidly increasing their data modeling capabilities in recent years, with compelling applications emerging frequently, some of which may even appear to challenge human intelligence. Yet, despite this impressive performance, there remain open questions about whether these models possess the foundations of general intelligence, or whether they succeed without human-like understanding. This motivates the development of better tools for assessing such models, alongside continued advances in model design.

This workshop focuses on multimodal algorithmic reasoning, where an agent must assimilate information from multiple modalities for complex problem solving. Real-world examples of such problems include: (i) chain-of-thought reasoning across modalities, (ii) vision-and-language problem solving, (iii) agentic reasoning and tool use, and (iv) reasoning under physical constraints, among others. Over the past year, we have seen rapid advances in AI that more effectively bridge modalities, inspiring both optimism about superhuman capabilities and skepticism about the limits of current approaches. This is an opportune moment to explore critical challenges, including new architectures for visual and physical reasoning, data generation via simulators, and the theoretical limits of reasoning in large models.

Through talks by outstanding researchers and faculty, we aim to delve deeply into this topic at the intersection of multimodality, algorithmic foundations, and cognitive science, to better understand what has been achieved in machine intelligence and what remains missing relative to human cognition, as we seek the next rungs on the ladder toward advancing AI to the next frontier.

Important Dates

Paper submission deadline: ~~February 27, 2026~~ March 5, 2026 (AoE time).
Notification to authors: March 20, 2026.
Camera-ready deadline: April 10, 2026.

Topics

We invite submissions of original and high-quality research papers in the topics related to multimodal algorithmic reasoning. The topics for MAR-CVPR 2026 include, but are not limited to:

Multimodal structured and multi-step reasoning across vision, language, audio, and other modalities, including compositional and programmatic inference.
Multimodal foundation models and world models for reasoning, planning, and decision-making, and their connections to general intelligence.
Reasoning under physical, geometric, and causal constraints, including embodied agents, simulators, and digital twins.
Multi-agent reasoning and collaboration, including debate, coordination, mixture-of-experts, and reward- or critique-based aggregation.
Extreme generalization and concept learning, including few-shot, zero-shot, and out-of-distribution multimodal reasoning.
Scaling laws, efficiency, and test-time reasoning, including inference-time optimization, self-refinement, and tool-augmented reasoning.
Benchmarks, datasets, diagnostics, and evaluation, including synthetic data, interpretability, and systematic analysis of shortcomings and failure modes in multimodal AI models.
Theoretical and cognitive perspectives on multimodal reasoning, including limits of current models and insights from human cognition.
Human–AI reasoning comparisons and foundations, including perspectives from psychology, neuroscience, and child development; theoretical limits of reasoning in large models; and position papers on how current multimodal AI reasoning differs from human cognition.

Submission Instructions

We are inviting submissions of both original and previously published works.

All submissions are handled via the workshop’s OpenReview website.
Submissions should be made in PDF format and must follow the CVPR 2026 submission style provided here.
We allow three types of submissions:

(1) Original and unpublished papers of up to 8 pages, which will be published as part of the CVPR 2026 workshop proceedings and will be released on the workshop website upon acceptance.
(2) Original and unpublished papers of up to 4 pages, which will not be included in the CVPR workshop proceedings and will be released only on the workshop website upon acceptance.
(3) Previously accepted or published papers of up to 8 pages, which will be released only on the workshop website upon acceptance to our workshop.

All the page limits above are excluding references, acknowledgements, and other non-technical content (e.g., scope, limitations, impact statement).
Authors may upload an optional Appendix, containing additional details, proofs, images, etc. as part of the submission pdf (after the references) or in a separate zip file (with a max of 50MB in size). The deadline for submitting these supplementary materials is the same as that for the main paper.
All submissions should maintain author anonymity and should abide by the CVPR 2026 conference guidelines for double-blind review.
Accepted papers will be presented as either an oral, spotlight, or poster presentation. At least one author of each accepted submission must present the paper at the workshop in-person.
Presentation of accepted papers at our workshop will follow the same policy as that for accepted papers at the CVPR 2026 main conference.
Accepted papers will be made publicly accessible on the workshop website shortly after the camera-ready deadline.
The submitting authors are expected to also be reviewers for the workshop.

Contact

Email: smart101@googlegroups.com

Accepted Papers

[Workshop Proceedings]

All the accepted papers will be presented in the poster session.

The number in front of each paper is the poster number. Each board face is for two posters.

Oral Papers

[175] OrigamiBench: An Interactive Environment to Synthesize Flat-Foldable Origamis.
Naaisha Agarwal, Yihan Wu, Yichang Jian, Yikuan Hu, Nishad Mansoor, Mohan Li, Yifei Peng, Wang-Zhou Dai, Yao-Xiang Ding, Emanuele Sansone

[176] Focus Ambiguity in Visual Questions: A Disambiguation Problem, Not Instance Segmentation.
Yu-Yun Tseng, Danna Gurari
[supplement]

[177] On Robustness and Chain-of-Thought Consistency of RL-Finetuned VLMs.
Rosie Zhao, Anshul Shah, Xiaoyu Zhu, Xinke Deng, Zhongyu Jiang, Yang Yang, Joerg Liebelt, Arnab Kumar Mondal
[supplement]

Poster Papers

[178] POVQA: Preference-Optimized Video Question Answering with Rationales for Data Efficiency.
Ashim Dahal, Ankit Ghimire, Saydul Akbar Murad, Nick Rahimi
[supplement]

[178] Cross-Modal Programmatic Reasoning for Zero-Shot Physical Problem Solving.
Mahule Roy, Subhas Roy

[179] When Negation Is a Geometry Problem in Vision Language Models.
Fawaz Sammani, Tzoulio Chamiti, Paul Gavrikov, Nikos Deligiannis

[179] InPhyRe Discovers: Large Multimodal Models Struggle in Inductive Physical Reasoning.
Gautam Sreekumar, Vishnu Boddeti

[180] HoliGround: Holistic Assessment for Grounded Chain-of-Thought.
Tom Hodemon, Mohamed Chaouch, Aboubacar Tuo, Angelique Loesch

[180] Neural Algorithmic Learning for Contact-Rich Manipulation: Multi-step Visuo-Tactile Reasoning via Cross-Attention.
Yiwen Liu, YuanFu Yang

[181] Executable World Generation from Layout Sketches: Topology-Aware Multimodal Reasoning for Industrial Simulation.
AnJui Wang, YuChe Hsu, YuanFu Yang

[181] Theory of Space: Benchmarking Multimodal Spatial Belief Construction through Active Exploration.
Pingyue Zhang, Zihan Huang, Yue Wang, Jieyu Zhang, Letian Xue, Zihan Wang, Qineng Wang, Keshigeyan Chandrasegaran, Ruohan Zhang, Yejin Choi, Ranjay Krishna, Jiajun Wu, Li Fei-Fei, Manling Li
[supplement]

[182] UniDoc-Bench: A Unified Benchmark for Document-Centric Multimodal RAG.
Xiangyu Peng, Can Qin, Zeyuan Chen, Ran Xu, Caiming Xiong, Chien-Sheng Wu.

MAR 2026 Venue

Room 601, Colorado Convention Center, Denver, CO, USA

MAR 2026 will be held at Room 601 of the Colorado Convention Center, Denver, CO, USA at 8:55 AM - 12:30 PM MDT on June 4, 2026.

Organizers

[Contact Email: smart101@googlegroups.com]

Program Committee

Yuyou Zhang	Carnegie Mellon University
Rosie Zhao	Harvard University
Shentong Mo	Carnegie Mellon University
Jiahao Zhang	Australian National University
Yuhang He	Microsoft
Claas Beger	Santa Fe Institute
Nick Rahimi	University of Southern Mississippi
Ibraheem Muhammad Moosa	Pennsylvania State University
Danrui Li	Rutgers University
Kathakoli Sengupta	State University of New York at Stony Brook
Moitreya Chatterjee	Mitsubishi Electric Research Laboratories
Antonia Wüst	Technische Universität Darmstadt
Tim K. Marks	Mitsubishi Electric Research Laboratories
Ryan Yi	Santa Fe Institute
Gautam Sreekumar	Michigan State University
Xiangyu Peng	Salesforce AI
Qinhong Zhou	University of Massachusetts at Amherst
Yue Yang	University of North Carolina at Chapel Hill
Thai Quoc Hoang	Salesforce Research

Multimodal Algorithmic Reasoning

(MAR)

About MAR 2026

Where

When

Keynote Speakers

Juan Carlos Niebles

Melanie Mitchell

Jiayuan Mao

Jialong Wu

MAR 2026 Schedule

Opening Remarks Anoop Cherian

Keynote Juan Carlos Niebles

Keynote Jiayuan Mao

Oral Paper Presentation

Break

Keynote Melanie Mitchell

Keynote Jialong Wu

Closing Remarks Suhas Lohit

Poster Session All the accepted papers

Submissions

Call for Contributions

Important Dates

Topics

Submission Instructions

Contact

Accepted Papers

Oral Papers

Poster Papers

MAR 2026 Venue

Room 601, Colorado Convention Center, Denver, CO, USA

Sponsors

Organizers

Anoop Cherian

Suhas Lohit

Kuan-Chuan Peng

Honglu Zhou

Kevin Smith

Joshua B. Tenenbaum

Program Committee