MAR 2025 - Multimodal Algorithmic Reasoning

About MAR 2025

In this workshop, we plan to gather researchers working in neural algorithmic learning, multimodal reasoning, and cognitive models of intelligence to showcase their cutting-edge research, discuss the latest challenges, as well as bring to the forefront problems in perception and language modeling that are often overlooked but are pivotal in achieving true artificial general intelligence. An emphasis of this workshop is on the emerging topic of multimodal algorithmic reasoning, where a reasoning agent is required to automatically deduce new algorithms/procedures for solving real-world tasks, e.g., algorithms that use multimodal foundational models for analysis, synthesis, and planning, new approaches towards solving challenging vision-and-language mathematical (Olympiad type) reasoning problems, deriving winning strategies in multimodal games, procedures for using tools in robotic manipulation, etc. We hope to deep dive into this exciting topic at the intersection of multimodal learning and cognitive science to understand what we have achieved thus far in machine intelligence and what we are lacking in relation to the human way of thinking -- through talks from outstanding researchers and faculty that could inspire the audience to search for the missing rungs on the ladder to true intelligence.

Where

Upper Level Room 11AB, San Diego Convention Center, San Diego, CA, USA

When

December 7, 2025

Keynote Speakers

[More info about keynote speakers will be updated here]

Subbarao Kambhampati

Arizona State University

Yu Cheng

Chinese University of Hong Kong

Noah Goodman

Stanford University

Max Tegmark

MIT

Kristen Grauman

University of Texas at Austin

Alexander Toshev

Apple

Lindsey Li

Microsoft

MAR 2025 Schedule

[Tentative; subject to change]

09:10 AM

Opening Remarks Anoop Cherian

09:15 AM

Keynote Noah Goodman

TBD.

09:45 AM

Keynote Lindsey Li

TBD.

10:15 AM

Oral Paper Presentation Tin Nguyen et al.

HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs.

10:25 AM

Spotlight Paper Presentation

Md Tanvirul Alam et al., Sphinx: Visual Perception and Reasoning Gym.

Claas Beger et al., Investigating Abstraction Capabilities of the o3 Model Using Textual and Visual Modalities.

Yamei Chen et al., Symbolic Graphics Programming with Large Language Models.

Qi Cao et al., DreamPRM: Domain-Reweighted Process Reward Model for Multimodal Reasoning.

10:45 AM

Break

11:00 AM

Keynote Max Tegmark

TBD.

11:30 AM

Keynote Subbarao Kambhampati

TBD.

12:00 PM

Oral Paper Presentation Mingyuan Wu et al.

A-ha Moment Revisited: Are Vision Language Models Truly Capable of Self-verification in Inference Scaling?

12:10 PM

Lunch Break

01:30 PM

Oral Paper Presentation Antonia Wüst et al.

Learning Visual Concepts via Vision Language Programs.

01:40 PM

Oral Paper Presentation Rachneet Kaur et al.

ChartAgent: A Multimodal Agent for Complex Visual Question Answering in Charts.

01:50 PM

Spotlight Paper Presentation

Fuwen Luo et al., MUSEG: Reinforcing Video Temporal Understanding via Timestamp-Aware Multi-Segment Grounding.

Kibum Kim et al., Data Scaling Isn't Enough: Towards Improving Compositional Reasoning in Video-Language Models.

Zihao Lin et al., MLPEdit-Bench: Benchmarking Reasoning-Based Layer-wise Poster Editing.

02:15 PM

Keynote Kristen Grauman

TBD.

02:45 PM

Keynote Yu Cheng

TBD.

03:15 PM

Coffee Break

03:30 PM

Keynote Alexander Toshev

Towards Generalist Embodied AI Agents: Lessons and Methods.

04:00 PM

Closing Remarks Kuan-Chuan Peng

04:05 PM - 05:00 PM

Poster Session

All the accepted papers.

Submissions

[Call for Contributions PDF]

Call for Contributions

Large AI frameworks have been increasing in their data modeling abilities at an ever more vigor in recent times, with compelling applications emerging frequently, many of which may even appear to challenge human intelligence. Yet despite such impressive performance, there remain open questions about whether these models include the foundations of general intelligence, or whether they perform these tasks without human-like understanding. This necessitates development of better tools for assessing these models in tandem with developing the models themselves.

This workshop focuses on the topic of multimodal algorithmic reasoning, where an agent needs to assimilate information from multiple modalities towards deriving reasoning algorithms for complex problem solving. Some real-world examples of such problems include: i) chain-of-thought reasoning using multiple modalities, ii) solving Olympiad-type vision-and-language problems, and iii) distributed agentic reasoning and tool use, among others. Previous editions of this workshop emphasized the challenges in building generalizable AI towards solving vision-and-language problems. However, in the last year, we have seen rapid advances in AI capabilities that better bridge across modalities, bringing both optimism about superhuman capabilities and skepticism about the limits of current approaches. This is an opportune moment to explore critical challenges, including new architectures for visual reasoning, data generation via self-play, and the theoretical limits of reasoning in large models. Through talks from outstanding researchers and faculty, we hope to dive deep into this exciting topic at the intersection of theory, multimodal machine learning and cognitive science to understand what we have achieved thus far in machine intelligence and what we are lacking in relation to the human way of thinking, towards finding the missing rungs on the ladder to truly intelligent reasoning.

Important Dates

Paper submission deadline: August 31, 2025 (AoE time).
Notification to authors: September 22, 2025.
Camera-ready deadline: 11:59PM on November 3, 2025 (Pacific Time).

Topics

We invite submissions of original and high-quality research papers in the topics related to multimodal algorithmic reasoning. The topics for MAR-NeurIPS 2025 include, but are not limited to:

Multimodal algorithmic and mathematical reasoning.
Representations of algorithms for neural processing.
Comparisons between AI and human problem solving, including: i) perspectives from psychology and neuroscience, ii) children’s cognitive development, and iii) limits of reasoning in large models.
Extreme generalization to new tasks and few-shot concept induction.
Shortcomings in AI models.
Agentic AI, including multi-agent collaboration and distributed problem solving.
Scaling laws and efficient algorithms for improving reasoning at test-time.
Foundation models of intelligence, including vision, language, and other modalities.
Physical reasoning and planning using language models.
Multimodal AI applications, including new tasks, datasets, benchmarks, and models for multimodal reasoning.

Submission Instructions

We are inviting submissions of original and previously unpublished works.

All submissions are handled via the workshop’s CMT website.
Submissions should be made in PDF format and must follow the MAR 2025@NeurIPS submission style provided here (except for the NeurIPS checklist, which is optional).
Submissions should not exceed 4 pages in length (excluding references).
Authors may upload an optional Appendix, containing additional details, proofs, videos, images, etc. in a separate zip file (with a max of 50MB in size); the deadline for submitting these supplementary materials is the same as that for the main paper.
All submissions should maintain author anonymity and should abide by the NeurIPS conference guidelines for double-blind review.
Accepted papers will be presented as either an oral, spotlight, or poster presentation. At least one author of each accepted submission must present the paper at the workshop in-person.
Presentation of accepted papers at our workshop will follow the same policy as that for accepted papers at the NeurIPS 2025 main conference.
Accepted papers will be made publicly accessible on the workshop website shortly after the camera-ready deadline, but will not have any archival proceedings.
The submitting authors are expected to also be reviewers for the workshop, if needed.

Contact

Email: smart101@googlegroups.com

Accepted Papers

All the accepted papers will be presented in the poster session. The number in front of each paper is the poster number.

Oral Papers

[1] HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs.
Nguyen, Tin; Nguyen, Anh

[2] A-ha Moment Revisited: Are Vision Language Models Truly Capable of Self-verification in Inference Scaling?
Wu, Mingyuan; Li, Meitang; Yang, Jingcheng; Jiang, Jize; Yan, Kaizhuo; Li, Zhaoheng; Yu, Hanchao; Zhang, Minjia; Nahrstedt, Klara

[3] Learning Visual Concepts via Vision Language Programs.
Wüst, Antonia; Shindo, Hikaru; Stammer, Wolfgang; Dhami, Devendra Singh; Kersting, Kristian
[supplement]

[4] ChartAgent: A Multimodal Agent for Complex Visual Question Answering in Charts.
Kaur, Rachneet; Srishankar, Nishan; Zeng, Zhen; Ganesh, Sumitra; Veloso, Manuela

Spotlight Papers

[5] Sphinx: Visual Perception and Reasoning Gym.
Alam, Md Tanvirul; Chae, Justin; Rastogi, Nidhi

[6] Investigating Abstraction Capabilities of the o3 Model Using Textual and Visual Modalities.
Beger, Claas; Fu, Shuhao; Yi, Ryan; Moskvichev, Arseny; Mitchell, Melanie
[supplement]

[7] Symbolic Graphics Programming with Large Language Models.
Chen, Yamei; Zhang, Haoquan; Huang, Yangyi; Qiu, Zeju; Zhang, Kaipeng; Wen, Yandong; Liu, Weiyang

[8] DreamPRM: Domain-Reweighted Process Reward Model for Multimodal Reasoning.
Cao, Qi; Wang, Ruiyi; Zhang, Ruiyi; Xie, Pengtao

[9] MUSEG: Reinforcing Video Temporal Understanding via Timestamp-Aware Multi-Segment Grounding.
Luo, Fuwen; Lou, Shengfeng; Chen, Chi; Wang, Ziyue; Li, Chenliang; Shen, Weizhou; Guo, Jiyue; Li, Peng; Yan, Ming; Huang, Fei; Liu, Yang
[supplement]

[10] Data Scaling Isn't Enough: Towards Improving Compositional Reasoning in Video-Language Models.
Kim, Kibum; Min, Kyle; Park, Chanyoung

[11] MLPEdit-Bench: Benchmarking Reasoning-Based Layer-wise Poster Editing.
Lin, Zihao; Zhu, Wanrong; Gu, Jiuxiang; Kil, Jihyung; Tensmeyer, Chris; Zhang, Ruiyi; Huang, Lifu; Morariu, Vlad ; Sun, Tong

Poster Papers

[12] Online Reinforcement Learning for Autoformalization.
Sorg, Simon; Li, Wenda; Banerjee, Soumya

[13] DEPART: A Hierarchical Multi-Agent System for Multi-Turn Interaction.
Hsu, Hao-Lun; Xu, Jing; Vichare, Nikhil; Carbone, Francesco; Pajic, Miroslav; Carenini, Giuseppe
[supplement]

[14] Watch Wider and Think Deeper: Collaborative Cross-modal Chain-of-Thought for Complex Visual Reasoning.
Lu, Wenting; Zhu, Didi; Shen, Tao; Zhu, Donglin; Ye, Ayong; Wu, Chao
[supplement]

[15] Exploring Ego-Exo View-Invariant Temporal Understanding in Video LLMs.
Jung, Minjoon; Xiao, Junbin; Kim, Junghyun; Zhang, Byoung-Tak; Yao, Angela

[16] Audio Flamingo Sound-CoT: Improving Chain-of-Thought Reasoning in Sound Understanding.
Kong, Zhifeng; Goel, Arushi; Santos, João Felipe; Ghosh, Sreyan; Valle, Rafael; Ping, Wei; Catanzaro, Bryan

[17] Mitigating Hallucination in Multimodal LLMs with Layer Contrastive Decoding.
Tong, Bingkui; Xia, Jiaer; Zhou, Kaiyang

[18] Improvisational Reasoning with Vision-Language Models for Grounded Procedural Planning.
Rahman, Md Masudur; Zhuo, Yupeng; Wachs, Juan
[supplement]

[19] Do LLMs Benefit from User and Item Embeddings in Recommendation Tasks?
Hossain, Mir Rayat Imtiaz; Feng, Leo ; Sigal, Leonid; Ahmed, Mohamed Osama

[20] MET-Bench: Multimodal Entity Tracking for Evaluating the Limitations of Vision-Language and Reasoning Models.
Cohen, Vanya; Mooney, Raymond

[21] PersonaX: Multimodal Datasets with LLM-Inferred Behavior Traits.
Li, Loka; Wong, Yu Kang; Fu, Minghao; Chen, Guangyi; Chen, Zhenhao; Luo, Gongxu; Sun, Yuewen; Khan, Salman; Spirtes, Peter; Zhang, Kun

[22] Visual Abstract Thinking Empowers Multimodal Reasoning.
Liu, Dairu; Wang, Ziyue; Ruan, Minyuan; Luo, Fuwen; Chen, Chi; Li, Peng; Liu, Yang
[supplement]

[23] SlideAgent: Hierarchical Agentic Framework for Multi-Page Slide Deck Understanding.
Jin, Yiqiao; Kaur, Rachneet; Zeng, Zhen; Ganesh, Sumitra

[24] DA-CoTD: Efficient Chain-of-Thought Reasoning with Difficulty-Aware CoT-Distillation.
Waheed, Abdul; Mitra, Chancharik; Wang, Laurie

[25] What Makes a Good Generated Image? Studying Human & LLM Image Preference Alignment.
Parthasarathy, Rishab; Collins, Jasmine; Stephenson, Cory
[supplement]

[26] An Evaluation of Interleaved Instruction Tuning on Semantic Reasoning Performance in an Audio MLLM.
Liu, Jiawei; Çoban, Enis; Schevchenko, Zarina; Tang, Hao; Zhu, Zhigang; Mandel, Michael; Devaney, Johanna

[27] When One Modality Sabotages the Others: A Diagnostic Lens on Multimodal Reasoning.
Zhang, Chenyu; Kim, Minsol; Ghorbani, Shohreh; Wu, Jingyao; Maes, Patricia; Liang, Paul

[28] Text-to-Scene with Large Reasoning Models.
Berdoz, Frédéric; Lanzerdörfer, Luca; Tuniga, Nick; Wattenhofer, Roger
[supplement]

[29] ASCII-Bench: A Symbolic Benchmark for Multimodal Structural Reasoning.
Luo, Kerry; Peguero, Joshua; Fu, Michael; Malik, Husnain; Patil, Anvay; Lin, Joyce; Van Overborg, Megan; Sarmiento, Ryan; Zhu, Kevin

MAR 2025 Venue

San Diego Convention Center, San Diego, CA, USA

MAR 2025 will be held in the Upper Level Room 11AB at San Diego Convention Center, San Diego, CA, USA on December 7, 2025.

Sponsor

Organizers

[Contact Email: smart101@googlegroups.com]

Program Committee

Abdul Waheed	Carnegie Mellon University
Antonia Wüst	TU Darmstadt
Artemis Panagopoulou	University of Pennsylvania
Christoph Boeddeker	Paderborn University
Danrui Li	Rutgers University
Devesh Jha	Mitsubishi Electric Research Laboratories
Didi Zhu	Zhejiang University
Frédéric Berdoz	ETH Zurich
Fuwen Luo	Tsinghua University
Gordon Wichern	Mitsubishi Electric Research Laboratories
Hao-Lun Hsu	Duke University
Haomeng Zhang	Purdue University
Haoquan Zhang	CUHK
Ibraheem Muhammad Moosa	Penn State University
Jiahao Zhang	Australian National University
Jing Liu	Mitsubishi Electric Research Laboratories
Kanchana Ranasinghe	Stony Brook University
Kibum Kim	KAIST
Kobe Knowles	University of Auckland
Loka Li	Mohamed bin Zayed University of Artificial Intelligence
Md Masudur Rahman	Purdue University
Md Tanvirul Alam	Rochester Institute of Technology
Minjoon Jung	Seoul National University
Mir Rayat Imtiaz Hossain	University of British Columbia
Mohammad Shahab Sepehri	University of Southern California
Moitreya Chatterjee	Mitsubishi Electric Research Laboratories
Qi Cao	UC San Diego
Qinhong Zhou	Tsinghua University
Rachneet Kaur	J.P. Morgan AI Research
Shanka Subhra Mondal	Princeton University
Shijie Wang	Brown University
Siddarth Jain	Mitsubishi Electric Research Laboratories
Sina Rismanchian	University of California, Irvine
Soumya Banerjee	University of Cambridge
Tim Marks	Mitsubishi Electric Research Laboratories
Vanya Cohen	The University of Texas at Austin
Weitai Kang	University of Illinois Chicago
Yada Pruksachatkun	Salesforce Research
Ye Wang	Mitsubishi Electric Research Laboratories
Yifan Jiang	Southern University of Science and Technology
Yiqiao Jin	Georgia Institute of Technology
Yoshiki Masuyama	Mitsubishi Electric Research Laboratories
Yu Zhou	UCLA
Yunkee Chae	Seoul National University
Yuyou Zhang	Carnegie Mellon University
Zhengye Yang	Rensselaer Polytechnic Institute
Zhifeng Kong	NVIDIA
Zihao Lin	UC Davis
Ziyang Wang	University of North Carolina at Chapel Hill
Ziyue Wang	Tsinghua University

Multimodal Algorithmic Reasoning

(MAR)

About MAR 2025

Where

When

Keynote Speakers

MAR 2025 Schedule

Opening Remarks Anoop Cherian

Keynote Noah Goodman

Keynote Lindsey Li

Oral Paper Presentation Tin Nguyen et al.

Break

Keynote Max Tegmark

Keynote Subbarao Kambhampati

Oral Paper Presentation Mingyuan Wu et al.

Lunch Break

Oral Paper Presentation Antonia Wüst et al.

Oral Paper Presentation Rachneet Kaur et al.

Keynote Kristen Grauman

Keynote Yu Cheng

Coffee Break

Keynote Alexander Toshev

Closing Remarks Kuan-Chuan Peng

Submissions

Call for Contributions

Important Dates

Topics

Submission Instructions

Contact

Accepted Papers

Oral Papers

Spotlight Papers

Poster Papers

MAR 2025 Venue

San Diego Convention Center, San Diego, CA, USA

Sponsor

Organizers

Program Committee