About MAR 2025
In this workshop, we plan to gather researchers working in neural algorithmic learning, multimodal reasoning, and cognitive models of intelligence to showcase their cutting-edge research, discuss the latest challenges, as well as bring to the forefront problems in perception and language modeling that are often overlooked but are pivotal in achieving true artificial general intelligence. An emphasis of this workshop is on the emerging topic of multimodal algorithmic reasoning, where a reasoning agent is required to automatically deduce new algorithms/procedures for solving real-world tasks, e.g., algorithms that use multimodal foundational models for analysis, synthesis, and planning, new approaches towards solving challenging vision-and-language mathematical (Olympiad type) reasoning problems, deriving winning strategies in multimodal games, procedures for using tools in robotic manipulation, etc. We hope to deep dive into this exciting topic at the intersection of multimodal learning and cognitive science to understand what we have achieved thus far in machine intelligence and what we are lacking in relation to the human way of thinking -- through talks from outstanding researchers and faculty that could inspire the audience to search for the missing rungs on the ladder to true intelligence.
Where
Upper Level Room 11AB, San Diego Convention Center, San Diego, CA, USA
When
December 7, 2025
Keynote Speakers
[More info about keynote speakers will be updated here]

Subbarao Kambhampati
Arizona State University

Yu Cheng
Chinese University of Hong Kong

Noah Goodman
Stanford University

Max Tegmark
MIT

Kristen Grauman
University of Texas at Austin
MAR 2025 Schedule
[Tentative; subject to change]

Opening Remarks Anoop Cherian

Keynote Noah Goodman
TBD.

Keynote TBD
TBD.
Oral Paper Presentation Tin Nguyen et al.
HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs.
Spotlight Paper Presentation
Md Tanvirul Alam et al., Sphinx: Visual Perception and Reasoning Gym.
Claas Beger et al., Investigating Abstraction Capabilities of the o3 Model Using Textual and Visual Modalities.
Yamei Chen et al., Symbolic Graphics Programming with Large Language Models.
Qi Cao et al., DreamPRM: Domain-Reweighted Process Reward Model for Multimodal Reasoning.
Break

Keynote Max Tegmark
TBD.

Keynote Subbarao Kambhampati
TBD.
Oral Paper Presentation Mingyuan Wu et al.
Aha Moment Revisited: Are Vision Language Models Truly Capable of Self-verification in Inference Scaling?
Lunch Break
Oral Paper Presentation Antonia Wüst et al.
Learning Visual Concepts via Vision Language Programs.
Oral Paper Presentation Rachneet Kaur et al.
ChartAgent: A Multimodal Agent for Complex Visual Question Answering in Charts.
Spotlight Paper Presentation
Fuwen Luo et al., MUSEG: Reinforcing Video Temporal Understanding via Timestamp-Aware Multi-Segment Grounding.
Kibum Kim et al., Data Scaling Isn't Enough: Towards Improving Compositional Reasoning in Video-Language Models.
Zihao Lin et al., MLPEdit-Bench: Benchmarking Reasoning-Based Layer-wise Poster Editing.

Keynote Kristen Grauman
TBD.

Keynote Yu Cheng
TBD.
Coffee Break

Keynote TBD
TBD.

Closing Remarks Kuan-Chuan Peng
Poster Session
All the accepted papers.
Call for Contributions
Large AI frameworks have been increasing in their data modeling abilities at an ever more vigor in recent times, with compelling applications emerging frequently, many of which may even appear to challenge human intelligence. Yet despite such impressive performance, there remain open questions about whether these models include the foundations of general intelligence, or whether they perform these tasks without human-like understanding. This necessitates development of better tools for assessing these models in tandem with developing the models themselves.
This workshop focuses on the topic of multimodal algorithmic reasoning, where an agent needs to assimilate information from multiple modalities towards deriving reasoning algorithms for complex problem solving. Some real-world examples of such problems include: i) chain-of-thought reasoning using multiple modalities, ii) solving Olympiad-type vision-and-language problems, and iii) distributed agentic reasoning and tool use, among others. Previous editions of this workshop emphasized the challenges in building generalizable AI towards solving vision-and-language problems. However, in the last year, we have seen rapid advances in AI capabilities that better bridge across modalities, bringing both optimism about superhuman capabilities and skepticism about the limits of current approaches. This is an opportune moment to explore critical challenges, including new architectures for visual reasoning, data generation via self-play, and the theoretical limits of reasoning in large models. Through talks from outstanding researchers and faculty, we hope to dive deep into this exciting topic at the intersection of theory, multimodal machine learning and cognitive science to understand what we have achieved thus far in machine intelligence and what we are lacking in relation to the human way of thinking, towards finding the missing rungs on the ladder to truly intelligent reasoning.
Important Dates
Paper submission deadline: August 31, 2025 (AoE time).Notification to authors: September 22, 2025.
Camera-ready deadline: 11:59PM on November 3, 2025 (Pacific Time).
Topics
We invite submissions of original and high-quality research papers in the topics related to multimodal algorithmic reasoning. The topics for MAR-NeurIPS 2025 include, but are not limited to:- Multimodal algorithmic and mathematical reasoning.
- Representations of algorithms for neural processing.
- Comparisons between AI and human problem solving, including: i) perspectives from psychology and neuroscience, ii) children’s cognitive development, and iii) limits of reasoning in large models.
- Extreme generalization to new tasks and few-shot concept induction.
- Shortcomings in AI models.
- Agentic AI, including multi-agent collaboration and distributed problem solving.
- Scaling laws and efficient algorithms for improving reasoning at test-time.
- Foundation models of intelligence, including vision, language, and other modalities.
- Physical reasoning and planning using language models.
- Multimodal AI applications, including new tasks, datasets, benchmarks, and models for multimodal reasoning.
Submission Instructions
We are inviting submissions of original and previously unpublished works.- All submissions are handled via the workshop’s CMT website.
- Submissions should be made in PDF format and must follow the MAR 2025@NeurIPS submission style provided here (except for the NeurIPS checklist, which is optional).
- Submissions should not exceed 4 pages in length (excluding references).
- Authors may upload an optional Appendix, containing additional details, proofs, videos, images, etc. in a separate zip file (with a max of 50MB in size); the deadline for submitting these supplementary materials is the same as that for the main paper.
- All submissions should maintain author anonymity and should abide by the NeurIPS conference guidelines for double-blind review.
- Accepted papers will be presented as either an oral, spotlight, or poster presentation. At least one author of each accepted submission must present the paper at the workshop in-person.
- Presentation of accepted papers at our workshop will follow the same policy as that for accepted papers at the NeurIPS 2025 main conference.
- Accepted papers will be made publicly accessible on the workshop website shortly after the camera-ready deadline, but will not have any archival proceedings.
- The submitting authors are expected to also be reviewers for the workshop, if needed.
Contact
Email: smart101@googlegroups.comAccepted Papers
All the accepted papers will be presented in the poster session. The number in front of each paper is the poster number.
Oral Papers
- [1] HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs.
Nguyen, Tin; Nguyen, Anh - [2] Aha Moment Revisited: Are Vision Language Models Truly Capable of Self-verification in Inference Scaling?
Wu, Mingyuan; Li, Meitang; Yang, Jingcheng; Jiang, Jize; Yan, Kaizhuo; Li, Zhaoheng; Yu, Hanchao; Zhang, Minjia; Nahrstedt, Klara - [3] Learning Visual Concepts via Vision Language Programs.
Wüst, Antonia; Shindo, Hikaru; Stammer, Wolfgang; Dhami, Devendra Singh; Kersting, Kristian - [4] ChartAgent: A Multimodal Agent for Complex Visual Question Answering in Charts.
Kaur, Rachneet; Srishankar, Nishan; Zeng, Zhen; Ganesh, Sumitra; Veloso, Manuela
Spotlight Papers
- [5] Sphinx: Visual Perception and Reasoning Gym.
Alam, Md Tanvirul; Chae, Justin; Rastogi, Nidhi - [6] Investigating Abstraction Capabilities of the o3 Model Using Textual and Visual Modalities.
Beger, Claas; Fu, Shuhao; Yi, Ryan; Moskvichev, Arseny; Mitchell, Melanie - [7] Symbolic Graphics Programming with Large Language Models.
Chen, Yamei; Zhang, Haoquan; Huang, Yangyi; Qiu, Zeju; Zhang, Kaipeng; Wen, Yandong; Liu, Weiyang - [8] DreamPRM: Domain-Reweighted Process Reward Model for Multimodal Reasoning.
Cao, Qi; Wang, Ruiyi; Zhang, Ruiyi; Xie, Pengtao - [9] MUSEG: Reinforcing Video Temporal Understanding via Timestamp-Aware Multi-Segment Grounding.
Luo, Fuwen; Lou, Shengfeng; Chen, Chi; Wang, Ziyue; Li, Chenliang; Shen, Weizhou; Guo, Jiyue; Li, Peng; Yan, Ming; Huang, Fei; Liu, Yang - [10] Data Scaling Isn't Enough: Towards Improving Compositional Reasoning in Video-Language Models.
Kim, Kibum; Min, Kyle; Park, Chanyoung - [11] MLPEdit-Bench: Benchmarking Reasoning-Based Layer-wise Poster Editing.
Lin, Zihao; Zhu, Wanrong; Gu, Jiuxiang; Kil, Jihyung; Tensmeyer, Chris; Zhang, Ruiyi; Huang, Lifu; Morariu, Vlad ; Sun, Tong
Poster Papers
- [12] Online Reinforcement Learning for Autoformalization.
Sorg, Simon; Li, Wenda; Banerjee, Soumya - [13] DEPART: Hierarchical Multi-Agent System for Multi-Turn Interaction.
Hsu, Hao-Lun; Xu, Jing; Vichare, Nikhil; Carbone, Francesco; Pajic, Miroslav; Carenini, Giuseppe - [14] Watch Wider and Think Deeper: Collaborative Cross-modal Chain-of-Thought for Complex Visual Reasoning.
Lu, Wenting; Zhu, Didi; Shen, Tao; Zhu, Donglin; Ye, Ayong; Wu, Chao - [15] Exploring Ego-Exo View-Invariant Temporal Understanding in Video LLMs.
Jung, Minjoon; Xiao, Junbin; Kim, Junghyun; Zhang, Byoung-Tak; Yao, Angela - [16] Audio Flamingo Sound-CoT: Improving Chain-of-Thought Reasoning in Sound Understanding.
Kong, Zhifeng; Goel, Arushi; Santos, João Felipe; Ghosh, Sreyan; Valle, Rafael; Ping, Wei; Catanzaro, Bryan - [17] Mitigating Hallucination in Multimodal LLMs with Layer Contrastive Decoding.
Tong, Bingkui; Xia, Jiaer; Zhou, Kaiyang - [18] Improvisational Reasoning with Vision-Language Models for Grounded Procedural Planning.
Rahman, Md Masudur; Zhuo, Yupeng; Wachs, Juan - [19] Do LLMs Benefit from User and Item Embeddings in Recommendation Tasks?
Hossain, Mir Rayat Imtiaz; Feng, Leo ; Sigal, Leonid; Ahmed, Mohamed Osama - [20] MET-Bench: Multimodal Entity Tracking for Evaluating the Limitations of Vision-Language and Reasoning Models.
Cohen, Vanya; Mooney, Raymond - [21] PersonaX: Multimodal Datasets with LLM-Inferred Behavior Traits.
Li, Loka; Wong, Yu Kang; Fu, Minghao; Chen, Guangyi; Chen, Zhenhao; Luo, Gongxu; Sun, Yuewen; Khan, Salman; Spirtes, Peter; Zhang, Kun - [22] Visual Abstract Thinking Empowers Multimodal Reasoning.
Liu, Dairu; Wang, Ziyue; Ruan, Minyuan; Luo, Fuwen; Chen, Chi; Li, Peng; Liu, Yang - [23] SlideAgent: Hierarchical Agentic Framework for Multi-Page Slide Deck Understanding.
Jin, Yiqiao; Kaur, Rachneet; Zeng, Zhen; Ganesh, Sumitra - [24] DA-CoTD: Efficient Chain-of-Thought Reasoning with Difficulty-Aware CoT-Distillation.
Waheed, Abdul; Mitra, Chancharik; Wang, Laurie - [25] What Makes a Good Generated Image? Studying Human & LLM Image Preference Alignment.
Parthasarathy, Rishab; Collins, Jasmine; Stephenson, Cory - [26] An Evaluation of Interleaved Instruction Tuning on Semantic Reasoning Performance in an Audio MLLM.
Liu, Jiawei; Çoban, Enis; Schevchenko, Zarina; Tang, Hao; Zhu, Zhigang; Mandel, Michael; Devaney, Johanna - [27] When One Modality Sabotages the Others: A Diagnostic Lens on Multimodal Reasoning.
Zhang, Chenyu; Kim, Minsol; Ghorbani, Shohreh; Wu, Jingyao; Maes, Patricia; Liang, Paul - [28] Text-to-Scene with Large Reasoning Models.
Berdoz, Frédéric; Lanzerdörfer, Luca; Tuniga, Nick; Wattenhofer, Roger - [29] ASCII-Bench: A Symbolic Benchmark for Multimodal Structural Reasoning.
Luo, Kerry; Peguero, Joshua; Fu, Michael; Malik, Husnain; Patil, Anvay; Lin, Joyce; Van Overborg, Megan; Sarmiento, Ryan; Zhu, Kevin
MAR 2025 Venue
San Diego Convention Center, San Diego, CA, USA
MAR 2025 will be held in the Upper Level Room 11AB at San Diego Convention Center, San Diego, CA, USA on December 7, 2025.
Organizers
[Contact Email: smart101@googlegroups.com]

Anoop Cherian
Mitsubishi Electric Research Laboratories (MERL)

Kuan-Chuan Peng
Mitsubishi Electric Research Laboratories (MERL)

Suhas Lohit
Mitsubishi Electric Research Laboratories (MERL)

Honglu Zhou
Salesforce AI Research

Kevin Smith
MIT
Program Committee
Abdul Waheed | Carnegie Mellon University |
Antonia Wüst | TU Darmstadt |
Artemis Panagopoulou | University of Pennsylvania |
Christoph Boeddeker | Paderborn University |
Danrui Li | Rutgers University |
Devesh Jha | Mitsubishi Electric Research Laboratories |
Didi Zhu | Zhejiang University |
Frédéric Berdoz | ETH Zurich |
Fuwen Luo | Tsinghua University |
Gordon Wichern | Mitsubishi Electric Research Laboratories |
Hao-Lun Hsu | Duke University |
Haomeng Zhang | Purdue University |
Haoquan Zhang | CUHK |
Ibraheem Muhammad Moosa | Penn State University |
Jiahao Zhang | Australian National University |
Jing Liu | Mitsubishi Electric Research Laboratories |
Kanchana Ranasinghe | Stony Brook University |
Kibum Kim | KAIST |
Kobe Knowles | University of Auckland |
Loka Li | Mohamed bin Zayed University of Artificial Intelligence |
Md Masudur Rahman | Purdue University |
Md Tanvirul Alam | Rochester Institute of Technology |
Minjoon Jung | Seoul National University |
Mir Rayat Imtiaz Hossain | University of British Columbia |
Mohammad Shahab Sepehri | University of Southern California |
Moitreya Chatterjee | Mitsubishi Electric Research Laboratories |
Qi Cao | UC San Diego |
Qinhong Zhou | Tsinghua University |
Rachneet Kaur | J.P. Morgan AI Research |
Shanka Subhra Mondal | Princeton University |
Shijie Wang | Brown University |
Siddarth Jain | Mitsubishi Electric Research Laboratories |
Sina Rismanchian | University of California, Irvine |
Soumya Banerjee | University of Cambridge |
Tim Marks | Mitsubishi Electric Research Laboratories |
Vanya Cohen | The University of Texas at Austin |
Weitai Kang | University of Illinois Chicago |
Yada Pruksachatkun | Salesforce Research |
Ye Wang | Mitsubishi Electric Research Laboratories |
Yifan Jiang | Southern University of Science and Technology |
Yiqiao Jin | Georgia Institute of Technology |
Yoshiki Masuyama | Mitsubishi Electric Research Laboratories |
Yu Zhou | UCLA |
Yunkee Chae | Seoul National University |
Yuyou Zhang | Carnegie Mellon University |
Zhengye Yang | Rensselaer Polytechnic Institute |
Zhifeng Kong | NVIDIA |
Zihao Lin | UC Davis |
Ziyang Wang | University of North Carolina at Chapel Hill |
Ziyue Wang | Tsinghua University |