Multimodal Algorithmic Reasoning

(MAR)

In Conjunction with the Conference on Neural Information Processing Systems 2025

Upper Level Room 11AB, San Diego Convention Center, San Diego, CA, USA

December 7, 2025

About MAR 2025

About MAR 2025

In this workshop, we plan to gather researchers working in neural algorithmic learning, multimodal reasoning, and cognitive models of intelligence to showcase their cutting-edge research, discuss the latest challenges, as well as bring to the forefront problems in perception and language modeling that are often overlooked but are pivotal in achieving true artificial general intelligence. An emphasis of this workshop is on the emerging topic of multimodal algorithmic reasoning, where a reasoning agent is required to automatically deduce new algorithms/procedures for solving real-world tasks, e.g., algorithms that use multimodal foundational models for analysis, synthesis, and planning, new approaches towards solving challenging vision-and-language mathematical (Olympiad type) reasoning problems, deriving winning strategies in multimodal games, procedures for using tools in robotic manipulation, etc. We hope to deep dive into this exciting topic at the intersection of multimodal learning and cognitive science to understand what we have achieved thus far in machine intelligence and what we are lacking in relation to the human way of thinking -- through talks from outstanding researchers and faculty that could inspire the audience to search for the missing rungs on the ladder to true intelligence.

Where

Upper Level Room 11AB, San Diego Convention Center, San Diego, CA, USA

When

December 7, 2025

Keynote Speakers

[More info about keynote speakers will be updated here]

Subbarao Kambhampati

Subbarao Kambhampati

Arizona State University

Yu Cheng

Yu Cheng

Chinese University of Hong Kong

Noah Goodman

Noah Goodman

Stanford University

Kristen Grauman

Kristen Grauman

University of Texas at Austin

MAR 2025 Schedule

[Tentative; subject to change]

Anoop Cherian

Opening Remarks Anoop Cherian

Noah Goodman

Keynote Noah Goodman

TBD.

TBD

Keynote TBD

TBD.

Oral Paper Presentation Tin Nguyen et al.

HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs.

Spotlight Paper Presentation

Md Tanvirul Alam et al., Sphinx: Visual Perception and Reasoning Gym.

Claas Beger et al., Investigating Abstraction Capabilities of the o3 Model Using Textual and Visual Modalities.

Yamei Chen et al., Symbolic Graphics Programming with Large Language Models.

Qi Cao et al., DreamPRM: Domain-Reweighted Process Reward Model for Multimodal Reasoning.

Break

Max Tegmark

Keynote Max Tegmark

TBD.

Subbarao Kambhampati

Keynote Subbarao Kambhampati

TBD.

Oral Paper Presentation Mingyuan Wu et al.

Aha Moment Revisited: Are Vision Language Models Truly Capable of Self-verification in Inference Scaling?

Lunch Break

Oral Paper Presentation Antonia Wüst et al.

Learning Visual Concepts via Vision Language Programs.

Oral Paper Presentation Rachneet Kaur et al.

ChartAgent: A Multimodal Agent for Complex Visual Question Answering in Charts.

Spotlight Paper Presentation

Fuwen Luo et al., MUSEG: Reinforcing Video Temporal Understanding via Timestamp-Aware Multi-Segment Grounding.

Kibum Kim et al., Data Scaling Isn't Enough: Towards Improving Compositional Reasoning in Video-Language Models.

Zihao Lin et al., MLPEdit-Bench: Benchmarking Reasoning-Based Layer-wise Poster Editing.

Kristen Grauman

Keynote Kristen Grauman

TBD.

Yu Cheng

Keynote Yu Cheng

TBD.

Coffee Break

TBD

Keynote TBD

TBD.

Kuan-Chuan Peng

Closing Remarks Kuan-Chuan Peng

Poster Session

All the accepted papers.

Call for Contributions

Large AI frameworks have been increasing in their data modeling abilities at an ever more vigor in recent times, with compelling applications emerging frequently, many of which may even appear to challenge human intelligence. Yet despite such impressive performance, there remain open questions about whether these models include the foundations of general intelligence, or whether they perform these tasks without human-like understanding. This necessitates development of better tools for assessing these models in tandem with developing the models themselves.

This workshop focuses on the topic of multimodal algorithmic reasoning, where an agent needs to assimilate information from multiple modalities towards deriving reasoning algorithms for complex problem solving. Some real-world examples of such problems include: i) chain-of-thought reasoning using multiple modalities, ii) solving Olympiad-type vision-and-language problems, and iii) distributed agentic reasoning and tool use, among others. Previous editions of this workshop emphasized the challenges in building generalizable AI towards solving vision-and-language problems. However, in the last year, we have seen rapid advances in AI capabilities that better bridge across modalities, bringing both optimism about superhuman capabilities and skepticism about the limits of current approaches. This is an opportune moment to explore critical challenges, including new architectures for visual reasoning, data generation via self-play, and the theoretical limits of reasoning in large models. Through talks from outstanding researchers and faculty, we hope to dive deep into this exciting topic at the intersection of theory, multimodal machine learning and cognitive science to understand what we have achieved thus far in machine intelligence and what we are lacking in relation to the human way of thinking, towards finding the missing rungs on the ladder to truly intelligent reasoning.


Important Dates

Paper submission deadline: August 31, 2025 (AoE time). 
Notification to authors: September 22, 2025.
Camera-ready deadline: 11:59PM on November 3, 2025 (Pacific Time).


Topics

We invite submissions of original and high-quality research papers in the topics related to multimodal algorithmic reasoning. The topics for MAR-NeurIPS 2025 include, but are not limited to:
  • Multimodal algorithmic and mathematical reasoning.
  • Representations of algorithms for neural processing.
  • Comparisons between AI and human problem solving, including: i) perspectives from psychology and neuroscience, ii) children’s cognitive development, and iii) limits of reasoning in large models.
  • Extreme generalization to new tasks and few-shot concept induction.
  • Shortcomings in AI models.
  • Agentic AI, including multi-agent collaboration and distributed problem solving.
  • Scaling laws and efficient algorithms for improving reasoning at test-time.
  • Foundation models of intelligence, including vision, language, and other modalities.
  • Physical reasoning and planning using language models.
  • Multimodal AI applications, including new tasks, datasets, benchmarks, and models for multimodal reasoning.

Submission Instructions

We are inviting submissions of original and previously unpublished works.
  • All submissions are handled via the workshop’s CMT website.
  • Submissions should be made in PDF format and must follow the MAR 2025@NeurIPS submission style provided here (except for the NeurIPS checklist, which is optional).
  • Submissions should not exceed 4 pages in length (excluding references).
  • Authors may upload an optional Appendix, containing additional details, proofs, videos, images, etc. in a separate zip file (with a max of 50MB in size); the deadline for submitting these supplementary materials is the same as that for the main paper.
  • All submissions should maintain author anonymity and should abide by the NeurIPS conference guidelines for double-blind review.
  • Accepted papers will be presented as either an oral, spotlight, or poster presentation. At least one author of each accepted submission must present the paper at the workshop in-person.
  • Presentation of accepted papers at our workshop will follow the same policy as that for accepted papers at the NeurIPS 2025 main conference.
  • Accepted papers will be made publicly accessible on the workshop website shortly after the camera-ready deadline, but will not have any archival proceedings.
  • The submitting authors are expected to also be reviewers for the workshop, if needed.

Contact

Email: smart101@googlegroups.com




Accepted Papers

All the accepted papers will be presented in the poster session. The number in front of each paper is the poster number.


Oral Papers

  • [1] HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs.
    Nguyen, Tin; Nguyen, Anh

  • [2] Aha Moment Revisited: Are Vision Language Models Truly Capable of Self-verification in Inference Scaling?
    Wu, Mingyuan; Li, Meitang; Yang, Jingcheng; Jiang, Jize; Yan, Kaizhuo; Li, Zhaoheng; Yu, Hanchao; Zhang, Minjia; Nahrstedt, Klara

  • [3] Learning Visual Concepts via Vision Language Programs.
    Wüst, Antonia; Shindo, Hikaru; Stammer, Wolfgang; Dhami, Devendra Singh; Kersting, Kristian

  • [4] ChartAgent: A Multimodal Agent for Complex Visual Question Answering in Charts.
    Kaur, Rachneet; Srishankar, Nishan; Zeng, Zhen; Ganesh, Sumitra; Veloso, Manuela


Spotlight Papers

  • [5] Sphinx: Visual Perception and Reasoning Gym.
    Alam, Md Tanvirul; Chae, Justin; Rastogi, Nidhi

  • [6] Investigating Abstraction Capabilities of the o3 Model Using Textual and Visual Modalities.
    Beger, Claas; Fu, Shuhao; Yi, Ryan; Moskvichev, Arseny; Mitchell, Melanie

  • [7] Symbolic Graphics Programming with Large Language Models.
    Chen, Yamei; Zhang, Haoquan; Huang, Yangyi; Qiu, Zeju; Zhang, Kaipeng; Wen, Yandong; Liu, Weiyang

  • [8] DreamPRM: Domain-Reweighted Process Reward Model for Multimodal Reasoning.
    Cao, Qi; Wang, Ruiyi; Zhang, Ruiyi; Xie, Pengtao

  • [9] MUSEG: Reinforcing Video Temporal Understanding via Timestamp-Aware Multi-Segment Grounding.
    Luo, Fuwen; Lou, Shengfeng; Chen, Chi; Wang, Ziyue; Li, Chenliang; Shen, Weizhou; Guo, Jiyue; Li, Peng; Yan, Ming; Huang, Fei; Liu, Yang

  • [10] Data Scaling Isn't Enough: Towards Improving Compositional Reasoning in Video-Language Models.
    Kim, Kibum; Min, Kyle; Park, Chanyoung

  • [11] MLPEdit-Bench: Benchmarking Reasoning-Based Layer-wise Poster Editing.
    Lin, Zihao; Zhu, Wanrong; Gu, Jiuxiang; Kil, Jihyung; Tensmeyer, Chris; Zhang, Ruiyi; Huang, Lifu; Morariu, Vlad ; Sun, Tong


Poster Papers

  • [12] Online Reinforcement Learning for Autoformalization.
    Sorg, Simon; Li, Wenda; Banerjee, Soumya

  • [13] DEPART: Hierarchical Multi-Agent System for Multi-Turn Interaction.
    Hsu, Hao-Lun; Xu, Jing; Vichare, Nikhil; Carbone, Francesco; Pajic, Miroslav; Carenini, Giuseppe

  • [14] Watch Wider and Think Deeper: Collaborative Cross-modal Chain-of-Thought for Complex Visual Reasoning.
    Lu, Wenting; Zhu, Didi; Shen, Tao; Zhu, Donglin; Ye, Ayong; Wu, Chao

  • [15] Exploring Ego-Exo View-Invariant Temporal Understanding in Video LLMs.
    Jung, Minjoon; Xiao, Junbin; Kim, Junghyun; Zhang, Byoung-Tak; Yao, Angela

  • [16] Audio Flamingo Sound-CoT: Improving Chain-of-Thought Reasoning in Sound Understanding.
    Kong, Zhifeng; Goel, Arushi; Santos, João Felipe; Ghosh, Sreyan; Valle, Rafael; Ping, Wei; Catanzaro, Bryan

  • [17] Mitigating Hallucination in Multimodal LLMs with Layer Contrastive Decoding.
    Tong, Bingkui; Xia, Jiaer; Zhou, Kaiyang

  • [18] Improvisational Reasoning with Vision-Language Models for Grounded Procedural Planning.
    Rahman, Md Masudur; Zhuo, Yupeng; Wachs, Juan

  • [19] Do LLMs Benefit from User and Item Embeddings in Recommendation Tasks?
    Hossain, Mir Rayat Imtiaz; Feng, Leo ; Sigal, Leonid; Ahmed, Mohamed Osama

  • [20] MET-Bench: Multimodal Entity Tracking for Evaluating the Limitations of Vision-Language and Reasoning Models.
    Cohen, Vanya; Mooney, Raymond

  • [21] PersonaX: Multimodal Datasets with LLM-Inferred Behavior Traits.
    Li, Loka; Wong, Yu Kang; Fu, Minghao; Chen, Guangyi; Chen, Zhenhao; Luo, Gongxu; Sun, Yuewen; Khan, Salman; Spirtes, Peter; Zhang, Kun

  • [22] Visual Abstract Thinking Empowers Multimodal Reasoning.
    Liu, Dairu; Wang, Ziyue; Ruan, Minyuan; Luo, Fuwen; Chen, Chi; Li, Peng; Liu, Yang

  • [23] SlideAgent: Hierarchical Agentic Framework for Multi-Page Slide Deck Understanding.
    Jin, Yiqiao; Kaur, Rachneet; Zeng, Zhen; Ganesh, Sumitra

  • [24] DA-CoTD: Efficient Chain-of-Thought Reasoning with Difficulty-Aware CoT-Distillation.
    Waheed, Abdul; Mitra, Chancharik; Wang, Laurie

  • [25] What Makes a Good Generated Image? Studying Human & LLM Image Preference Alignment.
    Parthasarathy, Rishab; Collins, Jasmine; Stephenson, Cory

  • [26] An Evaluation of Interleaved Instruction Tuning on Semantic Reasoning Performance in an Audio MLLM.
    Liu, Jiawei; Çoban, Enis; Schevchenko, Zarina; Tang, Hao; Zhu, Zhigang; Mandel, Michael; Devaney, Johanna

  • [27] When One Modality Sabotages the Others: A Diagnostic Lens on Multimodal Reasoning.
    Zhang, Chenyu; Kim, Minsol; Ghorbani, Shohreh; Wu, Jingyao; Maes, Patricia; Liang, Paul

  • [28] Text-to-Scene with Large Reasoning Models.
    Berdoz, Frédéric; Lanzerdörfer, Luca; Tuniga, Nick; Wattenhofer, Roger

  • [29] ASCII-Bench: A Symbolic Benchmark for Multimodal Structural Reasoning.
    Luo, Kerry; Peguero, Joshua; Fu, Michael; Malik, Husnain; Patil, Anvay; Lin, Joyce; Van Overborg, Megan; Sarmiento, Ryan; Zhu, Kevin

MAR 2025 Venue

San Diego Convention Center, San Diego, CA, USA

MAR 2025 will be held in the Upper Level Room 11AB at San Diego Convention Center, San Diego, CA, USA on December 7, 2025.

Sponsor

Organizers

[Contact Email: smart101@googlegroups.com]

Anoop Cherian

Anoop Cherian

Mitsubishi Electric Research Laboratories (MERL)

Kuan-Chuan Peng

Kuan-Chuan Peng

Mitsubishi Electric Research Laboratories (MERL)

Suhas Lohit

Suhas Lohit

Mitsubishi Electric Research Laboratories (MERL)

Honglu Zhou

Honglu Zhou

Salesforce AI Research



Program Committee

Abdul Waheed Carnegie Mellon University
Antonia Wüst TU Darmstadt
Artemis Panagopoulou University of Pennsylvania
Christoph Boeddeker Paderborn University
Danrui Li Rutgers University
Devesh Jha Mitsubishi Electric Research Laboratories
Didi Zhu Zhejiang University
Frédéric Berdoz ETH Zurich
Fuwen Luo Tsinghua University
Gordon Wichern Mitsubishi Electric Research Laboratories
Hao-Lun Hsu Duke University
Haomeng Zhang Purdue University
Haoquan Zhang CUHK
Ibraheem Muhammad Moosa    Penn State University
Jiahao Zhang Australian National University
Jing Liu Mitsubishi Electric Research Laboratories
Kanchana Ranasinghe Stony Brook University
Kibum Kim KAIST
Kobe Knowles University of Auckland
Loka Li Mohamed bin Zayed University of Artificial Intelligence
Md Masudur Rahman Purdue University
Md Tanvirul Alam Rochester Institute of Technology
Minjoon Jung Seoul National University
Mir Rayat Imtiaz Hossain University of British Columbia
Mohammad Shahab Sepehri University of Southern California
Moitreya Chatterjee Mitsubishi Electric Research Laboratories
Qi Cao UC San Diego
Qinhong Zhou Tsinghua University
Rachneet Kaur J.P. Morgan AI Research
Shanka Subhra Mondal Princeton University
Shijie Wang Brown University
Siddarth Jain Mitsubishi Electric Research Laboratories
Sina Rismanchian University of California, Irvine
Soumya Banerjee University of Cambridge
Tim Marks Mitsubishi Electric Research Laboratories
Vanya Cohen The University of Texas at Austin
Weitai Kang University of Illinois Chicago
Yada Pruksachatkun Salesforce Research
Ye Wang Mitsubishi Electric Research Laboratories
Yifan Jiang Southern University of Science and Technology
Yiqiao Jin Georgia Institute of Technology
Yoshiki Masuyama Mitsubishi Electric Research Laboratories
Yu Zhou UCLA
Yunkee Chae Seoul National University
Yuyou Zhang Carnegie Mellon University
Zhengye Yang Rensselaer Polytechnic Institute
Zhifeng Kong NVIDIA
Zihao Lin UC Davis
Ziyang Wang University of North Carolina at Chapel Hill
Ziyue Wang Tsinghua University