MAR 2024 - Multimodal Algorithmic Reasoning

About MAR 2024

In this workshop, we plan to gather researchers working in neural algorithmic learning, multimodal reasoning, and cognitive models of intelligence to showcase their cutting-edge research, discuss the latest challenges, as well as bring to the forefront problems in perception and language modeling that are often overlooked but are pivotal in achieving true artificial general intelligence. An emphasis of this workshop is on the emerging topic of multimodal algorithmic reasoning, where a reasoning agent is required to automatically deduce new algorithms/procedures for solving real-world tasks, e.g., algorithms that use multimodal foundational models for analysis, synthesis, and planning, new approaches towards solving challenging vision-and-language mathematical (Olympiad type) reasoning problems, deriving winning strategies in multimodal games, procedures for using tools in robotic manipulation, etc. We hope to deep dive into this exciting topic at the intersection of multimodal learning and cognitive science to understand what we have achieved thus far in machine intelligence and what we are lacking in relation to the human way of thinking -- through talks from outstanding researchers and faculty that could inspire the audience to search for the missing rungs on the ladder to true intelligence.

Where

West Building Exhibit Hall A, Vancouver Convention Center, Vancouver, BC, Canada

When

8:25 AM - 5:10 PM PST, Sunday, December 15, 2024

Keynote Speakers

[More info about keynote speakers will be updated here]

Joshua B. Tenenbaum

MIT

Ranjay Krishna

University of Washington

Stefanie Jegelka

MIT

Sergey Levine

UC Berkeley

David Duvenaud

University of Toronto

MAR 2024 Schedule

[in Vancouver local time (Pacific Time)]

08:25 AM

Opening Remarks Anoop Cherian

08:30 AM

Keynote Josh Tenenbaum

Scaling Intelligence the Human Way.

09:15 AM

Coffee Break

09:30 AM

Keynote Stefanie Jegelka

Learning Algorithms with GNNs and Transformers.

[slides]

10:15 AM

Oral Paper Presentation Eunice Yiu et al.

KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models.

10:25 AM

Oral Paper Presentation Sullam Jeoung et al.

AVUA: Adaptive Video Understanding Agent Enhancing efficiency with dynamic frame sampling and feedback-driven reasoning.

10:35 AM

Oral Paper Presentation Mikel Bober-Irizar et al.

Neural Networks for Abstraction & Reasoning.

11:00 AM

Keynote Ranjay Krishna

Prioritizing Perception in Multimodal Language Models.

[slides]

11:45 AM

Spotlight Paper Presentation

Zirui Wang et al., CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs.

Mohammadmostafa Rostamkhani et al., Illusory VQA: Benchmarking and Enhancing Multimodal Models on Visual Illusions.

Lluis Castrejon et al., HAMMR: HierArchical MultiModal React agents for generic VQA.

Alexander Taylor et al., Are Large-Language Models Graph Algorithmic Reasoners?

Sina Rismanchian et al., TurtleBench: A Visual Programming Benchmark in Turtle Geometry.

Hammad Ayyubi et al., ENTER: Event Based Interpretable Reasoning for VideoQA.

12:15 PM

Lunch Break

01:30 PM

Keynote Sergey Levine

Training Robots to Think Harder.

[slides]

02:15 PM

Poster Session & Coffee Break

04:15 PM

Keynote David Duvenaud

LLM Posteriors over Functions as A New Output Modality.

[slides]

05:00 PM

Closing Remarks Anoop Cherian

Submissions

[Call for Contributions PDF]

Call for Contributions

Large deep learning based AI frameworks have been increasing in their data modeling abilities at an ever more vigor in recent times, with compelling applications emerging frequently, many of which may even appear to challenge human intelligence. Yet despite such impressive performances, there remain open questions about whether these models include the foundations of general intelligence, or whether they perform these tasks without human-like understanding. This necessitates development of better tools for assessing these models in tandem with developing the models themselves.

In this workshop, we plan to gather researchers working in neural algorithmic learning, multimodal reasoning, and cognitive models of intelligence to showcase their cutting-edge research, discuss the latest challenges, as well as bring to the forefront problems in perception and language modeling that are often overlooked but are pivotal in achieving true artificial general intelligence. An emphasis is on the emerging topic of multimodal algorithmic reasoning, where a reasoning agent is required to automatically deduce new algorithms/procedures for solving real-world tasks, e.g., algorithms that use multimodal foundational models for analysis, synthesis, and planning, new approaches towards solving challenging vision-and-language mathematical (Olympiad type) reasoning problems, deriving winning strategies in multimodal games, procedures for using tools in robotic manipulation, etc. We hope to dive deep into this exciting topic at the intersection of theory, multimodal machine learning architectures, and cognitive science to understand what we have achieved thus far in machine intelligence and what we are lacking in relation to the human way of thinking – through talks from outstanding researchers and faculty that could inspire the audience to search for the missing rungs on the ladder to true intelligence.

Important Dates

Submission deadline (both main paper & (optional) supplementary material): ~~August 30~~ September 12, 2024 (AoE time).
(Optional) Rebuttal starts: September 26, 2024.
(Optional) Rebuttal deadline: September 27, 2024.
Notification to authors: October 9, 2024.
Camera ready deadline: November 1, 2024 (AoE time).

Topics

We invite submissions of high-quality research papers in the topics related to multimodal algorithmic reasoning. The topics for MAR-NeurIPS 2024 include, but are not limited to:

Multimodal cognition and learning.
Multimodal Large language models.
Large language models and Cognition.
Large language models and algorithmic reasoning.
Shortcomings in AI models.
Large language models, neuroscience, and vision.
Multimodal machine cognition and learning.
Foundation models of intelligence, including vision, language, and other modalities.
Artificial general intelligence / general-purpose problem solving architectures.
Neural architectures for solving vision & language or language-based IQ puzzles.
Embodiment and AI.
Functional and algorithmic / procedural learning in vision.
Abstract visual-language reasoning, e.g., using sketches, diagrams, etc.
Perceptual reasoning and decision making.
New vision-and-language abstract reasoning tasks and datasets.
Vision-and-language applications.

Submission Instructions

We are inviting submissions of original and previously unpublished works, shorter versions of published papers at other venues, or shorter versions of papers submitted to the NeurIPS 2024 main conference.

All submissions are handled via the workshop’s CMT website.
Submissions should be made in PDF format and must follow the MAR 2024@NeurIPS submission style provided here.
Original and previously unpublished paper submissions should not exceed 9 pages in length (excluding references).
Resubmission of previously published papers or papers submitted to the main conference must be limited to a maximum of 4 pages in length (excluding references).
Authors may upload an optional Appendix, containing additional details, proofs, videos, images, etc. in a separate zip file (with a max of 50MB in size); the deadline for submitting these supplementary materials is the same as that for the main paper.
All submissions should maintain author anonymity and should abide by the NeurIPS conference guidelines for double-blind review.
Accepted papers will be presented as either an oral, spotlight, or poster presentation. At least one author of each accepted submission must present the paper at the workshop in-person.
Presentation of accepted papers at our workshop will follow the same policy as that for accepted papers at the NeurIPS main conference.
Accepted papers will be made publicly accessible on the workshop website shortly after the camera-ready deadline, but will not have any archival proceedings.

Contact

Email: smart101@googlegroups.com

Accepted Papers

[The number in front of each paper is the poster number]

Oral Papers

[1] AVUA: Adaptive Video Understanding Agent Enhancing efficiency with dynamic frame sampling and feedback-driven reasoning.
Jeoung, Sullam; Huybrechts, Goeric; Valmeekam, Karthik; Ganesh, Bhavana; Galstyan, Aram; Bodapati, Sravan Babu.

[2] KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models.
Yiu, Eunice; Qraitem, Maan; Wong, Charlie; Majhi, Anisa; Bai, Yutong; Ginosar, Shiry; Gopnik, Alison; Saenko, Kate.

[3] Neural Networks for Abstraction & Reasoning.
Bober-Irizar, Mikel; Banerjee, Soumya.
[supplement]

Spotlight Papers

[4] CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs.
Wang, Zirui; Xia, Mengzhou; He, Luxi; Chen, Howard; Liu, Yitao; Zhu, Richard; Liang, Kaiqu; Wu, Xindi; Liu, Haotian; Malladi, Sadhika; Chevalier, Alexis; Arora, Sanjeev; Chen, Danqi.

[5] Illusory VQA: Benchmarking and Enhancing Multimodal Models on Visual Illusions.
Rostamkhani, Mohammadmostafa; Ansari, Baktash; Sabzevari, Hoorieh; Rahmani, Farzan; Eetemadi, Sauleh.

[6] HAMMR: HierArchical MultiModal React agents for generic VQA.
Castrejon, Lluis; Mensink, Thomas; Zhou, Howard; Ferrari, Vittorio; Araujo, Andre; Uijlings, Jasper.

[7] Are Large-Language Models Graph Algorithmic Reasoners?
Taylor, Alexander; Cuturrufo, Anthony; Yathish, Vishal; Ma, Mingyu Derek; Wang, Wei.

[8] TurtleBench: A Visual Programming Benchmark in Turtle Geometry.
Rismanchian, Sina; Razeghi, Yasaman; Singh, Sameer; Doroudi, Shayan.

[9] ENTER: Event Based Interpretable Reasoning for VideoQA.
Ayyubi, Hammad; Liu, Junzhang; Wang, Zhecan; Alomari, Hani; Tang, Chia-Wei; Asgarov, Ali; Atabuzzaman, Md.; Sarker, Najibul H; Hakim, Zaber; Chang, Shih-Fu; Thomas, Chris.
[supplement]

Poster Papers

[10] Contrasting with Symile: Simple Model-Agnostic Representation Learning for Unlimited Modalities.
Saporta, Adriel; Puli, Aahlad; Goldstein, Mark; Ranganath, Rajesh.

[11] Smart Vision-Language Reasoners.
Olteanu Roberts, Denisa A; Roberts, Lucas R.

[12] Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding.
Sun, Shenghuan; Schubert, Alexander; Goldgof, Gregory; Sun, Zhiqing; Hartvigsen, Thomas; Butte, Atul; Alaa, Ahmed.
[supplement]

[13] Investigating Prompting Techniques for Zero- and Few-Shot Visual Question Answering.
Awal, Md Rabiul; Zhang, Le; Agrawal, Aishwarya.

[14] ViLAaD: Enhancing Attracting-and-Dispersing Source-Free Domain Adaptation with Vision-and-Language Model.
Tarashima, Shuhei; Shu, Xinqi; Tagawa, Norio.

[15] Chitrarth: Bridging Vision and Language for a Billion People.
Khan, Shaharukh; Tarun, Ayush; Ravi, Abhinav; Faraz, Ali; Pokala, Praveen Kumar; Bhangare, Anagha; Kolla, Raja; Khatri, Chandra; Agarwal, Shubham.

[16] LVM-Net: Efficient Long-Form Video Reasoning.
Gurukar, Saket; Kadav, Asim.

[17] Vision-LLMs Can Fool Themselves with Self-Generated Text.
Qraitem, Maan; Tasnim , Nazia ; Teterwak, Piotr; Saenko, Kate; Plummer, Bryan.

[18] LLAVIDAL: Benchmarking Large LAnguage VIsion Models for Daily Activities of Living.
Chakraborty, Rajatsubhra; Sinha, Arkaprava; Reilly, Dominick; Govind, Manish; Wang, Pu; Bremond, Francois; Das, Srijan.

MAR 2024 Venue

Vancouver Convention Center, Vancouver, BC, Canada

MAR 2024 will be held at the West Building Exhibit Hall A, Vancouver Convention Center, Vancouver, BC, Canada at 8:25 AM - 5:10 PM PST on Sunday, December 15, 2024.

Sponsor

Organizers

[Contact Email: smart101@googlegroups.com]

Program Committee

Anas Awadalla	University of Washington
Artemis Panagopoulou	University of Pennsylvania
Asim Kadav	Samsung Research
Dobrik G. Georgiev	University of Cambridge
Feipeng Ma	University of Science and Technology of China
Hao Tang	Cornell University
Jeonghwan Kim	University of Illinois Urbana-Champaign
Jiayi Pan	University of Berkeley, California
Jun Wang	Salesforce Research
Juntao Tan	Rutgers University
Junwen Chen	Michigan State University
Keshigeyan Chandrasegaran	Stanford University
Kimon Fountoulakis	University of Waterloo
Michal Shlapentokh-Rothman	University of Illinois at Urbana-Champaign
Nithin Gopalakrishnan Nair	Johns Hopkins University
Pulkit Madan	Qualcomm
Sameer Khurana	Mitsubishi Electric Research Laboratories
Scott O. Murray	University of Washington
Siddharth Nagar Nayak	Massachusetts Institute of Technology
Song Wen	Rutgers University
Thomas Mensink	Google Research
Wentao Bao	Michigan State University
Xiulong Liu	University of Washington
Yao Ni	Australian National University
Ye Wang	Mitsubishi Electric Research Laboratories
Yuhang He	University of Oxford
Zhiwei Liu	Salesforce

Multimodal Algorithmic Reasoning

(MAR)

About MAR 2024

Where

When

Keynote Speakers

MAR 2024 Schedule

Opening Remarks Anoop Cherian

Keynote Josh Tenenbaum

Coffee Break

Keynote Stefanie Jegelka

Oral Paper Presentation Eunice Yiu et al.

Oral Paper Presentation Sullam Jeoung et al.

Oral Paper Presentation Mikel Bober-Irizar et al.

Keynote Ranjay Krishna

Lunch Break

Keynote Sergey Levine

Poster Session & Coffee Break

Keynote David Duvenaud

Closing Remarks Anoop Cherian

Submissions

Call for Contributions

Important Dates

Topics

Submission Instructions

Contact

Accepted Papers

Oral Papers

Spotlight Papers

Poster Papers

MAR 2024 Venue

Vancouver Convention Center, Vancouver, BC, Canada

Sponsor

Organizers

Program Committee