MAR 2024 - Multimodal Algorithmic Reasoning

Speaker Details

[more info about the speakers will be announced here]

Petar Veličković

Google DeepMind

Bio:
Dr. Petar Veličković is a Staff Research Scientist at Google DeepMind, Affiliated Lecturer at the University of Cambridge, and an Associate of Clare Hall, Cambridge. He holds a PhD in Computer Science from the University of Cambridge (Trinity College), obtained under the supervision of Pietro Liò. His research concerns geometric deep learning—devising neural network architectures that respect the invariances and symmetries in data (a topic he has co-written a proto-book about). For his contributions, he is recognised as an ELLIS Scholar in the Geometric Deep Learning Program. Particularly, he focuses on graph representation learning and its applications in algorithmic reasoning (featured in Venture Beat). He is the first author of Graph Attention Networks—a popular convolutional layer for graphs—and Deep Graph Infomax—a popular self-supervised learning pipeline for graphs (featured in ZDNet). His research has been used in substantially improving travel-time predictions in Google Maps (featured in the CNBC, Endgadget, VentureBeat, CNET, the Verge and ZDNet), and guiding intuition of mathematicians towards new top-tier theorems and conjectures (featured in Nature, Science, Quanta Magazine, New Scientist, The Independent, Sky News, The Sunday Times, la Repubblica and The Conversation).

Keynote Title:
Embracing multimodality in Neural Algorithmic Reasoning.

Bio:
Dr. Tom Griffiths is the Henry R. Luce Professor of Information Technology, Consciousness and Culture in the Departments of Psychology and Computer Science at Princeton University. His research explores connections between human and machine learning, using ideas from statistics and artificial intelligence to understand how people solve the challenging computational problems they encounter in everyday life. Tom completed his PhD in Psychology at Stanford University in 2005, and taught at Brown University and the University of California, Berkeley before moving to Princeton. He has received awards for his research from organizations ranging from the American Psychological Association to the National Academy of Sciences and is a co-author of the book Algorithms to Live By, introducing ideas from computer science and cognitive science to a general audience.

Keynote Title:
Abstraction in Humans and Machines.

Keynote Abstract:
Machine learning has made great strides in creating systems that demonstrate high performance on tasks that were previously only performed by humans. However, are the solutions that they find comparable to those that humans use? In this talk I will summarize recent work analyzing the behavior of deep neural networks performing multimodal tasks that shows that these models failed to capture important abstractions that guide human performance in those tasks. I will also present some ideas on how we can better guide systems towards developing those abstractions.

Emilien Dupont

Google DeepMind

Bio:
Dr. Emilien Dupont is a research scientist at Google DeepMind working on machine learning for solving problems in mathematics and theoretical computer science. He has also spent time working on neural compression as well as scientific applications of machine learning, particularly in geoscience. Before joining GDM, he did a PhD in machine learning at Oxford and studied computational math at Stanford.

Keynote Title:
FunSearch: Mathematical discoveries from program search with LLMs.

Keynote Abstract:
Large language models (LLMs) have demonstrated tremendous capabilities in solving complex tasks, from quantitative reasoning to understanding natural language. However, LLMs sometimes suffer from confabulations (or hallucinations), which can result in them making plausible but incorrect statements. This hinders the use of current large models in scientific discovery. Here we introduce FunSearch (short for searching in the function space), an evolutionary procedure based on pairing a pretrained LLM with a systematic evaluator. We demonstrate the effectiveness of this approach to surpass the best-known results in important problems, pushing the boundary of existing LLM-based approaches. Applying FunSearch to a central problem in extremal combinatorics—the cap set problem—we discover new constructions of large cap sets going beyond the best-known ones, both in finite dimensional and asymptotic cases. This shows that it is possible to make discoveries for established open problems using LLMs. We showcase the generality of FunSearch by applying it to an algorithmic problem, online bin packing, finding new heuristics that improve on widely used baselines. In contrast to most computer search approaches, FunSearch searches for programs that describe how to solve a problem, rather than what the solution is. Beyond being an effective and scalable strategy, discovered programs tend to be more interpretable than raw solutions, enabling feedback loops between domain experts and FunSearch, and the deployment of such programs in real-world applications.

Lijuan Wang

Microsoft GenAI

Bio:
Dr. Lijuan Wang serves as a Principal Researcher and Research Manager, leading a multimodal generative AI research group within Microsoft GenAI. After earning her PhD from Tsinghua University, China, she began her tenure at Microsoft Research Asia in 2006 and later joined Microsoft Research in Redmond in 2016. Her research primarily focuses on multimodal understanding and generation, encompassing a wide range of areas from 3D talking heads to vision-language pre-training, vision foundation models and image/video generation. As a pivotal contributor in vision-language pretraining, image captioning, and object detection, her work has been integral to the development of various Microsoft products, including Cognitive Services and Office 365. Her recent explorations into GPT-4V's advanced capabilities, contributions to the development of DALL-E 3, and work on multimodal agents have garnered significant attention.

Keynote Title:
Recent Advances in Multimodal Foundation Models.

Keynote Abstract:
Humans interact with the world through multiple modalities, naturally synchronizing and integrating diverse information. A key goal in artificial intelligence is to develop algorithms capable of understanding and generating multimodal content. Research encompasses a broad range of tasks, from visual understanding (including image classification, image-text retrieval, image captioning, visual question answering, object detection, and various segmentation tasks) to visual generation (such as text-to-image and text-to-video generation). Recent advancements have shown significant improvements in model capabilities and versatility, novel benchmarks for emergent capabilities, and a trend toward integrating understanding and generation. The computer vision community is now emphasizing the development of general-purpose vision foundation models, influenced by the success of large-scale pre-training and large language models. These efforts are moving from specialized models to versatile general-purpose assistants. This talk will explore cutting-edge learning and application strategies for multimodal foundation models. Topics include learning models for multimodal understanding and generation, benchmarking these models to evaluate emergent abilities in understanding and generation tasks, and developing advanced systems and agents based on vision foundation models.

Chelsea Finn

Stanford University

Bio:
Dr. Chelsea Finn is an Assistant Professor in Computer Science and Electrical Engineering at Stanford University and a co-founder of Physical Intelligence (Pi). Her research interests lie in the capability of robots and other agents to develop broadly intelligent behavior through learning and interaction. To this end, her work has pioneered end-to-end deep learning methods for vision-based robotic manipulation, meta-learning algorithms for few-shot learning, and approaches for scaling robot learning to broad datasets. Her research has been recognized by awards such as the Sloan Fellowship, the IEEE RAS Early Academic Career Award, and the ACM doctoral dissertation award. Prior to joining Stanford, she received her Bachelor's degree in Electrical Engineering and Computer Science at MIT and her PhD in Computer Science at UC Berkeley.

Keynote Title:
Robotic Reasoning with Vision-Language Models.